Abstract
Multi-agent systems are utilized more often in the research community and industry, as they can complete tasks faster and more efficiently than single-agent systems. Therefore, in this paper, we are going to present an optimal approach to the multi-agent navigation problem in simply connected workspaces. The task involves each agent reaching its destination starting from an initial position and following an optimal collision-free trajectory. To achieve this, we design a decentralized control protocol, defined by a navigation function, where each agent is equipped with a navigation controller that resolves imminent safety conflicts with the others, as well as the workspace boundary, without requesting knowledge about the goal position of the other agents. Our approach is rendered sub-optimal, since each agent owns a predetermined optimal policy calculated by a novel off-policy iterative method. We use this method because the computational complexity of learning-based methods needed to calculate the global optimal solution becomes unrealistic as the number of agents increases. To achieve our goal, we examine how much the yielded sub-optimal trajectory deviates from the optimal one and how much time the multi-agent system needs to accomplish its task as we increase the number of agents. Finally, we compare our method results with a discrete centralized policy method, also known as a Multi-Agent Poli-RRT* algorithm, to demonstrate the validity of our method when it is attached to other research algorithms.
1. Introduction
Motion Planning (MP) is one of the fundamental problems in robotics and a great amount of control algorithms have been proposed to solve it. The main purpose is to find methods to discover safe and, to the greatest extent possible, optimal trajectories either for a single mobile robot/agent system (Single-Agent System—SAS) or a multiple mobile robots/agents system (Multi-Agent System—MAS), which navigate the robot/agents’ journey from any initial point of the workspace to their destination. In the research community and industry, controlling MASs is a state-of-the-art task, as these systems can complete tasks faster and more efficiently than SAS. Despite their effectiveness, developing control policies that result in suitable trajectories with minimum cost (in terms of the agent’s path distance and input energy) is extremely challenging.
There are two control policies used to navigate a MAS: the centralized policy (CP), in which the agents are treated as a whole system and each agent is guided by one global controller, and the decentralized policy (DP), where each agent has its own control policy and exchanges information with the others to reach its destination safely without collisions. Designing DP controllers is more challenging than the centralized one, as the CP simplifies the problem and the agents have all the information about the environment and the goal position of all agents. On the other hand, in DP, the constraint of agents’ communication plays a vital role in the control design (for more information about the differences between CP and DP, the reader is referred to [1]). There is an abundance of studies that utilize these policies to efficiently navigate the agents, helping them to accomplish their tasks. They are mostly published in the field of exploration [2,3,4,5,6,7] and formation [8,9,10,11,12,13].
Furthermore, numerous researchers have proposed methods for navigating mobile robots to a final position; see [14,15,16,17,18]. Some of these studies use a non-degenerate navigation function [19], creating a negated gradient potential field that is attractive towards the goal configuration and repulsive regarding obstacles and the workspace boundary. The first studies to apply the above function to help mobile agents navigate to their destination without facing any conflicts are [20,21] which describe a CP and DP, respectively. We have to mention that this type of controller does not provide strict global navigation, because a vector field in any manifold that has a unique attractive point should have at least as many saddles as obstacles [19]. Related multi-agent navigation function methods can be found in [22,23,24,25,26,27].
As we mentioned earlier, the ultimate goal for MAS navigation is to design methods that can find the optimal path for the agents to follow. In SAS, there are discrete (DM) and continuous methods (CM) that accomplish this task. In DM, there is an RRT* algorithm with various versions [28,29]; this algorithm modifies the robot’s path in order to decrease the minimum path length. The CM relies on Reinforcement Learning (RL)-based methods that use iterative algorithms to converge in an optimal navigation policy [30,31,32]. It is common in MAS to use methods based on Deep Reinforcement Learning (DRL), which is a powerful tool that combines neural networks and RL algorithms that allow each agent to learn from its interactions with the environment [33,34,35,36,37,38]. Despite the effectiveness of the RL-based methods, the main disadvantage in MAS is the computational complexity and abundance of data required to converge to the global policy.
Therefore in this paper, we are going to introduce a sub-optimal approach for navigating a MAS in planar simply connected workspaces (i.e., workspaces without any internal obstacles). We refer to them as sub-optimal because we begin by finding the optimal navigation for each agent separately with a novel off-policy iterative method [30] that is used to prevent the computational complexity of global MAS and, subsequently, we design a DP controller for multi-agent navigation, where we use the negated gradient of a navigation function from [19]. We apply a DP method to our MAS problem, as every agent should operate independently from the others. This is due to the fact that each agent possesses predetermined information about its optimal navigation and follows this policy to its destination when it is isolated from the others. Additionally, we adopt the approach from [21,26], referred to above, to design the controller for each agent in order to prevent collisions, and this approach also demands fewer numerical operations than the CP of [20], thus relaxing the requirements of the solution. Hence, the main contributions of this work are summarized as follows:
- We navigate each agent with a sub-optimal policy to its destination. To the best of our knowledge, this is the first work based on artificial potential fields that introduces optimality within a multi-agent navigation framework.
- No collision with other nearby agents or the workspace boundary occurs.
- Knowledge about the current position of the nearby agents and not their destination is required.
- The complexity is rendered linear with respect to the number of the agents and, if combined with the recent work [39], may be fixed.
Additionally, to demonstrate the efficiency of the proposed controller, in our simulation study, we examine how much the sub-optimal trajectory of each agent deviates from the optimal one of the SAS, how much time the overall system needs to accomplish its task, and how the time taken changes as we increase the number of agents starting simultaneously from their initial position. Also, we consider the results obtained using the Multi-Agent Poli-RRT* method, which was proposed by [40], in order to compare the validity of our method.
The rest of the paper is organized as follows: Section 2 formulates the problem. Section 3 describes the proposed decentralized navigation policy. Section 4 provides the proof of correctness of our approach. Section 5 presents the simulation results. Finally, Section 6 concludes the work and proposes future research directions.
2. Problem Formulation
Let us assume that we have a sphere manifold (ball set) . Inside this set, we define a simply connected workspace and its boundary, which is given by . There are m mobile agents (or robots) in W, with each one of them occupying a disk , where , denote the center and radius of the disk, respectively. The configuration space is spanned by . The initial position vector is defined as and the desired destination vector as . The motion of each agent follows a single integrator model:
where denotes the velocity command applied to each agent. In particular, will be a sub-optimal decentralized control policy which will navigate the agent to its destination and prevent any conflicts/collisions with other agents and the workspace boundary . Also, the controller possesses knowledge about the current position of other nearby agents, but not their destination. Thus, every agent sees the others as moving obstacles in W.
Moreover, we assume that the workspace is spacious, and at every point of it, each agent is capable of avoiding any conflict, since in this paper, we mainly focus on the functionality and the appropriate design of the sub-optimal control policy. Thus, we exclude narrowly connected workspaces, where the agents need global coordination to navigate towards their destination to alleviate blocking issues, such as in [41]. We are going to deal with this issue in future work. Finally, we define the following infinite-horizon cost function for each agent:
where is the starting point of the agent in W. Additionally, we define the state-related cost term Q and the input-related cost term R, respectively:
where is a weight parameter and denotes the Euclidean norm. The metric (2) along with (3), (4) form a classical function from Optimal Regulation theory [42]. The state-related term (3) expresses the minimization of the settling time of the system (1) and the input-related term (4) penalizes the control input’s Euclidean norm, which allows us to calculate the energy expenditure of the system (1). Furthermore, the weight’s value plays a vital role in the final result of the metric (2). It depends on what specification (reducing settling time or input energy) we desire to pass on to our system.
In summation, the aforementioned metric examines two factors: the first one is the amount of time that the whole MAS needs to reach the goal configuration and the second one is the energy of the control input signal, which is spent by each agent in order to successfully navigate the workspace. Thus, the objective of this work is to design appropriate velocity control policies such that collisions among the agents and the workspace boundary are safely avoided and the overall cost
is rendered as small as possible for any , .
4. Proof of Correctness
Let . Define a ball . According to [20], we discriminate the following topologies for the function :
- The destination point .
- The free space boundary: .
- The set near collisions: .
- The set away from collision: .
The following theorem examines the properties of the proposed decentralized optimal navigation protocol via an overall multi-agent Navigation Function.
Theorem 1.
Let be intervals and let and be analytic. Define the composition as . If σ is monotonically increasing on , then the set of critical points of and φ coincide and the (Morse) index of each critical point is identical.
Proof.
The proof proceeds similarly to [19]. Since we have a connected workspace and a well-defined function , we first present some results for the navigation function [20,26], which establish that the goal configurations are attainable without collisions and that there will always be a direction of movement decreasing the overall potential function. More specifically,
- Since the workspace is connected, the destination point is a non-degenerate local minimum of .
- All critical points of are in the interior of the free space.
- For every , there exists a positive integer such that if , then there are no critical points of in .
- There exists an , such that has no local minimum in , as long as .
Nevertheless, the aforementioned suppositions do not guarantee global convergence of the system state to the goal configurations on their own. To achieve such convergence, we define a time-invariant Lyapunov function for the overall system with respect to the positions of all agents. The candidate Lyapunov function is the sum of . Thus, differentiating the candidate Lyapunov function along the trajectories of the multi-agent system (i.e., substituting the navigation protocol (7)), we obtain a negative semi-definite derivative. Thus, the value of the Lyapunov function decreases (i.e., no collisions occur) and moreover, the system state converges to the largest invariant set of the system, which is composed of the desired configuration and the rest of equilibria that, fortunately for sufficiently large parameters , attain a measured zero domain of attraction. Therefore, the system trajectories converge safely to the desired configuration initializing from almost everywhere within the workspace. □
5. Results
In this section, we will present the simulation results of the aforementioned method. We will use 8 mobile holonomic agents with radius which are in a workspace as shown in Figure 4 with the initial and desired positions given in Table 1. We will also run the same simulations with the Multi-Agent Poli-RRT* method to compare our method’s result. We will provide a short description of this comparison method in the next subsection. For the simulations, we use the programming and numeric computing platform MATLAB in version R2022b and a computer with a processor Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz 2.90 GHz and RAM 8GB (Intel, Santa Clara, CA, USA).
Figure 4.
Workspace and the initial (symbol I) and desired (symbol G) position of each agent.
Table 1.
The initial and desired position of each agent.
5.1. Multi-Agent Poli-RRT* Algorithm
The Multi-Agent Poli-RRT* method was first proposed by [40]. In that work, the authors extended the Poli-RRT* algorithm [47] to a non-holonomic Multi-Agent system in which the agents move in a workspace with internal obstacles by adopting a priority-based approach. That is, the agents are ranked according to a priority criterion and the algorithm finds trajectories in sequence, starting from the highest-priority agent and moving to the lowest one. Furthermore, in each iteration, when it is their turn, the agents with lower priority will learn about the trajectories of the previous higher-priority agents from an updated list. This list involves the edges of the agents’ trajectories and the time duration for which they were there. With this structure, the agents can prevent a possible collision and maintain a safe distance from the previous one. The aforementioned algorithm can be simply adjusted to our problem, and we adapt the agents’ controller to be similar to the (7).
5.2. Simulations
Before we define the sub-optimal control law for each agent, we use the off-policy method to determine the optimal cost function for each agent. Because we want to attach equal importance to the minimization of the settling time and input response of the system (1), we set the . In Figure 5, we illustrate the optimal vector fields of each agent using the method presented in [30]. Additionally, in Table 2 and Table 3 we present the time and cost value of each optimal policy, respectively.
Figure 5.
Optimal vector fields of each agent. The red-filled circle showcases the goal position. (a) Agent 1. (b) Agent 2. (c) Agent 3. (d) Agent 4. (e) Agent 5. (f) Agent 6. (g) Agent 7. (h) Agent 8.
Table 2.
Time duration of each agent and whole system, with sub-optimal control method and Multi-Agent Poli-RRT* method.
Table 3.
Cost value of each agent and whole system, with sub-optimal control method and Multi-Agent Poli-RRT* method.
As we mentioned earlier, to demonstrate the efficiency of the proposed sub-optimal controller, in this simulation, we examine how much the trajectory of each agent deviates from the optimal one of a SAS and the time that the global system will take to accomplish its task as we increase the number of agents starting simultaneously from the initial position. As the goal of the aforementioned MAS, we stated that all agents should reach their destination starting from the above initial position and following an optimal no-collision trajectory. To this end, we ran three simulations. In the first one, a group of two agents started towards their destination; in the second, a group of four started together; and in the last one, all the agents started together. We assume that an agent interacts only with the group’s members and does not affect the navigation of the other agents which are not in the group. Notably, using the same aforementioned properties, we implemented the comparison method, too. For each simulation, we chose the group of agents that would interact and the variables of sub-optimal control law ().
For simulations, we categorized the agents into the following groups:
- Two members , , and .
- Four members and .
- Eight members (all the agents).
For the Multi-Agent Poli-RRT* method, the first agent has the highest priority and the last one has the lowest.
As variables of sub-optimal control for each simulation, we define the values as follows:
- Simulation 1: . https://www.youtube.com/watch?v=-qLbfTVryj8 (accessed on 31 March 2024)
- Simulation 2: . https://www.youtube.com/watch?v=M2rhUSAz1w0 (accessed on 31 March 2024)
- Simulation 3: . https://www.youtube.com/watch?v=Z__lYbZY7O0 (accessed on 31 March 2024)
We have to notice that in simulation one, the variables because the complementary relations of each agent are not defined when we observe an interaction between two agents. Also, the selected variables are the best ones available, providing the best sub-optimal no-collision trajectory for each agent. Finally, the accompanying videos for each simulation are given in slow motion in order to observe the collision avoidance property.
In Table 2, we list the time durations each agent needs to accomplish its tasks and in Table 3 we list the cost value of each agent for each simulation. Moreover, we also state the global system time duration and cost value in the last line. Both of the navigation methods’ results are shown in these tables too. Finally, in Figure 6 we have the optimal and sub-optimal greedy trajectory of each agent for optimal SAS and each simulation, and also in Figure 7 we present the trajectories of the two navigation methods during Simulation 3.
Figure 6.
The Optimal (solid blue line), Simulation 1 (dashed red line), Simulation 2 (dashed purple line) and Simulation 3 (dashed magenta line) trajectories of each agent. The green-filled circle showcases the initial position and the red-filled circle showcases the goal position. (a) Agent 1. (b) Agent 2. (c) Agent 3. (d) Agent 4. (e) Agent 5. (f) Agent 6. (g) Agent 7. (h) Agent 8.
Figure 7.
Agents’ optimal navigation (solid blue line) and their trajectories calculated with the sub-optimal navigation method (dashed red line) and Multi-Agents Poli-RRT* method (dashed purple line) at simulation 3. The green-filled circle showcases the initial position and the red-filled circle the goal position. (a) Agent 1. (b) Agent 2. (c) Agent 3. (d) Agent 4. (e) Agent 5. (f) Agent 6. (g) Agent 7. (h) Agent 8.
6. Discussion
In Figure 6, we provide the agents’ trajectories during the above simulations. Specifically, we see the optimal (solid blue line) and sub-optimal trajectories of each agent separately (dashed lines). In all simulations, at the start and the end of their trajectory, we notice that every agent is close to its optimal navigation. On the contrary, in the center of workspace, where the group of agents meets up, each agent deviates from its optimal trajectory or remains idle for a short time period in order to avoid conflicts with the other agents. As a result, the sub-optimal cost value increases as we upsize the members of the group. This occurs as a result of the tuning of the parameters of the sub-optimal controller according to Theorem 1, which is implemented to establish convergence from almost everywhere in the workspace. If we had not selected proper parameter variables, we would have watched the agents remain idle in the center of the workspace without accomplishing their tasks owing to the presence of a stable unwanted equilibrium.
In comparison with the Multi-Agent Poli-RRT* method, in Table 2 and Table 3, we showcase the global efficiency of our method. Also, we observe the efficiency of the priority attitude using the comparison method. That is, in Table 2 and Table 3, the highest-priority agents have a smaller time duration and cost value than the respective agents when our method is used. We observe the same result in Figure 7, which lists the trajectory results of the two methods during Simulation 3. The trajectories of the highest-priority agents (dashed purple line) are more limited in their optimal trajectory than sub-optimal (dashed red line). However, when the updated list increases in each iteration, the lowest-priority agents start to deviate from their optimal trajectory. For the above reasons, our approach is globally valid in comparison with the above priority criterion algorithm.
Moreover, in the aforementioned results, as we expected, we observe a trade-off between total time duration and cost value. What we mean is that as we increase the number of agents that interact in the workspace, a decrease in the total time the agents require to accomplish their tasks occurs alongside an increase in the total cost value of MAS, since the agents deviate from their optimal trajectory. Figure 8 shows the decrease in the total time duration (blue lines) and percentage rise of the total cost value attached to the global optimal cost value (red lines) for the sub-optimal navigation (solid lines) and Multi-Agent Poli-RRT* (dashed lines) method where the horizontal axis is in logarithmic scale. This trade-off result provides additional confirmation of the global efficiency of our method.
Figure 8.
Total time duration of sub-optimal navigation method (solid blue line) and Multi-Agent PoliRRT* method (dashed blue line) and the percentage increase of the total cost value of the sub-optimal navigation method (solid red line) and Multi-Agent PoliRRT* method (dashed red line) according to the number of agents in the group.
7. Conclusions
In this paper, we propose a sub-optimal approach for the navigation problem of MAS in a simply connected workspace. We observed a trade-off between the time duration needed to fulfill the task and the overall cost value as we increase the number of members of a group. Moreover, we compare this method with the Multi-Agent Poli-RRT* to demonstrate the validity of our method compared to other research algorithms. We use this approach to avoid the computational complexity of RL-based methods, designing a sub-optimal decentralized control law. Despite the promising results of our study, there are several issues we should discuss. First of all, when using this method, the agents may be blocked by others if the workspace has narrow halls. Moreover, we cannot achieve global asymptotic navigation such as that referred to in the previous sections [19]. In future research, we will extend this methodology to ensure it remains functional in narrow workspaces, coordinating the agents properly to ensure they reach their destination, and we will utilize algorithms based on game theory to asymptotically approach the global optimal navigation of the system.
Author Contributions
Conceptualization, D.K. and C.P.B.; methodology, D.K. and C.P.B.; software, D.K.; validation, D.K.; formal analysis, D.K. and C.P.B.; investigation, D.K. and C.P.B.; resources, C.P.B.; writing—original draft preparation, D.K.; writing—review and editing, C.P.B.; visualization, D.K.; supervision, C.P.B.; project administration, C.P.B.; funding acquisition, C.P.B. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the project “Applied Research for Autonomous Robotic Systems” (MIS 5200632) which is implemented within the framework of the National Recovery and Resilience Plan “Greece 2.0” (Measure: 16618—Basic and Applied Research) and is funded by the European Union—NextGenerationEU.
Informed Consent Statement
Not applicable.
Data Availability Statement
Data are contained within the article.
Conflicts of Interest
The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| MP | Motion Planning. |
| MAS | Multi-Agent System. |
| SAS | Single-Agent System. |
| CP | Centralized Policy. |
| DP | Decentralized Policy. |
| RRT | Rapidly-exploring Random Tree. |
| RL | Reinforcement Learning. |
| DRL | Deep Reinforcement Learning. |
| NF | Navigation Function. |
| AHPF | Artificial Harmonic Potential Field. |
| RPF | Relation Proximity Function. |
| RVF | Relation Verification Function. |
References
- Xuan, P.; Lesser, V.R. Multi-Agent Policies: From Centralized Ones to Decentralized Ones. In Proceedings of the First International Joint Conference on Autonomous Agents and Multi-Agent Systems, Bologna, Italy, 15–19 July 2002; pp. 1098–1105. [Google Scholar]
- Atinç, G.M.; Stipanovic, D.M.; Voulgaris, P.G. Supervised Coverage Control of Multi-Agent Systems. Automatica 2014, 50, 2936–2942. [Google Scholar] [CrossRef]
- Gul, F.; Mir, A.; Mir, I.; Mir, S.; Islaam, T.U.; Abualigah, L.; Forestiero, A. A Centralized Strategy for Multi-Agent Exploration. IEEE Access 2022, 10, 126871–126884. [Google Scholar] [CrossRef]
- Ota, J. Multi-Agent Robot Systems as Distributed Autonomous Systems. Adv. Eng. Inform. 2006, 20, 59–70. [Google Scholar] [CrossRef]
- Romeh, A.E.; Mirjalili, S. Multi-Robot Exploration of Unknown Space Using Combined Meta-Heuristic Salp Swarm Algorithm and Deterministic Coordinated Multi-Robot Exploration. Sensors 2023, 23, 2156. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Yu, J.; Tang, J.; Xu, Y.; Wang, Y. MR-TopoMap: Multi-Robot Exploration Based on Topological Map in Communication Restricted Environment. IEEE Robot. Autom. Lett. 2022, 7, 10794–10801. [Google Scholar] [CrossRef]
- Rooker, M.N.; Birk, A. Multi-Robot Exploration under the Constraints of Wireless Networking. Control Eng. Pract. 2007, 15, 435–445. [Google Scholar] [CrossRef]
- AlonsoMora, J.; Baker, S.; Rus, D. Multi-Robot Formation Control and Object Transport in Dynamic Environments via Constrained Optimization. Int. J. Robot. Res. 2017, 36, 1000–1021. [Google Scholar] [CrossRef]
- Borate, S.S.; Vadali, M. FFRRT: A Sampling-Based Path Planner for Flexible Multi-Robot Formations. In Proceedings of the 2021 5th International Conference on Advances in Robotics, Kanpur, India, 30 June–4 July 2021; pp. 53:1–53:6. [Google Scholar]
- Chipade, V.S.; Panagou, D. Multiagent Planning and Control for Swarm Herding in 2D Obstacle Environments under Bounded Inputs. IEEE Trans. Robot. 2021, 37, 1956–1972. [Google Scholar] [CrossRef]
- LópezNicolás, G.; Aranda, M.; Mezouar, Y. Adaptive Multi-Robot Formation Planning to Enclose and Track a Target with Motion and Visibility Constraints. IEEE Trans. Robot. 2020, 36, 142–156. [Google Scholar] [CrossRef]
- Cheng, P.D.C.; Indri, M.; Possieri, C.; Sassano, M.; Sibona, F. Path Planning in Formation and Collision Avoidance for Multi-Agent Systems. Nonlinear Anal. Hybrid Syst. 2023, 47, 101293. [Google Scholar] [CrossRef]
- Tong, X.; Yu, S.; Liu, G.; Niu, X.; Xia, C.; Chen, J.; Yang, Z.; Sun, Y. A Hybrid Formation Path Planning Based on A* and Multi-Target Improved Artificial Potential Field Algorithm in the 2D Random Environments. Adv. Eng. Inform. 2022, 54, 101755. [Google Scholar] [CrossRef]
- Banyassady, B.; de Berg, M.; Bringmann, K.; Buchin, K.; Fernau, H.; Halperin, D.; Kostitsyna, I.; Okamoto, Y.; Slot, S. Unlabeled Multi-Robot Motion Planning with Tighter Separation Bounds. In Proceedings of the 38th International Symposium on Computational Geometry, Berlin, Germany, 7–10 June 2022; pp. 12:1–12:16. [Google Scholar]
- Boardman, B.L.; Harden, T.; Martínez, S. Multiagent Motion Planning with Sporadic Communications for Collision Avoidance. IFAC J. Syst. Control 2021, 15, 100126. [Google Scholar] [CrossRef]
- Petrović, L.; Marković, I.; Seder, M. Multiagent Gaussian Process Motion Planning via Probabilistic Inference. In Proceedings of the 12th IFAC Symposium on Robot Control, St Etienne, France, 17–19 May 2018; Volume 51, pp. 160–165. [Google Scholar]
- Vlantis, P.; Bechlioulis, C.P.; Kyriakopoulos, K.J. Navigation of Multiple Disk-Shaped Robots with Independent Goals within Obstacle-Cluttered Environments. Sensors 2023, 23, 221. [Google Scholar] [CrossRef] [PubMed]
- Zhu, H.; AlonsoMora, J. Chance-Constrained Collision Avoidance for MAVs in Dynamic Environments. IEEE Robot. Autom. Lett. 2019, 4, 776–783. [Google Scholar] [CrossRef]
- Koditschek, D.E.; Rimon, E. Robot Navigation Functions on Manifolds with Boundary. Adv. Appl. Math. 1990, 11, 412–442. [Google Scholar] [CrossRef]
- Loizou, S.G.; Kyriakopoulos, K.J. Closed Loop Navigation for Multiple Holonomic Vehicles. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Lausanne, Switzerland, 30 September–4 October 2002; pp. 2861–2866. [Google Scholar]
- Zavlanos, M.M.; Kyriakopoulos, K.J. Decentralized Motion Control of Multiple Mobile Agents. In Proceedings of the 11th Mediterranean Conference on Control and Automation, Rhodes, Greece, 18–20 June 2003. [Google Scholar]
- Chen, J.; Dawson, D.M.; Salah, M.; Burg, T. Cooperative Control of Multiple Vehicles with Limited Sensing. Int. J. Adapt. Control Signal Process. 2006, 21, 115–131. [Google Scholar] [CrossRef]
- Loizou, S.G.; Kyriakopoulos, K.J. Closed Loop Navigation for Multiple Non-Holonomic Vehicles. In Proceedings of the 2003 IEEE International Conference on Robotics and Automation, Taipei, Taiwan, 14–19 September 2003; pp. 4240–4245. [Google Scholar]
- Loizou, S.G.; Lui, D.G.; Petrillo, A.; Santini, S. Connectivity Preserving Formation Stabilization in an Obstacle-Cluttered Environment in the Presence of TimeVarying Communication Delays. IEEE Trans. Autom. Control 2022, 67, 5525–5532. [Google Scholar] [CrossRef]
- Luis, R.; Tanner, H.G. Flocking, Formation Control, and Path Following for a Group of Mobile Robots. IEEE Trans. Control Syst. Technol. 2015, 23, 1268–1282. [Google Scholar]
- Dimarogonas, D.V.; Loizou, S.G.; Kyriakopoulos, K.J.; Zavlanos, M.M. A Feedback Stabilization and Collision Avoidance Scheme for Multiple Independent Non-point Agents. Automatica 2006, 42, 229–243. [Google Scholar] [CrossRef]
- Tanner, H.G.; Boddu, A. Multiagent Navigation Functions Revisited. IEEE Trans. Robot. 2012, 28, 1346–1359. [Google Scholar] [CrossRef]
- Karaman, S.; Frazzoli, E. Sampling-Based Algorithms for Optimal Motion Planning. Int. J. Robot. Res. 2011, 30, 846–894. [Google Scholar] [CrossRef]
- Vãras, L.G.D.; Medeiros, F.L.; Guimarães, L.N.F. Systematic Literature Review of Sampling Process in Rapidly-Exploring Random Trees. IEEE Access 2019, 7, 50933–50953. [Google Scholar] [CrossRef]
- Rousseas, P.; Bechlioulis, C.P.; Kyriakopoulos, K.J. A Continuous Off-Policy Reinforcement Learning Scheme for Optimal Motion Planning in Simply-Connected Workspaces. In Proceedings of the IEEE International Conference on Robotics and Automation, London, UK, 29 May–2 June 2023; pp. 10247–10253. [Google Scholar]
- Vlachos, C.; Rousseas, P.; Bechlioulis, C.P.; Kyriakopoulos, K.J. Reinforcement Learning-Based Optimal Multiple Waypoint Navigation. In Proceedings of the IEEE International Conference on Robotics and Automation, London, UK, 29 May–2 June 2023; pp. 1537–1543. [Google Scholar]
- Parras, J.; Apellániz, P.A.; Zazo, S. Deep Learning for Efficient and Optimal Motion Planning for AUVs with Disturbances. Sensors 2021, 21, 5011. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.F.; Liu, M.; Everett, M.; How, J.P. Decentralized Non-communicating Multi-Agent Collision Avoidance with Deep Reinforcement Learning. In Proceedings of the IEEE International Conference on Robotics and Automation, Singapore, 29 May–3 June 2017; pp. 285–292. [Google Scholar]
- Benjeddou, A.; Mechbal, N.; Deü, J. Smart Structures and Materials: Vibration and Control. J. Vib. Control 2020, 26, 1109. [Google Scholar] [CrossRef]
- Everett, M.; Chen, Y.F.; How, J.P. Motion Planning among Dynamic, Decision-Making Agents with Deep Reinforcement Learning. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, Spain, 1–5 October 2018; pp. 3052–3059. [Google Scholar]
- Qie, H.; Shi, D.; Shen, T.; Xu, X.; Li, Y.; Wang, L. Joint Optimization of Multi-UAV Target Assignment and Path Planning Based on Multi-Agent Reinforcement Learning. IEEE Access 2019, 7, 146264–146272. [Google Scholar] [CrossRef]
- Sartoretti, G.; Kerr, J.; Shi, Y.; Wagner, G.; Kumar, T.S.; Koenig, S.; Choset, H. PRIMAL: Path-Finding via Reinforcement and Imitation Multi-Agent Learning. IEEE Robot. Autom. Lett. 2019, 4, 2378–2385. [Google Scholar] [CrossRef]
- Wan, K.; Wu, D.; Li, B.; Gao, X.; Hu, Z.; Chen, D. MEMADDPG: An Efficient Learning-Based Motion Planning Method for Multiple Agents in Complex Environments. Int. J. Intell. Syst. 2022, 37, 2393–2427. [Google Scholar] [CrossRef]
- Rousseas, P.; Bechlioulis, C.P.; Kyriakopoulos, K.J. Reactive Optimal Motion Planning to Anywhere in the Presence of Moving Obstacles. Int. J. Robot. Res. 2024, 02783649241245729. [Google Scholar] [CrossRef]
- Ragaglia, M.; Prandini, M.; Bascetta, L. Multi-Agent Poli-RRT* Optimal Constrained RRT-based Planning for Multiple Vehicles with Feedback Linearisable Dynamics. In Modelling and Simulation for Autonomous Systems, Proceedings of the Third International Workshop, MESAS 2016, Rome, Italy, 15–16 June 2016; Springer International Publishing: New York, NY, USA, 2016; pp. 261–270. [Google Scholar]
- Peasgood, M.; Clark, C.M.; McPhee, J. A Complete and Scalable Strategy for Coordinating Multiple Robots within Roadmaps. IEEE Trans. Robot. 2008, 24, 283–292. [Google Scholar] [CrossRef]
- Kalman, R.E. Contributions to the Theory of Optimal Control. Bol. Soc. Mat. Mex. 1960, 5, 102–109. [Google Scholar]
- Song, R.; Lewis, F.L.; Wei, Q.; Zhang, H. Off-Policy Actor-Critic Structure for Optimal Control of Unknown Systems with Disturbances. IEEE Trans. Cybern. 2016, 46, 1041–1050. [Google Scholar] [CrossRef] [PubMed]
- Rousseas, P.; Bechlioulis, C.P.; Kyriakopoulos, K.J. Trajectory Planning in Unknown 2D Workspaces: A Smooth, Reactive, Harmonics-Based Approach. IEEE Robot. Autom. Lett. 2022, 7, 1992–1999. [Google Scholar] [CrossRef]
- Ames, A.D.; Xu, X.; Grizzle, J.W.; Tabuada, P. Control Barrier Function Based Quadratic Programs for Safety Critical Systems. IEEE Trans. Autom. Control 2017, 62, 3861–3876. [Google Scholar] [CrossRef]
- Majdisova, Z.; Skala, V. Radial Basis Function Approximations: Comparison and Applications. Appl. Math. Model. 2017, 51, 728–743. [Google Scholar] [CrossRef]
- Ragaglia, M.; Prandini, M.; Bascetta, L. Poli-RRT*: Optimal RRT-based Planning for Constrained and Feedback Linearisable Vehicle Dynamics. In Proceedings of the 14th European Control Conference, Linz, Austria, 15–17 July 2015; pp. 2521–2526. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).