A Path-Planning Method Based on Improved Soft Actor-Critic Algorithm for Mobile Robots
Abstract
:1. Introduction
- In this study, we provide a novel deep reinforcement learning method based on the soft actor–critic (SAC) framework for the purpose of path planning in settings with unknown characteristics. The algorithm has been modified to accommodate a continuous action space and operates as an offline method. The introduction of maximum entropy is employed to mitigate the issue of local optimality, hence enhancing the system’s resistance to interference.
- Furthermore, the hindsight experience replay (HER) algorithm is proposed as a solution to address the challenges of reward scarcity and the sluggish training pace seen in goal-oriented reinforcement learning algorithms. HER achieves this by recalculating reward values and efficiently using the knowledge gained from unsuccessful experiences during the training process.
- Third, the simulation experiments of path planning with environmental maps verify that the new algorithm HER-SAC can effectively perform path planning and improve the training speed and convergence of the algorithm.
2. Path-Planning Algorithm
2.1. Enhanced Learning
2.2. Soft Actor-Critic (SAC) Algorithm
2.2.1. Maximizing Entropy
2.2.2. Soft Strategy Iteration
2.2.3. Soft Actor-Critic
3. Improvement of SAC Algorithm
4. Path-Planning Simulation Experiment
4.1. Simulation Experiment Environment
4.2. Comparison of Algorithms
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Patle, B.K.; Babu, L.G.; Pandey, A.; Parhi, D.R.K.; Jagadeesh, A. A review: On path planning strategies for navigation of mobile robot. Def. Technol. 2019, 15, 582–606. [Google Scholar] [CrossRef]
- Sanchez-Ibanez, J.R.; Perez-del-Pulgar, C.J.; Garcia-Cerezo, A. Path Planning for Autonomous Mobile Robots: A Review. Sensors 2021, 21, 7898. [Google Scholar] [CrossRef]
- Zhan, S.; Zhang, T.; Lei, H.; Yin, Q.; Ali, L. Research on Path Planning of Mobile Robot Based on Deep Reinforcement Learning. In Big Data and Security. ICBDS 2020. Communications in Computer and Information Science; Springer: Singapore, 2021; Volume 1415. [Google Scholar] [CrossRef]
- Li, C.; Huang, X.; Ding, J.; Song, K.; Lu, S. Global path planning based on a bidirectional alternating search A* algorithm for mobile robots. Comput. Ind. Eng. 2022, 168, 108123. [Google Scholar] [CrossRef]
- Sedeno-noda, A.; Colebrook, M. A biobjective Dijkstra algorithm. Eur. J. Oper. Res. 2019, 276, 106–118. [Google Scholar] [CrossRef]
- Adiyatov, O.; Varol, H.A. A Novel RRT*-Based Algorithm for Motion Planning in Dynamic Environments. In Proceedings of the 2017 IEEE International Conference on Mechatronics and Automation (ICMA), Takamatsu, Japan, 6–9 August 2017; pp. 1416–1421. [Google Scholar] [CrossRef]
- Li, Q.; Xu, Y.; Bu, S.; Yang, J. Smart Vehicle Path Planning Based on Modified PRM Algorithm. Sensors 2022, 22, 6581. [Google Scholar] [CrossRef] [PubMed]
- Yu, K.; Lee, M.; Chi, S. Dynamic Path Planning Based on Adaptable Ant Colony Optimization algorithm. In Proceedings of the 2017 Sixth International Conference on Future Generation Communication Technologies (FGCT), Dublin, Ireland, 21–23 August 2017; pp. 60–66. [Google Scholar] [CrossRef]
- Lamini, C.; Benhlima, S.; Elbekri, A. Genetic Algorithm Based Approach for Autonomous Mobile Robot Path Planning. Procedia Comput. Sci. 2018, 127, 180–189. [Google Scholar] [CrossRef]
- Deepak, B.B.V.L.; Parhi, D.R.; Raju, B.M.V.A. Advance Particle Swarm Optimization-Based Navigational Controller For Mobile Robot. Arab. J. Sci. Eng. 2014, 39, 6477–6487. [Google Scholar] [CrossRef]
- Agirrebeitia, J.; Avilés, R.; de Bustos, I.F.; Ajuria, G. A new APF strategy for path planning in environments with obstacles. Mech. Mach. Theory 2005, 40, 645–658. [Google Scholar] [CrossRef]
- Liu, T.; Yan, R.; Wei, G.; Sun, L. Local Path Planning Algorithm for Blind-guiding Robot Based on Improved DWA Algorithm. In Proceedings of the 2019 Chinese Control And Decision Conference (CCDC), Nanchang, China, 3–5 June 2019; pp. 6169–6173. [Google Scholar] [CrossRef]
- Wang, J.; Luo, Y.; Tan, X. Path Planning for Automatic Guided Vehicles (AGVs) Fusing MH-RRT with Improved TEB. Actuators 2021, 10, 314. [Google Scholar] [CrossRef]
- Sung, I.; Choi, B.; Nielsen, P. On the training of a neural network for online path planning with offline path planning algorithms. Int. J. Inf. Manag. 2021, 57, 102142. [Google Scholar] [CrossRef]
- Polydoros, A.S.; Nalpantidis, L. Survey of Model-Based Reinforcement Learning: Applications on Robotics. J. Intell. Robot. Syst. 2017, 86, 153–173. [Google Scholar] [CrossRef]
- Duguleana, M.; Mogan, G. Neural networks based reinforcement learning for mobile robots obstacle avoidance. Expert Syst. Appl. 2016, 62, 104–115. [Google Scholar] [CrossRef]
- Maoudj, A.; Hentout, A. Optimal path planning approach based on Q-learning algorithm for mobile robots. Appl. Soft Comput. 2020, 97, 106796. [Google Scholar] [CrossRef]
- Pei, M.; An, H.; Liu, B.; Wang, C. An Improved Dyna-Q Algorithm for Mobile Robot Path Planning in Unknown Dynamic Environment. Ieee Trans. Syst. Man Cybern. Syst. 2022, 52, 4415–4425. [Google Scholar] [CrossRef]
- Wen, S.; Jiang, Y.; Cui, B.; Gao, K.; Wang, F. A Hierarchical Path Planning Approach with Multi-SARSA Based on Topological Map. Sensors 2022, 22, 2367. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.; Li, J.; Peng, L. Multi-robot path planning based on a deep reinforcement learning DQN algorithm. Caai Trans. Intell. Technol. 2020, 5, 177–183. [Google Scholar] [CrossRef]
- Yang, X.; Shi, Y.; Liu, W.; Ye, H.; Zhong, W.; Xiang, Z. Global path planning algorithm based on double DQN for multi-tasks amphibious unmanned surface vehicle. Ocean Eng. 2022, 266, 112809. [Google Scholar] [CrossRef]
- Sasaki, Y.; Matsuo, S.; Kanezaki, A.; Takemura, H. A3C Based Motion Learning for an Autonomous Mobile Robot in Crowds. In Proceedings of the 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), Bari, Italy, 6–9 October 2019; pp. 1036–1042. [Google Scholar] [CrossRef]
- Chen, P.; Pei, J.; Lu, W.; Li, M. A deep reinforcement learning based method for real-time path planning and dynamic obstacle avoidance. Neurocomputing 2022, 497, 64–75. [Google Scholar] [CrossRef]
- Xu, Y.; Wei, Y.; Jiang, K.; Chen, L.; Wang, D.; Deng, H. Action decoupled SAC reinforcement learning with discrete-continuous hybrid action spaces. Neurocomputing 2023, 537, 141–151. [Google Scholar] [CrossRef]
- Tian, S.; Li, Y.; Zhang, X.; Zheng, L.; Cheng, L.; She, W.; Xie, W. Fast UAV path planning in urban environments based on three-step experience buffer sampling DDPG. Digit. Commun. Netw. 2023. [Google Scholar] [CrossRef]
- Gao, J.; Ye, W.; Guo, J.; Li, Z. Deep Reinforcement Learning for Indoor Mobile Robot Path Planning. Sensors 2020, 20, 5493. [Google Scholar] [CrossRef] [PubMed]
- Cheng, X.; Zhang, S.; Cheng, S.; Xia, Q.; Zhang, J. Path-Following and Obstacle Avoidance Control of Nonholonomic Wheeled Mobile Robot Based on Deep Reinforcement Learning. Appl. Sci. 2022, 12, 6874. [Google Scholar] [CrossRef]
- Andrychowicz, M.; Wolski, F.; Ray, A.; Schneider, J.; Fong, R.; Welinder, P.; McGrew, B.; Tobin, J.; Abbeel, P.; Zaremba, W. Hindsight Experience Replay. In Proceedings of the 2017 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar] [CrossRef]
Path-Planning Algorithm | Dominance | Drawbacks | Complexity Theory |
---|---|---|---|
Q-learning | Simple and easy to implement, and discrete states and action spaces work better | Requires state–action–reward transition tables, and continuous state and action space issues do not apply | The complexity of the algorithm increases as the state and action space increases |
Dyna-Q | Combines model learning and reinforcement learning to improve learning efficiency and stability | Requires additional computational and storage overhead to maintain the environment model | Higher time complexity, proportional to the number of model learnings and planned learnings |
SARSA | Easy to implement, and better results for discrete state and action space problems | Not applicable for continuous state and action space problems | Depends on the size of the state–action space |
DQN | Adaptation to continuous state and action space problems, expressive, and can be trained offline | Instability in the training process, long training time, and large amount of sample data | Depends on the level of dimensionality, neural network size, number of iterations, and buffer size |
A3C | Parallelization of training, fast convergence, and adaptability | Training is unstable and requires a large number of training samples | Depends on the level of dimensionality, the size of the neural network, the number of iterations, and the size of the parallelized training |
PPO | Fast convergence, efficient use of samples, and good algorithmic stability | Sensitive parameter selection and demanding training samples | Depends on the level of dimensionality, the size of the neural network, the number of iterations, and the number of trajectories sampled |
SAC | Applicable to path planning in high-dimensional space, adaptive adjustment of parameters, and good robustness | Longer training time, large sample data size, and may require more computational resources for complex tasks | Depends on neural network size and number of iterations |
HER-SAC | Solves sparse rewards, improves convergence and stability, and can handle complex tasks | Requires additional computation and storage overhead | Depends on the level of dimensionality, neural network size, number of iterations, and buffer size |
DDPG | Applicable continuous state and action space problems with good convergence and performance | Sensitive to initial conditions, long training time, and need to tune hyperparameters | Depends on the size of the neural network and the number of iterations of training |
DDPG-HL | Ability to deal with problems with multiple levels, improving learning efficiency and performance | Requires additional computational and storage overhead to maintain the hierarchy | Depends on the size of the neural network and the number of iterations of training and the number of levels |
Action Space | State Space | Action Bound |
---|---|---|
2 | 4 |
Description | Parameter | Value |
---|---|---|
Actor network learning rate | actor_lr | 3 × 10−4 |
Critic network learning rate | critic_lr | 3 × 10−3 |
parameter learning rate | alpha_lr | 3 × 10−4 |
Hidden layer dimensions | hidden_dim | 128 |
Discount factor | gamma | 0.98 |
Soft update parameters | tau | 0.005 |
Buffer size | buffer_size | 10,000 |
Minimal size | minimal_size | 500 |
Batch size | batch_size | 64 |
Total training episodes | num_episodes | 1000 |
Minimal training episodes | minimal_episodes | 200 |
Target entropy | target_entropy | −0.1 |
Number of training samples | n_train | 20 |
Algorithm | Environment | Training Round | Start to Converge | Eventual Convergence | Path Length |
---|---|---|---|---|---|
DDPG | Map1 | 1000 | 200 | / | 21.26 |
SAC | Map1 | 1000 | 241 | 800 | 21.31 |
HER-SAC | Map1 | 1000 | 200 | 264 | 21.31 |
DDPG | Map2 | 1000 | 200 | 850 | 21.89 |
SAC | Map2 | 1000 | 204 | 632 | 21.90 |
HER-SAC | Map2 | 1000 | 213 | 300 | 21.90 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, T.; Wang, M.; Zhao, Q.; Zheng, X.; Gao, H. A Path-Planning Method Based on Improved Soft Actor-Critic Algorithm for Mobile Robots. Biomimetics 2023, 8, 481. https://doi.org/10.3390/biomimetics8060481
Zhao T, Wang M, Zhao Q, Zheng X, Gao H. A Path-Planning Method Based on Improved Soft Actor-Critic Algorithm for Mobile Robots. Biomimetics. 2023; 8(6):481. https://doi.org/10.3390/biomimetics8060481
Chicago/Turabian StyleZhao, Tinglong, Ming Wang, Qianchuan Zhao, Xuehan Zheng, and He Gao. 2023. "A Path-Planning Method Based on Improved Soft Actor-Critic Algorithm for Mobile Robots" Biomimetics 8, no. 6: 481. https://doi.org/10.3390/biomimetics8060481
APA StyleZhao, T., Wang, M., Zhao, Q., Zheng, X., & Gao, H. (2023). A Path-Planning Method Based on Improved Soft Actor-Critic Algorithm for Mobile Robots. Biomimetics, 8(6), 481. https://doi.org/10.3390/biomimetics8060481