Research on AGV Path Planning Based on Improved DQN Algorithm
Abstract
1. Introduction
2. AGV Path Planning Problem
2.1. Description of Path Planning Problem
2.2. Action Space Design
2.3. State Space Design
2.4. Multi-Objective Reward Function Design
- is the comparison value of the Manhattan distance between the current state and the previous state from the endpoint. If the distance decreases, then a reward value is set based on the reduced distance; otherwise, set it to −1.
- is the reward for reaching the endpoint. If it is reached, set it to 50; otherwise, set it to −1. This can provide positive incentives to encourage the agent to reach the endpoint faster and accelerate the convergence of the training process.
- is the obstacle punishment, which includes two situations: encountering obstacles in the environment and moving outside the boundaries of the environment. If the agent collides with an obstacle, set it to −5; otherwise, set it to 0. This can reduce the agent’s ineffective exploration near the obstacle by punishing it for collision with the obstacle. Learning to avoid the obstacle is a better strategy.
- is the step penalty. Each move will be slightly penalized, so set it to −0.1. The continuous accumulation of can motivate the agent to minimize the number of steps, prevent it from wandering around the map for a long time, and achieve the goal faster.
3. Improved DQN Algorithm
3.1. Boltzmann Strategy
3.2. Priority Experience Replay
3.3. Improved DQN Algorithm Model
3.4. The Overall Framework of the B-PER Algorithm
Algorithm 1: B-PER DQN algorithm |
1. Initialize the priority experience playback buffer , the capacity is 2. Initialize environment parameters and training parameters 3. For episode =1 to do 4. Reset the environment, get the initial states, initialize episode reward =0 5. While not done: 6. Select the action according to Equation (8). 7. Execute the action and store to 8. If > 256: 9. According to Equation (5), the sample of batch-size size is extracted from 10. Calculate and update network parameters according to Equation (11) 11. End if 12. Dynamically adjust according to Equation (10) 13. Attenuation according to Equation (9) 14. End for |
4. Simulation Experiments
4.1. Design of Experiments
4.2. Comparative Experiment and Analysis of Convergence Speed
4.3. Comparative Experiment and Analysis of Path Planning Effect
4.3.1. Comparison Experiment for Path Length
4.3.2. Comparative Experiment of Completion Rate
5. Discussion
6. Conclusions
- In the simple environment, the B-PER DQN algorithm gradually becomes stable after 50 rounds, while the traditional DQN and DDQN algorithms begin to converge after 100 rounds. The path length is reduced by 29.83% compared with the DQN algorithm and 11.52% compared with the DDQN algorithm. The completion rate is increased by 12.5% compared with the DDQN algorithm, 8.4% compared with the PER DQN algorithm, and 5.9% compared with the Dueling DQN algorithm. This shows that the B-PER DQN algorithm converges faster and has higher stability in the cumulative reward function and training steps and verifies the effectiveness of the adaptive temperature parameters and priority experience replay mechanism.
- In the complex environment, the cumulative reward values of the B-PER DQN algorithm and the DDQN algorithm start to converge after around 100 rounds of training, but the stability of the DDQN algorithm is poor. In terms of path planning effect, the standard deviation of the B-PER DQN algorithm is also the lowest; it is reduced by 36.64% compared with the DQN algorithm and 19.81% compared with the DDQN algorithm. At the same time, the proportion of successful arrival times is also the highest, indicating that the algorithm is more stable.
- When the environment scale expands and the complexity increases, the completion rate of B-PER DQN decreases by 9.47%, the completion rate of conventional DQN decreases by 12.50%, and the completion rate of DDQN decreases by 10.0%, which indicates that B-PER DQN is more adaptable to changes in environmental complexity.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AGV | Automated Guided Vehicle |
DQN | Deep Q Network |
PRM | Probabilistic Roadmap Method |
GWO | Grey Wolf Optimizer |
WOA | Whale Optimization Algorithm |
DRL | Deep Reinforcement Learning |
DDQN | Double Deep Q-Network |
PER | Priority Experience Replay |
MDP | Markov Decision Process |
NP | Navigation Priority |
B-PER DQN | Boltzmann Strategy-Priority Experience Replay Deep Q Network |
TD | Temporal Difference Error |
APF | Artificial Potential Field |
RRT | Rapidly Exploring Random Trees |
References
- Zhao, X.J.; Ye, H.; Jia, W.; Sun, Z. Survey on AGV Path Planning and Obstacle Avoidance Algorithms. J. Chin. Comput. Syst. 2024, 45, 529–541. [Google Scholar]
- Yan, L.W.; Fan, X.N. AGV local path planning based on adaptive dynamic window approach. J. Nanjing Univ. Inf. Sci. Technol. 2024, 1–12. [Google Scholar] [CrossRef]
- Peng, B.; Wang, L.; Yang, S.L. Trajectory planning of automated guided vehicle based on improved A* algorithm and dynamic window approach. J. Comput. Appl. 2022, 42, 347–352. [Google Scholar]
- Jiang, C.J.; Zhu, H.; Xie, Y.; Wang, H.X. Path Planning for Mobile Robots Based on Improved A* Algorithm. Int. J. Mach. Tools Manuf. 2024, 33–36+73. [Google Scholar] [CrossRef]
- Wang, H.Q.; Song, G.Z.; Chao, S.; Tan, Q.; Ge, C. Dynamic path planning of robot based on dung beetle algorithm and DWA algorithm. Manuf. Technol. Mach. Tool 2024, 12, 21–29. [Google Scholar]
- Wang, K.; Zhu, H.Z.; Wang, L.J. Global Path Planning Method for Mobile Robot under Deep Learning Theory. Comput. Simul. 2023, 40, 431–434+439. [Google Scholar]
- Liang, C. Research on Intelligent Vehicle Path Planning Based on Deep Learning and A* Algorithm. Automob. Appl. Technol. 2023, 48, 87–94. [Google Scholar]
- Wu, T.; Zhang, Z.; Jing, F.; Gao, M. A Dynamic Path Planning Method for UAVs Based on Improved Informed-RRT* Fused Dynamic Windows. Drones 2024, 8, 539. [Google Scholar] [CrossRef]
- Huang, Y.H.; Sun, J.X. Research on path planning of improved Dijkstra algorithm based on automatic wharf. Electron. Des. Eng. 2023, 31, 37–41. [Google Scholar]
- Sualla, R.; Ezeofor, J.C. A Dijkstra Based Algorithm for Optimal Splitter Location in Passive Optical Local Area Network (POLAN). J. Eng. Res. Rep. 2024, 26, 80–94. [Google Scholar] [CrossRef]
- Zhao, D.; Ni, L.; Zhou, K.; Lv, Z.; Qu, G.; Gao, Y.; Yuan, W.; Wu, Q.; Zhang, F.; Zhang, Q. A Study of the Improved A* Algorithm Incorporating Road Factors for Path Planning in Off-Road Emergency Rescue Scenarios. Sensors 2024, 24, 5643. [Google Scholar] [CrossRef]
- Huang, J.; Chen, C.; Shen, J.; Liu, G.; Xu, F. A self-adaptive neighborhood search A-star algorithm for mobile robots global path planning. Comput. Electr. Eng. 2025, 123, 110018. [Google Scholar] [CrossRef]
- Wang, Y.; Qian, L.; Hong, M.; Luo, Y.; Li, D. Multi-Objective Route Planning Model for Ocean-Going Ships Based on Bidirectional A-Star Algorithm Considering Meteorological Risk and IMO Guidelines. Appl. Sci. 2024, 14, 8029. [Google Scholar] [CrossRef]
- Wang, X.; Li, G.; Bian, Z. Research on the A* Algorithm Based on Adaptive Weights and Heuristic Reward Values. World Electr. Veh. J. 2025, 16, 144. [Google Scholar] [CrossRef]
- You, D.Z.; Zhao, H.Y.; Song, L.W. Robotic Arm Motion Planning Based on an Improved PRM. J. Mech. Transm. 2024, 48, 87–93+148. [Google Scholar]
- Xu, K.; Chen, Y.; Zhang, X.; Ge, Y.; Zhang, X.; Li, L.; Guo, C. Improved Sparrow Search Algorithm Based on Multistrategy Collaborative Optimization Performance and Path Planning Applications. Processes 2024, 12, 2775. [Google Scholar] [CrossRef]
- Hu, X.; Xiang, H. Unmanned aerial vehicle 3D path planning based on multi-strategy improved whale optimization algorithm. Mod. Electron. Tech. 2025, 48, 79–85. [Google Scholar]
- Sun, B.; Zhou, J.; Zhao, Y.; Zhang, Y.; Peng, H.; Zhao, W. Global Path Planning of Robot Based on Improved Gray Wolf Optimization Algorithm. Sci. Technol. Eng. 2024, 24, 14287–14297. [Google Scholar]
- Yu, J.; Chen, Y.; Feng, C.; Su, Y.; Guo, J. Review of Research on Local Path Planning for Intelligent Construction Robots. Comput. Eng. Appl. 2024, 60, 16–29. [Google Scholar]
- Guo, X.Y.; Zhao, S.P. A Survey of Algorithms in Robot Path Planning. Radio Eng. 2024, 54, 2664–2671. [Google Scholar]
- Liu, J.; Lin, X.; Huang, C.; Cai, Z.; Liu, Z.; Chen, M.; Li, Z. A Study on Path Planning for Curved Surface UV Printing Robots Based on Reinforcement Learning. Mathematics 2025, 13, 648. [Google Scholar] [CrossRef]
- Deguale, A.D.; Yu, L.; Sinishaw, L.M.; Li, K. Enhancing Stability and Performance in Mobile Robot Path Planning with PMR-Dueling DQN Algorithm. Sensors 2024, 24, 1523. [Google Scholar] [CrossRef]
- He, X.; Ma, P. Time-varying road network path planning based on double deep Q-network. Electron. Meas. Technol. 2023, 46, 23–29. [Google Scholar]
- Sun, C.Y.; Zhang, L.; Xin, S.; Liu, Y. Dynamic Environment Robot Path Planning Method Combining APF and Improved DDQN. J. Chin. Comput. Syst. 2023, 44, 1940–1946. [Google Scholar]
- Li, P.; Zhou, Y.; Yang, G. Intelligent Maritime Path Planning Based on Deep Q-Networks. Electron. Meas. Technol. 2024, 47, 77–84. [Google Scholar]
- El Wafi, M.; Youssefi, M.A.; Dakir, R.; Bakir, M. Intelligent Robot in Unknown Environments: Walk Path Using Q-Learning and Deep Q-Learning. Automation 2025, 6, 12. [Google Scholar] [CrossRef]
- Gong, Y.; Yu, H.; Kang, L.; Qiao, G.; Guo, D.; Zeng, J. A surrogate-assisted evolutionary algorithm with dual restricted Boltzmann machines and reinforcement learning-based adaptive strategy selection. Swarm Evol. Comput. 2024, 89, 101629. [Google Scholar] [CrossRef]
- Chen, Y.; Kang, J. Boltzmann optimized Q-learning algorithm for high-speed railway handover control. Control Theory Appl. 2025, 42, 1–7. [Google Scholar]
- Wen, J.; Huang, Z.; Zhang, G. The Path Planning for Unmanned Ship Based on the Prioritized Experience Replay of Deep Q-networks. J. Basic Clin. Physiol. Pharmacol. 2020, 126, 128–129. [Google Scholar]
- Wu, Y.W.; Xu, P.; Lv, Y.; Wang, D.; Gao, F.; Wang, J. Beamforming prediction based on the multireward DQN framework for UAV-RIS-assisted THz communication systems. Sci. China Inf. Sci. 2024, 67, 333–334. [Google Scholar] [CrossRef]
- Yong, X.I.; Suijun, Z.H.; Niansheng, C.H.; Hongjun, Z.H. Multi-AGV Route Planning for Unmanned Warehouses Based on Improved DQN Algorithm. Ind. Eng. J. 2024, 27, 36–44+53. [Google Scholar]
- Fu, H.; Li, Z.; Zhang, W.; Feng, Y.; Zhu, L.; Fang, X.; Li, J. Research on Path Planning of Agricultural UAV Based on Improved Deep Reinforcement Learning. Agronomy 2024, 14, 2669. [Google Scholar] [CrossRef]
Parameter | Value | |
---|---|---|
Discount factor () | 0.99 | |
Learning rate () | 0.001, using Adam optimizer | |
Number of network nodes | Layer 1 | (State dimension, 128) |
Layer 2 | (128, 128) | |
Layer 3 | (128, 128) | |
Layer 4 | (128, Action dimension) | |
Maximum search steps per epoch | 300 | |
Batch size | 32 | |
Update rate of the objective function parameter | 100 | |
Replay buffer storage size | 2000 | |
Priority weight () | 0.6 | |
Importance sampling weight () | 0.4 |
Parameter | DQN Algorithm | DDQN Algorithm | B-PER DQN Algorithm |
---|---|---|---|
Median | 21 | 21 | 21 |
Average value | 34.70 | 27.52 | 24.35 |
Maximum | 300 | 300 | 300 |
Minimum | 18 | 18 | 18 |
Standard deviation | 42.77 | 29.99 | 24.80 |
Parameter | DQN Algorithm | DDQN Algorithm | B-PER DQN Algorithm |
---|---|---|---|
Median | 339 | 278 | 260 |
Average value | 484.66 | 401.34 | 361.00 |
Maximum | 1200 | 1200 | 1200 |
Minimum | 68 | 70 | 65 |
Standard deviation | 378.01 | 298.70 | 219.13 |
Algorithm | 50 Cycle | 300 Cycle | 500 Cycle |
---|---|---|---|
B-PER DQN | 0.8 | 0.9 | 0.95 |
DQN | 0.4 | 0.8 | 0.7 |
DDQN | 0.6 | 0.8 | 0.78 |
Dueling DQN | 0.6 | 0.85 | 0.85 |
PER DQN | 0.6 | 0.83 | 0.86 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiao, Q.; Pan, T.; Wang, K.; Cui, S. Research on AGV Path Planning Based on Improved DQN Algorithm. Sensors 2025, 25, 4685. https://doi.org/10.3390/s25154685
Xiao Q, Pan T, Wang K, Cui S. Research on AGV Path Planning Based on Improved DQN Algorithm. Sensors. 2025; 25(15):4685. https://doi.org/10.3390/s25154685
Chicago/Turabian StyleXiao, Qian, Tengteng Pan, Kexin Wang, and Shuoming Cui. 2025. "Research on AGV Path Planning Based on Improved DQN Algorithm" Sensors 25, no. 15: 4685. https://doi.org/10.3390/s25154685
APA StyleXiao, Q., Pan, T., Wang, K., & Cui, S. (2025). Research on AGV Path Planning Based on Improved DQN Algorithm. Sensors, 25(15), 4685. https://doi.org/10.3390/s25154685