Risk-Constrained Multi-Objective Deep Reinforcement Learning for AGV Path Planning in Rail Transit
Abstract
1. Introduction
2. Materials and Methods
2.1. Quantitative Safety and Risk Assessment Criteria
2.1.1. Problem Statement and Multi-Objective Decomposition
2.1.2. Risk Assessment Criteria
2.2. Research on Heuristic Global Path Planning Based on Waypoint Fitting
2.2.1. Modeling of AGV Working Environment
2.2.2. Waypoint Fitting
2.2.3. Conflict Resolution
2.2.4. Collision Model and Choice of Radius
2.2.5. Reference Global Trajectory
- 1. Graph search (rail-aware A)*
- 2. Curvature-constrained smoothing (geometric feasibility)
- 3. Time-parameterization (dynamic feasibility)
- Safety Corridor used by the optimizer;
- (i).
- Smoothness: a penalty on input increments
- (ii).
- Tracking: a penalty on deviations from the reference (e.g., position and speed).
2.2.6. Trajectory Optimization
Algorithm 1 Multi-AGV Smooth Trajectory Optimization |
Require: Global reference paths , initial states ν, horizon length H |
Ensure: Smoothed and collision-free trajectories {, } |
1: Initialize trajectories {} for all AGVs i |
2: repeat |
3: for each AGV i = 1,..., n do |
4: Set initial state , terminal state |
5: for each step j = 1,..., H do |
6: Update dynamics: |
7: Enforce safety set constraint: |
8: Bound control inputs: |
9: end for |
10: Check inter-AGV separation: |
11: end for |
12: Compute cost function: |
13: Update trajectories { , } based on optimization results |
14: until convergence or max iterations |
15: Return optimized trajectories { , } |
Algorithm 2 Priority-based Multi-AGV Trajectory Smoothing |
Require: Global reference paths , safety set S, priority groups {,…, }, horizon length H |
Ensure: Smoothed and collision-free trajectories } |
1: Initialize fixed-trajectory set H ← ∅ |
2: for each priority group k = 1,…,m (high → low) do |
3: Add trajectories of higher-priority groups to hard set: |
H ← |
4: for each AGV i ∈ do |
5: Time-parametrize reference path : H using uniform subdivision |
6: Formulate optimization objective: |
7: Apply constraints: |
(a) Dynamics: |
(b) Input/state bounds: |
(c) Static-obstacle clearance: |
(d) Cross-group separation: |
8: Solve optimization problem using SQP/interior-point: |
) ← argmin objective under constraints |
9: end for |
10: If infeasible: |
introduce slack ξ to soften non-critical constraints with penalty, |
re-solve; if still infeasible, apply trajectory shifts/reduced horizon |
11: Add optimized group trajectories { } to H |
12: end for |
13: Return all optimized trajectories } |
2.2.7. Experimental Results and Analysis
- 1. Multi-AGV global path planning results
- 2. Conflict resolution
- 3. Trajectory optimization results
2.3. Research on Environmental Perception Algorithm Based on Multi-Sensor Feature Fusion
2.3.1. Visual Feature Processing Based on Dual Modules
2.3.2. LiDAR Information Processing
- If for consecutive frames mark occupied.
- If for consecutive frames, mark free.
- Otherwise mark unknown and treat as a soft cost in planning.
2.4. Research on Dynamic Path Planning of Multi-AGV Based on Deep Reinforcement Learning Under Global Guidance
2.4.1. Multi-AGV Dynamic Path Planning and Design
2.4.2. Reward Function Design
3. Experimental Results and Analysis
3.1. Static Evaluation on the Maze Benchmark Map
3.2. Dynamic Environment Under 10, 20, and 30 Obstacles
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jia, L.; Cheng, P.; Zhang, Z.; Ji, L.; Xu, C. Integrated Development of Rail Transit and Energies in China: Development Paths and Strategies. Strateg. Study CAE 2022, 24, 173–183. [Google Scholar] [CrossRef]
- Yang, M.; Bian, Y.; Ma, L.; Liu, G.; Zhang, H. Research on Traffic Control Algorithm Based on Multi-AGV Path Planning. In Proceedings of the 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Melbourne, Australia, 17–20 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 697–702. [Google Scholar]
- Pratama, A.Y.; Ariyadi, M.R.; Tamara, M.N.; Purnomo, D.S.; Ramadhan, N.A.; Pramujati, B. Design of Path Planning System for Multi-Agent AGV Using A* Algorithm. In Proceedings of the 2023 International Electronics Symposium (IES), Denpasar, Indonesia, 13–14 September 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 335–341. [Google Scholar]
- Hu, J.; Shang, W.; Lou, H. Research on AGV Path Based on Optimal Planning. In Proceedings of the 2021 4th International Conference on Data Science and Information Technology, Shanghai, China, 23–25 July 2021; pp. 236–240. [Google Scholar]
- Tien, T.N.; Nguyen, K.V. Updated Weight Graph for Dynamic Path Planning of Multi-AGVs in Healthcare Environments. In Proceedings of the 2022 International Conference on Advanced Technologies for Communications (ATC), Hanoi, Vietnam, 20–22 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 130–135. [Google Scholar]
- Huang, F.; Guo, W.; Zhao, H. AGV Path Planning Based on Improved Genetic Algorithm. In Proceedings of the 2023 2nd International Symposium on Control Engineering and Robotics (ISCER), Beijing, China, 19–21 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 3323–3327. [Google Scholar]
- Shi, J.; Wang, W.; Wang, X.; Sun, H.; Lan, X.; Xin, J.; Zheng, N. Leveraging Spatio-Temporal Evidence and Independent Vision Channel to Improve Multi-Sensor Fusion for Vehicle Environmental Perception. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China, 26–30 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 591–596. [Google Scholar]
- Senel, N.; Kefferpütz, K.; Doycheva, K.; Elger, G. Multi-Sensor Data Fusion for Real-Time Multi-Object Tracking. Processes 2023, 11, 501. [Google Scholar] [CrossRef]
- Bouain, M.; Ali, K.M.A.; Berdjag, D.; Fakhfakh, N.; Ben Atitallah, R. An Embedded Multi-Sensor Data Fusion Design for Vehicle Perception Tasks. J. Commun. 2018, 13, 8–14. [Google Scholar] [CrossRef]
- Xiang, C.; Feng, C.; Xie, X.; Shi, B.; Lu, H.; Lv, Y.; Yang, M.; Niu, Z. Multi-Sensor Fusion and Cooperative Perception for Autonomous Driving: A Review. IEEE Intell. Transp. Syst. Mag. 2023, 15, 36–58. [Google Scholar] [CrossRef]
- Jayaratne, M.; De Silva, D.; Alahakoon, D. Unsupervised Machine Learning Based Scalable Fusion for Active Perception. IEEE Trans. Autom. Sci. Eng. 2019, 16, 1653–1663. [Google Scholar] [CrossRef]
- Liao, X.; Wang, Y.; Xuan, Y.; Wu, D. AGV Path Planning Model Based on Reinforcement Learning. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 6722–6726. [Google Scholar]
- Yin, H.; Lin, Y.; Yan, J.; Meng, Q.; Festl, K.; Schichler, L.; Watzenig, D. AGV Path Planning Using Curiosity-Driven Deep Reinforcement Learning. In Proceedings of the 2023 IEEE 19th International Conference on Automation Science and Engineering (CASE), Auckland, New Zealand, 26–30 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
- Guo, X.; Ren, Z.; Wu, Z.; Lai, J.; Zeng, D.; Xie, S. A Deep Reinforcement Learning Based Approach for AGVs Path Planning. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 6833–6838. [Google Scholar]
- Ye, X.; Deng, Z.; Shi, Y.; Shen, W. Toward Energy-Efficient Routing of Multiple AGVs with Multi-Agent Reinforcement Learning. Sensors 2023, 23, 5615. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Yin, G.; Li, J.; Jiang, J. Research on AGV Path Planning Based on Reinforcement Learning. In Proceedings of the 2023 8th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 14–16 April 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 934–937. [Google Scholar]
- Theodorou, P.; Tsiligkos, K.; Meliones, A. Multi-Sensor Data Fusion Solutions for Blind and Visually Impaired: Research and Commercial Navigation Applications for Indoor and Outdoor Spaces. Sensors 2023, 23, 5411. [Google Scholar] [CrossRef] [PubMed]
- Cao, X.; Tan, B.; Li, Y.; Ding, S. Dynamic Load Regulation of Robots with Multi-Sensor Fusion. J. Phys. Conf. Ser. 2022, 2400, 012022. [Google Scholar] [CrossRef]
- Yeong, D.J.; Velasco-Hernandez, G.; Barry, J.; Walsh, J. Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review. Sensors 2021, 21, 2140. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Fang, H. Estimation of LAI with the LiDAR Technology: A Review. Remote Sens. 2020, 12, 3457. [Google Scholar] [CrossRef]
- Zhang, S.; Chen, Z.; Gao, Y.; Wan, W.; Shan, J.; Xue, H.; Sun, F.; Yang, Y.; Fang, B. Hardware Technology of Vision-Based Tactile Sensor: A Review. IEEE Sens. J. 2022, 22, 21410–21427. [Google Scholar] [CrossRef]
- Wang, C.; Mao, J. Summary of AGV Path Planning. In Proceedings of the 2019 3rd International Conference on Electronic Information Technology and Computer Engineering (EITCE), Xiamen, China, 18–20 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 332–335. [Google Scholar]
- Yin, X.; Cai, P.; Zhao, K.; Zhang, Y.; Zhou, Q.; Yao, D. Dynamic Path Planning of AGV Based on Kinematical Constraint A* Algorithm and Following DWA Fusion Algorithms. Sensors 2023, 23, 4102. [Google Scholar] [CrossRef] [PubMed]
- De Ryck, M.; Pissoort, D.; Holvoet, T.; Breslin, J.G. Decentral Task Allocation for Industrial AGV-Systems with Resource Constraints. J. Manuf. Syst. 2021, 59, 310–319. [Google Scholar] [CrossRef]
- Liu, A.; Yang, Y.; Sun, Q.; Xu, Q. A Deep Fully Convolution Neural Network for Semantic Segmentation Based on Adaptive Feature Fusion. In Proceedings of the 2018 5th International Conference on Information Science and Control Engineering (ICISCE), Zhengzhou, China, 20–22 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 16–20. [Google Scholar]
- Li, S.; Yan, J.; Li, L. Automated Guided Vehicle: The Direction of Intelligent Logistics. In Proceedings of the 2018 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI), Singapore, 31 July–2 August 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 250–255. [Google Scholar]
- Zheyi, C.; Bing, X. AGV Path Planning Based on Improved Artificial Potential Field Method. In Proceedings of the 2021 IEEE International Conference on Power Electronics, Computer Applications (ICPECA), Shenyang, China, 22–24 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 32–37. [Google Scholar]
- Jayasree, K.R.; Jayasree, P.R.; Vivek, A. Smoothed RRT Techniques for Trajectory Planning. In Proceedings of the 2017 International Conference on Technological Advancements in Power and Energy (TAP Energy), Kollam, India, 21–23 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–8. [Google Scholar]
- Chen, X.; Liu, S.; Zhao, J.; Wu, H.; Xian, J.; Montewka, J. Autonomous port management based AGV path planning and optimization via an ensemble reinforcement learning framework. Ocean. Coast. Manag. 2024, 251, 107087. [Google Scholar] [CrossRef]
- Huang, Y. Deep Q-Networks. In Deep Reinforcement Learning: Fundamentals, Research and Applications; Springer: Singapore, 2020; pp. 135–160. [Google Scholar] [CrossRef]
Map | Starting Point Coordinates (m) | Target Point Coordinates (m) |
---|---|---|
1 | (12.36, −10.68) | (−0.63, 6.16) |
2 | (11.17, 14.40) | (−13.78, −3.08) |
3 | (0.67, 14.91) | (6.38, −6.43) |
4 | (2.60, −12.11) | (3.98, 11.40) |
5 | (−9.98, −11.67) | (−4.29, 14.73) |
Metric | Statistic | RRT | DDPG [29] | GG-DRL (Ours) |
---|---|---|---|---|
Path Length (unit) | min | 80 | 80 | 80 |
max | 87 | 86 | 85 | |
mean | 84 | 80.37 | 80.12 | |
Computation Time (s) | min | 273.8 | 57.29 | 10.1 |
max | 319. 1 | 70.58 | 11.8 | |
mean | 296.08 | 62.38 | 10.94 | |
Obstacle Avoidance Times | min | 17 | 6 | 3 |
max | 30 | 14 | 9 | |
mean | 24.3 | 9.8 | 5.2 | |
Trajectory Smoothness | min | 0.71 | 0.74 | 0.77 |
max | 0.83 | 0.87 | 0.90 | |
mean | 0.78 | 0.81 | 0.85 |
Metric | Statistic | RRT | DDPG [29] | GG-DRL (Ours) |
---|---|---|---|---|
Path Length (unit) | Min | 81 | 80 | 80 |
Max | 83 | 84 | 84 | |
Mean | 81.7 | 82.1 | 81.9 | |
Computational Time (s) | Min | 300.08 | 63.54 | 11.29 |
Max | 323.78 | 70.78 | 12.64 | |
Mean | 307.21 | 66.19 | 12.08 | |
Obstacle Avoidance Times | Min | 3 | 3 | 2 |
Max | 7 | 6 | 5 | |
Mean | 5.1 | 4.8 | 4 | |
Trajectory Smoothness | Min | 0.74 | 0.75 | 0.77 |
Max | 0.9 | 0.92 | 0.92 | |
Mean | 0.84 | 0.85 | 0.87 |
Metric | Statistic | RRT | DDPG [29] | GG-DRL (Ours) |
---|---|---|---|---|
Path Length (unit) | Min | 81 | 80 | 80 |
Max | 88 | 88 | 87 | |
Mean | 82.4 | 83.4 | 82.9 | |
Computational Time (s) | Min | 319.40 | 64.10 | 12.03 |
Max | 367.20 | 80.78 | 14.5 | |
Mean | 339.46 | 75.44 | 13.06 | |
Obstacle Avoidance Times | Min | 5 | 5 | 4 |
Max | 10 | 9 | 7 | |
Mean | 7.3 | 6.5 | 5.8 | |
Trajectory Smoothness | Min | 0.76 | 0.77 | 0.8 |
Max | 0.90 | 0.91 | 0.92 | |
Mean | 0.83 | 0.85 | 0.86 |
Metric | Statistic | RRT | DDPG [29] | GG-DRL (Ours) |
---|---|---|---|---|
Path Length (unit) | Min | 80 | 80 | 80 |
Max | 82 | 82 | 86 | |
Mean | 81.6 | 81.9 | 84.4 | |
Computational Time (s) | Min | 314.51 | 63.54 | 12.51 |
Max | 340.07 | 72.68 | 15.09 | |
Mean | 326.01 | 70.57 | 14.09 | |
Obstacle Avoidance Times | Min | 8 | 7 | 6 |
Max | 15 | 13 | 11 | |
Mean | 12.1 | 10.8 | 8.7 | |
Trajectory Smoothness | Min | 0.71 | 0.73 | 0.75 |
Max | 0.9 | 0.91 | 0.93 | |
Mean | 0.82 | 0.84 | 0.86 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, Z.; Xiang, H. Risk-Constrained Multi-Objective Deep Reinforcement Learning for AGV Path Planning in Rail Transit. Appl. Syst. Innov. 2025, 8, 145. https://doi.org/10.3390/asi8050145
Yang Z, Xiang H. Risk-Constrained Multi-Objective Deep Reinforcement Learning for AGV Path Planning in Rail Transit. Applied System Innovation. 2025; 8(5):145. https://doi.org/10.3390/asi8050145
Chicago/Turabian StyleYang, Zihan, and Huiyu Xiang. 2025. "Risk-Constrained Multi-Objective Deep Reinforcement Learning for AGV Path Planning in Rail Transit" Applied System Innovation 8, no. 5: 145. https://doi.org/10.3390/asi8050145
APA StyleYang, Z., & Xiang, H. (2025). Risk-Constrained Multi-Objective Deep Reinforcement Learning for AGV Path Planning in Rail Transit. Applied System Innovation, 8(5), 145. https://doi.org/10.3390/asi8050145