This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Multi-AGV Collaborative Task Scheduling and Deep Reinforcement Learning Optimization Under Multi-Feature Constraints
by
Dongping Zhao
Dongping Zhao 1,*,
Hui Li
Hui Li 2,
Ziyang Wang
Ziyang Wang 3 and
Hang Li
Hang Li 4
1
School of Aircraft Engineering/School of Low-Altitude Technology and Engineering, Xihang University, Xi’an 710077, China
2
Shenyang Aircraft Design Institute, Shenyang 110035, China
3
School of Mechanical Engineering, Northwestern Polytechnical University, Xi’an 710072, China
4
School of Mechanical Engineering, Sichuan University of Science and Engineering, Yibin 644000, China
*
Author to whom correspondence should be addressed.
Processes 2025, 13(11), 3754; https://doi.org/10.3390/pr13113754 (registering DOI)
Submission received: 27 October 2025
/
Revised: 19 November 2025
/
Accepted: 19 November 2025
/
Published: 20 November 2025
Abstract
To address the challenges of low efficiency, instability, and difficulties in meeting multiple constraints simultaneously in multi-AGV (Automated Guided Vehicle) task scheduling for intelligent manufacturing and logistics, this paper introduces a scheduling method based on multi-feature constraints and an improved deep reinforcement learning (DRL) approach (Improved Proximal Policy Optimization, IPPO). The method integrates multiple constraints, including minimizing task completion time, reducing penalty levels, and minimizing scheduling time deviation, into the scheduling optimization process. Building on the conventional PPO algorithm, several enhancements are introduced: a dynamic penalty mechanism is implemented to adaptively adjust constraint weights, a structured reward function is designed to boost learning efficiency, and sampling bias correction is combined with global state awareness to improve training stability and global coordination. Simulation experiments demonstrate that, after 10,000 iterations, the minimum task completion time drops from 98.2 s to 30 s, the penalty level decreases from 130 to 82, and scheduling time deviation reduces from 12 s to 0.5 s, representing improvements of 69.4%, 37%, and 95.8% in the same scenario, respectively. Compared to genetic algorithms (GAs) and rule-based scheduling methods, the IPPO approach demonstrates significant advantages in average task completion time, total system makespan, and overall throughput, along with faster convergence and better stability. These findings demonstrate that the proposed methodology enables effective multi-objective collaborative optimization and efficient task scheduling within complex dynamic environments, holding significant value for intelligent manufacturing and logistics systems.
Share and Cite
MDPI and ACS Style
Zhao, D.; Li, H.; Wang, Z.; Li, H.
Multi-AGV Collaborative Task Scheduling and Deep Reinforcement Learning Optimization Under Multi-Feature Constraints. Processes 2025, 13, 3754.
https://doi.org/10.3390/pr13113754
AMA Style
Zhao D, Li H, Wang Z, Li H.
Multi-AGV Collaborative Task Scheduling and Deep Reinforcement Learning Optimization Under Multi-Feature Constraints. Processes. 2025; 13(11):3754.
https://doi.org/10.3390/pr13113754
Chicago/Turabian Style
Zhao, Dongping, Hui Li, Ziyang Wang, and Hang Li.
2025. "Multi-AGV Collaborative Task Scheduling and Deep Reinforcement Learning Optimization Under Multi-Feature Constraints" Processes 13, no. 11: 3754.
https://doi.org/10.3390/pr13113754
APA Style
Zhao, D., Li, H., Wang, Z., & Li, H.
(2025). Multi-AGV Collaborative Task Scheduling and Deep Reinforcement Learning Optimization Under Multi-Feature Constraints. Processes, 13(11), 3754.
https://doi.org/10.3390/pr13113754
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.