Research on Proximal Policy Optimization Algorithm in Path Planning for UAV-Based Vehicle Tracking
Highlights
- The proposed PPO-based method significantly improves the convergence speed and stability of UAV trajectory planning in vehicle tracking scenarios.
- Compared with traditional path planning approaches, the optimized policy effectively reduces tracking error and enhances trajectory smoothness.
- The results demonstrate that reinforcement learning-based trajectory planning can provide reliable and adaptive tracking performance in dynamic traffic environments.
- The proposed method offers a practical solution for intelligent transportation applications such as traffic monitoring, autonomous escorting, and aerial–ground cooperative systems.
Abstract
1. Introduction
1.1. Research Background and Significance
1.2. Research Status
1.3. Research Content and Contributions
2. Modeling of Path Planning for UAV-Tracking
2.1. Analysis of Commonly Used Path Planning Algorithms
2.1.1. Artificial Potential Field (APF)
2.1.2. TD3 Algorithm
2.1.3. Q-Learning
2.1.4. Proximal Policy Optimization
2.2. Structural Composition of the UAV-Tracking System
2.3. Modeling of the Tracking Scenario and Assumptions
2.4. Modeling of Three-Dimensional Path Constraints
2.4.1. Basic Safety Distance Constraint
2.4.2. Lateral and Longitudinal Tracking Error Constraints
2.4.3. Dynamic and Kinematic Feasibility Constraints
2.4.4. Simplified Dynamic Model
- The UAV is treated as a point mass, neglecting inertial coupling, air resistance, and nonlinear propulsion characteristics.
- The UAV can instantaneously adjust its forward velocity , vertical velocity , and yaw rate .
- The state vector consists of altitude , position , and attitude angles .
- Flight constraints such as maximum acceleration, tilt angle limits, and delays caused by rotational inertia are not considered.
2.4.5. Airspace Regulations and Altitude Constraints
2.4.6. Relative Motion and Prediction Constraints
2.4.7. Optimization and Constraint Relaxation
3. Design of a PPO-Based Path Planning Method
3.1. Theoretical Foundations and Applicability Analysis of the Proximal Policy Optimization (PPO) Method
3.1.1. Algorithm Principle
3.1.2. Applicability Analysis
3.2. Algorithm Structure and Steps
3.3. Algorithm Complexity and Convergence Analysis
3.3.1. Algorithm Complexity Analysis
3.3.2. Convergence Analysis
4. Simulation and Results Analysis
4.1. Simulation Experiment Environment and Parameter Settings
4.1.1. Simulation Platform Construction
- Environment Module: The simulation scenario is constructed using MATLAB 3D graphics and dynamic modeling, including ground roads, building clusters, obstacles, and target points.
- UAV Dynamic Module: The UAV is modeled using simplified six-degree-of-freedom dynamics. The control inputs are the desired velocity and heading angle.
- Reinforcement Learning Training Module: PPO is employed to train the policy. Both the policy network and the value network adopt deep neural network architectures.
- Path Evaluation Module: After training, the performance of the UAV is analyzed using metrics such as average reward, success rate, and trajectory smoothness.
4.1.2. State Space and Action Space Design
4.1.3. Construction of the Actor–Critic Network
4.2. Tracking Performance Evaluation in Typical Scenarios
4.2.1. Simulation Environment and Parameters
4.2.2. Evaluation Metrics
- (1)
- Tracking Error
- (2)
- Convergence Stability
- (3)
- Response Time
4.2.3. Robustness Analysis
4.2.4. Adaptability to Continuous Action Spaces
4.2.5. Stability of Policy Updates
4.2.6. Ablation Study on the Reward Function
4.2.7. Tracking Error Variations Across Different Scenarios
5. Conclusions
- 3D Path Planning Model and Objective Design: Based on the kinematic characteristics of UAVs and ground vehicles, a 3D path planning model was developed that considers spatial coordinates, velocity, and attitude constraints. A well-designed objective function—including tracking error minimization, energy optimization, and safety distance constraints—ensures both the realism and operability of the model.
- PPO-Based Adaptive Learning: By designing an appropriate state space, action space, and reward function, the PPO algorithm can achieve adaptive learning in complex environments. MATLAB simulation results demonstrate that PPO-based UAV path planning outperforms comparative algorithms such as Q-learning in terms of tracking accuracy, convergence speed, and robustness. In specific scenarios, the trajectory error of Q-learning is approximately 1 m, whereas PPO achieves an error of about 0.2 m, with faster and more stable error convergence within roughly 10 s. In comparison, the APF algorithm converges in about 15 s, and TD3 converges in 10 s but exhibits oscillations. Incorporating a smoothness reward further improves the UAV path smoothness, allowing the UAV to follow the vehicle trajectory stably, indicating the promising application potential of PPO in intelligent UAV control.
- Enhanced Decision-Making and Path Optimization: The PPO-based UAV-tracking path planning method effectively enhances the UAV’s intelligent decision-making and path optimization capabilities, providing a new technical approach and research foundation for intelligent UAV traffic and cooperative control systems.
- Limitations of PPO: While PPO exhibits strong stability and robustness in continuous control problems, certain limitations remain in UAV-tracking tasks. First, as an on-policy algorithm, its sample efficiency is relatively low, resulting in moderate training efficiency. Second, PPO relies primarily on local policy optimization, limiting its long-term path planning capability; in complex obstacle environments, the planned path may not be optimal. Moreover, control precision in continuous action spaces is limited, the algorithm is sensitive to reward function design, and generalization across multiple scenarios remains insufficient. These issues partially constrain the application of PPO in high-precision, real-time UAV tracking tasks.
- Future Research Directions: Although PPO-based UAV-tracking path planning demonstrates strong performance in theory and simulation, there remains substantial scope for research in real-world validation, multi-UAV cooperation, and the integration of perception and decision-making. Future work could leverage hardware-in-the-loop simulation platforms or real UAV flight experiments to conduct engineering validation of the proposed PPO algorithm, further assessing its feasibility and reliability in real-world environments.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Chen, J. Intelligent Parking Guidance System Based on Quadrotor UAV as Guide: 201410353889.1. CN104183153A, 23 July 2014. [Google Scholar]
- Li, Y.G.; Song, C.Z.; Song, W.J.; Wang, L. Aircraft assembly simulation path planning based on engineering semantics. Key Eng. Mater. 2010, 431–432, 503–506. [Google Scholar] [CrossRef]
- Hou, X.; Liu, F.; Wang, R.; Yu, Y. A UAV dynamic path planning algorithm. In Proceedings of the 2020 35th Youth Academic Annual Conference of Chinese Association of Automation (YAC); IEEE: Piscataway, NJ, USA, 2020. [Google Scholar]
- Ma, N.; Cao, Y.; Wang, X.; Wang, Z.; Sun, H. A fast path re-planning method for UAV based on improved A* algorithm. In 2020 3rd International Conference on Unmanned Systems (ICUS); IEEE: Piscataway, NJ, USA, 2020; pp. 462–467. [Google Scholar]
- Cheng, Z.; Zhao, L.; Shi, Z. Decentralized multi-UAV path planning based on two layer coordinative framework for formation rendezvous. IEEE Access 2022, 10, 45695–45708. [Google Scholar] [CrossRef]
- Dou, L.; Yang, C.; Wang, D. Multi-UAV Formation Tracking Control Based on State Observer. J. Tianjin Univ. 2019, 52, 90–97. [Google Scholar]
- Wang, Q.; Cheng, J.Y.; Li, X. Formation Control of Omnidirectional Robots Based on BackStepping. J. Weapon. Equip. Eng. 2017, 38, 98–102. [Google Scholar]
- Zhang, H.; Gan, X.; Mao, Y. A survey of UAV obstacle avoidance algorithms. Aero Weapon. 2021, 28, 53–63. [Google Scholar]
- Chen, F.; Zhang, M. Collision-free trajectory planning for UAV tracking ground targets. Ordnance Ind. Autom. 2022, 41, 40–44. [Google Scholar]
- Wang, Y.; Wang, S. UAV path planning based on an improved particle swarm optimization algorithm. Comput. Eng. Sci. 2020, 42, 1690–1696. [Google Scholar]
- Sun, S.; Sun, T. Research on UAV path planning based on a fused A* Algorithm. Electron. Meas. Technol. 2022, 45, 82–91. [Google Scholar]
- Huang, S.; Tian, J.; Qiao, L.; Wang, Q.; Su, Y. UAV path planning based on an improved genetic algorithm. J. Comput. Appl. 2021, 41, 390–397. [Google Scholar]
- Wang, X.; Meng, X.; Li, C. Design of a UAV trajectory tracking controller based on model predictive control (MPC). Syst. Eng. Electron. 2021, 43, 191–198. [Google Scholar]
- Li, K.; Lu, Y.; Bao, S.; Xu, P. Three-dimensional obstacle avoidance planning for UAV based on an improved RRT algorithm. Comput. Simul. 2021, 38, 59–63+96. [Google Scholar]
- Zhao, J. UAV trajectory planning strategy based on an extended D* algorithm guided by heuristic points. Mach. Des. Manuf. 2020, 153–157. [Google Scholar]
- He, J.; He, G.; Yu, X. UAV path planning based on an improved bee colony algorithm. Fire Control. Command. Control. 2021, 46, 103–106. [Google Scholar]
- Chen, X.; Mao, H.; Liu, K. Research on UAV trajectory planning based on an improved adaptive ant colony algorithm. Electro-Opt. Control. 2022, 29, 6–10. [Google Scholar]
- Murray, C.C.; Chu, A.G. The flying sidekick traveling salesman problem: Optimization of drone-assisted parcel delivery. Transp. Res. Part C-Emerg. Technol. 2015, 54, 86–109. [Google Scholar] [CrossRef]
- Agatz, N.; Bouman, P.; Schmidt, M. Optimization approaches for the traveling salesman problem with drone. Transp. Sci. 2018, 52, 965–981. [Google Scholar] [CrossRef]
- Meier, D.; Tullumi, I.; Stauffer, Y.; Dornberger, R.; Hanne, T. A novel backup path planning approach with ACO. In 2017 5th International Symposium on Computational and Business Intelligence (ISCBI); IEEE: Piscataway, NJ, USA, 2017; pp. 50–56. [Google Scholar]
- Spurny, V.; Baca, T.; Saska, M. Complex manoeuvres of heterogeneous MAV-UGV formations using a model predictive control. In International Conference on Methods & Models in Automation & Robotics; Miedzyzdroje, Poland, IEEE: Piscataway, NJ, USA, 2016; p. 29. [Google Scholar]
- Hafez, A.; Givigi, S. Formation Reconfiguration of Cooperative UAVs via Learning Based Model Predictive Control in an Obstacle-Loaded Environment. In 2016 Annual IEEE Systems Conference (SysCon); Orlando, FL, USA, IEEE: Piscataway, NJ, USA, 2016; pp. 18–21. [Google Scholar]
- Yang, X.; Zhao, S.; Gao, W.; Li, P.; Feng, Z.; Li, L.; Jia, T.; Wang, X. Three-Dimensional Path Planning for UAV Based on Multi-Strategy Dream Optimization Algorithm. Biomimetics 2025, 10, 551. [Google Scholar] [CrossRef]
- Xiao, C.; Yang, H.; Zhang, B. Multi-Unmanned Aerial Vehicle Path Planning Based on Improved Nutcracker Optimization Algorithm. Drones 2025, 9, 116. [Google Scholar] [CrossRef]
- Liu, B.; Cai, Y.; Li, D.; Lin, K.; Xu, G. A Hybrid ARO Algorithm and Key Point Retention Strategy Trajectory Optimization for UAV Path Planning. Drones 2024, 8, 644. [Google Scholar] [CrossRef]
- Gao, Y.; Li, S. Obstacle Avoidance Path Planning for UAV Applied to Photovoltaic Stations Based on Improved Dynamic Window Method. Electronics 2025, 14, 1963. [Google Scholar] [CrossRef]
- Gu, Z.; Jia, K.; Xu, K. Research on Path Planning Based on Integration of Fluid Disturbance and Proximal Policy Optimization Algorithm. Fire Control. Command. Control. 2026, 51, 66–73. [Google Scholar]
- Wang, H.; Huang, J.; Wang, W. Multi-UAV Formation Obstacle Avoidance Control Method Based on Proximal Policy Optimization Algorithm. Ordnance Ind. Autom. 2026, 45, 108–112. [Google Scholar]
- Khatib, O. Real-time obstacle avoidance for manipulators and mobile robots. Int. J. Robot. Res. 1986, 5, 90–98. [Google Scholar] [CrossRef]
- Fujimoto, S.; van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning (ICML); IEEE: Piscataway, NJ, USA, 2018. [Google Scholar]
- Christopher, J.C.H.; Watkins, P.D. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]











| Parameter | Value | Description |
|---|---|---|
| Simulation Platform | MATLAB R2023b | Reinforcement Learning Toolbox |
| Simulation Step Size | 0.05 s | Time discretization interval |
| Path Type | Straight + Turns | Simulation of road scenarios |
| UAV Maximum Speed | 15 m/s | Flight safety constraints |
| Control Algorithm | PPO Algorithm | Based on Actor–Critic architecture |
| Reward Function | Tracking Error + Smooth Control Term | Ensures accuracy and stability |
| Algorithm | Learning Rate | Discount Factor | Network Structure |
|---|---|---|---|
| PPO | 3 × 10−4 | 0.99 | Policy Network + Value Network |
| Q-learning | 0.1 | 0.99 | Q-table |
| APF | - | - | Attractive/Repulsive Function |
| TD3 | 1 × 10−3 | 0.99 | Policy Network + Value Network |
| Algorithm | Activation Function | Optimizer | Other Key Parameters |
|---|---|---|---|
| PPO | Relu | Adam | Clipping coefficient ε = 0.2, GAE λ = 0.95 |
| Q-learning | - | - | ε-Greedy Policy, Exploration Rate ε = 0.1 |
| APF | - | - | Goal Attraction Coefficient, Obstacle Repulsion Coefficient |
| TD3 | Relu | Adam | Critic gradient clipping within [−1, 1] |
| Algorithm | Convergence Speed | Oscillation | Robustness |
|---|---|---|---|
| PPO | Fast | Very slight | Strong |
| Q-learning | Slow | Significant | Moderate |
| APF | Moderate | Noticeable | Weak |
| TD3 | Fast | Significant | Moderate |
| Algorithm | Mean Error (m) | Std (m) | Max Error (m) |
|---|---|---|---|
| PPO | 0.45 | 0.80 | 5.00 |
| TD3 | 0.55 | 0.90 | 5.00 |
| Q-learning | 1.00 | 1.25 | 5.60 |
| APF | 0.95 | 1.20 | 6.00 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Qiao, D.; Zhang, H. Research on Proximal Policy Optimization Algorithm in Path Planning for UAV-Based Vehicle Tracking. Drones 2026, 10, 319. https://doi.org/10.3390/drones10050319
Qiao D, Zhang H. Research on Proximal Policy Optimization Algorithm in Path Planning for UAV-Based Vehicle Tracking. Drones. 2026; 10(5):319. https://doi.org/10.3390/drones10050319
Chicago/Turabian StyleQiao, Dongna, and Hongxin Zhang. 2026. "Research on Proximal Policy Optimization Algorithm in Path Planning for UAV-Based Vehicle Tracking" Drones 10, no. 5: 319. https://doi.org/10.3390/drones10050319
APA StyleQiao, D., & Zhang, H. (2026). Research on Proximal Policy Optimization Algorithm in Path Planning for UAV-Based Vehicle Tracking. Drones, 10(5), 319. https://doi.org/10.3390/drones10050319
