Learning-Based Optimal Control for Multiple Fixed-Wing UAVs with Prescribed Performance
Abstract
1. Introduction
- A simplified actor–critic RL scheme is developed to handle the coupled position–attitude dynamics of fixed-wing UAV. By employing neural networks to approximate the solution to the HJB equation, the proposed method realizes approximate optimal control without relying on an accurate model, which reduces computational complexity and improving control energy efficiency.
- The prescribed performance function is introduced to impose predefined boundary constraints on formation errors, which ensures that safe inter-UAV distances are rigorously maintained throughout the formation flight and that addresses the limitation of existing RL control methods which typically neglect safety constraints.
- By using the Lyapunov theory, it is proven that all error signals in the closed-loop system are SGUUB, and that the formation errors remain strictly within the prescribed performance boundaries.
2. Preliminaries
2.1. Dynamics of Fixed-Wing Unmanned Aerial Vehicle
2.2. Problem Statement
2.3. Prescribed Performance Control
- (1)
- The formation tracking error of the fixed-wing UAVs remains strictly within the prescribed performance boundary.
- (2)
- All closed-loop signals of the fixed-wing UAVs are SGUUB.
3. Optimal Controller Design and Stability Analysis
3.1. Optimal Controller Design
3.2. Stability Analysis
- (1)
- The formation error can converge to a neighborhood around zero.
- (2)
- All signals of the closed-loop system are semi-globally uniformly ultimately bounded.
4. Simulation Results
4.1. Parameter Settings
4.2. Simulation Verification
4.3. Sensitivity Analysis of Learning Parameters
4.4. Ablation Study on PPC and RL Components
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| UAV | Unmanned Aerial Vehicle |
| RL | Reinforcement Learning |
| PPC | Prescribed Performance Control |
| HJB | Hamilton–Jacobi–Bellman |
| RBFNN | Radial Basis Function Neural Network |
| MSE | Mean Square Error |
| RMSE | Root Mean Square Error |
| MAE | Mean Absolute Error |
Appendix A
Appendix A.1. RBF Neural-Network
References
- Wang, H.; Wang, J.; Ding, G.; Chen, J.; Gao, F.; Han, Z. Completion Time Minimization with Path Planning for Fixed-Wing UAV Communications. IEEE Trans. Wirel. Commun. 2019, 18, 3485–3499. [Google Scholar] [CrossRef]
- Huang, Z.; Chen, M.; Shi, P. Disturbance Utilization-Based Tracking Control for the Fixed-Wing UAV with Disturbance Estimation. IEEE Trans. Circuits Syst. Regul. Pap. 2023, 70, 1337–1349. [Google Scholar]
- Lv, M.; Ahn, C.K.; Zhang, B.; Fu, A. Fixed-Time Antisaturation Cooperative Control for Networked Fixed-Wing Unmanned Aerial Vehicles Considering Actuator Failures. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 8812–8825. [Google Scholar] [CrossRef]
- Wang, Y.; Shan, M.; Wang, D. Motion Capability Analysis for Multiple Fixed-Wing UAV Formations with Speed and Heading Rate Constraints. IEEE Trans. Control Netw. Syst. 2020, 7, 977–989. [Google Scholar]
- Yan, X.; Fang, X.; Deng, C.; Wang, X. Joint Optimization of Resource Allocation and Trajectory Control for Mobile Group Users in Fixed-Wing UAV-Enabled Wireless Network. IEEE Trans. Wirel. Commun. 2024, 23, 1608–1621. [Google Scholar] [CrossRef]
- Shi, Y.; Li, J.; Lv, M.; Wang, N. Event-Based Fuzzy Asynchronous Consensus for UAV Swarm Under Jointly Connected Digraphs. IEEE Trans. Fuzzy Syst. 2025, 33, 3195–3209. [Google Scholar] [CrossRef]
- Wróbel, J.; Jendryka, K.; Milewski, M.; Kierzkowski, A.; Stosiak, M.; Prentkovskis, O.; Karpenko, M. Experimental Modal Testing of Lightweight Composite UAV Structures: Methods and Key Challenges. Machines 2026, 14, 457. [Google Scholar] [CrossRef]
- Karpenko, M.; Stosiak, M.; Deptuła, A.; Urbanowicz, K.; Nugaras, J.; Królczyk, G.; Żak, K. Performance evaluation of extruded polystyrene foam for aerospace engineering applications using frequency analyses. Int. J. Adv. Manuf. Technol. 2023, 126, 5515–5526. [Google Scholar] [CrossRef]
- Karpenko, M.; Nugaras, J. Vibration damping characteristics of the cork-based composite material in line to frequency analysis. J. Theor. Appl. Mech. 2022, 60, 593–602. [Google Scholar] [CrossRef] [PubMed]
- Meng, B.; Zhang, K.; Jiang, B. Fixed-Time Optimal Fault-Tolerant Formation Control with Prescribed Performance for Fixed-Wing UAVs Under Dual Faults. IEEE Trans. Signal Inf. Process. Over Netw. 2023, 9, 875–887. [Google Scholar] [CrossRef]
- Bu, X.; Lv, M.; Lei, H. Discrete-Time Optimal Control Ensuring Fixed-Time Prescribed Performance for SSP. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 3398–3407. [Google Scholar] [CrossRef]
- Zhang, B.; Lv, M.; Cui, S.; Bu, X.; Park, J.H. Learning-Based Optimal Cooperative Formation Tracking Control for Multiple UAVs: A Feedforward-Feedback Design Framework. IEEE Trans. Autom. Sci. Eng. 2025, 22, 11–23. [Google Scholar] [CrossRef]
- Yang, Q.; Cao, W.; Meng, W.; Si, J. Reinforcement-Learning-Based Tracking Control of Waste Water Treatment Process Under Realistic System Conditions and Control Performance Requirements. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 5284–5294. [Google Scholar] [CrossRef]
- Sun, Y.; Xu, J.; Chen, C.; Hu, W. Reinforcement Learning-Based Optimal Tracking Control for Levitation System of Maglev Vehicle with Input Time Delay. IEEE Trans. Instrum. Meas. 2022, 71, 7500813. [Google Scholar] [CrossRef]
- Zhou, Y.; Cao, L.; Lei, Y.; Ren, H. Observer-Based Prescribed-Time Optimal Neural Consensus Control for Six-Rotor UAVs: A Novel Actor-Critic Reinforcement Learning Strategy. Neural Netw. 2026, 108644. [Google Scholar] [CrossRef]
- Yin, S.; Zhao, S.; Zhao, Y.; Yu, F.R. Intelligent Trajectory Design in UAV-Aided Communications with Reinforcement Learning. IEEE Trans. Veh. Technol. 2019, 68, 8227–8231. [Google Scholar] [CrossRef]
- Cui, J.; Liu, Y.; Nallanathan, A. Multi-Agent Reinforcement Learning-Based Resource Allocation for UAV Networks. IEEE Trans. Wirel. Commun. 2020, 19, 729–743. [Google Scholar] [CrossRef]
- Lv, M.; De Schutter, B.; Shi, C.; Baldi, S. Logic-based distributed switching control for agents in power-chained form with multiple unknown control directions. Automatica 2022, 137, 110143. [Google Scholar] [CrossRef]
- Wen, G.; Chen, C.P.; Li, B. Optimized formation control using simplified reinforcement learning for a class of multiagent systems with unknown dynamics. IEEE Trans. Ind. Electron. 2020, 67, 7879–7888. [Google Scholar] [CrossRef]
- Wen, G.; Ge, S.S.; Tu, F. Optimized backstepping for tracking control of strict-feedback systems. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 3850–3862. [Google Scholar] [CrossRef]
- Wen, G.; Chen, C.P.; Ge, S.S. Simplified optimized backstepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions. IEEE Trans. Cybern. 2021, 51, 4567–4580. [Google Scholar] [CrossRef]
- Hamidoğlu, A. Designing discrete-time control-based strategies for pursuit-evasion games on the plane. Optimization 2025, 74, 239–268. [Google Scholar] [CrossRef]
- Hamidoğlu, A.; Gul, O.M.; Kadry, S.N.; Jana, C.; Elghirani, A.; Gultekin, G.K. A cost-effective nash-based allocation method for task distribution of multiple robots in distributed robotic networks. Eng. Appl. Artif. Intell. 2025, 162, 112548. [Google Scholar] [CrossRef]
- Lv, M.; Chen, Z.; De Schutter, B.; Baldi, S. Prescribed-performance tracking for high-power nonlinear dynamics with time-varying unknown control coefficients. Automatica 2022, 146, 110584. [Google Scholar] [CrossRef]
- Tian, G.; Golestani, M.; Lam, J.; Duan, G.; Kong, H. Prescribed-Time Control of Nonlinear Systems with Global Prescribed Performance for State Errors. IEEE Trans. Circuits Syst. Regul. Pap. 2025, 72, 6148–6158. [Google Scholar] [CrossRef]
- Wang, P.; Yu, C.; Lv, M. Optimized Formation Control of Nonlinear Systems with Full-State Constraints Using Adaptive Fixed-Time Techniques. IEEE Trans. Autom. Sci. Eng. 2025, 22, 3331–3344. [Google Scholar] [CrossRef]
- Lv, M.; Wang, N. Distributed Control for Uncertain Multiagent Systems with the Powers of Positive-Odd Numbers: A Low-Complexity Design Approach. IEEE Trans. Autom. Control 2024, 69, 434–441. [Google Scholar] [CrossRef]
- Zhang, G.; Xing, Y.; Zhang, W.; Li, J. Prescribed Performance Control for USV-UAV via a Robust Bounded Compensating Technique. IEEE Trans. Control Netw. Syst. 2025, 12, 2289–2299. [Google Scholar] [CrossRef]
- Yang, S.; Zhao, Z.; Zhu, X.; Huang, Y.; Zhang, W. Adaptive Robust Constraint-Following Control with Prescribed Performance for Quadrotor UAV Subjected to Time-Varying Uncertainties. IEEE Trans. Transp. Electrif. 2026, 12, 1630–1641. [Google Scholar] [CrossRef]
- Lv, M.; De Schutter, B.; Cao, J.; Baldi, S. Adaptive Prescribed Performance Asymptotic Tracking for High-Order Odd-Rational-Power Nonlinear Systems. IEEE Trans. Autom. Control 2023, 68, 1047–1053. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, H.; Liu, Y.; Li, J. Neural Adaptive Coordinated Docking Control with Improved Prescribed Performance for UAV Aerial Recovery. IEEE Trans. Ind. Electron. 2024, 71, 16546–16557. [Google Scholar] [CrossRef]
- Bu, X. Saturated Control with Variable Prescribed Performance Applied to the Manipulator of UAV. IEEE J. Miniaturization Air Space Syst. 2023, 4, 212–220. [Google Scholar] [CrossRef]
- Wang, X.; Baldi, S.; Feng, X.; Wu, C.; Xie, H.; De Schutter, B. A Fixed-Wing UAV Formation Algorithm Based on Vector Field Guidance. IEEE Trans. Autom. Sci. Eng. 2023, 20, 179–192. [Google Scholar] [CrossRef]
- Shi, Y.; Li, J.; Lv, M.; Wang, N.; Zhang, B. Distributed Consensus Control for 6-DOF Fixed-Wing Multi-UAVs in Asynchronously Switching Topologies. IEEE Trans. Veh. Technol. 2025, 74, 5649–5663. [Google Scholar] [CrossRef]
- Yang, X.; Huang, C.; Cao, J.; Liu, H. Predefined-time adaptive fuzzy echo state network containment control of uncertain multiagent systems with prescribed performance. Expert Syst. Appl. 2025, 286, 128046. [Google Scholar] [CrossRef]
- Deng, C.; Yang, G. Distributed adaptive fuzzy control for nonlinear multiagent systems under directed graphs. IEEE Trans. Fuzzy Syst. 2018, 26, 1356–1366. [Google Scholar]
- Wang, M.; Liang, H.; Pan, Y.; Xie, X. A New Privacy Preservation Mechanism and a Gain Iterative Disturbance Observer for Multiagent Systems. IEEE Trans. Netw. Sci. Eng. 2023, 11, 392–403. [Google Scholar] [CrossRef]









| Method Category | Main Advantage | Main Limitation | Difference of This Work |
|---|---|---|---|
| Robust/adaptive UAV control | Handles uncertainties and disturbances | Usually does not optimize long-term control cost | Introduces actor–critic RL to approximate optimal control |
| Conventional actor–critic UAV control | Learns near-optimal policy under limited model knowledge | Safety constraints are not explicitly guaranteed | Uses PPC to enforce formation error bounds |
| PPC-based UAV control | Guarantees transient/steady-state error constraints | Often lacks optimality or energy-efficiency design | Combines PPC with HJB-based actor-critic learning |
| Existing RL-PPC control | Combines learning and performance constraints | Often for single UAV, simplified dynamics, or nonlinear systems | Focuses on UAV formation with coupled position–attitude dynamics |
| Proposed method | Safe and near-optimal formation control | Current validation is simulation-based | Ensures SGUUB stability and defined boundary satisfaction |
| Case | Convergence Time | Learning Oscillation | Stability | ||
|---|---|---|---|---|---|
| Actor network | 0.8 | 1 | 11.7 | Low | √ |
| 0.4 | 1 | 17.0 | Low | √ | |
| 1.2 | 1 | 8.8 | Medium | √ | |
| 0.8 | 1.2 | 13.0 | Medium | √ |
| Data Type | MSE | RMSE | MAE |
|---|---|---|---|
| Proposed PPC-RL | 0.000999 | 0.031613 | 0.009885 |
| Without PPC | 0.030153 | 0.173645 | 0.129873 |
| Without RL | 0.002762 | 0.052556 | 0.028103 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Qiang, S.; Han, X.; Sun, D. Learning-Based Optimal Control for Multiple Fixed-Wing UAVs with Prescribed Performance. Machines 2026, 14, 583. https://doi.org/10.3390/machines14060583
Qiang S, Han X, Sun D. Learning-Based Optimal Control for Multiple Fixed-Wing UAVs with Prescribed Performance. Machines. 2026; 14(6):583. https://doi.org/10.3390/machines14060583
Chicago/Turabian StyleQiang, Shengnan, Xueyan Han, and Dingshan Sun. 2026. "Learning-Based Optimal Control for Multiple Fixed-Wing UAVs with Prescribed Performance" Machines 14, no. 6: 583. https://doi.org/10.3390/machines14060583
APA StyleQiang, S., Han, X., & Sun, D. (2026). Learning-Based Optimal Control for Multiple Fixed-Wing UAVs with Prescribed Performance. Machines, 14(6), 583. https://doi.org/10.3390/machines14060583

