Neural Surrogate-Enhanced Metaheuristic Optimization for Distributed Quadrotor Swarm Control
Abstract
1. Introduction
- (1)
- We address the deployment-time bottleneck in an inherited modified MPIO swarm controller by replacing its two-dimensional online weight selection module with a lightweight neural surrogate, leaving the existing flocking and gap-based obstacle-avoidance rules unchanged.
- (2)
- We train the surrogate with scene randomization, DAgger, and risk-weighted supervision, allowing it to better handle learner-visited states and place greater emphasis on safety-related samples during training.
- (3)
- We evaluate the method on a fixed synchronous closed-loop benchmark as well as a qualitative AirSim case study. On the benchmark, the surrogate improves the true collision-free rate and safe success rate while greatly reducing whole-swarm per-step decision latency and eliminating step overruns under the current implementation. In AirSim, the same high-level controller remains executable in an asynchronous multirotor control loop, providing qualitative evidence of migration feasibility.
- (4)
- We release the source code, final trained model, and merged evaluation results to support reproducibility and further comparison (https://github.com/cliche71/quadrotor-swarm-neural-surrogate.git, accessed on 17 May 2026).
2. System Modeling and Problem Formulation
2.1. UAV Model
2.2. Swarm Flocking Objectives
2.3. Obstacles and Gap-Based Avoidance Model
3. Multi-Objective Optimization Formulation and Online Modified MPIO Solver
3.1. Objective Optimization and Feasible Pareto Selection
3.2. Online Modified MPIO Teacher
| Algorithm 1 Online Modified MPIO Teacher for Per-step Weight Selection |
|
3.3. From Weight Selection to Closed-Loop State Update
4. Neural Surrogate Learning
4.1. Base Data Collection and DAgger Relabeling
4.2. Surrogate Network and Risk-Weighted Training Objective
5. Results and Discussion
5.1. Experimental Protocol
5.2. Closed-Loop Simulation Results
- Overall Quantitative Results.
- Additional Seed Sensitivity Check.
- Cross-Scene Consistency.
- Failure Mode Breakdown.
- Training and Model Ablations.
5.3. AirSim Case Study
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhou, X.; Yi, Z.; Liu, Y.; Huang, K.; Huang, H. Survey on Path and View Planning for UAVs. Virtual Real. Intell. Hardw. 2020, 2, 56–69. [Google Scholar] [CrossRef]
- Chung, S.J.; Paranjape, A.A.; Dames, P.; Shen, S.; Kumar, V. A Survey on Aerial Swarm Robotics. IEEE Trans. Robot. 2018, 34, 837–855. [Google Scholar] [CrossRef]
- Rahman, M.; Sarkar, N.I.; Lutui, R. A Survey on Multi-UAV Path Planning: Classification, Algorithms, Open Research Problems, and Future Directions. Drones 2025, 9, 263. [Google Scholar] [CrossRef]
- Alqudsi, Y.; Makaraci, M. UAV Swarms: Research, Challenges, and Future Directions. J. Eng. Appl. Sci. 2025, 72, 12. [Google Scholar] [CrossRef]
- Arshid, K.; Krayani, A.; Marcenaro, L.; Gomez, D.M.; Regazzoni, C. Toward Autonomous UAV Swarm Navigation: A Review of Trajectory Design Paradigms. Sensors 2025, 25, 5877. [Google Scholar] [CrossRef]
- Reynolds, C.W. Flocks, Herds, and Schools: A Distributed Behavioral Model. Comput. Graph. 1987, 21, 25–34. [Google Scholar] [CrossRef]
- Olfati-Saber, R. Flocking for Multi-Agent Dynamic Systems: Algorithms and Theory. IEEE Trans. Autom. Control 2006, 51, 401–420. [Google Scholar] [CrossRef]
- Cucker, F.; Smale, S. Emergent Behavior in Flocks. IEEE Trans. Autom. Control 2007, 52, 852–862. [Google Scholar] [CrossRef]
- Fiorini, P.; Shiller, Z. Motion Planning in Dynamic Environments Using Velocity Obstacles. Int. J. Robot. Res. 1998, 17, 760–772. [Google Scholar] [CrossRef]
- van den Berg, J.; Guy, S.J.; Lin, M.; Manocha, D. Reciprocal n-Body Collision Avoidance. In Robotics Research; Pradalier, C., Siegwart, R., Hirzinger, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 3–19. [Google Scholar] [CrossRef]
- Wang, L.; Ames, A.D.; Egerstedt, M. Safety Barrier Certificates for Collisions-Free Multirobot Systems. IEEE Trans. Robot. 2017, 33, 661–674. [Google Scholar] [CrossRef]
- Miettinen, K. Nonlinear Multiobjective Optimization; International Series in Operations Research & Management Science; Kluwer Academic Publishers: Boston, MA, USA, 1998; Volume 12. [Google Scholar] [CrossRef]
- Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
- Tang, J.; Duan, H.; Lao, S. Swarm intelligence algorithms for multiple unmanned aerial vehicles collaboration: A comprehensive review. Artif. Intell. Rev. 2023, 56, 4295–4327. [Google Scholar] [CrossRef]
- Poudel, S.; Arafat, M.Y.; Moh, S. Bio-Inspired Optimization-Based Path Planning Algorithms in Unmanned Aerial Vehicles: A Survey. Sensors 2023, 23, 3051. [Google Scholar] [CrossRef]
- Qiu, H.; Duan, H. A Multi-Objective Pigeon-Inspired Optimization Approach to UAV Distributed Flocking among Obstacles. Inf. Sci. 2020, 509, 515–529. [Google Scholar] [CrossRef]
- Ruan, W.Y.; Duan, H.B. Multi-UAV Obstacle Avoidance Control via Multi-Objective Social Learning Pigeon-Inspired Optimization. Front. Inf. Technol. Electron. Eng. 2020, 21, 740–748. [Google Scholar] [CrossRef]
- Shi, H.; Gao, W.; Jiang, X.; Su, C.; Li, P. Two-dimensional model-free Q-learning-based output feedback fault-tolerant control for batch processes. Comput. Chem. Eng. 2024, 182, 108583. [Google Scholar] [CrossRef]
- Hu, C.; Bai, J.; Zou, H. Two-dimensional iterative learning control under infinite horizon optimization for batch processes with partial actuator failures. Can. J. Chem. Eng. 2026, early view. [Google Scholar] [CrossRef]
- Alessio, A.; Bemporad, A. A Survey on Explicit Model Predictive Control. In Nonlinear Model Predictive Control: Towards New Challenging Applications; Magni, L., Raimondo, D.M., Allgöwer, F., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 345–369. [Google Scholar] [CrossRef]
- Hewing, L.; Wabersich, K.P.; Menner, M.; Zeilinger, M.N. Learning-Based Model Predictive Control: Toward Safe Learning in Control. Annu. Rev. Control Robot. Auton. Syst. 2020, 3, 269–296. [Google Scholar] [CrossRef]
- Gonzalez, C.; Asadi, H.; Kooijman, L.; Lim, C.P. Neural Networks for Fast Optimisation in Model Predictive Control: A Review. arXiv 2023, arXiv:2309.02668. [Google Scholar]
- Khodaverdian, A.; Gohil, D.; Christofides, P.D. Neural Network Implementation of Model Predictive Control with Stability Guarantees. Digit. Chem. Eng. 2025, 16, 100262. [Google Scholar] [CrossRef]
- Puente-Castro, A.; Rivero, D.; Pazos, A.; Fernandez-Blanco, E. A Review of Artificial Intelligence Applied to Path Planning in UAV Swarms. Neural Comput. Appl. 2022, 34, 153–170. [Google Scholar] [CrossRef]
- Argall, B.D.; Chernova, S.; Veloso, M.; Browning, B. A Survey of Robot Learning from Demonstration. Robot. Auton. Syst. 2009, 57, 469–483. [Google Scholar] [CrossRef]
- Ross, S.; Gordon, G.; Bagnell, D. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics; Proceedings of Machine Learning Research; PMLR: Fort Lauderdale, FL, USA, 2011; Volume 15, pp. 627–635. [Google Scholar]
- Shah, S.; Dey, D.; Lovett, C.; Kapoor, A. AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles. In Field and Service Robotics; Springer International Publishing: Cham, Switzerland, 2018; pp. 621–635. [Google Scholar] [CrossRef]
- Sabatino, F. Quadrotor Control: Modeling, Nonlinear Control Design, and Simulation. Master’s Thesis, KTH Royal Institute of Technology, Automatic Control, Stockholm, Sweden, 2015. [Google Scholar]
- Zhang, X.; Li, X.; Wang, K.; Lu, Y. A Survey of Modelling and Identification of Quadrotor Robot. Abstr. Appl. Anal. 2014, 2014, 320526. [Google Scholar] [CrossRef]
- Vicsek, T.; Czirók, A.; Ben-Jacob, E.; Cohen, I.; Shochet, O. Novel Type of Phase Transition in a System of Self-Driven Particles. Phys. Rev. Lett. 1995, 75, 1226–1229. [Google Scholar] [CrossRef]
- Sezer, V.; Gokasan, M. A Novel Obstacle Avoidance Algorithm: “Follow the Gap Method”. Robot. Auton. Syst. 2012, 60, 1123–1134. [Google Scholar] [CrossRef]
- Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: Piscataway, NJ, USA, 2017; pp. 23–30. [Google Scholar] [CrossRef]
- Qiu, H.; Duan, H. Multi-objective pigeon-inspired optimization for brushless direct current motor parameter design. Sci. China Technol. Sci. 2015, 58, 1915–1923. [Google Scholar] [CrossRef]









| Symbol | Value | Notes |
|---|---|---|
| P | 58 | Population size |
| 20 | Max iterations per step | |
| 2 | Removed per iteration | |
| 50 | Elite archive size | |
| R | 0.3 | Leader decay factor |
| 3.0 | Exploration-convergence factor | |
| 0.9 | General-leader ratio | |
| 0.01 | Follower-learning perturbation | |
| 2 | Follower-learning repetitions | |
| 0.01 | Fallback random-walk amplitude | |
| Lower bound of the 2-D weight position | ||
| Upper bound of the 2-D weight position | ||
| Lower bound of the 2-D pigeon velocity | ||
| Upper bound of the 2-D pigeon velocity | ||
| 0.5 s | Step size | |
| 59.5 s | Episode horizon |
| Indices | Dim. | Content |
|---|---|---|
| 1–6 | 6 | Ego planar velocity , speed , heading , and vertical-state information given by altitude rate and altitude . |
| 7–8 | 2 | Desired cruise velocity |
| 9–23 | Up to three nearest neighbors: relative position, relative velocity, and distance , with | |
| 24–41 | Up to three nearest obstacles: relative center position, radius, planar velocity, and clearance , with |
| Symbol | Value | Notes |
|---|---|---|
| Speed time constant | ||
| Heading time constant | ||
| Speed range (m/s) | ||
| Max lateral overload (g) | ||
| N | UAVs per episode | |
| Target altitude range (m) | ||
| Cruise speed range (m/s) | ||
| Target spacing range (m) | ||
| 40 | Neighbor radius (m) | |
| UAV safety radius (m) | ||
| Obstacle sensing radius (m) | ||
| Spacing gain | ||
| Velocity-alignment gain | ||
| Inter-UAV collision-repulsion gain | ||
| Inflated obstacle safety radius (m) | ||
| Gap-selection half FOV | ||
| Clearance weight in | ||
| Gap-width weight in | ||
| Progress weight in | ||
| Turn-cost weight in | ||
| Boundary penalty in |
| Item | Value | Notes |
|---|---|---|
| Batch size | 512 | Mini-batch size |
| Learning rate | Initial learning rate | |
| Weight decay | Weight-decay coefficient | |
| Max epochs | 60 | Max training epochs |
| Patience | 10 | Early-stop patience |
| CPU | 13th Gen Intel Core i7-13700H | 14 cores, 20 threads, 2.40 GHz |
| GPU | NVIDIA GeForce RTX 4060 Laptop GPU | 8 GB VRAM |
| Reference neighbor margin in | ||
| Reference obstacle margin in | ||
| Neighbor-margin weight in | ||
| Obstacle-margin weight in | ||
| Neighbor-collision multiplier in | ||
| Obstacle-collision multiplier in |
| Method | True Collision-Free (%) ↑ | Safe Success (%) ↑ | Formation Pass (%) ↑ | Step Compute Time (ms) ↓ | Overrun Ratio (%) ↓ |
|---|---|---|---|---|---|
| Base MPIO [33] | 26,772.83 ± 1396.42 | ||||
| Modified MPIO [16] | |||||
| Ours |
| Evaluation Seed | Episodes | True CF (%) | Safe Success (%) | Formation Pass (%) | Neighbor Coll. (%) | Obstacle Hard Coll. (%) | Step Time (ms) | Overrun (%) |
|---|---|---|---|---|---|---|---|---|
| 2028, 2029, 2030 | 450 | |||||||
| 1000, 2000, 3000 | 450 | |||||||
| 1000 | 150 | |||||||
| 2000 | 150 | |||||||
| 3000 | 150 |
| Method | Neighbor Collision (%) ↓ | Obstacle Hard Collision (%) ↓ | True Collision-Free (%) ↑ |
|---|---|---|---|
| Base MPIO [33] | |||
| Modified MPIO [16] | |||
| Ours |
| Variant | Setting | True CF (%) | Safe Success (%) | Formation Pass (%) | Neighbor Coll. (%) | Obstacle Hard Coll. (%) | Step Time (ms) | MAE |
|---|---|---|---|---|---|---|---|---|
| Full | DAgger + risk, 128–64 | |||||||
| No DAgger | Base data + risk, 128–64 | |||||||
| Uniform loss | DAgger + uniform, 128–64 | |||||||
| Small MLP | DAgger + risk, 64–32 | |||||||
| Large MLP | DAgger + risk, 256–128 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, J.; Wen, Z.; Ning, Z. Neural Surrogate-Enhanced Metaheuristic Optimization for Distributed Quadrotor Swarm Control. Sensors 2026, 26, 3398. https://doi.org/10.3390/s26113398
Li J, Wen Z, Ning Z. Neural Surrogate-Enhanced Metaheuristic Optimization for Distributed Quadrotor Swarm Control. Sensors. 2026; 26(11):3398. https://doi.org/10.3390/s26113398
Chicago/Turabian StyleLi, Jinze, Zeling Wen, and Zhaoke Ning. 2026. "Neural Surrogate-Enhanced Metaheuristic Optimization for Distributed Quadrotor Swarm Control" Sensors 26, no. 11: 3398. https://doi.org/10.3390/s26113398
APA StyleLi, J., Wen, Z., & Ning, Z. (2026). Neural Surrogate-Enhanced Metaheuristic Optimization for Distributed Quadrotor Swarm Control. Sensors, 26(11), 3398. https://doi.org/10.3390/s26113398

