Fault-Tolerant Cooperative Positioning for UAV Swarms in Degraded Environments: A Multi-Objective Deep Reinforcement Learning Approach
Highlights
- Proposes a novel MADRL-CEKF framework designed for highly resilient micro UAV swarm cooperative localization in severe GNSS-denied and obstacle-dense environments.
- Introduces a link-level dynamic soft isolation mechanism that independently evaluates ranging links, effectively severing the contagion paths of cascading cooperative errors.
- Integrates an adaptive Markov smoothing constraint to bridge discrete high-level AI decisions with low-level filtering, eliminating high-frequency control jitter.
- Develops a resource-aware multi-objective reward architecture, cutting processing delay and energy footprint by over 40% to strictly meet the 50 ms real-time limit.
Abstract
1. Introduction
- A Link-Level Dynamic Soft Isolation Mechanism Based on MADRL. Unlike traditional heuristic or node-level isolation methods, the proposed scheme independently evaluates the trust of each UWB ranging link. This enables the selective isolation of faulty links while preserving healthy ones, substantially enhancing network robustness.
- An Adaptive Markov Smoothing Constraint for Decision Continuity. To bridge discrete high-level AI decisions with continuous low-level actuator constraints, we introduce a dynamic smoothing mechanism. It adjusts the smoothing constant based on the swarm’s real-time status, mitigating sudden fluctuations in decision weights typical of conventional fixed-parameter filters.
- A Resource-Aware Multi-Objective Optimization Architecture. Addressing the strict computational and power limits of micro UAVs, this architecture implements a multi-dimensional reward structure that considers accuracy, processing delay, and energy costs. The agent autonomously balances these objectives, ensuring that single-run execution times remain safely within the 50 ms real-time threshold.
- Baseline performance comparison in ideal environments;
- Real-world robustness validation on the public MILUV dataset;
- Swarm scalability and error contagion analysis with 2/4/6/8 UAV nodes;
- XAI-based dynamic trust allocation and decision interpretability analysis;
- System robustness testing under cascaded multi-node IMU anomalies;
- Dual ablation experiments: quantifying contributions of weight smoothing and link isolation mechanisms;
- Multi-objective reward sub-ablation and engineering performance trade-off analysis.
2. Materials and Methods
2.1. CEKF Model and Vulnerability Analysis
2.1.1. State Prediction via IMU
2.1.2. Cooperative Ranging Observation
2.1.3. Vulnerability of Standard Update
2.2. Deep Reinforcement Learning Network Architecture and Decision Modeling
2.2.1. State Space Feature Extraction
2.2.2. Action Space and Lightweight Network Topology
2.2.3. Resource-Aware Multi-Objective Reward Mechanism
- is the positioning accuracy reward, constructed with a Gaussian kernel:
- 2.
- is the computation delay penalty. Single-step computation time must remain below the safe threshold . An exponential truncation penalty is applied:
- 3.
- is the computational energy consumption penalty, forcing agents to discard low-information, redundant nodes. It is mathematically formulated as the ratio of active filtering updates:where is an indicator function that returns 1 if the raw link weight exceeds the isolation threshold () and 0 otherwise. In the algorithmic implementation, links with are skipped during the CEKF measurement update step, which directly reduces the total floating-point operations (FLOPs) and corresponding computational energy usage.
- 4.
- is the control jitter penalty, limiting abrupt changes between consecutive actions to maintain smoothness:
2.3. Dynamic Soft Isolation Mechanism and Adaptive Markov Smoothing Constraint
2.3.1. Topology Preservation and Dynamic Soft Isolation Mechanism
2.3.2. High-Frequency Oscillation Hazard in Deep Reinforcement Learning
2.3.3. Mathematical Construction of Adaptive Markov Smoothing Constraint
2.3.4. Physical Response Mechanism and Computational Efficiency
- Noise filtering in steady-state environments: When residual fluctuations are minimal (), . The smoother gives greater weight to historical data, thus successfully removing noise from the output of the neural networks. There is no problem with smooth filter state updating and coherent motor commands.
- Instantaneous isolation under extreme faults: In the case of catastrophic hardware failure or when a node is inside a critical NLOS blind spot, the remaining spikes ( increases sharply). The term decays toward zero, and drops precipitously. The system immediately discards historical weights and assigns a high adoption proportion to the current low-confidence . This allows the filter to inflate the observation covariance matrix within a single time step, achieving second-level response and isolation of the fault source.
- Notably, the Markov smoothing operation for each link requires only three simple scalar arithmetic operations. The computational overhead is negligible (less than 1% of the total system inference time). This ensures high efficiency and highlights the framework’s suitability for deployment on resource-constrained platforms such as micro UAVs.
| Algorithm 1: MADRL-CEKF Fault-Tolerant Cooperative Positioning Algorithm |
| Input: Swarm size , number of anchors , initial states , baseline covariance matrix , smoothing hyperparameters and Output: Robust posterior position estimate , effective dynamic weights Initialize Actor network , Critic network , and replay buffer Initialize independent historical link weights and absolute residuals for time step do for each UAV node do Phase 1: Kinematic Prediction Obtain IMU reading and compute acceleration rate Perform prior state and covariance prediction , Obtain neighbor packets and UWB measurements Phase 2: MADRL-based Trust Evaluation & Smoothing for each ranging link do Compute innovation residual and absolute residual Construct local state vector: Actor network inference for raw trust weight: Compute residual variation: Compute adaptive smoothing factor: Apply adaptive Markov smoothing: Covariance reconstruction: end for Phase 3: Filter Update & Reward Storage Compute Kalman gain using dynamically expanded Update and output posterior state estimate Compute multi-objective reward and store transition in buffer end for Phase 4: Network Training (CTDE Paradigm) (Offline/asynchronous) Sample batch from to update Actor and Critic using MAPPO end for |
3. Results
3.1. Experiment 1: Baseline Performance Comparison in Ideal Environment
3.1.1. Experimental Objectives and Scenario Setup
3.1.2. Trajectory Tracking and Temporal Error Analysis
3.1.3. Quantitative Statistics and Mechanism Analysis
3.2. Experiment 2: Robustness Validation in Real-World Scenarios Using a Public Dataset
3.2.1. Experimental Objectives and Dataset Setup
3.2.2. Trajectory Reconstruction and Temporal Error Analysis
3.2.3. Quantitative Statistics and Dynamic Weight Mechanism Analysis
3.3. Experiment 3: Swarm Scalability and Error Contagion Analysis
3.3.1. Experimental Objectives and Scenario Setup
3.3.2. Scalability Curve and Inter-Node Dependency Analysis
3.3.3. Quantitative Improvement and Mechanism Analysis
3.4. Experiment 4: XAI-Based Dynamic Trust Allocation and Decision Interpretability Analysis
3.4.1. Experimental Objectives and Scenario Setup
3.4.2. Three-Stage Microscopic Decision Flow Analysis
3.4.3. Quantitative Metrics and Anti-Interference Isolation Mechanism
3.5. Experiment 5: Cascaded Fault Tolerance and Error Contagion Isolation Analysis
3.5.1. Experimental Objectives and Scenario Setup
3.5.2. Error Contagion and Trajectory Divergence Analysis
3.5.3. Quantitative Performance and Soft Isolation Mechanism
3.6. Experiment 6: Ablation Study Analysis
3.6.1. Experimental Objectives and Scenario Setup
3.6.2. Ablation of the Adaptive Markov Smoothing Constraint (System Stability Verification)
3.6.3. Ablation of the Link-Level Dynamic Soft Isolation Mechanism (Positioning Accuracy Verification)
3.6.4. Sensitivity Analysis of Markov Smoothing Parameters
3.7. Experiment 7: Resource-Aware Multi-Objective Optimization Architecture and Engineering Trade-Off Analysis
3.7.1. Experimental Objectives and Setup
3.7.2. Training Convergence Analysis
3.7.3. Multi-Objective Performance and Engineering Trade-Off
4. Discussion
4.1. The Art of Engineering Trade-Offs: Theoretical Accuracy Versus Real-World Constraints
4.2. Interpretability and Physical Mechanism of Error Contagion Isolation
4.3. Limitations and Future Work
4.4. Asymptotic Complexity and Scalability Bottlenecks for Large Swarms
- Communication Overhead: The current framework assumes a fully connected cooperative topology. The bandwidth requirement for state and measurement broadcasting per UAV is , resulting in a total network communication complexity of . For swarms of 20+ nodes, this quadratic growth easily saturates the limited wireless bandwidth (e.g., ZigBee or basic Wi-Fi modules), leading to packet loss and latency.
- Centralized Training (CTDE) Bottleneck: During offline training, the centralized critic network evaluates the global state of the entire swarm. The joint state-action space grows exponentially with , suffering from the curse of dimensionality. Training a centralized critic for 20+ agents becomes excessively difficult to converge.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jarraya, I.; Al-Batati, A.; Kadri, M.B.; Abdelkader, M.; Ammar, A.; Boulila, W.; Koubaa, A. Gnss-denied unmanned aerial vehicle navigation: Analyzing computational complexity, sensor fusion, and localization methodologies. Satell. Navig. 2025, 6, 9. [Google Scholar] [CrossRef]
- Yang, S.; Lin, D.; He, S.; Hussain, I.; Seneviratne, L. Aerial swarm search for GNSS-denied maritime surveillance. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 3442–3453. [Google Scholar] [CrossRef]
- Zhao, J.; Deng, Z.; Hu, E.; Su, W.; Lou, B.; Liu, Y. An Indoor UAV Localization Framework with ESKF Tightly-Coupled Fusion and Multi-Epoch UWB Outlier Rejection. Sensors 2025, 25, 7673. [Google Scholar] [CrossRef] [PubMed]
- Abdellatif, A.A.; Elmancy, A.; Mohamed, A.; Lebda, W.; Naji, K.K. PDSR: Efficient UAV deployment for swift and accurate post-disaster search and rescue. IEEE Internet Things Mag. 2025, 8, 149–156. [Google Scholar] [CrossRef]
- Nkrow, R.E.; Silva, B.; Boshoff, D.; Hancke, G.P.; Gidlund, M.; Abu-Mahfouz, A.M. NLOS identification and mitigation for time-based indoor localization systems: Survey and future research directions. ACM Comput. Surv. 2024, 56, 303. [Google Scholar] [CrossRef]
- Li, B.; Wang, Y.; Wang, X. Design and Realization of Ultra-Wideband-based Indoor Personnel Positioning System. In Proceedings of the 2024 2nd International Conference on Signal Processing and Intelligent Computing (SPIC), Guangzhou, China, 20–22 September 2024; pp. 524–528. [Google Scholar]
- Elsanhoury, M.; Mäkelä, P.; Koljonen, J.; Valisuo, P.; Shamsuzzoha, A.; Mantere, T.; Elmusrati, M.; Kuusniemi, H. Precision positioning for smart logistics using ultra-wideband technology-based indoor navigation: A review. IEEE Access 2022, 10, 44413–44445. [Google Scholar] [CrossRef]
- Tu, C.; Zhang, J.; Quan, Z.; Ding, Y. UWB indoor localization method based on neural network multi-classification for NLOS distance correction. Sens. Actuators A Phys. 2024, 379, 115904. [Google Scholar] [CrossRef]
- Luo, Q.; Li, S.; Yan, X.; Zhou, X. Hierarchical extended Kalman filter cooperative positioning algorithm for UAV swarm. In Proceedings of the 2023 IEEE Globecom Workshops (GC Wkshps); IEEE: Piscataway, NJ, USA, 2023; pp. 1801–1806. [Google Scholar]
- Olofsson, R.; Roos, A. Decentralized Collaborative EKF-SLAM for UAV Fleets Utilizing Monocular Cameras and UWB Sensors. Master’s Thesis, Linköping University, Linköping, Sweden, 2024. [Google Scholar]
- Wang, D.; Lian, B.; Liu, Y.; Gao, B.; Zhang, S. Resilient cooperative localization based on factor graphs for multirobot systems. Remote Sens. 2024, 16, 832. [Google Scholar] [CrossRef]
- Audrito, G.; Martini, M.; Albertin, U.; Chiaberge, M. UWB Multi-robot Localization with Gaussian Belief Propagation on Factor Graph. In Proceedings of the 2025 European Conference on Mobile Robots (ECMR); IEEE: Piscataway, NJ, USA, 2025; pp. 1–6. [Google Scholar]
- Tseng, C.H.; Lin, S.F.; Jwo, D.J. Robust Huber-based cubature Kalman filter for GPS navigation processing. J. Navig. 2017, 70, 527–546. [Google Scholar] [CrossRef]
- Li, Y.; Hou, L.; Yang, Y.; Tong, J. Huber’s M-Estimation-Based Cubature Kalman Filter for an INS/DVL Integrated System. Math. Probl. Eng. 2020, 2020, 1060672. [Google Scholar] [CrossRef]
- Zha, J.; Fan, Y.; Li, K.; Li, H.; Gao, C.; Chen, X. Dimm: Decoupled multi-hierarchy kalman filter via reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence; PKP Publishing Services: Burnaby, BC, Canada, 2026; Volume 40, pp. 18746–18754. [Google Scholar]
- Gao, X.; Luo, H.; Ning, B.; Zhao, F.; Bao, L.; Gong, Y.; Xiao, Y.; Jiang, J. RL-AKF: An adaptive Kalman filter navigation algorithm based on reinforcement learning for ground vehicles. Remote Sens. 2020, 12, 1704. [Google Scholar] [CrossRef]
- Yan, W.; Yin, F.; Wang, J.; Leus, G.; Zoubir, A.M.; Tian, Y. Attentional Graph Neural Network Is All You Need for Robust Massive Network Localization. arXiv 2023, arXiv:2311.16856. [Google Scholar] [CrossRef]
- Muthineni, K.; Artemenko, A.; Abode, D.; Vidal, J.; Nájar, M. PosGNN: A Graph Neural Network Based Multimodal Data Fusion for Indoor Positioning in Industrial Non-Line-of-Sight Scenarios. IEEE Open J. Veh. Technol. 2025, 7, 15–26. [Google Scholar] [CrossRef]
- Feng, Q.; Tang, T.; Zhang, Y.; Wang, D.; Jiang, L. Multistation Target Localization via Direction-of-Arrival: An Efficient Multiagent Deep Reinforcement Learning Approach. IEEE Internet Things J. 2025, 12, 40130–40145. [Google Scholar] [CrossRef]
- Mishra, M.; Poddar, P.; Agrawal, R.; Chen, J.; Tokekar, P.; Sujit, P.B. Multi-agent deep reinforcement learning for persistent monitoring with sensing, communication, and localization constraints. IEEE Trans. Autom. Sci. Eng. 2024, 22, 2831–2843. [Google Scholar] [CrossRef]
- Palossi, D.; Loquercio, A.; Conti, F.; Flamand, E.; Scaramuzza, D.; Benini, L. A 64-mw dnn-based visual navigation engine for autonomous nano-drones. IEEE Internet Things J. 2019, 6, 8357–8371. [Google Scholar] [CrossRef]
- Loquercio, A.; Maqueda, A.I.; Del-Blanco, C.R.; Scaramuzza, D. Dronet: Learning to fly by driving. IEEE Robot. Autom. Lett. 2018, 3, 1088–1095. [Google Scholar] [CrossRef]
- Mehta, S. MobileViTRM Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
- Evangelidis, G.; Hu, J.; Li, Y.; Ren, J.; Tulyakov, S.; Wang, Y.; Wen, Y.; Yuan, G. Efficientformer: Vision transformers at mobilenet speed. Adv. Neural Inf. Process. Syst. 2022, 35, 12934–12949. [Google Scholar]
- Li, X.R.; Jilkov, V.P. Survey of maneuvering target tracking. Part I. Dynamic models. IEEE Trans. Aerosp. Electron. Syst. 2003, 39, 1333–1364. [Google Scholar]
- Marano, S.; Gifford, W.M.; Wymeersch, H.; Win, M.Z. NLOS identification and mitigation for localization based on UWB experimental data. IEEE J. Sel. Areas Commun. 2010, 28, 1026–1035. [Google Scholar] [CrossRef]
- Lowe, R.; Wu, Y.I.; Tamar, A.; Harb, J.; Abbeel, P.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv 2017, arXiv:1706.02275. [Google Scholar]
- Bayen, A.; Gao, J.; Velu, A.; Vinitsky, E.; Wang, Y.; Wu, Y.; Yu, C. The surprising effectiveness of ppo in cooperative multi-agent games. Adv. Neural Inf. Process. Syst. 2022, 35, 24611–24624. [Google Scholar]
- Li, Y.; Ang, K.H.; Chong, G.C.Y. PID control system analysis and design. IEEE Control Syst. Mag. 2006, 26, 32–41. [Google Scholar]
- Hoffmann, G.; Huang, H.; Waslander, S.; Tomlin, C.J. Quadrotor helicopter flight dynamics and control: Theory and experiment. In Proceedings of the AIAA Guidance, Navigation and Control Conference and Exhibit, Hilton Head, SC, USA, 20–23 August 2007. [Google Scholar]
- MILUV: Micro-UAV Indoor Localization Dataset. Available online: https://github.com/decargroup/miluv (accessed on 17 May 2026).












| Method Category | Link-Level Soft Isolation | Adaptive Markov Smoothing | Multi-Objective Optimization | Energy & Delay Awareness | Validation Type |
|---|---|---|---|---|---|
| Robust EKF/Heuristic Tuning | Partial (Heuristic rigid thresholds) | No (Direct state correction) | No (Accuracy-only) | No | Simulation/Dataset |
| GNN-Based Trust Evaluation | No (Usually Node-level hard isolation) | No | No (Accuracy-only) | No (Heavy tensor operations) | Simulation/Dataset |
| Standard DRL-Based Localization | Partial (Black-box weight outputs) | No (Prone to high-frequency jitter) | Rare (Mostly single-objective) | No | Simulation |
| Proposed MADRL-CEKF | Yes (Dynamic soft isolation via covariance) | Yes (Adaptive β based on residual) | Yes (4D Pareto reward architecture) | Yes (Explicit penalties in training) | High-fidelity Sim + MILUV Dataset |
| Algorithm Method | Mean RMSE (m) | Max Error (m) | 90% CEP (m) |
|---|---|---|---|
| Standard CEKF | 0.1643 | 0.5189 | 0.2819 |
| Proposed MADRL | 0.1955 | 0.5823 | 0.3306 |
| Algorithm Method | Global RMSE (m) | Median Error (m) |
|---|---|---|
| Pure DR | 1224.8440 | 588.0719 |
| Standard EKF | 1.0639 | 0.6933 |
| ) | 1.0214 | 0.6954 |
| Huber Robust CEKF | 1.0412 | 0.6913 |
| Proposed MADRL-CEKF | 1.0189 | 0.6910 |
| Number of Nodes | Traditional RMSE (m) | Proposed RMSE (m) | Positioning Accuracy Gain (%) |
|---|---|---|---|
| 2 | 0.5658 | 0.5824 | −2.93% |
| 4 | 0.6848 | 0.5739 | +16.19% |
| 6 | 0.7795 | 0.5791 | +25.71% |
| 8 | 0.8839 | 0.5844 | +33.88% |
| Analysis Metric | Standard EKF | MADRL-CEKF (Ours) |
|---|---|---|
| RMSE in Interference Zone (m) | 3.0906 | 0.8675 |
| Anti-Interference Accuracy Gain (%) | Baseline | 71.93% |
| Average Weight of Interfering Nodes (α) | 0.80 | 0.0875 |
| Average Weight of Normal Nodes (α) | 0.80 | 0.6456 |
| Performance Metric (Healthy Nodes) | Standard CEKF | Huber Robust CEKF | Proposed MADRL-CEKF |
|---|---|---|---|
| RMSE before fault (m) | 3.2358 | 3.2358 | 3.2304 |
| Average RMSE during fault (m) | 57.3306 | 35.6330 | 2.2853 |
| Maximum peak error during fault (m) | 121.6830 | 62.1137 | 4.9688 |
| Global robustness improvement (Avg Gain) | Baseline | 37.84% | 96.01% |
| Peak error suppression rate (Peak Gain) | Baseline | 48.95% | 95.91% |
| Ablation Group | Interference RMSE (m) | Accuracy Improvement Due to Mechanism |
|---|---|---|
| Ablation: No Evaluation | 8.5993 | – |
| Full Proposed Algorithm | 0.6618 | +92.30% |
| Parameter Value (γ) | Response Characteristic | RMSE During Fault (m) | Steady-State Jitter Level |
|---|---|---|---|
| γ = 0.5 | Sluggish response, excessive inertia | 1.8532 | Very Low |
| γ = 1.0 | Delayed isolation | 1.2045 | Low |
| γ = 2.0 (Proposed) | Optimal balance | 0.8675 | Normal |
| γ = 5.0 | Over-sensitive, triggers false isolation | 0.9410 | High |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yang, P.; Li, J.; Lan, X.; Pang, B. Fault-Tolerant Cooperative Positioning for UAV Swarms in Degraded Environments: A Multi-Objective Deep Reinforcement Learning Approach. Sensors 2026, 26, 3747. https://doi.org/10.3390/s26123747
Yang P, Li J, Lan X, Pang B. Fault-Tolerant Cooperative Positioning for UAV Swarms in Degraded Environments: A Multi-Objective Deep Reinforcement Learning Approach. Sensors. 2026; 26(12):3747. https://doi.org/10.3390/s26123747
Chicago/Turabian StyleYang, Peiru, Jiayong Li, Xiaoyang Lan, and Bao Pang. 2026. "Fault-Tolerant Cooperative Positioning for UAV Swarms in Degraded Environments: A Multi-Objective Deep Reinforcement Learning Approach" Sensors 26, no. 12: 3747. https://doi.org/10.3390/s26123747
APA StyleYang, P., Li, J., Lan, X., & Pang, B. (2026). Fault-Tolerant Cooperative Positioning for UAV Swarms in Degraded Environments: A Multi-Objective Deep Reinforcement Learning Approach. Sensors, 26(12), 3747. https://doi.org/10.3390/s26123747

