DTVIRM-Swarm: A Distributed and Tightly Integrated Visual-Inertial-UWB-Magnetic System for Anchor Free Swarm Cooperative Localization

Luo, Xincan; Du, Xueyu; Yue, Shuai; Lv, Yunxiao; Zhang, Lilian; He, Xiaofeng; Wu, Wenqi; Mao, Jun

doi:10.3390/drones10010049

Open AccessArticle

DTVIRM-Swarm: A Distributed and Tightly Integrated Visual-Inertial-UWB-Magnetic System for Anchor Free Swarm Cooperative Localization

by

Xincan Luo

^1,2

,

Xueyu Du

^1,2,

Shuai Yue

^1,2,

Yunxiao Lv

^1,2,

Lilian Zhang

^1,2

,

Xiaofeng He

^1,2

,

Wenqi Wu

^1,2 and

Jun Mao

^1,2,*

¹

College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China

²

National Key Laboratory of Equipment State Sensing and Smart Support, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Drones 2026, 10(1), 49; https://doi.org/10.3390/drones10010049

Submission received: 19 November 2025 / Revised: 25 December 2025 / Accepted: 6 January 2026 / Published: 9 January 2026

(This article belongs to the Special Issue Autonomous Drone Navigation in GPS-Denied Environments)

Download

Browse Figures

Versions Notes

Highlights

This study proposes DTVIRM-Swarm, a distributed, anchor-free cooperative localization system for UAV swarms that operates without GNSS or pre-deployed anchors. It introduces a novel MDS-MAP initialization method, achieving faster and more accurate swarm pose initialization than conventional methods. The system demonstrates superior accuracy and strong robustness, maintaining operation during vision loss where other methods fail. It provides a practical, scalable solution for GNSS-denied environments, enabling complex applications like formation flying under severe conditions.

What are the main findings?

Superior Localization Accuracy and Efficiency: The proposed system and its novel MDS-MAP initialization method significantly outperform state-of-the-art alternatives, achieving faster and more accurate swarm pose initialization than conventional methods, culminating in real-world accuracy that is higher than leading VIO and collaborative methods.
Enhanced Robustness in Degraded Conditions: The tightly-coupled framework demonstrates exceptional resilience, maintaining stable operation and reliable positioning during complete vision deprivation, a scenario in which comparable state-of-the-art systems fail entirely.

What are the implications of the main findings?

A Practical Enabler for GNSS-Denied Operations: This work provides a viable, infrastructure-independent solution for multi-robot collaborative localization, effectively addressing a critical gap for deploying low-cost UAV swarms in complex environments where traditional GNSS or anchor-dependent systems are infeasible.
A Pathway to Scalable and Resilient Applications: The system’s design principles, including its efficient initialization and robust sensor fusion, pave the way for scalable applications involving large-scale swarms or heterogeneous robot teams, while its performance under harsh conditions expands the operational envelope for autonomous systems to more severe scenarios.

Abstract

Accurate Unmanned Aerial Vehicle (UAV) positioning is vital for swarm cooperation. However, this remains challenging in situations where Global Navigation Satellite System (GNSS) and other external infrastructures are unavailable. To address this challenge, we propose to use only the onboard Microelectromechanical System Inertial Measurement Unit (MIMU), Magnetic sensor, Monocular camera and Ultra-Wideband (UWB) device to construct a distributed and anchor-free cooperative localization system by tightly fusing the measurements. As the onboard UWB measurements under dynamic motion conditions are noisy and discontinuous, we propose an adaptive adjustment method based on chi-squared detection to effectively filter out inconsistent and false ranging information. Moreover, we introduce the pose-only theory to model the visual measurement, which improves the efficiency and accuracy for visual-inertial processing. A sliding window Extended Kalman Filter (EKF) is constructed to tightly fuse all the measurements, which is capable of working under UWB or visual deprived conditions. Additionally, a novel Multidimensional Scaling-MAP (MDS-MAP) initialization method fuses ranging, MIMU, and geomagnetic data to solve the non-convex optimization problem in ranging-aided Simultaneous Localization and Mapping (SLAM), ensuring fast and accurate swarm absolute pose initialization. To overcome the state consistency challenge inherent in the distributed cooperative structure, we model not only the UWB noisy uncertainty but also the neighbor agent’s position uncertainty in the measurement model. Furthermore, we incorporate the Covariance Intersection (CI) method into our UWB measurement fusion process to address the challenge of unknown correlations between state estimates from different UAVs, ensuring consistent and robust state estimation. To validate the effectiveness of the proposed methods, we have established both simulation and hardware test platforms. The proposed method is compared with state-of-the-art (SOTA) UAV localization approaches designed for GNSS-challenged environments. Extensive experiments demonstrate that our algorithm achieves superior positioning accuracy, higher computing efficiency and better robustness. Moreover, even when vision loss causes other methods to fail, our proposed method continues to operate effectively.

Keywords:

UAV swarm; cooperative localization; sensor fusion; covariance intersection

1. Introduction

Accurate localization is critical for efficient operations and precise control in aerial robotics, particularly in multi-UAV systems where collaborative tasks, formation flying, and mission execution depend on it. However, achieving reliable localization remains challenging in GNSS-denied environments [1]. To overcome this, researchers have investigated multimodal sensor fusion approaches. Among these, visual-inertial odometry (VIO) has emerged as a widely studied method that integrates camera and IMU data for relative motion estimation and localization [2,3].

VIO operates by capturing environmental images through cameras to extract and track visual features, while IMUs measure acceleration and angular velocity to estimate pose changes. Despite its advantages, VIO suffers from two primary limitations [4]. First, its performance heavily depends on visual conditions: feature extraction becomes unreliable in texture-sparse areas, and sudden environmental changes introduce tracking errors, degrading localization accuracy. Second, the computational complexity of real-time image processing and sensor fusion strains the limited processing capabilities of UAVs, further restricting accuracy and practicality [5,6].

To address these challenges in complex environments, multi-UAV collaborative localization methods have been developed. These approaches are broadly classified into centralized and distributed systems based on data processing strategies. Centralized methods rely on transmitting all UAV-collected data to a ground station or central node for unified computation. However, they require stable network connectivity and high-performance hardware at the central node, resulting in scalability and flexibility limitations. In contrast, distributed systems eliminate central nodes by enabling direct communication between neighboring UAVs, offering enhanced robustness and adaptability for dynamic environments. This advantage has made distributed methods increasingly prevalent in multi-UAV applications [7,8].

Current VIO-based collaborative localization methods often improve accuracy by sharing environmental characteristics observed across UAV clusters. While effective, such strategies impose high communication bandwidth requirements, which become problematic as UAV numbers or environmental complexity increases. Alternative approaches using pre-deployed UWB anchors enhance localization through anchor-based distance measurements. Nevertheless, these methods lack flexibility due to their dependence on carefully calibrated anchor infrastructure, making them impractical for unknown or dynamic environments where anchor deployment is costly or infeasible [9,10].

To address these challenges and meet the demands for low-cost, lightweight design, and real-time performance, this paper proposes a novel distributed anchor-free visual-inertial-UWB-magnetic cooperative localization system (DTVIRM-Swarm) for multi-UAVs based on the sliding window extended Kalman filter framework. Unlike existing distributed SLAM systems such as COVINS-G [11] and D2SLAM, which require significant communication bandwidth for map sharing, our approach leverages direct UWB ranging between UAVs without relying on pre-deployed anchors, achieving higher communication efficiency. Compared to COO-VIR [12], which still requires static UWB anchors, our system offers greater flexibility in unknown environments.

In this system, UAVs measure mutual distances using UWB technology without relying on UWB anchors. To fully utilize UWB measurements, multiple UWB keyframes are incorporated into the sliding window, and an adaptive adjustment method for UWB filtering errors based on chi-squared detection is employed to ensure robustness and real-time performance. Additionally, to further enhance accuracy, a visual observation model based on pose-only (PO) theory is adopted, improving the system’s adaptability to challenging visual environments [13,14].

The contributions of this paper are as follows:

Distributed Anchor-Free Architecture with Efficient Initialization: Proposes a distributed anchor-free cooperative localization framework for multi-UAV swarms, eliminating the reliance on central nodes or pre-deployed anchors. Integrates ranging, geomagnetic, and MIMU data to develop an MDS-MAP initialization method, which addresses the non-convex optimization problem in ranging-aided SLAM and enables fast and accurate absolute pose initialization of the swarm, outperforming traditional GTSAM-based optimization methods by 40.6% in positioning accuracy and reducing computation time by over 80%.
Optimized Sensor Measurement Processing: Targets the noise and discontinuity of UWB dynamic measurements, presents an adaptive adjustment method based on chi-squared detection, and fuses multiple UWB keyframes to enhance robustness. Adopts pose-only (PO) theory for visual measurement modeling, decouples 3D feature reconstruction, and achieves a balance between positioning accuracy and computational efficiency, reducing the computational burden compared to traditional VIO systems.
Covariance Intersection (CI) for Consistent Measurement Fusion: Incorporates the CI method into UWB measurement fusion to address the challenge of unknown correlations between state estimates from different UAVs. This ensures consistent and robust state estimation even in dynamic environments with frequent communication delays and packet losses, enhancing the overall reliability of the cooperative localization system.
Tightly-Coupled Multi-Sensor Fusion Framework: Constructs a sliding window EKF to fuse inertial, visual, UWB ranging, and geomagnetic data while explicitly modeling the position uncertainty of neighboring nodes. The framework maintains stable operation even in vision-deprived scenarios, breaking through the limitation of traditional methods that rely on continuous visual input and improving adaptability to complex environments. Compared to SOTA methods like VINS-Mono, OpenVINS, and SuperVINS, our system achieves an average positioning error reduction of 38.3%, 40.8%, and 29.7% respectively.

The structure of the paper is as follows. Section 2 presents a literature review. Section 3 introduces the proposed algorithm and the detailed mathematical model of collaborative localization. Section 4 presents simulation experiments and real experiments, and analyzes the results. Finally, Section 5 concludes the paper.

2. Related Work

In the field of UAV collaborative localization, a key issue is the relative position observation between UAVs. Currently, there are many methods to solve the problem of relative position observation between drones. Marker-based visual mutual observation methods usually extract and deploy markers on the drone, and use markers, such as ultraviolet LED lights and ultraviolet sensitive cameras, combined to perform relative positioning [15]. Unlabeled visual mutual observation methods often rely on convolutional neural networks (CNN) [16,17,18], which use machine learning-based techniques to extract relative distances, but are easily affected by changes in the appearance of targets and the environment and cannot provide accurate relative estimates.

Multi-robot SLAM methods can use map merging and loop closure between robots to obtain relative attitudes. This method requires the communication of a large amount of data and is not suitable for drone platforms with computational performance constraints [19]. 3D LiDAR and UWB can also directly measure relative distances. In [20], the fusion of 3D LiDAR, fisheye camera and UWB data is used to track drones flying above LiDAR unmanned ground vehicles (UGV). In [21], a distributed LiDAR inertial group range measurement method was proposed, which uses the reflectivity values from LiDAR data to directly detect collaborative drones, but using LiDAR as a relative position observation sensor is costly. In [22], a relative positioning method for micro drones based on VIO and LiDAR localization was proposed. Slave drones use LiDAR to observe the relative position and distance of the master drone, and integrate it with VIO to improve the positioning accuracy of slave drones. However, this method requires a high precision master drone node. If the master drone node is damaged, the entire drone cluster will not work normally.

To address the issue of substantial positioning errors in GNSS-challenged environments, particularly when visual features are sparse, researchers have proposed several methods. In terms of single drone positioning, ref. [23] fuses data from static UWB anchors, LiDAR odometers, IMUs and VIOs to improve the positioning accuracy of drones in GNSS-challenged environments, but in emergency situations, it is not feasible to place static UWB anchors in the area. In [24], a multi-sensor framework is proposed that uses ultra-wideband (UWB) technology and visual inertial measurement (VIO) to provide robust and low-drift positioning. In [25], a learning-based drone positioning method is proposed that uses fused vision, IMU, and UWB sensors, combined with visual inertia (VIO) and UWB branches, to predict global attitude, but these two methods still require the placement of multiple ground UWB anchors in advance.

In terms of multi-drone positioning, ref. [26] proposed a method that integrates UWB and VIO for collaborative positioning of two drones, and ref. [27] proposed a method for distributed formation estimation in large drone clusters. In [8], a distributed collaborative SLAM system was proposed, which innovatively manages near-field estimation and integrates multiple sensors and map data to achieve accurate near-field relative state estimation and consistent global trajectory far-field estimation. In [28], the authors fused detection results from CNN with UWB data and VIO for relative positioning in drone clusters. Ref. [12] fuses data from static UWB anchors, mutual ranging between drones, IMU and VIO to improve the positioning accuracy of drones in GNSS-challenged environments. However, this active vision-based approach cannot cope with urgent tasks.

The authors of [29] focused on collaborative localization of UGV and UAV teams, and their method relied mainly on UWB and VIO data and used 3D LiDAR detection during initialization. The authors of [30] utilized a heterogeneous team of UGVs carrying LiDAR and camera-equipped drones with the goal of detecting UGVs from airborne cameras on the drones and using them as a landmark for improving drone positioning. Ref. [31] proposed a new cooperative localization framework based on optimized belief propagation (BP) for use in GNSS denial areas, and ref. [32] proposed a technology to improve the accuracy of visual inertial odometer (VIO) by combining ultra-wideband (UWB) positioning technology. However, these methods all require the assistance of multiple UWB anchors, which is not conducive to practical use; ref. [33] proposes a multi-drone relative scheme based on distributed graph optimization (DGO), which combines on-board ultra-wideband (UWB) modules, cameras and inertial sensors, but the cameras need to observe other drones for position estimation, making it unsuitable for large-scale scenarios. Ref. [34] fuses the data from these two types of sensors via the Kalman filter to address the positioning drift problem in GPS-denied environments. Ref. [35] adopts the Information Consistency Filter (ICF) to address the state estimation association problem among multiple devices. Ref. [36] adopts a strategy of second-order Kalman filter preprocessing combined with EKF gating fusion to tackle the sensor observation data with intermittency and time delay.

Therefore, considering the requirements of low cost, lightweight, and real-time performance, we propose a distributed anchorless vision-inertial-UWB-magnetic multi-drone collaborative positioning system. UAVs measure the distance between drones through UWB. There is no need to rely on UWB anchors, and a filtering framework based on EKF is used to fuse multiple observations, which enhances the lightweight and real-time nature of the algorithm. In order to further improve the accuracy, we incorporate multiple UWB key frames into the sliding window, and adopt an adaptive adjustment method for filtering errors based on chi-squared detection. In addition, in terms of vision observation, we adopt a visual observation model based on PO constraints, which improves the system’s adaptability to challenging visual environments.

3. Methods

The proposed method is a distributed and anchor-free visual-inertial-UWB-magnetic cooperative localization system for UAVs. It consists of three main components: a sliding window extended Kalman filter framework for tightly fusing inertial, visual, Magnetic, and UWB ranging data; an adaptive adjustment method for UWB measurements based on chi-squared detection; and a visual observation model using pose-only (PO) theory.

3.1. Overview of the Architecture

The block diagram of the architecture of the proposed system is shown in Figure 1. Each UAV is equipped with an IMU, a magnetometer, a camera and a UWB sensor, and the UAVs can share each other’s position and relative distance information through communication links and UWB.

In the structural block diagram shown in Figure 1, feature tracking and extraction are used to obtain environmental feature information, IMU data is used for state propagation and enhancement to predict the motion state of the UAV, and basic views and sensor data are used to build a visual observation model based on pure pose (PO) theory to improve the accuracy of the algorithm. After the magnetometer is calibrated by the magnetic sensor, it outputs heading angle information for subsequent fusion positioning. Then, the information predicted by the filter is used to perform chi-squared detection on the UWB data and filter out outliers. Finally, the state estimate of the UAV is updated by fusing multiple sensor information in a tight combination method.

3.2. Multi-Sensor Time Synchronization and Delay Handling

In a multi-sensor fusion system for cooperative localization, precise time synchronization is fundamental to estimation accuracy. The sensors in this system, including IMU, camera, magnetometer (MAG), UWB, and inter-UAV position information, operate at different sampling frequencies and exhibit varying latencies. In distributed scenarios, wireless communication delays introduce additional temporal uncertainties. We propose a time synchronization and delay handling scheme built upon the sliding window EKF framework, leveraging IMU’s high-frequency propagation capability for temporal alignment.

3.2.1. Time Synchronization for Onboard Sensors

All sensor measurements carry precise timestamps. When a measurement arrives at time

t_{m}

, the system state at this instant is obtained through IMU-driven interpolation. The detailed steps of the algorithm are illustrated in Algorithm 1. Given filter states

{\hat{x}}_{k}

at

t_{k}

and

{\hat{x}}_{k + 1}

at

t_{k + 1}

where

t_{k} < t_{m} < t_{k + 1}

:

{\hat{x}}_{m} = {\hat{x}}_{k} + \frac{t_{m} - t_{k}}{t_{k + 1} - t_{k}} ({\hat{x}}_{k + 1} - {\hat{x}}_{k})

(1)

The synchronized residual is then computed as:

r = z - h ({\hat{x}}_{m})

(2)

where

h (\cdot)

is the measurement model.

Algorithm 1 Multi-Sensor Time Synchronization and Delay Handling.

Require:: Sliding window states ${x_{k - N + 1}, \dots, x_{k}}$ , timestamps, measurement $z$ with timestamp $t_{m}$
Ensure:: Aligned state ${\hat{x}}_{m}$ or updated state $x_{k}$
1:: Onboard Sensor Synchronization
2:: Find states $x_{k}$ at $t_{k}$ and $x_{k + 1}$ at $t_{k + 1}$ ( $t_{k} < t_{m} < t_{k + 1}$ )
3:: ${\hat{x}}_{m} \leftarrow x_{k} + \frac{t_{m} - t_{k}}{t_{k + 1} - t_{k}} (x_{k + 1} - x_{k})$
4:: $R \leftarrow z - h ({\hat{x}}_{m})$
5:: Inter-UAV Delay Handling
6:: $Δ t \leftarrow t_{r} - t_{c}$
7:: if $Δ t > {threshold}_{\max}$ then
8:: Discard measurement
9:: else
10:: $x_{d} \leftarrow Buffer [t_{d}]$ , $ν \leftarrow z_{d} - h_{d} (x_{d})$
11:: $K_{d} \leftarrow p_{d} H_{d}^{T} {(H_{d} p_{d} H_{d}^{T} + R_{d})}^{- 1}$
12:: ${\tilde{x}}_{d} \leftarrow x_{d} + K_{d} ν$
13:: $Φ_{k, d} \leftarrow I + F Δ t$
14:: $x_{k} \leftarrow Φ_{k, d} {\tilde{x}}_{d}$
15:: end if
16:: return Updated state

3.2.2. Delay Handling for Inter-UAV Information

In distributed cooperative localization, inter-UAV information suffers from communication delays. Let

t_{c}

be the generation timestamp and

t_{r}

the reception timestamp, with delay

Δ t = t_{r} - t_{c}

. The delay handling mechanism consists of three steps.

State Storage: A sliding window buffer maintains historical states

{x_{k - N + 1}, \dots, x_{k}}

with timestamps

{t_{k - N + 1}, \dots, t_{k}}

.

Backward Correction: Retrieve the historical state

x_{d}

at timestamp

t_{d} = t_{c}

from the buffer and compute the innovation:

δ x_{d} = K_{d} (z_{d} - h_{d} (x_{d}))

(3)

where

K_{d}

is the Kalman gain at

t_{d}

. The corrected state is

{\tilde{x}}_{d} = x_{d} + δ x_{d}

.

Forward Propagation: Propagate

{\tilde{x}}_{d}

from

t_{d}

to the current time

t_{k}

using the state transition matrix:

x_{k} = Φ_{k, d} {\tilde{x}}_{d} + G_{k, d} w_{d}

(4)

where

Φ_{k, d} \approx I + F Δ t

with

F = \frac{\partial f}{\partial x}

being the Jacobian of the system dynamics.

For UWB ranging from neighboring UAVs with delay, the measurement model is:

z_{UWB}^{delayed} = ∥ p_{k}^{local} - p_{d}^{neighbor} ∥ + v_{UWB}

(5)

where

p_{d}^{neighbor}

is the stored position at

t_{d}

, with uncertainty explicitly modeled in the measurement noise.

This scheme achieves comprehensive temporal alignment: onboard sensors are synchronized through IMU interpolation, while inter-UAV delays are handled through backward correction and forward propagation, ensuring accurate cooperative localization under communication delays.

3.3. Swarm Absolute Pose and Attitude Initialization

In multi-UAV swarm systems, accurate initialization of absolute positions and orientations is crucial for subsequent collaborative localization tasks. Traditional methods often rely on GNSS or pre-deployed infrastructure, which are not available in GNSS-denied environments. To address this challenge, we propose a novel swarm absolute pose initialization method based on ranging and geomagnetic information.

Our approach utilizes IMU (Inertial Measurement Unit) to estimate roll and pitch angles, magnetometer (compass) to estimate yaw angle, and inter-node ranging information to estimate relative positions. These measurements are fused to estimate the absolute positions and orientations of the UAV swarm in the ENU (East-North-Up) coordinate system.

The initialization process consists of two main steps:

First, individual UAV attitude estimation: Each UAV independently estimates its attitude angles (roll, pitch, and yaw) in a static state. Roll and pitch angles are estimated using accelerometer measurements:

{roll}_{0} = arctan 2 (\frac{g_{b} [1]}{g_{b} [2]}), {pitch}_{0} = arctan 2 (\frac{- g_{b} [0]}{\sqrt{g_{b} {[1]}^{2} + g_{b} {[2]}^{2}}})

(6)

where

g_{b} = {[g_{x}, g_{y}, g_{z}]}^{T}

is the average gravity vector measured by the accelerometer in the body frame.

Yaw angle is estimated using calibrated magnetometer measurements. The raw magnetometer readings are first calibrated to correct for environmental disturbances:

[\begin{matrix} M_{x x} \\ M_{y x} \\ M_{z x} \end{matrix}] = [\begin{matrix} k_{x x} & k_{x y} & k_{x z} \\ k_{y x} & k_{y y} & k_{y z} \\ k_{z x} & k_{z y} & k_{z z} \end{matrix}] \cdot ([\begin{matrix} M_{x} \\ M_{y} \\ M_{z} \end{matrix}] - [\begin{matrix} b_{m x} \\ b_{m y} \\ b_{m z} \end{matrix}])

(7)

where

k_{i j}

are scale factors and cross-axis sensitivity coefficients, and

b_{m}

are bias parameters. The calibrated magnetometer measurements are then transformed to the navigation frame:

h_{n} = R {({roll}_{0}, {pitch}_{0}, 0)}^{- 1} \cdot m_{b}

(8)

and the yaw angle is computed as:

{yaw}_{0} = arctan 2 (\frac{h_{n} [1]}{h_{n} [0]}) - D

(9)

where D is the magnetic declination obtained from geomagnetic maps.

Second, initialization of absolute position: The first UAV in the swarm (i = 1) is selected as the ENU coordinate origin

P_{1} = {[0, 0, 0]}^{T}

. Then move the drone for a short distance, and combine the output of VIO and the geomagnetic heading angle to obtain the displacement vector of the drone in the ENU coordinate system. Then, the MDS-MAP method proposed in this paper can be used to obtain the absolute position coordinates of the entire UAV swarm under the ENU coordinate system. The pseudocode of Algorithm 2 is as follows.

This algorithm takes the number of UAV nodes, pre- and post-movement distance matrices of Node 1, and its absolute displacement as inputs. It solves relative coordinates via MDS, aligns coordinate systems using non-moving nodes, corrects orientation, and applies global translation to output all nodes’ absolute positions, providing fast, robust, and accurate initialization for swarm collaborative localization. There is also a commonly used cluster position initialization method, which is to directly use ranging information to build residuals, build a joint optimization problem and solve it. Its ranging residual construction method is defined as

\sum_{i, j} ∥ d_{i j} - ∥ P_{i} - P_{j} {∥ ∥}^{2}

. Obviously, this approach will introduce a large amount of non-convexity into the optimization problem, which may cause the optimization problem to fall into a local optimal solution, resulting in a decrease in final accuracy. However, the method proposed in this paper not only avoids the problem of non-convex optimization by using algebraic methods but also greatly simplifies the calculation and improves the computational efficiency. We will also conduct experimental verification and analysis in the subsequent experimental evaluation process.

Algorithm 2 MDS-MAP Absolute Localization Algorithm.
1: function MDS_MAP_Localization(N, $D_{pre}$ , $D_{post}$ , $Δ X_{abs}$ ) N: Number of UAVs,
2: $D_{pre}$ : Distance matrix before Node 1 moves,
3: $D_{post}$ : Distance matrix after Node 1 moves,
4: $Δ X_{abs}$ : Node 1’s absolute displacement $X_{abs}$ : Absolute positions of all UAVs
5: 1. Compute Relative Coordinates via MDS
6: $J = I_{N} - \frac{1}{N} 1_{N} 1_{N}^{T}$	▹ Centralization matrix
7: $B_{pre} = - \frac{1}{2} J D_{pre}^{2} J$ , $[V_{pre}, Λ_{pre}] = eig (B_{pre})$
8: $Λ_{pre}^{sorted} = sort (diag (Λ_{pre}), ’ descend ’)$
9: $X_{rel, pre} = V_{pre} (:, 1 : 2) \times \sqrt{max (Λ_{pre}^{sorted} (1 : 2), 0)}$
10: $B_{post} = - \frac{1}{2} J D_{post}^{2} J$ , $[V_{post}, Λ_{post}] = eig (B_{post})$
11: $Λ_{post}^{sorted} = sort (diag (Λ_{post}), ’ descend ’)$
12: $X_{rel, post} = V_{post} (:, 1 : 2) \times \sqrt{max (Λ_{post}^{sorted} (1 : 2), 0)}$
13: 2. Align Coordinate Systems
14: $S = {2, 3, . . ., N}$	▹ Non-moving nodes
15: $c_{pre} = mean (X_{rel, pre} (S, :))$ , $c_{post} = mean (X_{rel, post} (S, :))$
16: $X_{rel, pre}^{cent} = X_{rel, pre} - c_{pre}$ , $X_{rel, post}^{cent} = X_{rel, post} - c_{post}$
17: $P = X_{rel, pre}^{cent} {(S, :)}^{T} X_{rel, post}^{cent} (S, :)$ , $[U, Σ, V] = svd (P)$ , $R = U V^{T}$
18: $X_{rel, post}^{align} = {(R \times {X_{rel, post}^{cent}}^{T})}^{T} + c_{pre}$
19: 3. Estimate Absolute Heading and Correct
20: $Δ X_{rel} = X_{rel, post}^{align} (1, :) - X_{rel, pre} (1, :)$
21: $θ_{rel} = atan 2 (Δ X_{rel} (2), Δ X_{rel} (1))$ , $θ_{abs} = atan 2 (Δ X_{abs} (2), Δ X_{abs} (1))$
22: $\hat{θ} = θ_{abs} - θ_{rel}$
23: $R_{yaw} = [\begin{matrix} cos \hat{θ} & - sin \hat{θ} \\ sin \hat{θ} & cos \hat{θ} \end{matrix}]$
24: $X_{rel, rot} = {(R_{yaw} \times X_{rel, pre}^{T})}^{T}$
25: 4. Compute Absolute Positions
26: $t = Δ X_{abs} - (X_{rel, rot} (1, :) - X_{rel, pre} (1, :))$	▹ Global translation
27: $X_{abs} = X_{rel, rot} + t$
return $X_{abs}$
28: end function

3.4. Filter State and Propagation

In the proposed multi-UAV collaborative navigation system, we employ the sliding window EKF for state estimation, which is developed from OpenVINS [37]. The state of any drone in the drone cluster is defined as follows: The error state vector

δ x

consists of the current INS error parameter

δ x_{I}

, vision measurement key frame

δ x_{K F_{V i s u a l}}

, UWB measurement key frame

δ x_{K F_{U W B}}

, magnetic measurement key frame

δ x_{K F_{M A G}}

, and single time offset

δ c_{I}

between the IMU and the camera clock. They can be defined as:

δ x = {[δ x_{I}, δ x_{K F_{Visual}}, δ x_{K F_{UWB}}, δ x_{K F_{MAG}}, {δ c}_{I}]}^{T}

(10)

where:

δ x_{I} = [δ θ_{b}^{w}, δ p_{b}^{w}, δ v^{w}, δ b_{g}, δ b_{a}]

(11)

δ x_{K F_{V i s u a l}} = [δ x_{K F_{V i s u a l}, 0}, δ x_{K F_{V i s u a l}, 1}, \dots, δ x_{K F_{V i s u a l}, N - 1}]

(12)

δ x_{K F_{U W B}} = [δ x_{K F_{U W B}, 0}, δ x_{K F_{U W B}, 1}, \dots, δ x_{K F_{U W B}, M - 1}]

(13)

δ x_{K F_{M A G}} = [δ x_{K F_{M A G}, 0}, δ x_{K F_{M A G}, 1}, \dots, δ x_{K F_{M A G}, M - 1}]

(14)

Here, the current inertial localization error state

δ x_{I}

includes the attitude error

δ θ_{b}^{w}

, position error

δ p_{b}^{w}

, speed error

δ v^{w}

, gyroscope deviation

δ b_{g}

and the accelerometer deviation

δ b_{a}

of the drone. The vision measurement key frame

δ x_{K F_{V i s u a l}}

and the UWB measurement key frame

δ x_{K F_{U W B}}

are defined as follows:

δ x_{K F_{V i s u a l, n}} = [δ θ_{b_{n}}^{w}, δ p_{b_{n}}^{w}]

(15)

δ x_{K F_{U W B, m}} = [δ θ_{b_{m}}^{w}, δ p_{b_{m}}^{w}]

(16)

δ x_{K F_{M A G, l}} = [δ θ_{b_{l}}^{w}, δ p_{b_{l}}^{w}]

(17)

where

δ θ_{b_{n}}^{w}

and

δ p_{b_{n}}^{w}

are the IMU attitude and position errors at the n-th vision key frame time.

δ θ_{b_{m}}^{w}

and

δ p_{b_{m}}^{w}

are the IMU attitude and position errors at the m-th UWB key frame time.

δ θ_{b_{l}}^{w}

and

δ p_{b_{l}}^{w}

are the IMU attitude and position errors at the l-th MAG key frame time. The true state

x

can be obtained from the estimated state

\hat{x}

and the error state

δ x

:

x = \hat{x} ⊞ δ x

(18)

For the attitude error, the operator ⊞ is given by

R = \hat{R} Exp (δ θ) \approx \hat{R} (I + {(δ θ)}_{\times})

(19)

For other states, the operator ⊞ is equivalent to Euclidean addition. When the IMU measurement is available, the INS mechanization is conducted to output the high-frequency prior pose. Meanwhile, the forward propagation of the whole error state and its covariance is similar to the OpenVINS [37] and will not be repeated here. When a new key frame is added to the sliding window, state augmentation and covariance update are needed to enhance the state into the state vector, and the corresponding covariance

P_{n \times n}

is augmented as:

P_{(n + 6) \times (n + 6)} = [\begin{matrix} I_{n \times n} \\ J_{6 \times n} \end{matrix}] P_{n \times n} {[\begin{matrix} I_{n \times n} \\ J_{6 \times n} \end{matrix}]}^{T}

(20)

where

J_{6 \times n}

is the Jacobian matrix, which represents the relationship between the new added state and the original state. When the sliding window exceeds its maximum length, it will be marginalized and the state and covariance of the oldest keyframe will be directly deleted [37].

3.5. Visual Measurement Based on PO Theory

In order to simplify the complexity of the 3D reconstruction process in the traditional VIO system, to reduce the amount of computation, and to avoid the accuracy limitations imposed by direct 3D reconstruction, with reference to the PO theory [38], we reconstructed the measurement model represented only by pixel coordinates and relative positional pose. In PO theory, the description of multi-view geometry can be realized by using only camera poses. This means that instead of directly estimating the 3D coordinates of the feature points in the scene, we infer their positional relationships from the relative poses between the cameras. Assuming that the projection of feature point

p

in image i can be represented as

p_{i} = K [R_{i j} | t_{i j}] p_{j}

, where

K

is the camera internal reference matrix,

R_{i j}

and

t_{i j}

are the rotation and translation from camera i to camera j, respectively, the reprojection error

e_{i}

can be represented by the camera poses in PO theory without directly using the 3D coordinates of the feature points.

e_{i} = p_{i} - K [R_{i j} | t_{i j}] p_{j}

(21)

Therefore, the geometric description of multiple views based on PO theory can be expressed as follows.

D (j, k) = \{d (j, i) ∣ p_{c_{i}} = d {(j, k)}^{c_{i}} R_{c_{j}} p_{c_{j}} + {}^{c_{i}}t_{c_{j}} ∣ 1 \leq i \leq n, i \neq j\}

(22)

where

d (j, i)

is the constraint between images i and j,

p_{c_{i}}

is the normalized coordinates of the feature points in images i, and

{}^{c_{i}}R_{c_{j}}

and

{}^{c_{i}}t_{c_{j}}

are the rotation and translation from image i to image j, respectively. Thus the reprojection error can be redefined as follows [38,39].

r_{c_{i}}^{(l)} = ∥p_{c_{i}}^{(l)} - {\hat{p}}_{c_{i}}^{(l)}∥ = ∥\frac{{}^{c_{i}}p_{f_{l}}^{(P O)}}{e_{3}^{T} {}^{c_{i}}p_{f_{l}}^{(P O)}} - p_{c_{i}}^{(l)}∥

(23)

where

r_{c_{i}}^{(l)}

is the reprojection error of the l-th feature point in the i-th image,

{}^{c_{i}}p_{f_{l}}^{(P O)}

is the projection obtained by using only the camera pose and 2D features, and

e_{3}^{T}

is a transformation vector for converting 3D vectors to 2D. Thus the measurement model based on PO theory can be represented as follows [13].

r_{c_{i}}^{(l)} \approx H_{x_{i}}^{(l)} δ x + n_{i}^{(l)}

(24)

\begin{matrix} H_{x_{i}}^{(l)} & = \frac{\partial r_{C_{i}}^{(l)}}{\partial δ x} = \frac{\partial r_{C_{i}}^{(l)}}{\partial {}^{C_{i}}p_{f_{l}}^{(P O)}} \cdot \frac{\partial {}^{C_{i}}p_{f_{l}}^{(P O)}}{\partial δ x} \\ = [\begin{matrix} \frac{1}{e_{3}^{T} {}^{c_{i}}p_{f_{l}}^{(P O)}} & 0 & - \frac{e_{1}^{T} {}^{C_{i}}p_{f_{l}}^{(P O)}}{{(e_{3}^{T} {}^{C_{i}}p_{f_{l}}^{(P O)})}^{2}} \\ 0 & \frac{1}{e_{3}^{T} {}^{C_{i}}p_{f_{l}}^{(P O)}} & - \frac{e_{2}^{T} {}^{C_{i}}p_{f_{l}}^{(P O)}}{{(e_{3}^{T} {}^{C_{i}}p_{f_{l}}^{(P O)})}^{2}} \end{matrix}] \\ \cdot [\begin{matrix} 0_{3 \times 15} & \frac{\partial {}^{C_{i}}p_{f_{l}}^{(P O)}}{\partial_{b_{m}}^{G} ϕ} & \frac{\partial {}^{C_{i}}p_{f_{l}}^{(P O)}}{{\partial δ}^{G} p_{b_{m}}} & \dots \end{matrix}] \end{matrix}

(25)

As a result, the new measurement model is represented only in pixel coordinates and system attitude and is fully decoupled from the 3D features, avoiding the effects of inaccurate 3D reconstruction processes. The PO-based visual measurement model significantly reduces the computational complexity and avoids the error accumulation caused by 3D point reconstruction while maintaining high positioning accuracy.

3.6. Collaborative Localization with Anchor-Free UWB Measurement and Magnetic Assisted

This section presents our collaborative localization framework that tightly integrates anchor-free UWB measurements and magnetic heading constraints. The framework shown in Figure 2 addresses the challenges of unknown correlations between UAV state estimates and robustly handles measurement noise and uncertainties.

3.6.1. Magnetic Heading Assisted Localization

Magnetic sensors provide valuable absolute heading information that helps constrain the unobservable yaw dimension identified in our observability analysis. Here we detail the magnetic heading measurement model and robustness enhancements.

Magnetic Heading Measurement Model

The geomagnetic heading provides an absolute measurement of the yaw angle in the IMU state, with the following measurement model:

ψ_{o} (t) = ψ (t) + n_{m} (t)

(26)

where

n_{m} (t)

is Gaussian white noise and

ψ_{o} (t)

is the observed heading at time t. The residual between the estimated heading

ψ_{e} (t)

and geomagnetic heading in the filter is:

r = ψ_{o} (t) - ψ_{e} (t)

(27)

The Jacobian matrix for the heading measurement is straightforward:

H_{mag} = [\begin{matrix} 0 & 0 & 1 \end{matrix}]

(28)

Robust Magnetic Heading Estimation

To enhance the reliability of magnetic heading estimation in challenging environments, we implement five key strategies with corresponding mathematical formulations:

1.: Magnetic Calibration

We perform automatic pre-mission calibration to compensate for hard and soft iron distortions. The calibration process estimates the following parameters:

-: Hard iron bias: $b = {[b_{x}, b_{y}, b_{z}]}^{T}$
-: Soft iron matrix: $S = [\begin{matrix} s_{x x} & s_{x y} & s_{x z} \\ s_{y x} & s_{y y} & s_{y z} \\ s_{z x} & s_{z y} & s_{z z} \end{matrix}]$

The calibrated magnetic field vector

m_{c a l}

is computed as:

m_{c a l} = S (m_{r a w} - b)

(29)

2.: Adaptive Magnetic Weighting

The weight

w_{m a g}

for magnetic measurements is dynamically adjusted based on consistency with IMU and visual data:

w_{m a g} = exp (- \frac{∥ ψ_{m a g} - ψ_{r e f} ∥^{2}}{2 σ_{c o n s i s t}^{2}})

(30)

where

ψ_{r e f}

is the reference heading from IMU/visual fusion and

σ_{c o n s i s t}

is the consistency threshold.

3.: Magnetic Anomaly Detection

We detect magnetic anomalies by comparing current measurements with geomagnetic map predictions:

ζ = \frac{∥ m_{c a l} - m_{m a p} ∥}{σ_{m a p}}

(31)

Measurements are rejected if

ζ > ζ_{t h r e s h}

, where

ζ_{t h r e s h}

is the anomaly threshold.

4.: Heading Complementary Filter

A complementary filter fuses magnetic heading

ψ_{m a g}

with gyroscope-derived heading

ψ_{g y r o}

:

ψ_{c o m p} (t) = α \cdot ψ_{c o m p} (t - 1) + (1 - α) \cdot ψ_{m a g} (t) + β \cdot ω (t)

(32)

where

ω (t)

is the gyroscope angular velocity, and

α, β

are filter gains (

0 < α < 1

,

β > 0

).

5.: Multi-Sensor Heading Fusion

When magnetic measurements are unreliable, we fuse heading estimates from multiple sensors using weighted least squares:

ψ_{f u s e d} = {(\sum_{k = 1}^{K} w_{k} H_{k}^{T} R_{k}^{- 1} H_{k})}^{- 1} \sum_{k = 1}^{K} w_{k} H_{k}^{T} R_{k}^{- 1} ψ_{k}

(33)

where K is the number of sensors,

ψ_{k}

are individual heading estimates,

R_{k}

are their covariance matrices, and

w_{k}

are adaptive weights.

3.6.2. Anchor-Free UWB Measurement Model

UWB sensors provide direct peer-to-peer distance measurements between neighboring UAVs, which are crucial for maintaining relative position consistency in the swarm.

Basic UWB Measurement Model

For UAV i and its neighbor j, the UWB ranging measurement is modeled as:

d_{i j} = ∥ p_{i} - p_{j} ∥ + v_{u w b, i j}

(34)

where

p_{i}

and

p_{j}

are the positions of UAV i and j in the global frame, and

v_{u w b, i j}

is Gaussian measurement noise.

Cumulative UWB Measurement with Neighbor Uncertainty

In distributed systems, neighboring UAVs’ position estimates contain uncertainties. We model these uncertainties by constructing a cumulative UWB measurement for UAV i with M neighbors:

r_{i} = \sum_{j \in N_{i}} w_{i j} (∥ {\tilde{p}}_{G}^{i} - {\tilde{p}}_{G}^{j} ∥ - d_{i j} + n_{i j})

(35)

where

N_{i}

is the set of neighboring UAVs,

w_{i j}

are dynamically adjusted weights (satisfying

\sum_{j \in N_{i}} w_{i j} = 1

) and

{\tilde{p}}_{G}^{i}

represents the estimated position.

The Jacobian matrix for this cumulative measurement is:

H_{U W B_{i}} = \sum_{j \in N_{i}} w_{i j} [\frac{x_{i} - x_{j}}{d_{i j}} \frac{y_{i} - y_{j}}{d_{i j}} \frac{z_{i} - z_{j}}{d_{i j}} - \frac{x_{i} - x_{j}}{d_{i j}} - \frac{y_{i} - y_{j}}{d_{i j}} - \frac{z_{i} - z_{j}}{d_{i j}}]

(36)

Uncertainty Propagation from Neighbors

We explicitly model the uncertainty introduced by neighboring UAVs’ position errors. The observation noise matrix for the cumulative UWB measurement is:

R_{U W B_{i}} = δ_{d}^{2} + \sum_{j \in N_{i}} w_{i j}^{2} H_{U W B_{i j}} P_{p_{G}^{j}} H_{U W B_{i j}}^{T}

(37)

where

δ_{d}^{2}

is the UWB ranging noise variance, and

P_{p_{G}^{j}}

is the position covariance matrix of UAV j.

3.6.3. Covariance Intersection (CI) for Consistent UWB Fusion

To address unknown correlations between UAV state estimates, we incorporate Covariance Intersection (CI) into our UWB measurement fusion process.

CI Fusion Principle

When fusing state estimate

{\hat{x}}_{j}

with covariance

P_{j}

from UAV j with local estimate

{\hat{x}}_{i}

and

P_{i}

, CI computes the fused state

{\hat{x}}_{C I}

and covariance

P_{C I}

as:

\begin{matrix} (38) & P_{C I}^{- 1} & = (1 - α) P_{i}^{- 1} + α P_{j}^{- 1} \\ (39) & {\hat{x}}_{C I} & = P_{C I} ((1 - α) P_{i}^{- 1} {\hat{x}}_{i} + α P_{j}^{- 1} {\hat{x}}_{j}) \end{matrix}

The optimal mixing parameter

α^{*}

minimizes the trace of

p_{C I}

while maintaining consistency:

α^{*} = arg min_{α \in [0, 1]} tr (P_{C I} (α))

(40)

CI-Enhanced Measurement Noise

The CI-derived covariance is incorporated into the UWB measurement noise matrix:

R_{U W B_{i}}^{C I} = δ_{d}^{2} + \sum_{j \in N_{i}} w_{i j}^{2} H_{U W B_{i j}} P_{C I_{j}} H_{U W B_{i j}}^{T}

(41)

The CI method ensures that the fused state estimate is consistent even when the correlations between UAV estimates are unknown or time-varying, enhancing the robustness and consistency of our cooperative localization system.

3.6.4. Integrated UWB Range Update Algorithm

Algorithm 3 integrates CI fusion, chi-squared validation, and EKF update into a comprehensive UWB range update process:

This integrated algorithm ensures robust and consistent fusion of UWB measurements in our anchor-free collaborative localization system, effectively constraining the divergence of unobservable dimensions identified in our observability analysis.

The ranging-aided SLAM (RA-SLAM) problem introduces additional non-convexity due to distance measurement models, making it challenging to obtain globally optimal solutions. The maximum a posteriori (MAP) estimation of RA-SLAM is a non-convex optimization problem that heavily relies on good initial values. Traditional methods for solving this problem often face challenges such as local optima and high computational complexity.

Algorithm 3 UWB Range Update with Chi-squared Test and Covariance Intersection Fusion.
1: function $U W B_{R} a n g e_{U} p d a t e$ ( $Z_{k}$ , ${\hat{X}}_{i}$ , $P_{i}$ , ${{\hat{X}}_{j}, P_{j}}_{j \in N_{i}}$ ) $Z_{k}$ : UWB measurement, ${\hat{X}}_{i}$ : Local state estimate, $P_{i}$ : Local covariance, ${{\hat{X}}_{j}, P_{j}}_{j \in N_{i}}$ : Neighbor states and covariances, ${\hat{X}}_{i}^{u p d a t e d}$ : Updated state estimate, $P_{i}^{u p d a t e d}$ : Updated covariance
2: 1. Covariance Intersection Fusion
3: for all $j \in N_{i}$ do
4: $α^{*} = arg {min}_{α \in [0, 1]} tr (P_{C I} (α))$	▹ Optimal fusion weight
5: $P_{C I_{j}}^{- 1} = (1 - α^{}) P_{i}^{- 1} + α^{} P_{j}^{- 1}$	▹ Fused covariance
6: ${\hat{X}}_{C I_{j}} = P_{C I_{j}} ((1 - α^{}) P_{i}^{- 1} {\hat{X}}_{i} + α^{} P_{j}^{- 1} {\hat{X}}_{j})$	▹ Fused state
7: $R_{U W B_{i j}}^{C I} = δ_{d}^{2} + w_{i j}^{2} H_{U W B_{i j}} P_{C I_{j}} H_{U W B_{i j}}^{T}$	▹ CI-enhanced noise
8: end for
9: $R_{U W B} = \sum_{j \in N_{i}} w_{i j} R_{U W B_{i j}}^{C I}$	▹ Total noise matrix
10: 2. Chi-squared Measurement Validation
11: $r = Z_{k} - h ({\hat{X}}_{i})$	▹ Innovation
12: $S = H P_{i} H^{T} + R_{U W B}$	▹ Innovation covariance
13: $χ^{2} = r^{T} S^{- 1} r$	▹ Chi-squared statistic
14: 3. Conditional EKF Update
15: if $χ^{2} < χ_{t h r e s h}^{2}$ then
16: $[{\hat{X}}_{i}^{u p d a t e d}, P_{i}^{u p d a t e d}] = EKFUpdate ({\hat{X}}_{i}, P_{i}, r, H, S)$
17: else
18: $[{\hat{X}}_{i}^{u p d a t e d}, P_{i}^{u p d a t e d}] = [{\hat{X}}_{i}, P_{i}]$	▹ Reject unreliable measurement
19: end if return ${\hat{X}}_{i}^{u p d a t e d}, P_{i}^{u p d a t e d}$
20: end function

3.7. UWB Measurements Adaptive Adjustment Based on Chi-Squared Detection

In practical applications, UWB ranging measurements are often affected by non-line-of-sight (NLOS) propagation, multipath effects, and other environmental factors, resulting in outliers that can severely degrade localization performance. To address this issue, we propose an adaptive adjustment method for UWB measurements based on chi-squared detection, with alternative strategies when the chi-squared test fails.

For each UWB measurement, the innovation (residual) is computed as:

y_{k} = Z_{k} - H_{k} {\hat{X}}_{k | k - 1}

(42)

where

Z_{k}

is the measurement vector,

H_{k}

is the observation matrix, and

{\hat{X}}_{k | k - 1}

is the predicted state vector. The innovation covariance matrix and chi-squared statistic are then computed as:

S_{k} = H_{k} P_{k | k - 1} H_{k}^{T} + R_{k}

(43)

λ_{k} = y_{k}^{T} S_{k}^{- 1} y_{k}

(44)

where

P_{k | k - 1}

is the a priori error covariance matrix of the Kalman filter, and

R_{k}

is the observation noise covariance matrix. In this article, the threshold of the chi-squared test is set to 95%, and the dimension of the residual vector is 6, so the corresponding threshold is 12.59.

When the chi-squared test fails (i.e.,

λ_{k} \geq χ^{2} (0.95)

), our system implements a weighted measurement update strategy instead of completely rejecting the measurement. The weight assigned to the measurement is inversely proportional to the chi-squared statistic:

w_{u w b} = exp (- \frac{λ_{k} - χ_{t h r e s h}^{2}}{2 σ_{u w b}^{2}})

(45)

where

σ_{u w b}

is the UWB measurement noise parameter. This approach allows for slightly inconsistent measurements to still contribute to the state update with reduced influence, while strongly inconsistent measurements are effectively downweighted.

Finally, each UAV uses the standard EKF update formula with the computed weight to update UWB measurements and obtain state estimates after collaborative positioning.

3.8. Observability Analysis for Ranging + Odometry Swarm Systems

We conduct a rigorous observability analysis for anchor-free UAV swarms with only inter-UAV ranging and on-board odometry, deriving key properties for professional readers. For a swarm with N UAVs, the state vector

x = {[x_{1}^{T}, x_{2}^{T}, \dots, x_{N}^{T}]}^{T}

includes each UAV’s position

p_{i} = {[x_{i}, y_{i}, z_{i}]}^{T}

, velocity

v_{i}

, attitude

q_{i}

, and IMU biases

b_{i}

. Measurements include:

On-board odometry (relative motion)
Inter-UAV UWB ranging $d_{i j} = ∥ p_{i} - p_{j} ∥ + v_{u w b, i j}$

Consider a global transformation

τ = {[δ x_{G}, δ y_{G}, δ z_{G}, δ ψ_{G}]}^{T}

representing 3D translation and yaw rotation. Applying this to the swarm state:

\begin{matrix} (46) & p_{i}^{'} & = R_{y} (δ ψ_{G}) p_{i} + {[δ x_{G}, δ y_{G}, δ z_{G}]}^{T} \\ (47) & v_{i}^{'} & = R_{y} (δ ψ_{G}) v_{i} \\ (48) & q_{i}^{'} & = q_{i} \otimes q_{y} (δ ψ_{G}) \end{matrix}

This transformation leaves all measurements invariant:

-: Odometry measurements depend only on relative motion, so $v_{i}^{'}$ produces identical measurements to $v_{i}$
-: Ranging measurements are preserved: $∥ p_{i}^{'} - p_{j}^{'} ∥ = ∥ p_{i} - p_{j} ∥$ (rotation matrices preserve Euclidean distance)

The observability matrix

O

for the linearized system is constructed as:

O = [\begin{matrix} H \\ H F \\ H F^{2} \\ ⋮ \\ H F^{n - 1} \end{matrix}]

(49)

where

H

is the combined measurement matrix,

F

is the block-diagonal state transition matrix, and

n = 16 N

is the total state dimension.

Since

τ

leaves all measurements invariant, it lies in the null space of

O

. This yields:

Theorem 1.

For an anchor-free UAV swarm with only inter-UAV ranging and on-board odometry, and without initial absolute pose information, the absolute heading (yaw) and absolute position (3D translation) are unobservable, resulting in 4 unobservable dimensions.

The rank of

O

is thus at most

16 N - 4

, confirming the unobservable subspace.

While the 4 unobservable dimensions are fundamental, the divergence rate of these directions can be constrained. This is the core motivation for our absolute accuracy improvement mechanism: through strategic sensor fusion and drift mitigation, we can significantly slow the divergence of the unobservable directions, thereby improving absolute positioning accuracy without changing the fundamental observability properties.

3.9. Analysis of Absolute Accuracy Improvement Mechanism in Anchor-Free Collaborative Localization

Building upon the observability analysis, we now analyze how our proposed system improves the absolute positioning accuracy of anchor-free UAV swarms, even in the presence of 4 unobservable dimensions. While the fundamental unobservability cannot be eliminated without external absolute references, our research demonstrates that the absolute positioning accuracy can be significantly enhanced and network drift can be mitigated through three key mechanisms: (1) accurate absolute pose initialization using MDS-based methods fused with ranging, IMU, and geomagnetic information; (2) continuous accuracy improvement through multi-constraint fusion in a sliding window EKF framework; and (3) explicit network drift mitigation strategies. These mechanisms effectively constrain the divergence rate of the unobservable directions, leading to improved absolute positioning accuracy.

3.9.1. Swarm Initialization with Absolute Pose Estimation

As detailed in Section 3.2, our MDS-MAP initialization approach utilizes ranging measurements, IMU data, and geomagnetic information to establish accurate absolute positions for all UAVs at the beginning of the mission. This initialization process is crucial as it provides each UAV with a globally consistent reference frame, which serves as the foundation for subsequent collaborative localization. The accurate initialization of the swarm’s absolute pose significantly enhances the overall absolute positioning accuracy of the UAV cluster. By establishing a precise initial reference frame, the accumulated errors in subsequent navigation can be minimized, leading to more reliable and accurate positioning throughout the mission.

3.9.2. Continuous Accuracy Enhancement Through Multi-Constraint Fusion

During system operation, the sliding window EKF framework continuously fuses multiple sensor constraints to improve estimation accuracy and maintain long-term stability. The augmented state vector includes keyframes from visual and UWB measurements:

δ x = {[δ x_{I}, δ x_{K F_{Visual}}, δ x_{K F_{UWB}}, δ x_{K F_{MAG}}]}^{T}

(50)

The observation model combines visual, ranging, and geomagnetic measurements:

z_{k} = [\begin{matrix} z_{v i s} \\ z_{u w b} \\ z_{m a g} \end{matrix}] = [\begin{matrix} h_{v i s} (x) \\ h_{u w b} (x) \\ h_{m a g} (x) \end{matrix}] + [\begin{matrix} n_{v i s} \\ n_{u w b} \\ n_{m a g} \end{matrix}]

(51)

This multi-constraint fusion significantly increases the rank of the information matrix, thereby improving the observability and accuracy of the system state estimation. The increased redundancy in measurements also enhances robustness against individual sensor failures or noise. Specifically:

Visual constraints provide high-frequency relative motion information and help correct drift in the inertial navigation system.
UWB ranging constraints offer direct peer-to-peer distance measurements, which are particularly valuable for maintaining inter-UAV positional consistency in the swarm.
Geomagnetic constraints supply absolute heading information, which is essential for maintaining global orientation consistency.

3.9.3. Network Drift Mitigation Strategies

Building upon our observability analysis, we now focus on how our system constrains the divergence rate of the 4 unobservable dimensions. While the fundamental unobservability cannot be eliminated, our proposed strategies effectively mitigate the drift in these unobservable directions, thereby improving the absolute positioning accuracy. We implement three key strategies that directly address the divergence of the unobservable dimensions:

Absolute Heading Constraint: By continuously incorporating geomagnetic measurements into the fusion framework, we provide a stable absolute reference for the swarm’s heading. While this does not change the fundamental unobservability of the yaw angle, it effectively constrains the divergence rate of the absolute heading. The geomagnetic measurements act as a virtual anchor for the yaw direction, preventing the entire swarm from rotating uncontrollably and maintaining consistent heading estimates across all UAVs.
Relative Distance Consistency: The UWB ranging measurements between neighboring UAVs provide strong constraints on the relative positions of the swarm members. These constraints effectively limit the divergence of the unobservable translational dimensions. While the entire swarm can still translate rigidly, the relative distance constraints ensure that this translation occurs in a coordinated manner, preventing the swarm from dispersing and maintaining the relative formation. This coordinated behavior significantly reduces the effective drift rate of the absolute position estimates.
Sliding Window Marginalization: The sliding window mechanism maintains a history of past states and measurements, allowing the system to detect and correct drift over time through loop closure-like effects. By marginalizing older states while retaining their information in the covariance matrix, the system creates implicit constraints that limit the divergence of all states, including the unobservable dimensions. The sliding window effectively extends the temporal horizon of the system, providing long-term consistency constraints that slow down the drift of the unobservable directions.

Collectively, these strategies effectively constrain the divergence rate of the 4 unobservable dimensions, even though they cannot eliminate the fundamental unobservability. By limiting the drift rate, our system significantly improves the absolute positioning accuracy of the swarm, making it suitable for practical applications in GNSS-denied environments.

3.10. Scalability Analysis

Through a series of key design considerations, this system achieves favorable scalability for large-scale unmanned aerial vehicle (UAV) swarms. Specifically, the Pose-Only (PO) theory improves computational efficiency by eliminating the need for 3D feature reconstruction; the sliding-window EKF limits the number of estimated states; and localized processing ensures a linear scaling of computational load. In addition, the system mitigates communication latency and packet loss via time alignment, synchronization, and delay compensation within the sliding window. For future deployment expansion, in terms of communication overhead, each UAV only communicates with neighboring UAVs within its communication range, thus ensuring a linear scaling of communication burden. A dynamic neighbor selection strategy further restricts communication to the nearest K neighbors. Regarding UWB signal management, the system implements the Time Division Multiple Access (TDMA) protocol and Frequency Hopping Spread Spectrum (FHSS) technology to avoid signal collisions in dense swarms. Analysis results show that this system can support swarms of 100 UAVs with appropriate parameter tuning, and is expected to accommodate even larger swarms through additional optimizations such as hierarchical communication and UWB modulation enhancement.

4. Experimental Evaluation

This section provides a comprehensive performance evaluation of our proposed collaborative navigation framework and benchmarks it against the SOTA VIO method and collaborative positioning method, which are commonly used for drone positioning in GNSS-denied situations. Experimental validation was conducted using datasets collected from both the AirSim simulation platform and real-world environments. All evaluations were performed on a laptop configuration (13th Gen Intel® Core™ i9-13900HX 2.20 GHz processor) to ensure practical applicability. This paper employs EVO (Python package for the evaluation of odometry and SLAM), version 1.12.0, for data assessment. When creating visualizations, this study utilizes Python 3.9 and the Matplotlib plotting library (version 3.6.3).

4.1. Swarm Position Initialization Simulation Experiment

To validate the effectiveness of the proposed MDS-MAP-based swarm absolute position initialization method, we conducted comparative experiments against the traditional GTSAM-based optimization approach that builds ranging information residuals. We simulated 12 nodes, set the ranging error to 0.5 m, 500 Montocaro experiments, and the final results were averaged. Multiple Monte Carlo simulations were performed to evaluate both methods in terms of average positioning accuracy and computational time.

The traditional approach formulates the initialization problem as a non-linear optimization task by constructing ranging residuals defined as

\sum_{i, j} ∥ d_{i j} - ∥ P_{i} - P_{j} {∥ ∥}^{2}

, where

d_{i j}

represents the measured distance between UAVs i and j, and

P_{i}

,

P_{j}

are their respective absolute positions. This optimization problem is solved using the GTSAM library, which iteratively minimizes the residuals to estimate the absolute positions.

In contrast, our proposed MDS-MAP method avoids the non-convexity issues inherent in the optimization approach by employing algebraic methods based on Multidimensional Scaling (MDS) and coordinate alignment techniques. This approach directly computes the relative positions using MDS, aligns the coordinate systems using non-moving nodes, estimates the absolute heading, and finally solves for the absolute positions through global translation. Experimental results, as shown in Table 1.

The results in Table 1 clearly show that our proposed MDS-MAP method achieves 44.1% reduction in positioning error and more than 80% reduction in computation time compared to the GTSAM-based optimization approach. To provide more reliable statistical results, we have included error variance and 95% confidence intervals based on 500 Monte Carlo simulations. These improvements are attributed to the elimination of non-convex optimization challenges and the adoption of efficient algebraic computations instead of iterative optimization procedures.

4.2. Swarm Cooperative Localization Simulation Experiment

To verify the effectiveness of the new architecture, we conducted simulation experiments based on the AirSim platform. Since the AirSim platform does not have UWB data simulation, we added Gaussian white noise with a mean of 0.3 m based on the true distance to simulate and generate UWB data. The parameter settings of the simulation experiment are shown in Table 2, and the simulation scenarios and ground truth trajectories are shown in Figure 3.

In the simulation experiment, “VINS-Mono” and “OpenVINS” represent the two SOTA VIO methods in [37,40], “SuperVINS” represents the SOTA VIO method based deep learning proposed in [41], “COO_VIR” represents the collaborative localization method proposed in [12], “COVINS-G” represents the collaborative localization method proposed in [11], and “COO_OUR” represents the method proposed in this paper. The statistics of drone positioning root-mean-square error (RMSE) results are shown in Table 3. Since this paper is aimed at drones operating in a large-scale outdoor environment and requires real-time positioning information output, when comparing it with OpenVINS and VINS-Mono algorithms, loop back repositioning is not considered.

As shown in Table 3, in the simulation experiment with 6 UAVs, our method outperforms all other methods. This indicates that our method not only achieves excellent positioning accuracy for each individual UAV but also maintains stable performance in the multi-UAV system, verifying its robustness and effectiveness in extended simulation scenarios.

4.3. Vision Loss Experiment

To verify the efficacy of the proposed algorithm in scenarios with visual loss, we conducted experiments by introducing partial visual loss into the simulation framework established in the previous section, as illustrated in Figure 4, the curves with different colors in the figure represent the trajectories of six different drones respectively, which is consistent with Figure 3. The bold part of the curve indicates visual disconnection, and the duration of visual disconnection is 10 s. Unless otherwise specified, the experimental settings, including but not limited to parameters, configurations, and environmental conditions, were consistent with those outlined in the preceding simulation experiment.

In the simulated vision loss experiment, the UAV’s positioning results are shown in Table 4. It can be seen that COVINS completely failed during the vision loss period as it relies entirely on continuous visual input. The COO_VIR method showed significant performance degradation with an average RMSE of 19.47 m, which is more than triple the error compared to our method. This degradation occurs because these methods heavily depend on the continuous operation of the VIO system, and when vision is lost, the VIO system fails to function properly, causing significant deviation in the overall system’s positioning results. In contrast, our proposed method can effectively handle vision deprivation, maintaining an average accuracy of 6.91 m, which is comparable to the results achieved in normal conditions as shown in Table 3. This demonstrates that our system maintains robust performance even under adverse visual conditions by leveraging the coupled fusion of UWB ranging and IMU data.

4.4. UWB Abnormal Experiment

To further evaluate the robustness of the proposed method under abnormal UWB conditions, we conducted additional simulation experiments that emulate two typical UWB failures in GNSS-denied outdoor environments: (1) UWB signal loss (no ranging output) and (2) UWB non-line-of-sight (NLOS) ranging bias. Unless otherwise specified, the simulation settings (sensor rates, trajectories, and evaluation protocol) are identical to those in the swarm cooperative localization simulation experiment (Table 2).

(1): UWB signal loss

In this experiment, the UWB ranging packets are assumed to be completely lost during several time intervals, i.e., no distance measurements are available for the EKF update. Practically, this is implemented by dropping the UWB measurements so that the filter performs propagation and updates only with the remaining sensors (IMU, vision, and magnetometer). The UWB loss intervals are highlighted by the bold red segments in Figure 5.

(2): UWB NLOS with positive bias (3–5 m)

Relevant studies have demonstrated that in NLOS scenarios, UWB ranging measurements may generate values larger than the true ones owing to the multipath effect. To model NLOS-induced ranging errors, we add a positive bias to the true inter-UAV distance during NLOS intervals. Specifically, for each inter-UAV link i–j, the corrupted ranging measurement is generated as

{\tilde{d}}_{i j} = d_{i j}^{t r u e} + b_{i j} + n_{i j}, b_{i j} \sim U (3, 5) m, n_{i j} \sim N (0, σ_{d}^{2})

(52)

where

d_{i j}^{t r u e}

is the ground-truth distance,

b_{i j}

is a random positive bias uniformly sampled from 3 to 5 m, and

n_{i j}

is zero-mean Gaussian noise with

σ_{d} = 0.3 m

, consistent with the nominal UWB setting. The NLOS intervals are highlighted by the bold black segments in Figure 5.

In the simulated UWB abnormality experiment, the UAV’s positioning results are shown in Table 5. It can be seen that COO_VIR achieved an average RMSE of 11.39 m. This degradation occurs because these methods heavily depend on UWB ranging data for collaborative positioning, and when UWB data is lost or contains large errors, the system’s positioning accuracy is significantly affected.

Our proposed method with UWB adaptive adjustment (UWB_AA) strategy can effectively handle UWB abnormalities, maintaining an average accuracy of 6.47 m. To demonstrate the effectiveness of our chi-squared detection based UWB_AA strategy, we conducted an ablation experiment by removing this strategy, resulting in COO_OUR (without_UWB_AA) with an average RMSE of 8.39 m. This shows that the chi-squared detection based UWB_AA strategy contributes to a significant accuracy improvement of 1.92 m in UWB abnormality scenarios.

The superior performance is attributed to two key factors: (1) our system adopts a multi-modal fusion approach that integrates visual, inertial, and magnetic sensor data, which can compensate for the loss or errors of UWB data; (2) our chi-squared detection method effectively filters out UWB outliers, further improving the system’s robustness to UWB abnormalities.

To further demonstrate the effectiveness of our chi-squared detection method in identifying abnormal UWB measurements, we conducted a statistical analysis of the UWB data processing results. The abnormality criterion is defined as follows: if the error between the UWB measurement and the ground truth is greater than 1 m, the measurement is considered abnormal. The statistical results for the simulation experiment are shown in Table 6.

Table 6 shows that in the simulation experiment, our chi-squared detection method achieves a high anomaly detection rate of 93%. Specifically, 20.0% of the raw UWB data are abnormal, and our method correctly identifies 18.6% of them. Since the same proportion of UWB measurement anomalies was set for each UAV in the simulation experiment, a single set of statistics is sufficient to represent the overall performance. This result clearly demonstrates the effectiveness of our chi-squared detection method in filtering out abnormal UWB measurements, which is crucial for improving the robustness and accuracy of the cooperative localization system in simulation scenarios.

4.5. Physical System Construction

In order to verify the practical application capabilities of the algorithm, this research builds a distributed multi-drone collaborative localization hardware platform. The collaborative localization platform integrates multi-modal sensors and carries out clock synchronization, and has a built-in embedded computing module that can process sensor data in real time. The platform can be mounted on a small quad-rotor drone, and realizes status information sharing and UWB ranging data interaction through a wireless communication module.

The hardware architecture is shown in Figure 6, which mainly includes the following modules: (1) Vision module: Equipped with FLIR Blackfly S USB3 global shutter camera, microsecond time synchronization between IMU and camera is achieved through hardware trigger signals (FLIR Systems, Inc., Wilsonville, OR, USA). (2) Inertial module: MEMS-IMU is used to output 200 Hz raw data of gyroscope, accelerometer, and magnetometer (Ruiyan Xinchuang Technology Co., Ltd., Chengdu, Sichuan, China). (3) UWB module: The ranging frequency is 10 Hz, and the accuracy of 0.3 m in dynamic environments is achieved through TDoA technology (Air Recycling Technology Co., Ltd., Shenzhen, Guangdong, China). (4) RTK positioning module: Integrated RTK-GNSS module (positioning accuracy of 2 cm) as a truth reference and is only enabled during the algorithm evaluation stage (Ruiyan Xinchuang Technology Co., Ltd., Chengdu, Sichuan, China). (5) Main processor: Select the RK3588 (Rockchip Electronics Co., Ltd., Fuzhou, Fujian, China) embedded computing module to meet the real-time computing requirements of the sliding window EKF framework. The parameters of each module are shown in Table 7. The UAVs are each equipped with a compact communication station, which provides a communication bandwidth of no less than 50 Mbps and a communication range exceeding 10 km. Since the proposed cooperative method only requires the sharing of positional data and ranging information, without the need for transmitting image data, this communication setup adequately fulfills the operational requirements.

4.6. Real-World Experiment

To prove the practicality of the system proposed in this paper in real-world scenarios, we conducted a real-world collaborative localization experiment. A total of three drones were flown during the experiment. The flight trajectory of the drones is shown in Figure 7b. The sensor parameters carried by the drone are shown in Table 7. The experimental environment is an outdoor scene, as shown in Figure 7a. The drones are equipped with high-precision RTK positioning equipment as the ground truth reference. In the real-world experiment, the images captured by the drone camera are shown in Figure 8. It can be seen that compared with indoor UAVs, outdoor UAVs operate at higher altitudes, where only downward-facing cameras can capture continuous scenes and feature points are typically tens of meters away. This renders the accuracy of VIO significantly susceptible to environmental factors and reduced, thereby necessitating multi-UAV collaboration to improve accuracy.

In order to verify several innovative points of the method proposed in this paper, especially the novel processing methods for both UWB and visual measurements, ablation experiments with different settings were also carried out.

Table 8 shows a detailed evaluation of positioning errors for real-world experiments, where “COO (ours)” represents the algorithm proposed in this paper, “COO (ours_without_PO)” represents that the algorithm proposed in this paper does not use PO constraints, “COO (ours_without_UWB)” represents that the algorithm proposed in this paper does not use UWB measurement, “COO (ours_without_MAG)” represents that the algorithm proposed in this paper without Magnetic Assisted, and “COO (ours_without_UWB_AA)” represents that the algorithm proposed in this paper does not use the chi-squared test based UWB adaptive adjustment (UWB_AA) strategy. Figure 9 shows a comparison of the running times of different algorithms to evaluate the operating efficiency. It can be seen that our algorithm can maintain high operating efficiency thanks to its lightweight architecture and filter-based advantages. Moreover, the 3588 board (16 GB RAM) on our device runs the algorithm with a memory usage of ≤10% and a CPU usage of ≤30% to meet real-time demands. The system is powered by an independent 12 V/6000 mAh lithium battery, supporting continuous operation for ≥1 h. To sum up, compared with the open source SOTA VIO positioning method and multi-UAV collaborative positioning method, the algorithm proposed in this paper shows excellent positioning performance and computational efficiency.

Furthermore, in real-world experiments, the test scenario for unmanned aerial vehicles (UAVs) is a relatively flat urban square. In contrast, the scenario in simulation experiments is a complex terrain environment that includes undulating buildings, grasslands, rivers and other elements, with a larger motion scale. Consequently, the accuracy of the results obtained from real-world experiments is higher than that of the results from simulation experiments.

Additionally, we conducted a statistical analysis of the UWB data processing results in the real-world experiment. Using the same abnormality criterion (error > 1 m), the statistical results are shown in Table 9.

Table 9 shows the UWB anomaly detection performance for each UAV and the average performance in the real-world experiment. On average, our chi-squared detection method achieves a high anomaly detection rate of 89.9% across all UAVs. This result further confirms the effectiveness of our chi-squared detection method in real-world scenarios, demonstrating its robustness and practical applicability even when different UAVs experience varying UWB measurement conditions.

5. Conclusions

This paper addresses the challenge of degraded navigation and positioning accuracy in GNSS-challenged environments for unmanned platforms, where visual environmental features are sparse and GNSS communication is constrained. We propose a distributed anchor-free visual-inertial-UWB-Magnetic multi-UAV cooperative localization system. In this system, UWB measurements between UAVs are utilized to estimate inter-drone distances without relying on pre-deployed UWB anchors. A sliding window EKF framework is adopted to fuse multi-modal observations, enhancing the algorithm’s lightweight design and real-time performance. To further improve accuracy, a PO visual observation model is introduced, which strengthens the system’s adaptability to challenging visual environments with limited features. Additionally, a novel MDS (Multidimensional Scaling)-MAP initialization method fuses ranging, IMU, and geomagnetic data to solve the non-convex optimization problem in ranging-aided SLAM, ensuring fast and accurate swarm absolute pose initialization. The improvement mechanism of absolute positioning accuracy in cooperative positioning without anchor points is analyzed, multiple keyframes are integrated into the sliding window, ensuring robustness and precision. Different from the current Vision-Inertial-UWB Cooperative Localization System based on the VIO system, we adopt a tightly-coupled Cooperative Localization System based on IMU, which can effectively deal with vision loss.

Extensive simulation and real-world experiments demonstrate that the proposed method significantly outperforms state-of-the-art localization approaches in GNSS-denied scenarios. In particular, our system achieved an average positioning accuracy of 5.74 m in an outdoor 50-m flight simulation experiment and 1.86 m in the actual experiment, which is a significant improvement over existing methods. Importantly, our method can effectively deal with situations of vision loss, maintaining an accuracy of 6.91 m even when other approaches fail completely. Notably, our chi-squared test-based UWB adaptive adjustment (UWB_AA) strategy effectively filters abnormal UWB measurements, achieving over 88% anomaly detection rate in both simulation and real-world experiments. Ablation experiments confirm it improves accuracy in UWB abnormality scenarios, enhancing the system’s robustness and accuracy.

The successful deployment of our system on small-scale UAV platforms validates its practicality and applicability in real-world lightweight drone systems.

In future research work, we will further explore the application of our system in challenging environments such as large-scale deployments and day-night transitions to obtain a more comprehensive and powerful drone cluster positioning system.

Author Contributions

Conceptualization: X.L. and J.M.; methodology: X.L. and X.D.; software: X.L. and S.Y.; validation: X.L., Y.L. and L.Z.; formal analysis: X.L. and X.H.; investigation: X.L. and W.W.; resources: J.M.; data curation: X.L.; writing—original draft preparation: X.L.; writing—review and editing: J.M. and X.D.; visualization: X.L.; supervision: J.M.; project administration: J.M.; funding acquisition: J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (grant numbers: 62103430, 62103427, 62073331) and Major Project of Natural Science Foundation of Hunan Province (No.2021JC0004).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request. The code that supports the findings of this study is available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Arafat, M.Y.; Alam, M.M.; Moh, S. Vision-Based Navigation Techniques for Unmanned Aerial Vehicles: Review and Challenges. Drones 2023, 7, 89. [Google Scholar] [CrossRef]
Gaigalas, J.; Perkauskas, L.; Gricius, H.; Kanapickas, T.; Kriščiūnas, A. A Framework for Autonomous UAV Navigation Based on Monocular Depth Estimation. Drones 2025, 9, 236. [Google Scholar] [CrossRef]
Munguia, R.; Trujillo, J.-C.; Grau, A. UAV Navigation Using EKF-MonoSLAM Aided by Range-to-Base Measurements. Drones 2025, 9, 570. [Google Scholar] [CrossRef]
Gupta, A.; Fernando, X. Simultaneous Localization and Mapping (SLAM) and Data Fusion in Unmanned Aerial Vehicles: Recent Advances and Challenges. Drones 2022, 6, 85. [Google Scholar] [CrossRef]
Zhou, J.; Gu, G.; Chen, X. Distributed Kalman filtering over wireless sensor networks in the presence of data packet drops. IEEE Trans. Automat. Control 2018, 64, 1603–1610. [Google Scholar] [CrossRef]
Tang, J.; Duan, H.; Lao, S. Swarm intelligence algorithms for multiple unmanned aerial vehicles collaboration: A comprehensive review. Artif. Intell. Rev. 2023, 56, 4295–4327. [Google Scholar] [CrossRef]
Schmuck, P.; Ziegler, T.; Karrer, M.; Perraudin, J.; Chli, M. COVINS: Visual-Inertial SLAM for Centralized Collaboration. In Proceedings of the 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR 2021), Bari, Italy, 4–8 October 2021. [Google Scholar]
Xu, H.; Liu, P.; Chen, X.; Shen, S. D²SLAM: Decentralized and Distributed Collaborative Visual-Inertial SLAM System for Aerial Swarm. IEEE Trans. Robot. 2024, 40, 3445–3464. [Google Scholar] [CrossRef]
Liu, C.; Zhao, J.; Sun, N. A Review of Collaborative Air-Ground Robots Research. J. Intell. Robot. Syst. 2022, 106, 60. [Google Scholar] [CrossRef]
Romanelli, F.; Martinelli, F.; Mattogno, S. Resilient Simultaneous Localization and Mapping Fusing Ultra Wide Band Range Measurements and Visual Odometry. J. Intell. Robot. Syst. 2023, 109, 64. [Google Scholar] [CrossRef]
Patel, M.; Karrer, M.; Bänninger, P.; Chli, M. COVINS-G: A Generic Back-end for Collaborative Visual-Inertial SLAM. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2003; IEEE: Piscataway, NJ, USA; pp. 2076–2082. [Google Scholar] [CrossRef]
Cao, Y.; Beltrame, G. VIR-SLAM: Visual, inertial, and ranging SLAM for single and multi-robot systems. Auton. Robot. 2021, 45, 905–917. [Google Scholar] [CrossRef]
Du, X.; Zhang, L.; Ji, C.; Luo, X.; Wang, M.; Wu, W.; Mao, J. SPVIO: Pose-Only Visual Inertial Odometry With State Transformation and Visual-Deprived Correction. IEEE Internet of Things Journal 2025, 12, 49644–49663. [Google Scholar] [CrossRef]
Wang, L.; Tang, H.; Zhang, T.; Wang, Y.; Zhang, Q.; Niu, X. PO-KF: A Pose-Only Representation-based Kalman Filter for Visual Inertial Odometry. IEEE Internet Things J. 2025, 12, 14856–14875. [Google Scholar] [CrossRef]
Walter, V.; Staub, N.; Franchi, A.; Saska, M. UVDAR System for Visual Relative Localization with Application to Leader-Follower Formations of Multirotor UAVs. IEEE Robot. Autom. Lett. 2019, 4, 2637–2644. [Google Scholar] [CrossRef]
Ge, R.; Lee, M.; Radhakrishnan, V.; Zhou, Y.; Li, G.; Loianno, G. Vision-based Relative Detection and Tracking for Teams of Micro Aerial Vehicles. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022. [Google Scholar]
Schilling, F.; Schiano, F.; Floreano, D. Vision-Based Drone Flocking in Outdoor Environments. IEEE Robot. Autom. Lett. 2021, 6, 2954–2961. [Google Scholar] [CrossRef]
Vrba, M.; Saska, M. Marker-Less Micro Aerial Vehicle Detection and Localization Using Convolutional Neural Networks. IEEE Robot. Autom. Lett. 2020, 5, 2459–2466. [Google Scholar] [CrossRef]
Dubois, R.; Eudes, A.; Frémont, V. Sharing visual-inertial data for collaborative decentralized simultaneous localization and mapping. Robot. Auton. Syst. 2022, 148, 103933. [Google Scholar] [CrossRef]
Gross, J.; De Petrillo, M.; Beard, J.; Nichols, H.; Swiger, T.; Watson, R.; Kirk, C.; Kilic, C.; Hikes, J.; Upton, E.; et al. Field-Testing of a UAV-UGV Team for GNSS-Denied Localization in Subterranean Environments. In Proceedings of the 32nd International Technical Meeting of the Satellite Division of The Institute of Localization (ION GNSS+), Miami, FL, USA, 16–20 September 2019; pp. 2112–2124. [Google Scholar]
Zhu, F.; Ren, Y.; Kong, F.; Wu, H.; Liang, S.; Chen, N.; Xu, W.; Zhang, F. Swarm-LIO: Decentralized Swarm LiDAR inertial Odometry. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Vienna, Austria, 1–5 June 2023; pp. 3254–3260. [Google Scholar]
Pritzl, V.; Vrba, M.; Štěpán, P.; Saska, M. Fusion of Visual-Inertial Odometry with LiDAR Relative Localization for Cooperative Guidance of a Micro-Scale Aerial Vehicle. arXiv 2023, arXiv:2306.17544. [Google Scholar]
Nguyen, T.M.; Cao, M.; Yuan, S.; Lyu, Y.; Nguyen, T.H.; Xie, L. VIRAL-Fusion: A Visual-Inertial-Ranging Lidar Sensor Fusion Approach. IEEE Trans. Robot. 2022, 38, 958–977. [Google Scholar] [CrossRef]
Delama, G.; Shamsfakhr, F.; Weiss, S.; Fontanelli, D.; Fomasier, A. UVIO: An UWB-Aided Visual-Inertial Odometry Framework with Bias-Compensated Anchors Initialization. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023. [Google Scholar]
Kao, P.-Y.; Chang, H.-J.; Tseng, K.-W.; Chen, T.; Luo, H.-L.; Hung, Y.-P. VIUNet: Deep Visual–Inertial–UWB Fusion for Indoor UAV Localization. IEEE Access 2023, 11, 61525–61534. [Google Scholar] [CrossRef]
Nguyen, T.H.; Nguyen, T.M.; Xie, L. Flexible and Resource-Efficient Multi-Robot Collaborative Visual-Inertial-Range Localization. IEEE Robot. Autom. Lett. 2022, 7, 928–935. [Google Scholar] [CrossRef]
Ziegler, T.; Karrer, M.; Schmuck, P.; Chli, M. Distributed Formation Estimation via Pairwise Distance Measurements. IEEE Robot. Autom. Lett. 2021, 6, 3017–3024. [Google Scholar] [CrossRef]
Xu, H.; Wang, L.; Zhang, Y.; Qiu, K.; Shen, S. Decentralized Visual-Inertial-UWB Fusion for Relative State Estimation of Aerial Swarm. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020. [Google Scholar]
Queralta, J.P.; Li, Q.; Schiano, F.; Westerlund, T. VIO-UWB-Based Collaborative Localization and Dense Scene Reconstruction within Heterogeneous Multi-Robot Systems. In Proceedings of the International Conference on Advanced Robotics and Mechatronics (ICARM), Guilin, China, 9–11 July 2022. [Google Scholar]
Spasojevic, I.; Liu, X.; Ribeiro, A.; Pappas, G.J.; Kumar, V. Active Collaborative Localization in Heterogeneous Robot Teams. In Proceedings of the Robotics: Science and Systems, Daegu, Republic of Korea, 10–14 July 2023. [Google Scholar]
Li, J.; Yang, G.; Cai, Q.; Niu, H.; Li, J. Cooperative localization for UAVs in GNSS-denied area based on optimized belief propagation. Measurement 2022, 192, 110797. [Google Scholar] [CrossRef]
Lin, H.Y.; Zhan, J.R. GNSS-denied UAV indoor localization with UWB incorporated visual inertial odometry. Measurement 2023, 206, 112256. [Google Scholar] [CrossRef]
Xiong, C.; Lu, W.; Xiong, H.; Ding, H.; He, Q.; Zhao, D.; Wan, J.; Xing, F.; You, Z. Onboard cooperative relative positioning system for Micro-UAV swarm based on UWB/Vision/INS fusion through distributed graph optimization. Measurement 2024, 234, 114897. [Google Scholar] [CrossRef]
Bautista, N.; Gutierrez, H.; Inness, J.; Rakoczy, J. Precision Landing of a Quadcopter Drone by Smartphone Video Guidance Sensor in a GPS-Denied Environment. Sensors 2023, 23, 1934. [Google Scholar] [CrossRef]
Silva Cotta, J.L.; Agar, D.; Bertaska, I.R.; Inness, J.P.; Gutierrez, H. Latency Reduction and Packet Synchronization in Low-Resource Devices Connected by DDS Networks in Autonomous UAVs. Sensors 2023, 23, 9269. [Google Scholar] [CrossRef]
Silva Cotta, J.L.; Gutierrez, H.; Bertaska, I.R.; Inness, J.P.; Rakoczy, J. High-Altitude Precision Landing by Smartphone Video Guidance Sensor and Sensor Fusion. Drones 2024, 8, 37. [Google Scholar] [CrossRef]
Geneva, P.; Eckenhoff, K.; Lee, W.; Yang, Y.; Huang, G. OpenVINS: A Research Platform for Visual-Inertial Estimation. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020. [Google Scholar]
Cai, Q.; Wu, Y.; Zhang, L.; Zhang, P. Equivalent constraints for two-view geometry: Pose solution/pure rotation identification and 3d reconstruction. Int. J. Comput. Vis. 2019, 127, 163–180. [Google Scholar] [CrossRef]
Cai, Q.; Zhang, L.; Wu, Y.; Yu, W.; Hu, D. A pose-only solution to visual reconstruction and localization. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 73–86. [Google Scholar] [CrossRef]
Qin, T.; Li, P.; Shen, S. VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef]
Luo, H.; Liu, Y.; Guo, C.; Li, Z.; Song, W. SuperVINS: A Real-Time Visual-Inertial SLAM Framework for Challenging Imaging Conditions. IEEE Sens. J. 2025, 25, 26042–26050. [Google Scholar] [CrossRef]

Figure 1. Architecture of the proposed distributed anchor-free cooperative localization system.

Figure 2. Schematic diagram of UAVs using UWB measurements for collaboration.

Figure 3. (a) UAV flight environment in simulation experiments. (b) Motion trajectories of the 6 UAV in simulation experiments.

Figure 4. Trajectory containing visual loss, the six different colored trajectories represent six drones respectively, refer to Figure 3 for details.

Figure 5. Trajectories under UWB abnormal conditions. The blue curve denotes the normal trajectory, while the bold red and bold black segments indicate UWB signal loss and UWB NLOS periods, respectively, the six different colored trajectories represent six drones respectively, refer to Figure 3 for details.

Figure 6. Collaborative localization physical system hardware platform.

Figure 7. (a) UAVs in the experiment. (b) Motion trajectories of the three UAVs (RTK). (c) Comparison of UWB measured distance and real distance between UAV 1 and UAV 2.

Figure 8. Images captured by UAV cameras in real-world experiments.

Figure 9. Runtime comparison of different algorithms in real-world experiments, in milliseconds.

Table 1. Comparison of Swarm Absolute Position Initialization Methods.

Method	Mean Error (m)	Variance	95% CI	Mean Time (ms)
GTSAM	0.34	0.045	[0.32, 0.36]	152.6
MDS-MAP (Ours)	0.19	0.012	[0.18, 0.20]	25.8

Table 2. Sensors carried by the UAV platform in simulation experiments.

Sensors	Data Frequency	Parameters
Camera	10 Hz	Resolution: 752 × 480 pixels
		View angle: 90°
IMU	100 Hz	In-run bias stability of Gyroscope: $10^{\circ} / h$
		Noise density of Gyroscope: $1^{\circ} / s / \sqrt{Hz}$
		In-run bias stability of Accelerometer: $2000 µ g$
		Noise density of Accelerometer: $200 µ g / \sqrt{Hz}$
UWB	10 Hz	Range accuracy: 0.3 m
		Maximum range: 500 m
Magnetometer	50 Hz	Resolution: 0.1°

Table 3. RMSE of the position in extended simulation experiments with 6 UAVs, in meters.

Method	UAV1	UAV2	UAV3	UAV4	UAV5	UAV6	Avg.
VINS-Mono	6.12	11.45	9.21	8.76	10.34	9.87	9.30
OpenVINS	5.87	12.01	8.67	9.12	11.05	10.43	9.53
SuperVINS	5.21	9.78	7.98	8.23	9.12	8.65	8.16
COVINS-G	5.43	9.23	7.56	7.89	8.92	8.15	7.86
COO_VIR	4.89	8.67	7.12	7.45	8.21	7.73	7.35
COO_OUR	3.78	6.23	5.89	6.12	6.45	5.97	5.74

Table 4. Positioning RMSE (m) in the vision loss simulation experiment.

Method	UAV1	UAV2	UAV3	UAV4	UAV5	UAV6	Avg.
COVINS	Failed	Failed	Failed	Failed	Failed	Failed	Failed
COO_VIR	12.78	22.98	18.76	19.32	22.54	20.43	19.47
COO_OUR	5.72	7.55	6.81	7.34	7.59	6.44	6.91

Table 5. Positioning RMSE (m) in the UWB abnormality simulation experiment.

Method	UAV1	UAV2	UAV3	UAV4	UAV5	UAV6	Avg.
COO_VIR	8.23	11.12	10.89	12.01	12.34	13.76	11.39
COO_OUR	4.12	6.56	6.34	7.11	7.67	7.03	6.47
COO_OUR (without_UWB_AA)	5.89	8.43	8.12	9.26	9.87	8.75	8.39

Table 6. Statistical analysis of UWB anomaly detection performance in simulation experiment.

Statistic Index	Simulation Experiment
Anomaly ratio in raw UWB data	20.0%
Correctly identified anomaly ratio	18.6%
Anomaly detection rate	93%

Table 7. Sensors carried by the UAV platform in real-world experiments.

Sensors	Data Frequency	Parameters
Camera	10 Hz	Resolution: 1280 × 1024 pixels
IMU	100 Hz	In-run bias stability of Gyroscope: $10^{\circ} / h$
		Noise density of Gyroscope: $1^{\circ} / s / \sqrt{Hz}$
		In-run bias stability of Accelerometer: $2000 µ g$
		Noise density of Accelerometer: $200 µ g \sqrt{Hz}$
UWB	10 Hz	Range accuracy: 0.3 m
		Maximum range: 500 m
Magnetometer	50 Hz	Resolution: 0.1°
Barometer	20 Hz	Accuracy: 10 cm

Table 8. RMSE of the position in real-world experiments, in meters.

Method	UAV1	UAV2	UAV3	Avg.
VINS-Mono	4.01	4.29	4.03	4.11
OpenVINS	3.97	4.06	3.41	3.81
SuperVINS	3.85	3.87	3.57	3.76
COO_VIR	3.04	3.49	2.57	3.03
COVINS-G	3.22	3.57	2.64	3.14
COO (ours_without_PO)	2.06	2.44	2.23	2.24
COO (ours_without_UWB)	2.43	2.95	2.38	2.59
COO (ours_without_MAG)	1.72	2.33	2.26	2.10
COO (ours_without_UWB_AA)	1.98	2.67	2.54	2.40
COO (ours)	1.53	2.08	1.97	1.86

Table 9. Statistical analysis of UWB anomaly detection performance in real-world experiment.

Statistic Index	UAV1	UAV2	UAV3	Average
Anomaly ratio in raw UWB data	12.2%	8.5%	6.8%	9.2%
Correctly identified anomaly ratio	11.0%	7.5%	6.2%	8.2%
Anomaly detection rate	90.2%	88.2%	91.2%	89.9%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Luo, X.; Du, X.; Yue, S.; Lv, Y.; Zhang, L.; He, X.; Wu, W.; Mao, J. DTVIRM-Swarm: A Distributed and Tightly Integrated Visual-Inertial-UWB-Magnetic System for Anchor Free Swarm Cooperative Localization. Drones 2026, 10, 49. https://doi.org/10.3390/drones10010049

AMA Style

Luo X, Du X, Yue S, Lv Y, Zhang L, He X, Wu W, Mao J. DTVIRM-Swarm: A Distributed and Tightly Integrated Visual-Inertial-UWB-Magnetic System for Anchor Free Swarm Cooperative Localization. Drones. 2026; 10(1):49. https://doi.org/10.3390/drones10010049

Chicago/Turabian Style

Luo, Xincan, Xueyu Du, Shuai Yue, Yunxiao Lv, Lilian Zhang, Xiaofeng He, Wenqi Wu, and Jun Mao. 2026. "DTVIRM-Swarm: A Distributed and Tightly Integrated Visual-Inertial-UWB-Magnetic System for Anchor Free Swarm Cooperative Localization" Drones 10, no. 1: 49. https://doi.org/10.3390/drones10010049

APA Style

Luo, X., Du, X., Yue, S., Lv, Y., Zhang, L., He, X., Wu, W., & Mao, J. (2026). DTVIRM-Swarm: A Distributed and Tightly Integrated Visual-Inertial-UWB-Magnetic System for Anchor Free Swarm Cooperative Localization. Drones, 10(1), 49. https://doi.org/10.3390/drones10010049

Article Menu

DTVIRM-Swarm: A Distributed and Tightly Integrated Visual-Inertial-UWB-Magnetic System for Anchor Free Swarm Cooperative Localization

Highlights

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. Overview of the Architecture

3.2. Multi-Sensor Time Synchronization and Delay Handling

3.2.1. Time Synchronization for Onboard Sensors

3.2.2. Delay Handling for Inter-UAV Information

3.3. Swarm Absolute Pose and Attitude Initialization

3.4. Filter State and Propagation

3.5. Visual Measurement Based on PO Theory

3.6. Collaborative Localization with Anchor-Free UWB Measurement and Magnetic Assisted

3.6.1. Magnetic Heading Assisted Localization

Magnetic Heading Measurement Model

Robust Magnetic Heading Estimation

3.6.2. Anchor-Free UWB Measurement Model

Basic UWB Measurement Model

Cumulative UWB Measurement with Neighbor Uncertainty

Uncertainty Propagation from Neighbors

3.6.3. Covariance Intersection (CI) for Consistent UWB Fusion

CI Fusion Principle

CI-Enhanced Measurement Noise

3.6.4. Integrated UWB Range Update Algorithm

3.7. UWB Measurements Adaptive Adjustment Based on Chi-Squared Detection

3.8. Observability Analysis for Ranging + Odometry Swarm Systems

3.9. Analysis of Absolute Accuracy Improvement Mechanism in Anchor-Free Collaborative Localization

3.9.1. Swarm Initialization with Absolute Pose Estimation

3.9.2. Continuous Accuracy Enhancement Through Multi-Constraint Fusion

3.9.3. Network Drift Mitigation Strategies

3.10. Scalability Analysis

4. Experimental Evaluation

4.1. Swarm Position Initialization Simulation Experiment

4.2. Swarm Cooperative Localization Simulation Experiment

4.3. Vision Loss Experiment

4.4. UWB Abnormal Experiment

4.5. Physical System Construction

4.6. Real-World Experiment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI