Adaptive Particle Filter-Neural Network Fusion for Cooperative Localization of Multi-UAV Systems in GNSS-Denied Indoor Environments

Wang, Zhongyi; Wang, Hao; Liu, Shuzhi

doi:10.3390/computers15030172

Open AccessArticle

Adaptive Particle Filter-Neural Network Fusion for Cooperative Localization of Multi-UAV Systems in GNSS-Denied Indoor Environments

by

Zhongyi Wang

,

Hao Wang

and

Shuzhi Liu

^*

School of Physics and Electronic Engineering, Qilu Normal University, Jinan 250200, China

^*

Author to whom correspondence should be addressed.

Computers 2026, 15(3), 172; https://doi.org/10.3390/computers15030172

Submission received: 12 February 2026 / Revised: 5 March 2026 / Accepted: 5 March 2026 / Published: 6 March 2026

(This article belongs to the Special Issue AI in Action: Innovations and Breakthroughs)

Download

Browse Figures

Versions Notes

Abstract

Accurate autonomous navigation of unmanned aerial vehicles (UAVs) in complex indoor environments where satellite signals are denied remains a critical challenge. Conventional state estimation methods, such as particle filters, often suffer from particle degeneracy and high computational costs, limiting their robustness and real-time applicability. Here, we introduce an adaptive particle filter-neural network (PF-NN) fusion framework that achieves high-fidelity cooperative localization for multi-UAV systems. Our approach integrates a lightweight neural network that optimizes particle weight allocation by learning from motion consistency, thereby mitigating sample impoverishment. This is coupled with an adaptive resampling strategy that dynamically adjusts the particle population based on the effective sample size, balancing computational load with estimation accuracy. By fusing ultra-wideband (UWB) inter-vehicle ranging with visual landmark observations, the system leverages both global and local constraints to achieve robust state estimation. In simulations involving six UAVs in a complex indoor setting, our algorithm demonstrated superior performance, achieving an average root-mean-square error (RMSE) of 0.437 m. This work provides a robust and efficient solution for multi-UAV cooperative localization, paving the way for reliable autonomous operations in GNSS-denied scenarios such as search-and-rescue and industrial inspection.

Keywords:

multi-UAV; cooperative localization; particle filter; neural network; Monte Carlo method

Graphical Abstract

1. Introduction

The proliferation of unmanned aerial vehicles (UAVs) has fundamentally transformed numerous application domains, ranging from logistics and transportation to emergency response [1,2]. Accurate and continuous localization constitutes a prerequisite for autonomous flight. While Global Navigation Satellite Systems (GNSSs) provide reliable solutions in outdoor environments, their signals experience severe attenuation or complete loss in indoor spaces, underground structures, and dense urban canyons, creating critical operational bottlenecks [3]. This limitation has catalyzed the development of alternative localization technologies for GNSS-denied environments. Among these, multi-UAV cooperative localization has emerged as a powerful paradigm wherein swarm members share sensor data to mutually enhance positioning accuracy, thereby achieving system-level robustness exceeding the capabilities of any individual agent [4].

Existing research has predominantly focused on Simultaneous Localization and Mapping (SLAM), particularly visual SLAM (V-SLAM) systems that navigate GNSS-Denied regions using onboard cameras [5]. However, V-SLAM systems are highly susceptible to failure in visually degraded conditions—such as texture-sparse surfaces, dramatic illumination variations, or rapid motion—resulting in severe accumulated drift [6]. To overcome these limitations, researchers have turned toward multi-sensor fusion. Ultra-wideband (UWB) technology, with its high-precision ranging capabilities and strong resilience to multipath interference, has proven effective in providing absolute or relative distance constraints that anchor SLAM estimates and suppress cumulative errors [7,8,9]. The fusion of UWB with visual and inertial measurement unit (IMU) data has demonstrated enhanced robustness under challenging conditions [6,10].

For state estimation in such nonlinear, non-Gaussian systems, particle filters (PFs) have been widely adopted due to their ability to represent arbitrary probability distributions [11,12]. Nevertheless, standard particle filters face two fundamental challenges. First, particle degeneracy—wherein a small subset of particles accumulates the majority of weight mass—leads to loss of particle diversity. Second, the substantial computational cost associated with maintaining a large fixed particle population for accuracy becomes prohibitive on resource-constrained UAV platforms [13]. These issues severely constrain the effectiveness of particle filters in robotic applications requiring high-dimensional state estimation and real-time responsiveness.

In recent years, learning-based methods and adaptive strategies have emerged to address the inherent limitations of particle filters. Adaptive particle filters improve computational efficiency by dynamically adjusting particle counts to match estimation uncertainty [12,14]. Concurrently, neural network integration has demonstrated substantial potential in optimizing filter performance. For instance, some studies employ neural networks to learn superior resampling strategies [15] or adaptively adjust filter parameters [16]. More advanced end-to-end learning frameworks, such as PF-Net [17], attempt to learn entire state estimation models. Recently, more sophisticated deep learning-based frameworks, including Differentiable Particle Filters [18] and Deep Latent Space Particle Filters (D-LSPF) [19], have shown remarkable capabilities by performing filtering in low-dimensional latent spaces or leveraging gradient information, demonstrating the powerful synergy between learning and classical filtering algorithms.

Despite these significant advances, a holistic framework that integrates multi-sensor fusion, adaptive computation, and intelligent weight optimization specifically designed for multi-UAV cooperative localization remains an open research gap. Table 1 provides a comprehensive comparison of existing approaches, highlighting their respective contributions and limitations.

Table 1 provides a comprehensive comparison of representative approaches in multi-UAV localization and particle filter methods, summarizing their key methodologies, primary contributions, and inherent limitations. This analysis reveals that while existing methods address specific aspects of the localization problem, none offers a unified solution combining multi-sensor fusion, adaptive computation, and intelligent weight optimization specifically tailored for multi-UAV cooperative localization in GNSS-denied environments.

To address this challenge, we present an adaptive particle filter-neural network (PF-NN) fusion framework for multi-UAV cooperative localization. Our solution combines a Monte Carlo particle filter with a lightweight neural network that learns motion consistency to optimize particle weights, thereby enhancing the filter’s robustness against observation noise and model uncertainties. Simultaneously, an adaptive resampling mechanism dynamically adjusts particle counts based on the effective sample size (ESS), ensuring efficient allocation of computational resources. The system fuses UWB-based inter-UAV ranging information with visual observations of environmental landmarks, enabling each UAV to benefit from both local environmental features and global swarm-level geometric constraints. Simulation results reveal that, in a complex indoor environment involving six UAVs, PF-NN exhibits superior localization performance with an average root-mean-square error (RMSE) of 0.437 m. The proposed framework achieves sub-meter accuracy while effectively mitigating uncertainties induced by sensor noise and environmental sparsity.

Our principal contributions are:

The introduction of a PF-NN fusion framework that leverages neural networks to optimize particle weight allocation;
The development of an adaptive resampling strategy that balances accuracy with computational load;
The design of a cooperative observation model that robustly fuses UWB and visual data.

The remainder of this paper is organized as follows. Section 2 details the system model and the proposed PF-NN algorithm. Section 3 presents the simulation setup and analyzes the localization performance and adaptive mechanisms. Finally, Section 4 discusses the findings and concludes the paper with directions for future work.

2. System Model and Proposed Algorithm

2.1. State Space Model

We consider a multi-UAV system with

N = 6

UAVs in a 3D indoor space. The state vector for each UAV

i

at time step

k

is defined as:

x_{k}^{(i)} = [p_{k}^{(i) T}, v_{k}^{(i) T}, q_{k}^{(i) T}]^{T} \in R^{10}

(1)

where

p_{k}^{(i)} = [x_{k}^{(i)}, y_{k}^{(i)}, z_{k}^{(i)}]^{T}

represents the 3-dimensional position,

v_{k}^{(i)} = [v_{x, k}^{(i)}, v_{y, k}^{(i)}, v_{z, k}^{(i)}]^{T}

denotes the velocity vector, and

q_{k}^{(i)} = [q_{w, k}^{(i)}, q_{x, k}^{(i)}, q_{y, k}^{(i)}, q_{z, k}^{(i)}]^{T}

is the unit quaternion representing the vehicle’s orientation at time step

k

.

The state evolves according to a nonlinear motion model driven by inertial measurement unit (IMU) inputs:

x_{k + 1}^{(i)} = f (x_{k}^{(i)}, u_{k}^{(i)}, w_{k}^{(i)})

(2)

where

u_{k}^{(i)} = [a_{k}^{(i) T}, ω_{k}^{(i) T}]^{T}

consists of the measured linear acceleration and angular velocity, and

w_{k}^{(i)} \sim N (0, Q_{k})

represents additive process noise.

The position and velocity are updated using a constant acceleration model:

p_{k + 1}^{(i)} = p_{k}^{(i)} + v_{k}^{(i)} Δ t + \frac{1}{2} (a_{k}^{(i)} + w_{a, k}^{(i)}) Δ t^{2}

(3)

The velocity update equation integrates acceleration measurements:

v_{k + 1}^{(i)} = v_{k}^{(i)} + (a_{k}^{(i)} + w_{a, k}^{(i)}) Δ t

(4)

Orientation is propagated using quaternion multiplication with incremental rotation derived from gyroscope measurements, providing a singularity-free attitude representation.

After each quaternion update, normalization is performed to maintain unit length:

q \leftarrow q / | | q | |

(5)

This ensures the quaternion remains a valid rotation representation. Process noise integration for orientation updates requires careful handling due to the nonlinear nature of quaternion multiplication. The angular velocity measurement incorporates gyroscope noise as

ω_{k} = ω_{true} + η_{ω}

, where

η_{ω} \sim N (0, Q_{ω})

and

Q_{ω}

is the gyroscope noise covariance. The incremental rotation quaternion

Δ_{q}

is computed from the noisy angular velocity, and the quaternion update in Equation (2) effectively propagates this noise through the multiplicative operation. The resulting orientation uncertainty is represented by the dispersion of particle orientations in the quaternion space. After each update, small-angle approximations allow us to model the quaternion noise as a Gaussian distribution in the tangent space (3D rotation vector), which is then mapped back to the quaternion space. This approach maintains the unit quaternion constraint while properly accounting for orientation uncertainty in the prediction step.

2.2. Monte Carlo Particle Filter Framework

The localization problem is cast within a Bayesian filtering framework, aiming to estimate the posterior probability distribution

p (x_{k} | z_{1 : k})

. We use a sequential Monte Carlo method, or particle filter, to approximate this posterior with a set of

M

weighted particles

{x_{k}^{(j)}, w_{k}^{(j)}}_{j = 1}^{M}

:

p (x_{k} | z_{1 : k}) \approx \sum_{j = 1}^{M} w_{k}^{(j)} δ (x_{k} - x_{k}^{(j)})

(6)

where

M

is the number of particles,

x_{k}^{(j)}

represents the

j^{t h}

particle state,

δ (\cdot)

function denotes the Dirac delta distribution,

w_{k}^{(j)}

are normalized weights satisfying

\sum_{j} w_{k}^{(j)} = 1

.

Particle weights are updated according to the observation likelihood. When using the state transition distribution as the proposal distribution, the weight update simplifies to:

w_{k}^{(j)} \propto p (z_{k} ∣ x_{k}^{(j)}) \cdot w_{k - 1}^{(j)}

(7)

To address numerical stability issues arising from likelihood underflow, we employ log-likelihood computations. The observation likelihood combines contributions from both landmark observations and inter-vehicle range measurements.

The log-likelihood computation operates as follows:

For each particle, we compute the logarithm of the observation likelihood

l o g p (z_{k} ∣ x_{k}^{(j)})

instead of the raw likelihood. This transformation converts the product of independent likelihoods into a sum:

\log p (z_{k} ∣ x_{k}) = \sum_{i} \log p (z_{k}^{i} ∣ x_{k})

(8)

where

z_{k}^{i}

represents individual measurements (landmark or UWB).

For Gaussian measurement models, the log-likelihood has the closed form:

\log p (z ∣ x) = - 0.5 [(z - h (x))^{⊤} R^{- 1} (z - h (x)) + \log | 2 π R |]

(9)

where

h (x)

is the observation function and R is the measurement noise covariance. Working in log space prevents numerical underflow when multiplying many small probability values (each likelihood is typically << 1), as the sum of log-likelihoods remains numerically stable even with hundreds of particles and multiple observations per step.

After computing all log-weights, we apply the log-sum-exp trick for normalization:

w_{k}^{(j)} = \exp ({\log \tilde{w}}_{k}^{(j)} - \log_s u m_e x p)

(10)

where

\log_s u m_e x p = \log (\sum_{l} \exp ({\log \tilde{w}}_{k}^{(l)}))

. This ensures numerical stability throughout the weight update process.

2.3. Neural Network-Optimized Weight Update

Traditional particle filters rely solely on observation likelihood for weight assignment, potentially overlooking valuable information from state estimation history. To enhance robustness against observation noise and model inaccuracies, we introduce a lightweight neural network to optimize particle weight allocation by incorporating motion consistency features.

The neural network architecture consists of three layers as shown in Figure 1.

The input

f_{k}^{(j)} \in R^{6}

is a feature vector encoding the deviation of a particle’s state from the previous time step’s estimate:

f_{k}^{(j)} = [Δ p^{T}, Δ v^{T}]^{T} = [p_{k}^{(j)} - {\hat{p}}_{k - 1}, v_{k}^{(j)} - {\hat{v}}_{k - 1}]^{T}

(11)

The hidden layer applies the hyperbolic tangent activation function for nonlinear feature extraction:

h_{1} = \tanh (W_{1} \cdot f_{k}^{(j)} + b_{1})

(12)

where

W_{1} in R^{H x 6}

and

b_{1}

in

R^{H}

are the weight matrix and bias vector, respectively, and

H = 12

is the number of hidden neurons. The sigmoid-activated output layer produces a weight correction coefficient:

y_{k}^{(j)} = σ (W_{2} \cdot h + b_{2})

(13)

This output is then used to modulate the original particle weight. The neural-weighted likelihood is calculated as:

{\tilde{w}}_{k}^{(j)} = w_{k}^{(j)} \cdot [0.7 + 0.6 \cdot (y^{(j)} - 0.5)]

(14)

To determine the optimal neural weight correction range, we conducted an ablation study comparing different correction ranges under identical simulation conditions. Figure 2 presents the localization performance achieved with various neural weight correction ranges.

It can be observed from Figure 2 that the proposed range [0.4, 1.0] delivers the optimal performance across all evaluated metrics. Specifically, this range achieves the lowest average RMSE (0.437 m) and maximum error (6.91 m), while maintaining a moderate particle count (720 particles). In contrast, overly aggressive penalty ranges (e.g., [0.2, 1.0] and [0.3, 1.0]) result in increased RMSE and larger fluctuations in particle number, whereas overly conservative ranges (e.g., [0.5, 1.0]) fail to adequately suppress inconsistent particles. Variations in the upper bound ([0.4, 1.2] and [0.4, 0.9]) show limited impact on performance, indicating that the primary function of the neural network is penalty rather than reward.

The network parameters are updated through online gradient descent using particle states as training data. The loss function minimizes the squared error between network output and a target value derived from particle position accuracy:

t^{(j)} = \exp (- ‖p^{(j)} - p_{t r u e}‖)

(15)

where

p^{(j)}

is the particle position and

p_{t r u e}

is the ground truth position. This exponential decay function maps position errors to target values in (0, 1], with perfect particles (zero error) receiving a target of 1. The loss function is defined as:

L = (1 / M_{t r a i n}) Σ_{j} (y^{(j)} - t^{(j)})^{2}

(16)

Training occurs every

T_{t r a i n} = 10

time steps, enabling the network to adapt to changing environmental conditions and system dynamics.

It is important to acknowledge a fundamental limitation of the current online training approach: the loss function in Equation (16) relies on ground-truth positions, which are unavailable during real-world deployment. This dependency currently restricts the framework to simulation environments or scenarios where external positioning systems (e.g., motion capture systems) can provide training supervision during an initial calibration phase. To address this limitation, future work will investigate self-supervised and unsupervised learning paradigms.

Potential approaches include: (1) using multi-view geometric constraints from overlapping UAV observations to generate pseudo-labels; (2) incorporating factor graph optimization to produce consensus estimates that serve as training targets; and (3) exploring contrastive learning techniques that learn motion consistency without explicit position labels. These directions aim to achieve fully autonomous online learning without ground-truth dependency.

To bridge the gap between simulation and real-world deployment, we propose a phased training strategy that can be implemented in practical scenarios. Phase 1 (Calibration Phase): During an initial deployment period in a controlled environment (e.g., a warehouse or indoor facility with pre-installed UWB anchors at known positions), the UAVs can utilize the known anchor positions and trilateration to obtain approximate position estimates. These estimates, while less accurate than motion capture systems, can serve as pseudo-ground-truth labels for initial network training. Phase 2 (Transfer Learning): The pre-trained network weights from Phase 1 are then used to initialize the network for operational deployment. During this phase, the network switches to a self-supervised mode where training targets are generated through multi-view geometric consistency checks among cooperating UAVs. Specifically, when two or more UAVs observe common landmarks, their relative poses can be constrained through epipolar geometry, providing consistency-based supervision signals. Phase 3 (Online Adaptation): In the fully operational mode, the network employs a slow-learning-rate adaptation strategy where weight updates are driven by temporal consistency losses—penalizing particles that deviate significantly from the filter’s predicted state based on motion model continuity.

2.4. Adaptive Resampling Strategy

Particle degeneracy, wherein a small subset of particles accumulates disproportionate weight mass, represents a fundamental challenge in particle filtering. We employ the effective sample size (ESS) as a diagnostic metric for degeneracy monitoring:

N_{E S S} = \frac{1}{\sum_{j = 1}^{M} (w_{k}^{(j)})^{2}}

(17)

The resampling threshold is set to

η_{t h} = 0.5

, meaning resampling is triggered when ESS falls below 50% of the current particle count (

N_{E S S} < 0.5 \times N_{p}

).

The ESS ranges from 1 (complete degeneracy) to

N_{p}

(uniform weight distribution). When ESS falls below the threshold, systematic resampling is triggered to restore particle diversity.

To balance accuracy and computational cost, we implement adaptive particle number adjustment:

M_{k + 1} = \{\begin{array}{l} \min (M_{k} + Δ M, M_{m a x}), i f N_{e f f} < 0.3 M_{k} \\ \max (M_{k} - Δ M, M_{m i n}), i f N_{e f f} > 0.8 M_{k} \\ M_{k}, o t h e r w i s e \end{array}

(18)

This mechanism increases particle count during high-uncertainty periods and reduces it when the filter has converged, achieving significant computational savings without sacrificing accuracy.

2.5. Multi-UAV Cooperative Observation Model

The observation model fuses two sensor modalities: visual landmark observations and UWB inter-vehicle ranging.

For visual observations, the UAV camera detects environmental landmarks within a maximum range of 10 m. Landmarks are detected using ORB features, matched across frames via BFMatcher with Hamming distance [31]. Given a landmark’s world position

L_{w}

and the UAV’s estimated position p and orientation q, its body-frame position is computed as:

L_{b} = R (q)^{T} \times (L_{w} - p)

(19)

yielding distance

d = | | L_{b} | |

and bearing

b = L_{b} / d

. This requires prior knowledge of landmark positions, feasible in structured indoor environments with pre-mapped or artificial markers.

Although, the alternative methods include (1) Direct visual odometry (DSO, LSD-SLAM); (2) Deep learning features (SuperPoint, R2D2); (3) Visual-inertial odometry (VIO).

The observation function maps landmark positions to measured distances and bearing vectors:

z_{k}^{(i, m)} = h (x_{k}^{(i)}, m_{m}) + n_{k}^{(i, m)}

(20)

p (z_{k}^{(i, m)} | x_{k}^{(i)}) = N (z_{k}^{(i, m)}; h (x_{k}^{(i)}, m_{m}), Σ_{c a m e r a})

(21)

where

Σ_{c a m e r a} = d i a g (σ_{c a m e r a}^{2}, σ_{c a m e r a}^{2}, σ_{c a m e r a}^{2})

is the measurement noise covariance, and

σ_{c a m e r a} = 0.15 m

.

UWB provides pairwise distance measurements between UAVs:

z_{k}^{(i, j)} = ‖ p_{k}^{(i)} - p_{k}^{(j)} ‖ + ν_{k}^{(i, j)}

(22)

where

ν_{k}^{(i, j)} \sim N (0, σ_{U W B}^{2})

captures ranging noise. The UWB likelihood penalizes deviations between measured and expected inter-vehicle distances:

p (z_{k}^{(i, j)} | x_{k}^{(i)}, x_{k}^{(j)}) = (\frac{1}{\sqrt{2 π}} σ_{UWB}) \times \exp (- (z_{k}^{(i, j)} - d)^{2} / (2 σ_{UWB}^{2}))

(23)

where

d = | | p_{k}^{(i)} - p_{k}^{(j)} | |

is the expected distance, and

σ_{U W B} = 0.20 m

is the UWB ranging noise standard deviation.

The joint observation likelihood for UAV

i

is the product of likelihoods from all observed landmarks

m \in M_{k}^{(i)}

and neighboring UAVs

j \in N_{k}^{(i)}

:

p (z_{k}^{(i)} | x_{k}^{(i)}) = \prod_{m \in M_{k}^{(i)}} p (z_{k}^{(i, m)} | x_{k}^{(i)}) \cdot \prod_{j \in N_{k}^{(i)}} p (z_{k}^{(i, j)} | x_{k}^{(i)}, {\hat{x}}_{k}^{(j)})

(24)

where

M_{k}^{(i)}

denotes the set of observed landmarks and

N_{k}^{(i)}

represents the set of neighboring UAVs within communication range. This fusion allows each UAV to correct its position estimate using both local environmental features and global information from the swarm’s geometry.

In practice, landmark association is subject to false matches and visual outliers, particularly in environments with repetitive textures or symmetric structures. To enhance robustness, the framework can incorporate outlier rejection mechanisms such as: (1) Random Sample Consensus (RANSAC) to identify and reject inconsistent landmark correspondences; (2) Mahalanobis distance gating to filter observations with unlikely innovation statistics; (3) temporal consistency checks that verify landmark observations across consecutive frames; and (4) multi-UAV cross-validation where consistent observations from multiple UAVs increase confidence in landmark associations. These mechanisms prevent incorrect data associations from causing severe accumulated drift in the localization estimates.

Meanwhile, the proposed observation model framework is designed to be extensible, allowing integration of additional sensing modalities for improved localization in limited 3D scenes.

Potential extensions include:

(1): Scattered optical signals: In environments with external light sources (e.g., windows, lamps), photodiodes or light sensors can measure incident light angles and intensities. These measurements can be fused with UWB and visual data by modeling the expected light field given the UAV’s position and known source locations.
(2): WiFi/Bluetooth RSSI: Received signal strength indicators from existing infrastructure can provide coarse position constraints, particularly useful for initialization or when other sensors are degraded.
(3): Ultrasonic ranging: Short-range ultrasonic sensors can provide proximity measurements to walls and obstacles, complementing the longer-range UWB and visual observations.
(4): Magnetic field sensing: Indoor magnetic field anomalies can serve as location fingerprints when mapped a priori.

These additional modalities can be incorporated into the joint likelihood function (Equation (24)) as additional product terms, with appropriate measurement models for each sensor type. The particle filter framework naturally handles the heterogeneous noise characteristics of different sensors through their respective likelihood functions.

2.6. Algorithm Implementation

This section presents the complete algorithmic implementation of the proposed PF-NN fusion framework. Algorithm 1 outlines the main PF-NN fusion procedure, which integrates Monte Carlo prediction, neural network inference, and adaptive weight updates for each UAV.

Algorithm 1 PF-NN Fusion Algorithm

Input: Initial particles

{x_{0}^{(j)}, w_{0}^{(j)} = \frac{1}{N_{p}}}_{j = 1}^{N_{p}}

, neural network parameters θ
Output: State estimates

{\{{\hat{x}}_{k}\}}_{k = 1}^{K}

1: for k = 1, 2, …, K do
2: for each UAV i = 1, …, N do
3: //Prediction step
4:

x_{k}^{(j)} \sim p (x_{k} ∣ x_{k - 1}^{(j)}, u_{k - 1})

//Monte Carlo propagation
5:
6: //Observation acquisition
7:

z_{k}^{(i)} = \{d_{k}^{U W B}, l_{k}^{V i s u a l}\}

8:
9: //Neural network forward propagation
10:

y^{(j)} = N N (x_{k}^{(j)}; θ)

11:
12: //Weight update with neural correction
13:

{\tilde{w}}_{k}^{(j)} = p (z_{k} ∣ x_{k}^{(j)}) \times [0.7 + 0.6 (y^{(j)} - 0.5)]

14:

w_{k}^{(j)} = \frac{{\tilde{w}}_{k}^{(j)}}{{\sum_{l = 1}^{N_{p}} \tilde{w}}_{k}^{(l)}}

15:
16: //Compute effective sample size
17:

E S S = \frac{1}{{\sum_{j = 1}^{N_{p}} (w_{k}^{(j)})}^{2}}

18:
19: if

E S S < η_{t h} \cdot N_{p}

then
20: //Systematic resampling
21:

\{x_{k}^{(j)}, w_{k}^{(j)}\} \leftarrow S y s t e m a t i c R e s a m p l e (\{x_{k}^{(j)}, {\tilde{w}}_{k}^{(j)}\})

22: //Adaptive particle adjustment
23:

N_{p} \leftarrow A d a p t i v e A d j u s t P a r t i c l e N u m b e r (\{E S S, N_{p}\})

24: end if
25:
26: //State estimation
27:

{\hat{x}}_{k}^{(i)} = \sum_{j = 1}^{N_{p}} w_{k}^{(j)} x_{k}^{(j)}

28:
29: //Neural network online training
30: if k mod

T_{t r a i n} = = 0

then
31:

θ \leftarrow T r a i n N N ({\{x_{k}^{(j)}\}}_{j = 1}^{N_{p}}, \{p_{t r u e}\})

32: end if
33: end for
34: end for

Algorithm 2 details the systematic resampling mechanism with adaptive particle number adjustment, which maintains estimation accuracy while optimizing computational resources.

Algorithm 2 Adaptive Resampling Strategy

Input: Particle set

{\{x^{(j)}, w^{(j)}\}}_{j = 1}^{N_{p}}

, ESS,

N_{m i n}

,

N_{m a x}

Output: Resampled particles

{\{x^{' (j)}, w^{' (j)}\}}_{j = 1}^{{N^{'}}_{p}}

1: //Systematic resampling
2: Compute cumulative distribution:

C_{j} = \sum_{l = 1}^{j} w^{(l)}

3: Generate stratified samples:

u_{j} = \frac{j - 1}{N_{p}} + \frac{U (0,1)}{N_{p}}

4: for j = 1 to

N_{p}

do
5: Find smallest i such that C_i ≥ u_j
6:

x^{' (j)} = x^{(i)}

7: end for
8:
9: //Add perturbation to prevent sample impoverishment
10: for j = 1 to

N_{p}

do
11:

x^{' (j)} \leftarrow x^{' (j)} + N (0, Σ_{p e r t u r b})

12: end do
13:
14://Reset weights
15:

w^{' (j)} = \frac{1}{N_{p}}

for all j
16:
17://Adaptive particle number adjustment
18: if

E S S < 0.3 \times N_{p}

and

N_{p} < N_{m a x}

then
19:

{N^{'}}_{p} = \min (N_{p} + 100, N_{m a x})

20: Add

{N^{'}}_{p} - N_{p}

new particles by perturbing existing ones
21: else if

E S S > 0.8 \times N_{p}

and

N_{m i n} < N_{p}

then
22:

{N^{'}}_{p} = \max (N_{p} - 50, N_{m i n})

23: Keep top

{N^{'}}_{p}

particles by weight
24: else
25:

{N^{'}}_{p} = N_{p}

26: end if
27:
28: return

{\{x^{' (j)}, w^{' (j)}\}}_{j = 1}^{{N^{'}}_{p}}

3. Simulation Results and Analysis

3.1. Simulation Parameters and Experimental Setup

The simulation environment was designed to rigorously evaluate the proposed PF-NN fusion algorithm under realistic indoor conditions. 6 UAVs operate within a

30 m \times 20 m \times 5 m

indoor space populated with 20 randomly distributed landmarks serving as visual reference points. The simulation duration spans 100 s with a discrete time step of 0.1 s, resulting in 100 total iterations. Each UAV follows a circular trajectory with an angular velocity of

0.15 r a d / s

and a linear velocity of

0.5 m / s

. The phase offsets are distributed evenly among the 6 UAVs at intervals of

\frac{π}{3}

to ensure spatial separation and prevent collisions. The motion model incorporates realistic IMU noise with a standard deviation of

0.05 m / s^{2}

and gyroscope noise with a standard deviation of

0.01 r a d / s

.

Key parameters for the simulation are summarized in Table 2.

The adaptive particle mechanism initializes with 300 particles per UAV, with dynamic adjustment bounds between 100 and 800 particles. The resampling threshold is set to 0.5, meaning resampling triggers when the effective sample size falls below 50% of the current particle count. The UWB ranging noise is modeled with a standard deviation of 0.20 m, while visual observations have a standard deviation of 0.15 m with a maximum detection range of 10 m.

The lightweight neural network consists of 6 input neurons representing the state deviation vector, 12 hidden neurons with hyperbolic tangent activation, and 1 output neuron with sigmoid activation. Online training occurs every 10 time steps using 50 randomly selected particles, with a learning rate of 0.01.

We acknowledge that the current evaluation is limited to simulated environments with idealized sensor models. Real-world indoor flight presents additional challenges not fully captured in simulation, including: (1) UWB multipath effects caused by signal reflections from walls and obstacles; (2) visual degradation due to motion blur, illumination variations, and texture-sparse regions; (3) dynamic communication latency in wireless networks; and (4) unmodeled aerodynamic disturbances. To validate the framework’s practical utility, future work will evaluate performance on: (1) standard open-source MAV datasets such as the EuRoC dataset or UZH-FPV drone racing dataset; (2) hardware-in-the-loop simulations incorporating real sensor characteristics; and (3) real-world experiments with physical UAV platforms equipped with UWB and visual sensors.

3.2. Localization Performance

3.2.1. Three-Dimensional Trajectory Comparison

The algorithm demonstrated high-fidelity tracking of all six UAVs. Figure 3 shows the estimated trajectories (dashed lines) closely following the ground truth paths (solid lines), visually confirming the accuracy of the localization system.

The estimated trajectories closely follow the ground truth trajectories for all six UAVs throughout the one-hundred-second simulation, demonstrating the algorithm’s capability to maintain accurate localization. The overlapping nature of true and estimated paths indicates sub-meter tracking accuracy. The six UAVs follow circular trajectories with phase offsets of sixty degrees, creating a hexagonal formation pattern that ensures adequate separation while maintaining UWB communication range. The black squares represent the twenty environmental landmarks randomly distributed throughout the space. The trajectories demonstrate that UAVs traverse regions with varying landmark densities, testing the algorithm’s robustness under different observability conditions. UAVs flying through landmark-sparse regions must rely more heavily on UWB inter-vehicle measurements for localization. The z-axis variation from approximately 1.2 m to 1.8 m reflects the sinusoidal vertical motion component added to each UAV’s trajectory, testing the algorithm’s ability to track three-dimensional position.

3.2.2. Temporal Evolution of Localization Error

Figure 4 illustrates the localization error evolution over time for all six UAVs. All UAVs exhibit rapid error reduction during the first ten seconds, with errors dropping from initial values exceeding two meters to below one meter. This convergence behavior is characteristic of particle filters, where the initial uniform particle distribution gradually concentrates around high-likelihood regions as observations accumulate. The weight update mechanism assigns higher weights to particles with greater observation likelihood, driving the filter toward accurate state estimates. Several UAVs exhibit transient error spikes exceeding two meters at various time instants. UAV-3 and UAV-4 show spikes around forty to fifty seconds, reaching approximately nine to ten meters. UAV-6 exhibits a spike near sixty seconds, reaching about seven meters. These spikes correlate with periods when UAVs traverse regions with limited landmark visibility. When a UAV enters a landmark-sparse region, the observation likelihood becomes less informative, causing particle diversity to decrease as indicated by a lower effective sample size. The adaptive particle increase mechanism addresses this by adding up to one hundred particles when the effective sample size falls below thirty percent of the current particle count.

After the initial convergence, most UAVs maintain errors below one meter for the majority of the simulation. UAV-2 and UAV-5 demonstrate particularly stable performance, with errors consistently remaining below 0.5 m after twenty seconds. This stability reflects the effectiveness of the neural network weight optimization in maintaining accurate particle weight distributions. The error curves show some correlation between UAVs, particularly during time intervals when multiple UAVs simultaneously experience increased uncertainty. This correlation arises from the cooperative observation model, where each UAV’s localization depends on its neighbors’ estimated positions through UWB ranging measurements.

3.2.3. Quantitative Performance Metrics

Quantitative performance metrics are provided in Table 2 and visualized in Figure 5. Table 3 presents the comprehensive performance metrics for each UAV calculated over the entire one-hundred-second simulation. The RMSE values range from 0.374 m for UAV-2 to 0.535 m for UAV-4, with an average of 0.437 m across all six UAVs. This sub-meter accuracy validates the algorithm’s suitability for indoor navigation applications, where typical accuracy requirements range from 0.3 to 1.0 m. The mean error values range from 0.143 m to 0.343 m and are consistently lower than the RMSE values, indicating that the error distribution is right-skewed with occasional large deviations. This skewness is evident in the box plots shown in Figure 5, where the median errors are significantly lower than the maximum errors.

UAV-4 experiences the largest maximum error of 10.385 m, followed by UAV-3 with 9.783 m, as shown in Figure 5. These extreme values occur during transient periods of high uncertainty and represent isolated incidents lasting only a few time steps. The adaptive mechanism’s response time limits the duration of such error spikes. The performance variation among UAVs, with an RMSE standard deviation of 0.065 m, can be attributed to trajectory-dependent landmark visibility, relative position to other UAVs affecting UWB measurement geometry, and random initialization of neural network weights.

3.2.4. Error Distribution Characteristics

The error distribution statistics, presented as box plots in Figure 6, further highlight the system’s consistency. The median errors for all UAVs remain below 0.2 m, indicating that fifty percent of all time steps achieve errors below this threshold. This demonstrates consistent performance across the majority of the simulation.

UAV-2 and UAV-5 have the smallest interquartile ranges, indicating the most consistent performance, while UAV-1 and UAV-4 show larger interquartile ranges, reflecting greater variability. The outliers represent transient error spikes, with UAV-3 and UAV-4 exhibiting the most outliers consistent with their higher maximum errors. These outliers cluster in the five to ten meter range, corresponding to the spikes visible in Figure 4. The asymmetric box plots confirm the right-skewed error distribution, where most errors are small with occasional large deviations.

3.3. Adaptive Mechanism Analysis

3.3.1. Particle Number Adaptation

Figure 7 illustrates the adaptive particle number evolution over the simulation duration. All UAVs maintain particle counts near the maximum value of eight hundred throughout most of the simulation, indicating sustained uncertainty that triggers the adaptive increase mechanism. At simulation start, particles are uniformly distributed with large initial variance, requiring many particles to adequately represent the posterior. The circular trajectories with sinusoidal vertical components create continuously changing state estimates, preventing the filter from fully converging to a small particle set. The UWB and visual observation noises limit the achievable localization precision, maintaining persistent uncertainty.

The adaptive increase condition triggers when the effective sample size falls below thirty percent of the current particle count and the current count is below the maximum. This condition triggers frequently due to the challenging observation environment, causing most UAVs to reach and maintain the maximum of eight hundred particles. The particle decrease condition, which triggers when the effective sample ratio exceeds eighty percent, rarely activates in this simulation as the sustained motion and observation noise prevent such high ratios for extended periods. Maintaining eight hundred particles per UAV results in approximately four thousand eight hundred total particles for the six-UAV system. With each particle representing a 10-dimensional state vector, the computational cost scales with the product of the number of UAVs, particles per UAV, and state dimension.

3.3.2. Effective Sample Size Dynamics

Figure 8 presents the effective particle ratio over time, providing insights into particle filter health. The effective particle ratios fluctuate predominantly in the 0.3 to 0.6 range, with occasional excursions below 0.2 and above 0.8. This distribution indicates moderate particle degeneracy that is actively managed through the resampling mechanism. Most UAVs experience resampling events every ten to thirty time steps, with UAV-1 and UAV-4 showing more frequent resampling due to their higher error variability.

The periodic drops in effective ratio correspond to several factors. When UAVs fly through regions with few visible landmarks, the observation likelihood becomes less informative, causing weight concentration on fewer particles. As UAVs move relative to each other, the UWB measurement geometry changes, affecting the information content of inter-vehicle range measurements. The online neural network training every ten steps temporarily perturbs the weight distribution as the network adapts to new motion patterns. After each resampling event, the effective ratio resets to approximately one, followed by gradual degradation as observations accumulate. The recovery rate depends on observation informativeness and motion model accuracy. The neural network training process significantly influences the observed results.

The network trains every ten time steps using fifty randomly selected particles, representing approximately six to seventeen percent of the total particle count. The training target maps particle position errors to values between zero and one, where perfect particles receive a target of one. The mean squared error between network output and target typically decreases from initial values around 0.15 to steady-state values below 0.05 after approximately fifty training iterations. This convergence is evident in the improved error stability observed after fifty seconds in Figure 3. The neural weight correction maps the sigmoid output to correction factors between 0.4 and 1.0. Particles with correction factors above 0.7 receive boosted weights, while those below 0.7 are penalized, effectively implementing learned importance sampling.

3.4. Comparison with Baseline Methods

3.4.1. Effectiveness Comparison with Baseline Methods

To evaluate the effectiveness of the proposed PF-NN fusion framework, we compare it against several baseline methods under identical simulation conditions. Additionally, we compare it against established state-of-the-art cooperative localization algorithms to demonstrate genuine superiority. The compared methods include:

(1)

Internal baselines (degraded versions of our framework):

Standard PF (N = 500): A conventional particle filter with fixed 500 particles, representing a typical implementation without adaptive mechanisms.
Fixed PF (N = 800): A particle filter with fixed 800 particles (the maximum used by our adaptive method), representing increased computational cost.
Adaptive PF (no NN): The proposed adaptive resampling mechanism without neural network weight optimization, isolating the contribution of NN.
Proposed PF-NN: The complete proposed framework with both adaptive resampling and neural network weight optimization.

(2)

External state-of-the-art algorithms:

EKF-based Cooperative Localization: A distributed Extended Kalman Filter that fuses UWB and visual observations using linearized motion and observation models [32]. This represents a widely used alternative to particle filters in multi-robot systems.
Distributed Factor Graph Optimization (DFGO): A graph-based optimization approach that performs maximum a posteriori estimation over sliding windows of poses and landmarks [33]. This method has shown superior accuracy in cooperative SLAM applications.
Deep Learning-based Filter (D-LSPF): The Deep Latent Space Particle Filter [21], which performs filtering in a learned low-dimensional latent space using neural network encoders.

Table 4 presents the quantitative comparison results averaged over 20 Monte Carlo runs (mean ± standard deviation).

The results demonstrate that the proposed PF-NN framework achieves the best performance across all metrics:

Compared to Standard PF (N = 500), PF-NN reduces RMSE by 28.6% (0.612 m → 0.437 m) and maximum error by 44.5% (12.45 m → 6.91 m), demonstrating the effectiveness of both adaptive particle adjustment and neural network optimization.
Compared to Fixed PF (N = 800), PF-NN achieves better accuracy with fewer average particles (720 vs. 800), validating the efficiency of the adaptive mechanism.
The comparison between Adaptive PF (no NN) and PF-NN shows that neural network optimization contributes an additional 9.1% RMSE reduction (0.481 m → 0.437 m), confirming its value in improving weight allocation.
The comparison reveals that PF-NN achieves competitive performance with DFGO (0.437 m vs. 0.398 m RMSE) while maintaining lower computational cost. PF-NN outperforms EKF-based methods by 36.4% in RMSE, demonstrating the advantage of non-Gaussian uncertainty representation. Compared to D-LSPF, PF-NN achieves 4.2% lower RMSE without requiring complex encoder networks.

3.4.2. Computational Performance Analysis

To evaluate the computational efficiency of the proposed algorithm, we measure the runtime performance on a standard desktop computer with an Apple Core M1 CPU and 8 GB RAM. The implementation is in MATLAB R2025b without GPU acceleration. Table 5 summarizes the computational performance metrics.

The computational performance comparison across all evaluated methods reveals distinct characteristics in terms of execution time and memory utilization, as summarized in Table 5.

Among the internal baseline methods, the proposed PF-NN framework achieves an average computation time of 63.4 ± 6.7 ms per step, representing a 12.9% reduction compared to Fixed PF (N = 800) and a 40.2% reduction in memory consumption (178 MB versus 205 MB). This efficiency gain stems from the adaptive particle mechanism, which dynamically reduces particle count during stable estimation periods. Relative to Standard PF (N = 500), PF-NN incurs a 40.3% computational overhead attributable to neural network forward propagation and online training.

The comparison with external state-of-the-art algorithms provides additional context for assessing the computational characteristics of PF-NN. The EKF Cooperative method achieves the lowest computational cost (18.5 ± 2.3 ms/step) and memory footprint (65 MB) due to its reliance on Gaussian assumptions and efficient matrix operations. However, the linearization errors inherent in EKF may limit its applicability in highly nonlinear scenarios.

The DFGO requires substantial computational resources (125.6 ± 15.8 ms/step, 385 MB memory). The batch optimization nature of factor graphs, which performs maximum a posteriori estimation over sliding windows, exceeds the real-time constraint for 10 Hz operation (100 ms/step), limiting its applicability for real-time UAV control.

The D-LSPF incurs significant computational overhead (89.3 ± 8.7 ms/step, 268 MB memory) due to its neural network encoder architecture. The 40.9% higher computation time and 50.6% higher memory consumption of D-LSPF relative to PF-NN highlight the efficiency advantages of the proposed lightweight neural network architecture.

Given the 0.1 s (100 ms) control loop requirement for typical UAV applications, the proposed PF-NN framework satisfies real-time constraints with approximately 36.6 ms of available margin for auxiliary tasks such as flight control, path planning, and inter-vehicle communication. Among all evaluated methods, only EKF Cooperative and PF-NN meet the real-time requirement, with PF-NN offering superior representational capacity for non-Gaussian uncertainties. The comprehensive comparison demonstrates that PF-NN achieves an optimal balance between computational efficiency and representational capability among particle filter-based approaches.

While the current implementation was tested on an Apple Core M1 desktop processor, multi-UAV swarms typically operate on resource-constrained embedded platforms. To contextualize the algorithm’s scalability, we analyze its computational and memory requirements on typical embedded hardware. The primary computational costs are (1) particle propagation:

O (N_{p} * d^{2})

where

N_{p}

is particle count and d is state dimension; (2) neural network inference:

O (N_{p} * h)

where h is hidden layer size (12 neurons); (3) weight update:

O (N_{p} * m)

where m is measurement dimension. For the tested configuration (

N_{p}

= 720, d = 10, h = 12), this translates to approximately 45 MFLOPs per UAV per step.

On embedded platforms such as the NVIDIA Jetson Nano (472 GFLOPS) or Raspberry Pi 4 (13.5 GFLOPS), the algorithm is expected to achieve step times of 5–10 ms and 150–200 ms respectively. Memory requirements scale linearly with particle count: approximately 250 bytes per particle (10D state + weight + metadata), yielding 180 KB per UAV at

N_{p}

= 720. This is well within the memory capacity of typical embedded platforms (2–8 GB RAM). These projections suggest the algorithm is deployable on modern embedded systems with appropriate particle count tuning.

3.4.3. Statistical Validation

To ensure the reliability of our results, we conducted 30 independent Monte Carlo simulations with different random seeds for initial particle distribution, as shown in Figure 9, sensor noise realizations, and landmark placements. Each simulation runs for 100 s (1000 time steps) with 6 UAVs.

The proposed method (green solid line) consistently achieves lower RMSE values compared to Standard PF (N = 500, red dashed), Fixed PF (N = 800, orange dash-dot), and Adaptive PF without NN (blue dotted).

3.5. Performance Summary

The simulation results demonstrate that the proposed PF-NN fusion algorithm achieves sub-meter accuracy with an average RMSE of 0.437 m across six UAVs, meeting requirements for indoor navigation applications. The algorithm exhibits robust convergence with rapid error reduction during the initial ten seconds and stable performance thereafter. The adaptive resource management through dynamic particle adjustment maintains filter health while controlling computational cost. The fusion of UWB ranging and visual observations provides complementary information, enhancing robustness compared to single-modality approaches. The learned weight optimization contributes to improved accuracy, as evidenced by the correlation between training convergence and error stabilization. These results validate the algorithm’s effectiveness for multi-UAV cooperative localization in GNSS-denied indoor environments, with performance characteristics suitable for real-world deployment in surveillance, search-and-rescue, and industrial inspection scenarios.

4. Conclusions

This work proposes and validates a novel framework that fuses adaptive particle filtering with neural networks, aiming to address the challenge of cooperative localization for UAV systems in GNSS-denied environments. By integrating learning-driven weight optimization with dynamic resource management, our approach overcomes the critical bottlenecks of conventional particle filters. Simulation results demonstrate that the framework achieves sub-meter localization accuracy (with an average RMSE of 0.437 m), effectively mitigates uncertainties arising from sensor noise and environmental sparsity, and exhibits exceptional robustness and real-time application potential.

Our work confirms that introducing a lightweight neural network to learn and evaluate the motion consistency of particles can substantially improve the quality of weight assignment, thereby effectively suppressing particle degeneracy and maintaining stable filter performance even with poor observational data. Furthermore, an effective sample size-based adaptive particle number adjustment mechanism enables intelligent allocation of computational resources according to real-time task demands, which is critical for deploying advanced estimation algorithms on computationally constrained UAV platforms. This synergy between learning and adaptive strategies represents the core advantage of the present work over traditional fixed-parameter filters.

Despite the promising outcomes of this study, several directions warrant further exploration. First, the online training of the current neural network relies on simulated “ground-truth” positions as supervisory signals, which are unavailable in real-world deployment. This represents a fundamental limitation that must be addressed before real-world deployment. We propose the following research directions to overcome this challenge. (1) Multi-view geometric consistency: Leveraging overlapping visual observations from multiple UAVs to generate pseudo-ground-truth through epipolar geometry and triangulation. (2) Consensus-based supervision: Using distributed factor graph optimization to produce consensus state estimates that can serve as training targets. (3) Self-supervised motion learning: Implementing contrastive learning frameworks that learn motion consistency from temporal sequences without position labels. (4) Transfer learning: Pre-training the network on simulation data and fine-tuning with limited real-world calibration data. These approaches aim to eliminate ground-truth dependency while maintaining the benefits of online neural network optimization.

Secondly, the proposed framework can be extended to high-dimensional state spaces—for instance, by incorporating landmark positions into the state vector for joint estimation—to achieve true cooperative SLAM. This will impose higher demands on algorithm scalability and computational efficiency, potentially benefiting from recent advances in differentiable SLAM and neural radiance field (NeRF)-based localization. And then, the current simulation employs idealized circular trajectories with constant velocity, which may not fully represent the chaotic and unpredictable motion patterns encountered in real-world applications such as search-and-rescue operations. In such scenarios, UAVs must perform aggressive maneuvers including rapid acceleration, sharp turns, and hovering at waypoints while navigating through cluttered environments. The robustness of the proposed PF-NN framework to these challenging trajectory patterns warrants further investigation. Specifically, we will evaluate the algorithm’s performance under (1) jerk-constrained trajectories that mimic emergency response maneuvers; (2) multi-scale motion patterns combining fast transit and slow inspection phases; (3) communication-constrained scenarios where UWB measurements become intermittent due to occlusion; and (4) dynamic formation changes where UAVs must rapidly reconfigure their relative positions. These evaluations will be conducted using realistic flight dynamics models and benchmark trajectory datasets from actual search-and-rescue missions. The adaptive particle mechanism and neural network weight optimization are expected to provide enhanced robustness against motion model mismatches during aggressive maneuvers, but empirical validation is essential to quantify performance bounds.

Finally, physical validation of the algorithm on real UAV swarms, and evaluation of its performance in more complex and dynamic real-world scenarios (e.g., in the presence of moving obstacles) will be a critical step toward verifying its ultimate practical utility. Collectively, this work lays a solid foundation for the development of next-generation intelligent and efficient UAV cooperative navigation systems.

Author Contributions

Conceptualization, Z.W. and H.W.; methodology, S.L.; software, Z.W. and S.L.; validation, Z.W., H.W. and S.L.; investigation, Z.W. and H.W.; resources, H.W.; data curation, Z.W.; writing—original draft preparation, Z.W.; writing—review and editing, S.L.; visualization, S.L. and Z.W.; supervision, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available from the corresponding author upon reasonable request. Due to company-sensitive and confidential information contained in the dataset, the data cannot be made publicly available.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Helm, S.V.D.; Mcguire, K.N.; Coppola, M.; De Croon, G.C.H.E. On-board Range-based Relative Localization for Micro Aerial Vehicles in Indoor Leader-Follower Flight. Auton. Robot. 2020, 44, 415–441. [Google Scholar] [CrossRef]
Zhihan, M.; Ze, T.; Jianwen, F.; Dong, D. Distributed formation containment control for multi-agent systems via dynamic event-triggering communication mechanism. Appl. Math. Comput. 2024, 482, 128958. [Google Scholar] [CrossRef]
Tian, X.; Wei, G.; Song, Y.; Ding, D. Cooperative localization based on semidefinite relaxation in wireless sensor networks under non-line-of-sight propagation. Wirel. Netw. 2023, 29, 775–785. [Google Scholar] [CrossRef]
Queralta, J.P.; Almansa, C.M.; Schiano, F.; Floreano, D.; Westerlund, T. UWB-based System for UAV Localization in GNSS-denied Environments: Characterization and Dataset. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October–24 January 2021; pp. 4521–4528. [Google Scholar] [CrossRef]
Campos, C.; Elvira, R.; Rodriguez, J.J.G.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial, and Multimap SLAM. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
Liu, J.; Wan, L.; Bard, J. Robust Visual SLAM with Integrated UWB Positioning for Indoor Construction Robotics. In Proceedings of the 2023 International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023. [Google Scholar] [CrossRef]
Yang, L.; Ye, J.; Zhang, Y.; Wang, L.; Qiu, C. A semantic SLAM-based method for navigation and landing of UAVs in indoor environments. Knowl. Based Syst. 2024, 293, 111693. [Google Scholar] [CrossRef]
Lin, H.Y.; Yeh, M.C. Drift-free Visual SLAM for Mobile Robot Localization by Integrating UWB Technology. IEEE Access 2022, 10, 92034–92046. [Google Scholar] [CrossRef]
Cheng, C.; Xiuxian, L.; Lihua, X.; Li, L. Autonomous dynamic docking of UAV based on UWB-vision in GPS-denied environment. J. Frankl. Inst. 2022, 359, 2788–2809. [Google Scholar] [CrossRef]
Hamesse, C.; Vleugels, R.; Vlaminck, M.; Luong, H.; Haelterman, R. Fast and Cost-Effective UWB Anchor Position Calibration Using a Portable SLAM System. IEEE Sens. J. 2024, 24, 26496–26505. [Google Scholar] [CrossRef]
Thrun, S.; Burgard, W.; Fox, D. Probabilistic Robotics; MIT Press: Cambridge, MA, USA, 2005; Available online: https://mitpress.mit.edu/9780262201629/probabilistic-robotics/ (accessed on 11 February 2026).
Arulampalam, M.S.; Maskell, S.; Gordon, N.; Clapp, T. A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking. IEEE Trans. Signal Process. 2002, 50, 174–188. [Google Scholar] [CrossRef]
Doucet, A.; Godsill, S.; Andrieu, C. On Sequential Monte Carlo Sampling Methods for Bayesian Filtering. Stat. Comput. 2000, 10, 197–208. [Google Scholar] [CrossRef]
Grisetti, G.; Stachniss, C.; Burgard, W. Improved Techniques for Grid Mapping with Rao-Blackwellized Particle Filters. IEEE Trans. Robot. 2007, 23, 34–46. [Google Scholar] [CrossRef]
Siyi, C.; Yanli, L.; Heng, Z.; Neal, X. Robot Visual Localization Based on Adaptive Soft-Resampling Particle Filter Network. In Proceedings of the 2022 4th International Conference on Robotics, Intelligent Control and Artificial Intelligence (RICAI ’22), Dongguan, China, 16–18 December 2022; Association for Computing Machinery: New York, NY, USA, 2023; pp. 13–18. [Google Scholar] [CrossRef]
Jiao, Z.; Gao, Z.; Chai, H.; Song, D. An Initial Value Problem on an Adaptive Fractional-Order Unscented Particle Filter for a Nonlinear Continuous-Time Fractional-Order System with Unknown Parameters. In Proceedings of the 2023 42nd Chinese Control Conference (CCC), Tianjin, China, 24–26 July 2023; pp. 282–287. [Google Scholar] [CrossRef]
Karkus, P.; Hsu, D.; Lee, W.S. Particle Filter Networks with Application to Visual Localization. In Proceedings of the Conference on Robot Learning (CoRL), Zurich, Switzerland, 29–31 October 2018; pp. 169–178. Available online: https://api.semanticscholar.org/CorpusID:49390503 (accessed on 11 February 2026).
Kantas, N.; Doucet, A.; Singh, S.S.; Maciejowski, J.M. An Overview of Sequential Monte Carlo Methods for Parameter Estimation in General State-Space Models. IFAC Proc. Vol. 2009, 42, 774–785. [Google Scholar] [CrossRef]
Guo, K.; Li, X.; Xie, L. Ultra-wideband and Odometry-based Cooperative Relative Localization with Application to Multi-UAV Formation Control. IEEE Trans. Cybern. 2019, 50, 2590–2603. [Google Scholar] [CrossRef]
Kloss, A.; Martius, G.; Bohg, J. How to train your differentiable filter. Auton Robot. 2021, 45, 561–578. [Google Scholar] [CrossRef]
Mücke, N.T.; Bohté, S.M.; Oosterlee, C.W. The deep latent space particle filter for real-time data assimilation with uncertainty quantification. Sci. Rep. 2024, 14, 19447. [Google Scholar] [CrossRef] [PubMed]
Li, T.; Yu, X.; Lin, Q.; Lv, Y.; Wen, G.; Shi, C. Distributed Cooperative Localization for Unmanned Systems Using UWB/INS Integration in GNSS-Denied Environments. IEEE Trans. Instrum. Meas. 2025, 74, 8507313. [Google Scholar] [CrossRef]
Tong, P.; Yang, X.; Yang, Y.; Liu, W.; Wu, P. Multi-UAV Collaborative Absolute Vision Positioning and Navigation: A Survey and Discussion. Drones 2023, 7, 261. [Google Scholar] [CrossRef]
Qiang, X.; Xue, R.; Zhu, Y. Bayesian Filter Based on Grid Filtration and Its Application to Multi-UAV Tracking. Signal Process. 2022, 190, 108305. [Google Scholar] [CrossRef]
Sun, Z.; Gao, W.; Tao, X.; Pan, S.; Wu, P.; Huang, H. Semi-Tightly Coupled Robust Model for GNSS/UWB/INS Integrated Positioning in Challenging Environments. Remote Sens. 2024, 16, 2108. [Google Scholar] [CrossRef]
Qi, Y.; Zhong, Y.; Shi, Z. Cooperative 3-D Relative Localization for UAV Swarm by Fusing UWB with IMU and GPS. In Proceedings of the 2020 3rd International Conference on Unmanned Systems (ICUS), Harbin, China, 27–28 November 2020; pp. 1–6. [Google Scholar] [CrossRef]
Vicente, D.; Tomic, S.; Beko, M.; Dinis, R. Performance Analysis of a Distributed Algorithms for Target Localization in Wireless Sensor Networks Using Hybrid Measurements. In Proceedings of the 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), Montreal, QC, Canada, 8–13 October 2017; pp. 1–6. [Google Scholar] [CrossRef]
Nemeth, C.; Fearnhead, P.; Mihaylova, L. Particle approximations of the score and observed information matrix for parameter estimation in state-space models with linear computational cost. J. Comput. Graph. Stat. 2016, 25, 1138–1157. [Google Scholar] [CrossRef]
Chen, L.; Liang, C.; Yuan, S.; Cao, M.; Xie, L. Relative localizability and localization for multi-robot systems. IEEE Trans. Robot. 2025, 41, 2931–2949. [Google Scholar] [CrossRef]
Zhao, J.; Xu, Y.; Qian, X.; Liu, H.; Plumbley, M.D.; Wang, W. Attention-Based End-to-End Differentiable Particle Filter for Audio Speaker Tracking. IEEE Open J. Signal Process. 2024, 5, 449–458. [Google Scholar] [CrossRef]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 IEEE International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar] [CrossRef]
Roumeliotis, S.I.; Bekey, G.A. Distributed multirobot localization. IEEE Trans. Robot. Autom. 2002, 18, 781–795. [Google Scholar] [CrossRef]
Cunningham, A.; Paluri, M.; Dellaert, F. DDF-SAM: Fully distributed SLAM using constrained factor graphs. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 18–22 October 2010; pp. 3025–3030. [Google Scholar] [CrossRef]

Figure 1. Lightweight Neural Network Architecture for Weight Optimization. The network takes state deviations as input and outputs a weight correction factor.

Figure 2. Results of the ablation study: (a) average RMSE; (b) maximum error; (c) average number of particles. Tested ranges include: [0.2, 1.0] (aggressive penalty), [0.3, 1.0] (moderate penalty), [0.4, 1.0] (proposed range), [0.5, 1.0] (conservative penalty), [0.4, 1.2] (extended upper bound), and [0.4, 0.9] (restricted upper bound).

Figure 3. 3D Trajectory Comparison. The estimated trajectories (dashed lines) for all six UAVs align closely with their true paths (solid lines). Black squares represent landmarks.

Figure 4. Localization Error vs. Time for Six UAVs. All UAVs show rapid error convergence and maintain stable, low-error performance throughout the simulation.

Figure 5. RMSE statistics for 6 UAVs. The bar chart shows the final RMSE for each UAV, with an average of 0.437 m.

Figure 6. Error Distribution Statistics. The box plots show that the majority of position errors are small, with median values below 0.2 m for all UAVs, though some outliers are present.

Figure 7. Adaptive Particle Number Evolution. The particle count for all UAVs quickly saturates at the maximum allowed value of 800, indicating the adaptive system is responding to sustained filter uncertainty.

Figure 8. Effective Particle Ratio Over Time. The ratio of effective particles to total particles fluctuates, indicating that the system experiences and actively mitigates particle degeneracy through resampling.

Figure 9. CDF of RMSE from 30 independent Monte Carlo simulations. (a) CDF of RMSE for each UAV, showing the distribution of localization accuracy across 30 runs. The vertical dashed lines indicate the mean RMSE for each UAV. (b) CDF comparison between the proposed PF-NN framework and baseline methods.

Table 1. Comparison of Related Work on Multi-UAV Localization and Particle Filter Methods.

Ref.	Method	Key Contributions	Limitations
[20]	Differentiable histogram filters	End-to-end learnable state estimation	Requires differentiable models; limited sensor types
[21]	Deep latent space PF	D-LSPF in low-dimensional latent spaces	Complex architecture; extensive hyperparameter tuning
[22]	Cooperative UWB-based localization	Outdoor UAV positioning aided by ground robots	Requires ground infrastructure; outdoor focus
[23]	UAV integrated navigation	Research on multi-UAV cooperative localization	Limited experimental validation
[24]	Grid-based Bayesian filter	Bayesian filter for multi-UAV tracking	Grid approach suffers from curse of dimensionality
[25]	GNSS/INS/UWB integration	Tightly integrated cooperative navigation	Requires GNSS; limited pure GNSS-denied evaluation
[26]	Cooperative 3D relative localization	Fusing UWB with IMU and GPS	GPS dependency; limited indoor evaluation
[27]	Distributed target localization	Performance analysis of distributed algorithms	Target localization focus; limited UAV considerations
[28]	PF-SEFI	Particle filtering-based score estimation	Theoretical focus; limited robotics validation
[29]	End-to-end learning framework	Multi-agent relative localization with GNN	Requires training; limited real-world evaluation
[30]	Attention-based MDF	Multimodal differentiable filter with attention	High computational cost; complex architecture
Our work	PF-NN fusion framework	Adaptive PF with NN weight optimization; multi-sensor fusion; dynamic particle adjustment	Requires offline NN training; computational cost scales with UAV count

Table 2. Setup Parameters.

Parameter	Value	Parameter	Value
Simulation duration	100 s	IMU acceleration noise	0.05 m/s²
Time step	0.1 s	Gyroscope noise	0.01 rad/s
Base particle count	300	UWB ranging noise	0.20 m
Maximum particle count	800	Visual observation noise	0.15 m
Minimum particle count	100	Neural network hidden neurons	12
Resampling threshold	0.5	Learning rate	0.01

Table 3. Performance Metrics.

UAV	RMSE (m)	Mean (m)	Max (m)
UAV-1	0.431	0.343	2.737
UAV-2	0.374	0.190	5.049
UAV-3	0.510	0.152	9.783
UAV-4	0.535	0.159	10.385
UAV-5	0.375	0.148	6.039
UAV-6	0.402	0.143	7.432
Average	0.437	0.189	6.906

Table 4. Comparison with baseline methods (mean ± std over 20 runs).

Method	RMSE (m)	Mean Error (m)	Max Error (m)	Avg Particles
Standard PF (N = 500)	0.612 ± 0.089	0.287 ± 0.052	12.45 ± 2.31	500
Fixed PF (N = 800)	0.523 ± 0.074	0.245 ± 0.041	10.82 ± 1.95	800
Adaptive PF (no NN)	0.481 ± 0.062	0.218 ± 0.038	8.93 ± 1.47	650 ± 120
EKF Cooperative	0.687 ± 0.102	0.312 ± 0.058	14.23 ± 3.12	N/A
DFGO	0.398 ± 0.041	0.176 ± 0.028	5.84 ± 0.89	N/A
D-LSPF	0.456 ± 0.053	0.198 ± 0.034	7.42 ± 1.23	500
Proposed PF-NN	0.437 ± 0.048	0.189 ± 0.031	6.91 ± 1.12	720 ± 95

Table 5. Computational performance comparison (mean ± std over 1000 steps).

Method	Time per Step (ms)	Total Time (s)	Memory Usage (MB)
Standard PF (N = 500)	45.2 ± 3.1	45.2	128
Fixed PF (N = 800)	72.8 ± 4.5	72.8	205
Adaptive PF (no NN)	58.6 ± 8.3	58.6	165
EKF Cooperative	18.5 ± 2.3	18.5	65
DFGO	125.6 ± 15.8	125.6	385
D-LSPF	89.3 ± 8.7	89.3	268
Proposed PF-NN	63.4 ± 6.7	63.4	178

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Z.; Wang, H.; Liu, S. Adaptive Particle Filter-Neural Network Fusion for Cooperative Localization of Multi-UAV Systems in GNSS-Denied Indoor Environments. Computers 2026, 15, 172. https://doi.org/10.3390/computers15030172

AMA Style

Wang Z, Wang H, Liu S. Adaptive Particle Filter-Neural Network Fusion for Cooperative Localization of Multi-UAV Systems in GNSS-Denied Indoor Environments. Computers. 2026; 15(3):172. https://doi.org/10.3390/computers15030172

Chicago/Turabian Style

Wang, Zhongyi, Hao Wang, and Shuzhi Liu. 2026. "Adaptive Particle Filter-Neural Network Fusion for Cooperative Localization of Multi-UAV Systems in GNSS-Denied Indoor Environments" Computers 15, no. 3: 172. https://doi.org/10.3390/computers15030172

APA Style

Wang, Z., Wang, H., & Liu, S. (2026). Adaptive Particle Filter-Neural Network Fusion for Cooperative Localization of Multi-UAV Systems in GNSS-Denied Indoor Environments. Computers, 15(3), 172. https://doi.org/10.3390/computers15030172

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Particle Filter-Neural Network Fusion for Cooperative Localization of Multi-UAV Systems in GNSS-Denied Indoor Environments

Abstract

1. Introduction

2. System Model and Proposed Algorithm

2.1. State Space Model

2.2. Monte Carlo Particle Filter Framework

2.3. Neural Network-Optimized Weight Update

2.4. Adaptive Resampling Strategy

2.5. Multi-UAV Cooperative Observation Model

2.6. Algorithm Implementation

3. Simulation Results and Analysis

3.1. Simulation Parameters and Experimental Setup

3.2. Localization Performance

3.2.1. Three-Dimensional Trajectory Comparison

3.2.2. Temporal Evolution of Localization Error

3.2.3. Quantitative Performance Metrics

3.2.4. Error Distribution Characteristics

3.3. Adaptive Mechanism Analysis

3.3.1. Particle Number Adaptation

3.3.2. Effective Sample Size Dynamics

3.4. Comparison with Baseline Methods

3.4.1. Effectiveness Comparison with Baseline Methods

3.4.2. Computational Performance Analysis

3.4.3. Statistical Validation

3.5. Performance Summary

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI