1. Introduction
The proliferation of unmanned aerial vehicles (UAVs) has fundamentally transformed numerous application domains, ranging from logistics and transportation to emergency response [
1,
2]. Accurate and continuous localization constitutes a prerequisite for autonomous flight. While Global Navigation Satellite Systems (GNSSs) provide reliable solutions in outdoor environments, their signals experience severe attenuation or complete loss in indoor spaces, underground structures, and dense urban canyons, creating critical operational bottlenecks [
3]. This limitation has catalyzed the development of alternative localization technologies for GNSS-denied environments. Among these, multi-UAV cooperative localization has emerged as a powerful paradigm wherein swarm members share sensor data to mutually enhance positioning accuracy, thereby achieving system-level robustness exceeding the capabilities of any individual agent [
4].
Existing research has predominantly focused on Simultaneous Localization and Mapping (SLAM), particularly visual SLAM (V-SLAM) systems that navigate GNSS-Denied regions using onboard cameras [
5]. However, V-SLAM systems are highly susceptible to failure in visually degraded conditions—such as texture-sparse surfaces, dramatic illumination variations, or rapid motion—resulting in severe accumulated drift [
6]. To overcome these limitations, researchers have turned toward multi-sensor fusion. Ultra-wideband (UWB) technology, with its high-precision ranging capabilities and strong resilience to multipath interference, has proven effective in providing absolute or relative distance constraints that anchor SLAM estimates and suppress cumulative errors [
7,
8,
9]. The fusion of UWB with visual and inertial measurement unit (IMU) data has demonstrated enhanced robustness under challenging conditions [
6,
10].
For state estimation in such nonlinear, non-Gaussian systems, particle filters (PFs) have been widely adopted due to their ability to represent arbitrary probability distributions [
11,
12]. Nevertheless, standard particle filters face two fundamental challenges. First, particle degeneracy—wherein a small subset of particles accumulates the majority of weight mass—leads to loss of particle diversity. Second, the substantial computational cost associated with maintaining a large fixed particle population for accuracy becomes prohibitive on resource-constrained UAV platforms [
13]. These issues severely constrain the effectiveness of particle filters in robotic applications requiring high-dimensional state estimation and real-time responsiveness.
In recent years, learning-based methods and adaptive strategies have emerged to address the inherent limitations of particle filters. Adaptive particle filters improve computational efficiency by dynamically adjusting particle counts to match estimation uncertainty [
12,
14]. Concurrently, neural network integration has demonstrated substantial potential in optimizing filter performance. For instance, some studies employ neural networks to learn superior resampling strategies [
15] or adaptively adjust filter parameters [
16]. More advanced end-to-end learning frameworks, such as PF-Net [
17], attempt to learn entire state estimation models. Recently, more sophisticated deep learning-based frameworks, including Differentiable Particle Filters [
18] and Deep Latent Space Particle Filters (D-LSPF) [
19], have shown remarkable capabilities by performing filtering in low-dimensional latent spaces or leveraging gradient information, demonstrating the powerful synergy between learning and classical filtering algorithms.
Despite these significant advances, a holistic framework that integrates multi-sensor fusion, adaptive computation, and intelligent weight optimization specifically designed for multi-UAV cooperative localization remains an open research gap.
Table 1 provides a comprehensive comparison of existing approaches, highlighting their respective contributions and limitations.
Table 1 provides a comprehensive comparison of representative approaches in multi-UAV localization and particle filter methods, summarizing their key methodologies, primary contributions, and inherent limitations. This analysis reveals that while existing methods address specific aspects of the localization problem, none offers a unified solution combining multi-sensor fusion, adaptive computation, and intelligent weight optimization specifically tailored for multi-UAV cooperative localization in GNSS-denied environments.
To address this challenge, we present an adaptive particle filter-neural network (PF-NN) fusion framework for multi-UAV cooperative localization. Our solution combines a Monte Carlo particle filter with a lightweight neural network that learns motion consistency to optimize particle weights, thereby enhancing the filter’s robustness against observation noise and model uncertainties. Simultaneously, an adaptive resampling mechanism dynamically adjusts particle counts based on the effective sample size (ESS), ensuring efficient allocation of computational resources. The system fuses UWB-based inter-UAV ranging information with visual observations of environmental landmarks, enabling each UAV to benefit from both local environmental features and global swarm-level geometric constraints. Simulation results reveal that, in a complex indoor environment involving six UAVs, PF-NN exhibits superior localization performance with an average root-mean-square error (RMSE) of 0.437 m. The proposed framework achieves sub-meter accuracy while effectively mitigating uncertainties induced by sensor noise and environmental sparsity.
Our principal contributions are:
The introduction of a PF-NN fusion framework that leverages neural networks to optimize particle weight allocation;
The development of an adaptive resampling strategy that balances accuracy with computational load;
The design of a cooperative observation model that robustly fuses UWB and visual data.
The remainder of this paper is organized as follows.
Section 2 details the system model and the proposed PF-NN algorithm.
Section 3 presents the simulation setup and analyzes the localization performance and adaptive mechanisms. Finally,
Section 4 discusses the findings and concludes the paper with directions for future work.
2. System Model and Proposed Algorithm
2.1. State Space Model
We consider a multi-UAV system with
UAVs in a 3D indoor space. The state vector for each UAV
at time step
is defined as:
where
represents the 3-dimensional position,
denotes the velocity vector, and
is the unit quaternion representing the vehicle’s orientation at time step
.
The state evolves according to a nonlinear motion model driven by inertial measurement unit (IMU) inputs:
where
consists of the measured linear acceleration and angular velocity, and
represents additive process noise.
The position and velocity are updated using a constant acceleration model:
The velocity update equation integrates acceleration measurements:
Orientation is propagated using quaternion multiplication with incremental rotation derived from gyroscope measurements, providing a singularity-free attitude representation.
After each quaternion update, normalization is performed to maintain unit length:
This ensures the quaternion remains a valid rotation representation. Process noise integration for orientation updates requires careful handling due to the nonlinear nature of quaternion multiplication. The angular velocity measurement incorporates gyroscope noise as , where and is the gyroscope noise covariance. The incremental rotation quaternion is computed from the noisy angular velocity, and the quaternion update in Equation (2) effectively propagates this noise through the multiplicative operation. The resulting orientation uncertainty is represented by the dispersion of particle orientations in the quaternion space. After each update, small-angle approximations allow us to model the quaternion noise as a Gaussian distribution in the tangent space (3D rotation vector), which is then mapped back to the quaternion space. This approach maintains the unit quaternion constraint while properly accounting for orientation uncertainty in the prediction step.
2.2. Monte Carlo Particle Filter Framework
The localization problem is cast within a Bayesian filtering framework, aiming to estimate the posterior probability distribution
. We use a sequential Monte Carlo method, or particle filter, to approximate this posterior with a set of
weighted particles
:
where
is the number of particles,
represents the
particle state,
function denotes the Dirac delta distribution,
are normalized weights satisfying
.
Particle weights are updated according to the observation likelihood. When using the state transition distribution as the proposal distribution, the weight update simplifies to:
To address numerical stability issues arising from likelihood underflow, we employ log-likelihood computations. The observation likelihood combines contributions from both landmark observations and inter-vehicle range measurements.
The log-likelihood computation operates as follows:
For each particle, we compute the logarithm of the observation likelihood
instead of the raw likelihood. This transformation converts the product of independent likelihoods into a sum:
where
represents individual measurements (landmark or UWB).
For Gaussian measurement models, the log-likelihood has the closed form:
where
is the observation function and
R is the measurement noise covariance. Working in log space prevents numerical underflow when multiplying many small probability values (each likelihood is typically << 1), as the sum of log-likelihoods remains numerically stable even with hundreds of particles and multiple observations per step.
After computing all log-weights, we apply the log-sum-exp trick for normalization:
where
. This ensures numerical stability throughout the weight update process.
2.3. Neural Network-Optimized Weight Update
Traditional particle filters rely solely on observation likelihood for weight assignment, potentially overlooking valuable information from state estimation history. To enhance robustness against observation noise and model inaccuracies, we introduce a lightweight neural network to optimize particle weight allocation by incorporating motion consistency features.
The neural network architecture consists of three layers as shown in
Figure 1.
The input
is a feature vector encoding the deviation of a particle’s state from the previous time step’s estimate:
The hidden layer applies the hyperbolic tangent activation function for nonlinear feature extraction:
where
and
in
are the weight matrix and bias vector, respectively, and
is the number of hidden neurons. The sigmoid-activated output layer produces a weight correction coefficient:
This output is then used to modulate the original particle weight. The neural-weighted likelihood is calculated as:
To determine the optimal neural weight correction range, we conducted an ablation study comparing different correction ranges under identical simulation conditions.
Figure 2 presents the localization performance achieved with various neural weight correction ranges.
It can be observed from
Figure 2 that the proposed range [0.4, 1.0] delivers the optimal performance across all evaluated metrics. Specifically, this range achieves the lowest average RMSE (0.437 m) and maximum error (6.91 m), while maintaining a moderate particle count (720 particles). In contrast, overly aggressive penalty ranges (e.g., [0.2, 1.0] and [0.3, 1.0]) result in increased RMSE and larger fluctuations in particle number, whereas overly conservative ranges (e.g., [0.5, 1.0]) fail to adequately suppress inconsistent particles. Variations in the upper bound ([0.4, 1.2] and [0.4, 0.9]) show limited impact on performance, indicating that the primary function of the neural network is penalty rather than reward.
The network parameters are updated through online gradient descent using particle states as training data. The loss function minimizes the squared error between network output and a target value derived from particle position accuracy:
where
is the particle position and
is the ground truth position. This exponential decay function maps position errors to target values in (0, 1], with perfect particles (zero error) receiving a target of 1. The loss function is defined as:
Training occurs every time steps, enabling the network to adapt to changing environmental conditions and system dynamics.
It is important to acknowledge a fundamental limitation of the current online training approach: the loss function in Equation (16) relies on ground-truth positions, which are unavailable during real-world deployment. This dependency currently restricts the framework to simulation environments or scenarios where external positioning systems (e.g., motion capture systems) can provide training supervision during an initial calibration phase. To address this limitation, future work will investigate self-supervised and unsupervised learning paradigms.
Potential approaches include: (1) using multi-view geometric constraints from overlapping UAV observations to generate pseudo-labels; (2) incorporating factor graph optimization to produce consensus estimates that serve as training targets; and (3) exploring contrastive learning techniques that learn motion consistency without explicit position labels. These directions aim to achieve fully autonomous online learning without ground-truth dependency.
To bridge the gap between simulation and real-world deployment, we propose a phased training strategy that can be implemented in practical scenarios. Phase 1 (Calibration Phase): During an initial deployment period in a controlled environment (e.g., a warehouse or indoor facility with pre-installed UWB anchors at known positions), the UAVs can utilize the known anchor positions and trilateration to obtain approximate position estimates. These estimates, while less accurate than motion capture systems, can serve as pseudo-ground-truth labels for initial network training. Phase 2 (Transfer Learning): The pre-trained network weights from Phase 1 are then used to initialize the network for operational deployment. During this phase, the network switches to a self-supervised mode where training targets are generated through multi-view geometric consistency checks among cooperating UAVs. Specifically, when two or more UAVs observe common landmarks, their relative poses can be constrained through epipolar geometry, providing consistency-based supervision signals. Phase 3 (Online Adaptation): In the fully operational mode, the network employs a slow-learning-rate adaptation strategy where weight updates are driven by temporal consistency losses—penalizing particles that deviate significantly from the filter’s predicted state based on motion model continuity.
2.4. Adaptive Resampling Strategy
Particle degeneracy, wherein a small subset of particles accumulates disproportionate weight mass, represents a fundamental challenge in particle filtering. We employ the effective sample size (ESS) as a diagnostic metric for degeneracy monitoring:
The resampling threshold is set to , meaning resampling is triggered when ESS falls below 50% of the current particle count ().
The ESS ranges from 1 (complete degeneracy) to (uniform weight distribution). When ESS falls below the threshold, systematic resampling is triggered to restore particle diversity.
To balance accuracy and computational cost, we implement adaptive particle number adjustment:
This mechanism increases particle count during high-uncertainty periods and reduces it when the filter has converged, achieving significant computational savings without sacrificing accuracy.
2.5. Multi-UAV Cooperative Observation Model
The observation model fuses two sensor modalities: visual landmark observations and UWB inter-vehicle ranging.
For visual observations, the UAV camera detects environmental landmarks within a maximum range of 10 m. Landmarks are detected using ORB features, matched across frames via BFMatcher with Hamming distance [
31]. Given a landmark’s world position
and the UAV’s estimated position
p and orientation
q, its body-frame position is computed as:
yielding distance
and bearing
. This requires prior knowledge of landmark positions, feasible in structured indoor environments with pre-mapped or artificial markers.
Although, the alternative methods include (1) Direct visual odometry (DSO, LSD-SLAM); (2) Deep learning features (SuperPoint, R2D2); (3) Visual-inertial odometry (VIO).
The observation function maps landmark positions to measured distances and bearing vectors:
where
is the measurement noise covariance, and
.
UWB provides pairwise distance measurements between UAVs:
where
captures ranging noise. The UWB likelihood penalizes deviations between measured and expected inter-vehicle distances:
where
is the expected distance, and
is the UWB ranging noise standard deviation.
The joint observation likelihood for UAV
is the product of likelihoods from all observed landmarks
and neighboring UAVs
:
where
denotes the set of observed landmarks and
represents the set of neighboring UAVs within communication range. This fusion allows each UAV to correct its position estimate using both local environmental features and global information from the swarm’s geometry.
In practice, landmark association is subject to false matches and visual outliers, particularly in environments with repetitive textures or symmetric structures. To enhance robustness, the framework can incorporate outlier rejection mechanisms such as: (1) Random Sample Consensus (RANSAC) to identify and reject inconsistent landmark correspondences; (2) Mahalanobis distance gating to filter observations with unlikely innovation statistics; (3) temporal consistency checks that verify landmark observations across consecutive frames; and (4) multi-UAV cross-validation where consistent observations from multiple UAVs increase confidence in landmark associations. These mechanisms prevent incorrect data associations from causing severe accumulated drift in the localization estimates.
Meanwhile, the proposed observation model framework is designed to be extensible, allowing integration of additional sensing modalities for improved localization in limited 3D scenes.
Potential extensions include:
- (1)
Scattered optical signals: In environments with external light sources (e.g., windows, lamps), photodiodes or light sensors can measure incident light angles and intensities. These measurements can be fused with UWB and visual data by modeling the expected light field given the UAV’s position and known source locations.
- (2)
WiFi/Bluetooth RSSI: Received signal strength indicators from existing infrastructure can provide coarse position constraints, particularly useful for initialization or when other sensors are degraded.
- (3)
Ultrasonic ranging: Short-range ultrasonic sensors can provide proximity measurements to walls and obstacles, complementing the longer-range UWB and visual observations.
- (4)
Magnetic field sensing: Indoor magnetic field anomalies can serve as location fingerprints when mapped a priori.
These additional modalities can be incorporated into the joint likelihood function (Equation (24)) as additional product terms, with appropriate measurement models for each sensor type. The particle filter framework naturally handles the heterogeneous noise characteristics of different sensors through their respective likelihood functions.
2.6. Algorithm Implementation
This section presents the complete algorithmic implementation of the proposed PF-NN fusion framework. Algorithm 1 outlines the main PF-NN fusion procedure, which integrates Monte Carlo prediction, neural network inference, and adaptive weight updates for each UAV.
| Algorithm 1 PF-NN Fusion Algorithm |
Input: Initial particles , neural network parameters θ Output: State estimates 1: for k = 1, 2, …, K do 2: for each UAV i = 1, …, N do 3: //Prediction step 4: //Monte Carlo propagation 5: 6: //Observation acquisition 7: 8: 9: //Neural network forward propagation 10: 11: 12: //Weight update with neural correction 13: 14: 15: 16: //Compute effective sample size 17: 18: 19: if then 20: //Systematic resampling 21: 22: //Adaptive particle adjustment 23: 24: end if 25: 26: //State estimation 27: 28: 29: //Neural network online training 30: if k mod then 31: 32: end if 33: end for 34: end for |
Algorithm 2 details the systematic resampling mechanism with adaptive particle number adjustment, which maintains estimation accuracy while optimizing computational resources.
| Algorithm 2 Adaptive Resampling Strategy |
Input: Particle set , ESS, ,
Output: Resampled particles 1: //Systematic resampling 2: Compute cumulative distribution: 3: Generate stratified samples: 4: for j = 1 to do
5: Find smallest i such that Ci ≥ uj
6:
7: end for 8: 9: //Add perturbation to prevent sample impoverishment 10: for j = 1 to do 11: 12: end do 13: 14://Reset weights 15: for all j 16: 17://Adaptive particle number adjustment 18: if and then 19: 20: Add new particles by perturbing existing ones 21: else if and then 22: 23: Keep top particles by weight 24: else 25: 26: end if 27: 28: return |
3. Simulation Results and Analysis
3.1. Simulation Parameters and Experimental Setup
The simulation environment was designed to rigorously evaluate the proposed PF-NN fusion algorithm under realistic indoor conditions. 6 UAVs operate within a indoor space populated with 20 randomly distributed landmarks serving as visual reference points. The simulation duration spans 100 s with a discrete time step of 0.1 s, resulting in 100 total iterations. Each UAV follows a circular trajectory with an angular velocity of and a linear velocity of . The phase offsets are distributed evenly among the 6 UAVs at intervals of to ensure spatial separation and prevent collisions. The motion model incorporates realistic IMU noise with a standard deviation of and gyroscope noise with a standard deviation of .
Key parameters for the simulation are summarized in
Table 2.
The adaptive particle mechanism initializes with 300 particles per UAV, with dynamic adjustment bounds between 100 and 800 particles. The resampling threshold is set to 0.5, meaning resampling triggers when the effective sample size falls below 50% of the current particle count. The UWB ranging noise is modeled with a standard deviation of 0.20 m, while visual observations have a standard deviation of 0.15 m with a maximum detection range of 10 m.
The lightweight neural network consists of 6 input neurons representing the state deviation vector, 12 hidden neurons with hyperbolic tangent activation, and 1 output neuron with sigmoid activation. Online training occurs every 10 time steps using 50 randomly selected particles, with a learning rate of 0.01.
We acknowledge that the current evaluation is limited to simulated environments with idealized sensor models. Real-world indoor flight presents additional challenges not fully captured in simulation, including: (1) UWB multipath effects caused by signal reflections from walls and obstacles; (2) visual degradation due to motion blur, illumination variations, and texture-sparse regions; (3) dynamic communication latency in wireless networks; and (4) unmodeled aerodynamic disturbances. To validate the framework’s practical utility, future work will evaluate performance on: (1) standard open-source MAV datasets such as the EuRoC dataset or UZH-FPV drone racing dataset; (2) hardware-in-the-loop simulations incorporating real sensor characteristics; and (3) real-world experiments with physical UAV platforms equipped with UWB and visual sensors.
3.2. Localization Performance
3.2.1. Three-Dimensional Trajectory Comparison
The algorithm demonstrated high-fidelity tracking of all six UAVs.
Figure 3 shows the estimated trajectories (dashed lines) closely following the ground truth paths (solid lines), visually confirming the accuracy of the localization system.
The estimated trajectories closely follow the ground truth trajectories for all six UAVs throughout the one-hundred-second simulation, demonstrating the algorithm’s capability to maintain accurate localization. The overlapping nature of true and estimated paths indicates sub-meter tracking accuracy. The six UAVs follow circular trajectories with phase offsets of sixty degrees, creating a hexagonal formation pattern that ensures adequate separation while maintaining UWB communication range. The black squares represent the twenty environmental landmarks randomly distributed throughout the space. The trajectories demonstrate that UAVs traverse regions with varying landmark densities, testing the algorithm’s robustness under different observability conditions. UAVs flying through landmark-sparse regions must rely more heavily on UWB inter-vehicle measurements for localization. The z-axis variation from approximately 1.2 m to 1.8 m reflects the sinusoidal vertical motion component added to each UAV’s trajectory, testing the algorithm’s ability to track three-dimensional position.
3.2.2. Temporal Evolution of Localization Error
Figure 4 illustrates the localization error evolution over time for all six UAVs. All UAVs exhibit rapid error reduction during the first ten seconds, with errors dropping from initial values exceeding two meters to below one meter. This convergence behavior is characteristic of particle filters, where the initial uniform particle distribution gradually concentrates around high-likelihood regions as observations accumulate. The weight update mechanism assigns higher weights to particles with greater observation likelihood, driving the filter toward accurate state estimates. Several UAVs exhibit transient error spikes exceeding two meters at various time instants. UAV-3 and UAV-4 show spikes around forty to fifty seconds, reaching approximately nine to ten meters. UAV-6 exhibits a spike near sixty seconds, reaching about seven meters. These spikes correlate with periods when UAVs traverse regions with limited landmark visibility. When a UAV enters a landmark-sparse region, the observation likelihood becomes less informative, causing particle diversity to decrease as indicated by a lower effective sample size. The adaptive particle increase mechanism addresses this by adding up to one hundred particles when the effective sample size falls below thirty percent of the current particle count.
After the initial convergence, most UAVs maintain errors below one meter for the majority of the simulation. UAV-2 and UAV-5 demonstrate particularly stable performance, with errors consistently remaining below 0.5 m after twenty seconds. This stability reflects the effectiveness of the neural network weight optimization in maintaining accurate particle weight distributions. The error curves show some correlation between UAVs, particularly during time intervals when multiple UAVs simultaneously experience increased uncertainty. This correlation arises from the cooperative observation model, where each UAV’s localization depends on its neighbors’ estimated positions through UWB ranging measurements.
3.2.3. Quantitative Performance Metrics
Quantitative performance metrics are provided in
Table 2 and visualized in
Figure 5.
Table 3 presents the comprehensive performance metrics for each UAV calculated over the entire one-hundred-second simulation. The RMSE values range from 0.374 m for UAV-2 to 0.535 m for UAV-4, with an average of 0.437 m across all six UAVs. This sub-meter accuracy validates the algorithm’s suitability for indoor navigation applications, where typical accuracy requirements range from 0.3 to 1.0 m. The mean error values range from 0.143 m to 0.343 m and are consistently lower than the RMSE values, indicating that the error distribution is right-skewed with occasional large deviations. This skewness is evident in the box plots shown in
Figure 5, where the median errors are significantly lower than the maximum errors.
UAV-4 experiences the largest maximum error of 10.385 m, followed by UAV-3 with 9.783 m, as shown in
Figure 5. These extreme values occur during transient periods of high uncertainty and represent isolated incidents lasting only a few time steps. The adaptive mechanism’s response time limits the duration of such error spikes. The performance variation among UAVs, with an RMSE standard deviation of 0.065 m, can be attributed to trajectory-dependent landmark visibility, relative position to other UAVs affecting UWB measurement geometry, and random initialization of neural network weights.
3.2.4. Error Distribution Characteristics
The error distribution statistics, presented as box plots in
Figure 6, further highlight the system’s consistency. The median errors for all UAVs remain below 0.2 m, indicating that fifty percent of all time steps achieve errors below this threshold. This demonstrates consistent performance across the majority of the simulation.
UAV-2 and UAV-5 have the smallest interquartile ranges, indicating the most consistent performance, while UAV-1 and UAV-4 show larger interquartile ranges, reflecting greater variability. The outliers represent transient error spikes, with UAV-3 and UAV-4 exhibiting the most outliers consistent with their higher maximum errors. These outliers cluster in the five to ten meter range, corresponding to the spikes visible in
Figure 4. The asymmetric box plots confirm the right-skewed error distribution, where most errors are small with occasional large deviations.
3.3. Adaptive Mechanism Analysis
3.3.1. Particle Number Adaptation
Figure 7 illustrates the adaptive particle number evolution over the simulation duration. All UAVs maintain particle counts near the maximum value of eight hundred throughout most of the simulation, indicating sustained uncertainty that triggers the adaptive increase mechanism. At simulation start, particles are uniformly distributed with large initial variance, requiring many particles to adequately represent the posterior. The circular trajectories with sinusoidal vertical components create continuously changing state estimates, preventing the filter from fully converging to a small particle set. The UWB and visual observation noises limit the achievable localization precision, maintaining persistent uncertainty.
The adaptive increase condition triggers when the effective sample size falls below thirty percent of the current particle count and the current count is below the maximum. This condition triggers frequently due to the challenging observation environment, causing most UAVs to reach and maintain the maximum of eight hundred particles. The particle decrease condition, which triggers when the effective sample ratio exceeds eighty percent, rarely activates in this simulation as the sustained motion and observation noise prevent such high ratios for extended periods. Maintaining eight hundred particles per UAV results in approximately four thousand eight hundred total particles for the six-UAV system. With each particle representing a 10-dimensional state vector, the computational cost scales with the product of the number of UAVs, particles per UAV, and state dimension.
3.3.2. Effective Sample Size Dynamics
Figure 8 presents the effective particle ratio over time, providing insights into particle filter health. The effective particle ratios fluctuate predominantly in the 0.3 to 0.6 range, with occasional excursions below 0.2 and above 0.8. This distribution indicates moderate particle degeneracy that is actively managed through the resampling mechanism. Most UAVs experience resampling events every ten to thirty time steps, with UAV-1 and UAV-4 showing more frequent resampling due to their higher error variability.
The periodic drops in effective ratio correspond to several factors. When UAVs fly through regions with few visible landmarks, the observation likelihood becomes less informative, causing weight concentration on fewer particles. As UAVs move relative to each other, the UWB measurement geometry changes, affecting the information content of inter-vehicle range measurements. The online neural network training every ten steps temporarily perturbs the weight distribution as the network adapts to new motion patterns. After each resampling event, the effective ratio resets to approximately one, followed by gradual degradation as observations accumulate. The recovery rate depends on observation informativeness and motion model accuracy. The neural network training process significantly influences the observed results.
The network trains every ten time steps using fifty randomly selected particles, representing approximately six to seventeen percent of the total particle count. The training target maps particle position errors to values between zero and one, where perfect particles receive a target of one. The mean squared error between network output and target typically decreases from initial values around 0.15 to steady-state values below 0.05 after approximately fifty training iterations. This convergence is evident in the improved error stability observed after fifty seconds in
Figure 3. The neural weight correction maps the sigmoid output to correction factors between 0.4 and 1.0. Particles with correction factors above 0.7 receive boosted weights, while those below 0.7 are penalized, effectively implementing learned importance sampling.
3.4. Comparison with Baseline Methods
3.4.1. Effectiveness Comparison with Baseline Methods
To evaluate the effectiveness of the proposed PF-NN fusion framework, we compare it against several baseline methods under identical simulation conditions. Additionally, we compare it against established state-of-the-art cooperative localization algorithms to demonstrate genuine superiority. The compared methods include:
- (1)
Internal baselines (degraded versions of our framework):
Standard PF (N = 500): A conventional particle filter with fixed 500 particles, representing a typical implementation without adaptive mechanisms.
Fixed PF (N = 800): A particle filter with fixed 800 particles (the maximum used by our adaptive method), representing increased computational cost.
Adaptive PF (no NN): The proposed adaptive resampling mechanism without neural network weight optimization, isolating the contribution of NN.
Proposed PF-NN: The complete proposed framework with both adaptive resampling and neural network weight optimization.
- (2)
External state-of-the-art algorithms:
EKF-based Cooperative Localization: A distributed Extended Kalman Filter that fuses UWB and visual observations using linearized motion and observation models [
32]. This represents a widely used alternative to particle filters in multi-robot systems.
Distributed Factor Graph Optimization (DFGO): A graph-based optimization approach that performs maximum a posteriori estimation over sliding windows of poses and landmarks [
33]. This method has shown superior accuracy in cooperative SLAM applications.
Deep Learning-based Filter (D-LSPF): The Deep Latent Space Particle Filter [
21], which performs filtering in a learned low-dimensional latent space using neural network encoders.
Table 4 presents the quantitative comparison results averaged over 20 Monte Carlo runs (mean ± standard deviation).
The results demonstrate that the proposed PF-NN framework achieves the best performance across all metrics:
Compared to Standard PF (N = 500), PF-NN reduces RMSE by 28.6% (0.612 m → 0.437 m) and maximum error by 44.5% (12.45 m → 6.91 m), demonstrating the effectiveness of both adaptive particle adjustment and neural network optimization.
Compared to Fixed PF (N = 800), PF-NN achieves better accuracy with fewer average particles (720 vs. 800), validating the efficiency of the adaptive mechanism.
The comparison between Adaptive PF (no NN) and PF-NN shows that neural network optimization contributes an additional 9.1% RMSE reduction (0.481 m → 0.437 m), confirming its value in improving weight allocation.
The comparison reveals that PF-NN achieves competitive performance with DFGO (0.437 m vs. 0.398 m RMSE) while maintaining lower computational cost. PF-NN outperforms EKF-based methods by 36.4% in RMSE, demonstrating the advantage of non-Gaussian uncertainty representation. Compared to D-LSPF, PF-NN achieves 4.2% lower RMSE without requiring complex encoder networks.
3.4.2. Computational Performance Analysis
To evaluate the computational efficiency of the proposed algorithm, we measure the runtime performance on a standard desktop computer with an Apple Core M1 CPU and 8 GB RAM. The implementation is in MATLAB R2025b without GPU acceleration.
Table 5 summarizes the computational performance metrics.
The computational performance comparison across all evaluated methods reveals distinct characteristics in terms of execution time and memory utilization, as summarized in
Table 5.
Among the internal baseline methods, the proposed PF-NN framework achieves an average computation time of 63.4 ± 6.7 ms per step, representing a 12.9% reduction compared to Fixed PF (N = 800) and a 40.2% reduction in memory consumption (178 MB versus 205 MB). This efficiency gain stems from the adaptive particle mechanism, which dynamically reduces particle count during stable estimation periods. Relative to Standard PF (N = 500), PF-NN incurs a 40.3% computational overhead attributable to neural network forward propagation and online training.
The comparison with external state-of-the-art algorithms provides additional context for assessing the computational characteristics of PF-NN. The EKF Cooperative method achieves the lowest computational cost (18.5 ± 2.3 ms/step) and memory footprint (65 MB) due to its reliance on Gaussian assumptions and efficient matrix operations. However, the linearization errors inherent in EKF may limit its applicability in highly nonlinear scenarios.
The DFGO requires substantial computational resources (125.6 ± 15.8 ms/step, 385 MB memory). The batch optimization nature of factor graphs, which performs maximum a posteriori estimation over sliding windows, exceeds the real-time constraint for 10 Hz operation (100 ms/step), limiting its applicability for real-time UAV control.
The D-LSPF incurs significant computational overhead (89.3 ± 8.7 ms/step, 268 MB memory) due to its neural network encoder architecture. The 40.9% higher computation time and 50.6% higher memory consumption of D-LSPF relative to PF-NN highlight the efficiency advantages of the proposed lightweight neural network architecture.
Given the 0.1 s (100 ms) control loop requirement for typical UAV applications, the proposed PF-NN framework satisfies real-time constraints with approximately 36.6 ms of available margin for auxiliary tasks such as flight control, path planning, and inter-vehicle communication. Among all evaluated methods, only EKF Cooperative and PF-NN meet the real-time requirement, with PF-NN offering superior representational capacity for non-Gaussian uncertainties. The comprehensive comparison demonstrates that PF-NN achieves an optimal balance between computational efficiency and representational capability among particle filter-based approaches.
While the current implementation was tested on an Apple Core M1 desktop processor, multi-UAV swarms typically operate on resource-constrained embedded platforms. To contextualize the algorithm’s scalability, we analyze its computational and memory requirements on typical embedded hardware. The primary computational costs are (1) particle propagation: where is particle count and d is state dimension; (2) neural network inference: where h is hidden layer size (12 neurons); (3) weight update: where m is measurement dimension. For the tested configuration ( = 720, d = 10, h = 12), this translates to approximately 45 MFLOPs per UAV per step.
On embedded platforms such as the NVIDIA Jetson Nano (472 GFLOPS) or Raspberry Pi 4 (13.5 GFLOPS), the algorithm is expected to achieve step times of 5–10 ms and 150–200 ms respectively. Memory requirements scale linearly with particle count: approximately 250 bytes per particle (10D state + weight + metadata), yielding 180 KB per UAV at = 720. This is well within the memory capacity of typical embedded platforms (2–8 GB RAM). These projections suggest the algorithm is deployable on modern embedded systems with appropriate particle count tuning.
3.4.3. Statistical Validation
To ensure the reliability of our results, we conducted 30 independent Monte Carlo simulations with different random seeds for initial particle distribution, as shown in
Figure 9, sensor noise realizations, and landmark placements. Each simulation runs for 100 s (1000 time steps) with 6 UAVs.
The proposed method (green solid line) consistently achieves lower RMSE values compared to Standard PF (N = 500, red dashed), Fixed PF (N = 800, orange dash-dot), and Adaptive PF without NN (blue dotted).
3.5. Performance Summary
The simulation results demonstrate that the proposed PF-NN fusion algorithm achieves sub-meter accuracy with an average RMSE of 0.437 m across six UAVs, meeting requirements for indoor navigation applications. The algorithm exhibits robust convergence with rapid error reduction during the initial ten seconds and stable performance thereafter. The adaptive resource management through dynamic particle adjustment maintains filter health while controlling computational cost. The fusion of UWB ranging and visual observations provides complementary information, enhancing robustness compared to single-modality approaches. The learned weight optimization contributes to improved accuracy, as evidenced by the correlation between training convergence and error stabilization. These results validate the algorithm’s effectiveness for multi-UAV cooperative localization in GNSS-denied indoor environments, with performance characteristics suitable for real-world deployment in surveillance, search-and-rescue, and industrial inspection scenarios.
4. Conclusions
This work proposes and validates a novel framework that fuses adaptive particle filtering with neural networks, aiming to address the challenge of cooperative localization for UAV systems in GNSS-denied environments. By integrating learning-driven weight optimization with dynamic resource management, our approach overcomes the critical bottlenecks of conventional particle filters. Simulation results demonstrate that the framework achieves sub-meter localization accuracy (with an average RMSE of 0.437 m), effectively mitigates uncertainties arising from sensor noise and environmental sparsity, and exhibits exceptional robustness and real-time application potential.
Our work confirms that introducing a lightweight neural network to learn and evaluate the motion consistency of particles can substantially improve the quality of weight assignment, thereby effectively suppressing particle degeneracy and maintaining stable filter performance even with poor observational data. Furthermore, an effective sample size-based adaptive particle number adjustment mechanism enables intelligent allocation of computational resources according to real-time task demands, which is critical for deploying advanced estimation algorithms on computationally constrained UAV platforms. This synergy between learning and adaptive strategies represents the core advantage of the present work over traditional fixed-parameter filters.
Despite the promising outcomes of this study, several directions warrant further exploration. First, the online training of the current neural network relies on simulated “ground-truth” positions as supervisory signals, which are unavailable in real-world deployment. This represents a fundamental limitation that must be addressed before real-world deployment. We propose the following research directions to overcome this challenge. (1) Multi-view geometric consistency: Leveraging overlapping visual observations from multiple UAVs to generate pseudo-ground-truth through epipolar geometry and triangulation. (2) Consensus-based supervision: Using distributed factor graph optimization to produce consensus state estimates that can serve as training targets. (3) Self-supervised motion learning: Implementing contrastive learning frameworks that learn motion consistency from temporal sequences without position labels. (4) Transfer learning: Pre-training the network on simulation data and fine-tuning with limited real-world calibration data. These approaches aim to eliminate ground-truth dependency while maintaining the benefits of online neural network optimization.
Secondly, the proposed framework can be extended to high-dimensional state spaces—for instance, by incorporating landmark positions into the state vector for joint estimation—to achieve true cooperative SLAM. This will impose higher demands on algorithm scalability and computational efficiency, potentially benefiting from recent advances in differentiable SLAM and neural radiance field (NeRF)-based localization. And then, the current simulation employs idealized circular trajectories with constant velocity, which may not fully represent the chaotic and unpredictable motion patterns encountered in real-world applications such as search-and-rescue operations. In such scenarios, UAVs must perform aggressive maneuvers including rapid acceleration, sharp turns, and hovering at waypoints while navigating through cluttered environments. The robustness of the proposed PF-NN framework to these challenging trajectory patterns warrants further investigation. Specifically, we will evaluate the algorithm’s performance under (1) jerk-constrained trajectories that mimic emergency response maneuvers; (2) multi-scale motion patterns combining fast transit and slow inspection phases; (3) communication-constrained scenarios where UWB measurements become intermittent due to occlusion; and (4) dynamic formation changes where UAVs must rapidly reconfigure their relative positions. These evaluations will be conducted using realistic flight dynamics models and benchmark trajectory datasets from actual search-and-rescue missions. The adaptive particle mechanism and neural network weight optimization are expected to provide enhanced robustness against motion model mismatches during aggressive maneuvers, but empirical validation is essential to quantify performance bounds.
Finally, physical validation of the algorithm on real UAV swarms, and evaluation of its performance in more complex and dynamic real-world scenarios (e.g., in the presence of moving obstacles) will be a critical step toward verifying its ultimate practical utility. Collectively, this work lays a solid foundation for the development of next-generation intelligent and efficient UAV cooperative navigation systems.