CPFL: Resilient Continuous UAV Localization via Cross-View Perception and Particle Filtering
Highlights
- A vision-only Cross-view Particle Filter Localization (CPFL) framework is proposed. By fusing homography-based visual motion propagation with a Dual-Granularity Adaptive Gating (DGAG) network, it restrains cumulative drift into a bounded error during continuous localization.
- Experiments on the real-world MAFS dataset demonstrate that CPFL achieves a 5.28 m mean error and an 89.7% success rate (under a 10 m threshold), outperforming mainstream vision-only and IMU-fusion baselines in global accuracy and long-term robustness.
- By addressing the perceptual aliasing problem in homogeneously textured regions and constraining cumulative drift via the filtering mechanism, this approach provides a resilient, map-dependent solution for continuous UAV localization in outdoor GNSS-denied environments.
- The introduction of the TCI@d metric establishes a more rigorous standard for evaluating long-term localization stability. Furthermore, the CPFL framework provides a new self-localization paradigm with engineering deployment potential for resource-constrained UAVs executing long-endurance missions in complex dynamic environments.
Abstract
1. Introduction
- 1.
- A vision-only Cross-view Particle Filter Localization (CPFL) framework. By fusing visual motion propagation with cross-view observation corrections, it effectively bounds cumulative drift and achieves continuous global localization;
- 2.
- A Dual-Granularity Adaptive Gating (DGAG) cross-view feature extraction network. It utilizes dynamic routing to adjust the fusion ratio of global semantics and local details, which mitigates perceptual aliasing and improves cross-view matching robustness in complex environments;
- 3.
- A trajectory-level evaluation metric, the Trajectory Continuity Index (TCI@d). By incorporating temporal smoothness constraints and cumulative error suppression penalties, it provides an objective standard specifically tailored for evaluating the long-term stability of continuous UAV localization.
2. Related Work
2.1. UAV Cross-View Geo-Localization Methods
2.1.1. Image Retrieval
2.1.2. FPI
2.1.3. Particle Filter
2.2. Feature Representation Learning Methods in Cross-View Tasks
2.3. Motion Estimation Based on Inter-Frame Visual Displacement
3. Method
3.1. Cross-View Feature Extraction Network
3.1.1. Baseline Model: Dual-Branch Siamese ViT Architecture
- Input and Preprocessing
- Backbone Network and Feature Extraction
- Feature Processing and Output
- Supervised Training and Loss Functions
3.1.2. Dual-Granularity Adaptive Gating (DGAG) Head Design
- Local Feature Aggregation: Generalized Mean (GeM) Pooling [38]
- Adaptive Fusion Mechanism for Cross-Granularity Features
3.1.3. Feature Similarity Calculation
3.2. Particle Filter Localization Framework
3.2.1. Particle Set Construction
- State Prediction: Each particle undergoes independent state propagation according to the state transition model .
- Observation Update: The weight of each particle is calculated and updated according to the observation model .
- Resampling: The particle set is filtered based on the weight distribution to maintain particle diversity and approximate the true posterior.
3.2.2. State Prediction
- Feature Extraction and Matching
- Robust Estimation of Motion Parameters
3.2.3. Cross-View Coordinate Transformation and Scale Mapping
3.2.4. Observation Update
3.2.5. Resampling and State Estimation
| Algorithm 1 Cross-View Particle Filter Localization (CPFL) |
|
3.3. Trajectory Continuity Index (TCI@d)
4. Experiments
4.1. Dataset
4.2. Main Localization Performance Comparison
- DenseUAV [12]: A representative method based on the Image Retrieval paradigm, which localizes by retrieving the most similar reference image and performing geographic tag interpolation.
- SSPT [15]: A coordinate regression paradigm based on FPI (Finding Point with Image), employing a Transformer architecture to directly predict the pixel coordinates of the UAV within the satellite image.
- SWA-PF [8]: The SOTA method based on Sequential filtering, which utilizes semantic segmentation features for matching and relies on IMU sensors for state prediction.
4.2.1. Experimental Setup and Baseline Implementation
4.2.2. Quantitative Performance Analysis
4.2.3. Cumulative Error Distribution (CED) Analysis
- The curve of our method (Ours, solid red line) has the largest initial slope and rises the fastest. The curve exceeds a 60% recall rate at the 5 m error threshold and approaches 90% at 10 m, demonstrating a faster convergence speed and accuracy.
- SWA-PF (dashed blue line), although sharing a similar overall trend with our method, is consistently outperformed by our method in the high-accuracy interval of 3 m–8 m.
- The curves for SSPT (dash-dotted green line) and DenseUAV (dotted gray line) grow slowly and exhibit significant long-tail effects, further confirming the limitations of non-sequential localization methods in dynamic environments.
4.2.4. Continuity and Smoothness Trade-Off Analysis
4.3. Comparative Analysis of Feature Extraction Strategies
4.3.1. Quantitative Performance Evaluation
- Robustness Improvement:
- Trade-off Between Convergence and Retention:
4.3.2. Error Distribution and Temporal Stability Analysis
4.3.3. Summary of Feature Extraction Strategies
4.4. Drift Elimination and Global Consistency Verification
4.4.1. Experimental Setup and Comparison Schemes
- Visual Dead Reckoning (Open-Loop):
- Ours (Closed-Loop):
4.4.2. Qualitative Analysis of Trajectory Consistency
- Drift Accumulation in Open-Loop Dead Reckoning:
- Error Correction Capability of the Closed-Loop System:
4.4.3. Quantitative Error Evolution
- Drift Accumulation and High Variance:
- Bounded Oscillation and Distribution Compression:
4.4.4. Summary of Drift Elimination
4.5. Real-Time Performance on Edge Platforms
4.5.1. Edge Hardware Setup and System Optimization
- Backend Acceleration via TensorRT: The DGAG-ViT cross-view observation model was converted from the native PyTorch(V2.1.0) dynamic graph into an optimized TensorRT static engine. By quantizing the network to FP16 precision, we explicitly exploited the GPU’s Tensor Cores for accelerated matrix multiplication and reduced the memory bandwidth bottleneck.
- Frontend Lightweighting: The visual odometry was optimized to ensure continuous relative pose estimation without saturating the ARM CPU. We introduced a image downsampling strategy and replaced computationally intensive feature extractors (e.g., SIFT) with lightweight ORB features. By utilizing binary descriptors and Hamming distance matching, the frontend processing time was compressed to under 100 ms.
4.5.2. State-Dependent Particle Decay and Accuracy-Latency Trade-Off
4.5.3. Tracking Stability and Resilience Analysis
4.5.4. Summary of Real-Time Edge Performance
4.6. Robustness Analysis Under Sudden Wind Gust Disturbances
4.6.1. Intrinsic Fault Tolerance (50 m Gust Disturbance)
4.6.2. Extreme Disturbance and Adaptive Noise Amplification (90 m Gust)
4.6.3. Summary of Disturbance Robustness
4.7. System Sensitivity and Boundary Condition Evaluation
4.7.1. Robustness Against Map Degradation
- Low-Res Interpolation: The reference map is downsampled by a factor of 0.25 and subsequently upsampled to its original resolution using linear interpolation, followed by a minor Gaussian blur to remove mosaic artifacts. This simulates the reliance on outdated or low-resolution satellite databases.
- Severe Blur: A large Gaussian blur kernel (, ) is applied to simulate severe optical defocus or thick atmospheric interference (e.g., fog or clouds) that obscures ground details.
- Gaussian Noise: Heavy additive Gaussian noise is injected into the image to simulate sensor thermal noise or signal corruption under suboptimal environmental conditions.
4.7.2. Parameter Sensitivity of the Observation Likelihood
4.7.3. Tolerance to Scale Factor and Altitude Perturbations
4.7.4. Summary of Sensitivity and Boundary Analysis
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Shakhatreh, H.; Sawalmeh, A.H.; Al-Fuqaha, A.; Dou, Z.; Almaita, E.; Khalil, I.; Othman, N.S.; Khreishah, A.; Guizani, M. Unmanned aerial vehicles (UAVs): A survey on civil applications and key research challenges. IEEE Access 2019, 7, 48572–48634. [Google Scholar] [CrossRef]
- Couturier, A.; Akhloufi, M.A. A review on absolute visual localization for UAV. Robot. Auton. Syst. 2021, 135, 103666. [Google Scholar] [CrossRef]
- Zhou, X.; Zhang, X.; Yang, X.; Zhao, J.; Liu, Z.; Shuang, F. Towards UAV Localization in GNSS-Denied Environments: The SatLoc Dataset and a Hierarchical Adaptive Fusion Framework. Remote Sens. 2025, 17, 3048. [Google Scholar] [CrossRef]
- Partsinevelos, P.; Chatziparaschis, D.; Trigkakis, D.; Tripolitsiotis, A. A novel UAV-assisted positioning system for GNSS-denied environments. Remote Sens. 2020, 12, 1080. [Google Scholar] [CrossRef]
- Lin, T.Y.; Cui, Y.; Belongie, S.; Hays, J. Learning deep representations for ground-to-aerial geolocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2015; pp. 5007–5015. [Google Scholar]
- Durgam, A.; Paheding, S.; Dhiman, V.; Devabhaktuni, V. Cross-view geo-localization: A survey. IEEE Access 2024, 12, 192028–192050. [Google Scholar] [CrossRef]
- Qin, T.; Li, P.; Shen, S. Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef]
- Yuan, J.; Dai, M.; Zheng, E.; Su, C.; Chen, N.; Hu, Q.; Zhu, S.; Cao, Y. SWA-PF: Semantic-Weighted Adaptive Particle Filter for Memory-Efficient 4-DoF UAV Localization in GNSS-Denied Environments. arXiv 2025, arXiv:2509.13795. [Google Scholar]
- Zheng, Z.; Wei, Y.; Yang, Y. University-1652: A multi-view multi-source benchmark for drone-based geo-localization. In Proceedings of the 28th ACM international Conference on Multimedia; ACM: New York, NY, USA, 2020; pp. 1395–1403. [Google Scholar]
- Workman, S.; Souvenir, R.; Jacobs, N. Wide-area image geolocalization with aerial reference imagery. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: New York, NY, USA, 2015; pp. 3961–3969. [Google Scholar]
- Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2016; pp. 5297–5307. [Google Scholar]
- Dai, M.; Zheng, E.; Feng, Z.; Qi, L.; Zhuang, J.; Yang, W. Vision-based UAV self-positioning in low-altitude urban environments. IEEE Trans. Image Process. 2023, 33, 493–508. [Google Scholar] [CrossRef]
- Li, J.; Sun, Y.; Xiang, Y.; Lei, L. One-to-Many Retrieval Between UAV Images and Satellite Images for UAV Self-Localization in Real-World Scenarios. Remote Sens. 2025, 17, 3045. [Google Scholar] [CrossRef]
- Shugaev, M.; Semenov, I.; Ashley, K.; Klaczynski, M.; Cuntoor, N.; Lee, M.W.; Jacobs, N. Arcgeo: Localizing limited field-of-view images using cross-view matching. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; IEEE: New York, NY, USA, 2024; pp. 209–218. [Google Scholar]
- Fan, J.; Zheng, E.; He, Y.; Yang, J. A cross-view geo-localization algorithm using UAV image and satellite image. Sensors 2024, 24, 3719. [Google Scholar] [CrossRef]
- Li, H.; Yang, W.; Xu, F.; Tan, H.; Zhang, H.; Li, S.; Xia, G.S. Unifying UAV Cross-View Geo-Localization via 3D Geometric Perception. arXiv 2026, arXiv:2604.01747. [Google Scholar]
- Van Dalen, G.J.; Magree, D.P.; Johnson, E.N. Absolute localization using image alignment and particle filtering. In Proceedings of the AIAA Guidance, Navigation, and Control Conference, San Diego, CA, USA, 4–8 January 2016; p. 0647. [Google Scholar]
- Patel, B. Visual Localization for UAVs in Outdoor GPS-Denied Environments; University of Toronto: Toronto, ON, Canada, 2019. [Google Scholar]
- Miller, I.D.; Cowley, A.; Konkimalla, R.; Shivakumar, S.S.; Nguyen, T.; Smith, T.; Taylor, C.J.; Kumar, V. Any way you look at it: Semantic crossview localization and mapping with lidar. IEEE Robot. Autom. Lett. 2021, 6, 2397–2404. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. In Proceedings of the European conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2006; pp. 404–417. [Google Scholar]
- Wang, T.; Zheng, Z.; Yan, C.; Zhang, J.; Sun, Y.; Zheng, B.; Yang, Y. Each part matters: Local patterns facilitate cross-view geo-localization. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 867–879. [Google Scholar] [CrossRef]
- Shi, Y.; Liu, L.; Yu, X.; Li, H. Spatial-aware feature aggregation for image based cross-view geo-localization. Adv. Neural Inf. Process. Syst. 2019, 32, 10090–10100. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Zhu, S.; Shah, M.; Chen, C. Transgeo: Transformer is all you need for cross-view image geo-localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2022; pp. 1162–1171. [Google Scholar]
- Dai, M.; Hu, J.; Zhuang, J.; Zheng, E. A transformer-based feature segmentation and region alignment method for UAV-view geo-localization. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 4376–4389. [Google Scholar] [CrossRef]
- Wang, X.; Zheng, L. UAV Cross-View Geo-Localization Based on Multi-Scale Partitioning and Attention-Enhanced Transformer. In Proceedings of the 2025 International Conference on Signal Processing, Computer Networks and Communications (SPCNC); IEEE: New York, NY, USA, 2025; pp. 512–516. [Google Scholar]
- Prutyanov, V.V.; Korolev, N.L.; Romanov, A.Y. UAV Visual Localization System Empowered by Zero-Shot Deep Feature Matching. In Proceedings of the 2025 International Russian Automation Conference (RusAutoCon); IEEE: New York, NY, USA, 2025; pp. 350–354. [Google Scholar]
- Nguyen, T.; Chen, S.W.; Shivakumar, S.S.; Taylor, C.J.; Kumar, V. Unsupervised deep homography: A fast and robust homography estimation model. IEEE Robot. Autom. Lett. 2018, 3, 2346–2353. [Google Scholar] [CrossRef]
- Amidi, O.; Kanade, T.; Fujita, K. A visual odometer for autonomous helicopter flight. Robot. Auton. Syst. 1999, 28, 185–193. [Google Scholar] [CrossRef]
- Caballero, F.; Merino, L.; Ferruz, J.; Ollero, A. Vision-based odometry and SLAM for medium and high altitude flying UAVs. J. Intell. Robot. Syst. 2009, 54, 137–161. [Google Scholar] [CrossRef]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision; IEEE: New York, NY, USA, 2011; pp. 2564–2571. [Google Scholar]
- Ma, J.; Jiang, X.; Fan, A.; Jiang, J.; Yan, J. Image matching from handcrafted to deep features: A survey. Int. J. Comput. Vis. 2021, 129, 23–79. [Google Scholar] [CrossRef]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; IEEE: New York, NY, USA, 2018; pp. 224–236. [Google Scholar]
- Sun, J.; Shen, Z.; Wang, Y.; Bao, H.; Zhou, X. LoFTR: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2021; pp. 8922–8931. [Google Scholar]
- Xiao, J.; Loianno, G. Uasthn: Uncertainty-aware deep homography estimation for uav satellite-thermal geo-localization. In Proceedings of the 2025 IEEE International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2025; pp. 14066–14072. [Google Scholar]
- Hermans, A.; Beyer, L.; Leibe, B. In defense of the triplet loss for person re-identification. arXiv 2017, arXiv:1703.07737. [Google Scholar] [CrossRef]
- Radenović, F.; Tolias, G.; Chum, O. Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 1655–1668. [Google Scholar] [CrossRef]
- Arulampalam, M.S.; Maskell, S.; Gordon, N.; Clapp, T. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 2002, 50, 174–188. [Google Scholar] [CrossRef]
- Muja, M.; Lowe, D.G. Fast approximate nearest neighbors with automatic algorithm configuration. In Proceedings of the International Conference on Computer Vision Theory and Applications, Scitepress, Lisboa, Portugal, 5–8 February 2009; Volume 1, pp. 331–340. [Google Scholar]
- Ranftl, R.; Lasinger, K.; Hafner, D.; Schindler, K.; Koltun, V. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1623–1637. [Google Scholar] [CrossRef] [PubMed]















| Method | Type | Sensors | Mean Error (m) ↓ | RMSE (m) ↓ | Success Rate% (10 m) ↑ | TCI@5 ↑ | TCI@10 ↑ | TCI@20 ↑ |
|---|---|---|---|---|---|---|---|---|
| SSPT | Regression | Vision Only | 42.08 | 62.26 | 41.25% | 0.006 | 0.035 | 0.104 |
| DenseUAV | Retrieval | Vision Only | 176.15 | 260.54 | 5.06% | 0.0001 | 0.001 | 0.007 |
| SWA-PF | Sequential | Vision + IMU | 5.76 ± 0.35 | 6.60 ± 0.40 | 80.77% | 0.061 | 0.319 | 0.924 |
| Ours | Sequential | Vision Only | 5.28 ± 0.38 | 5.92 ± 0.44 | 89.74% | 0.033 | 0.329 | 0.906 |
| Head | Mean Error (m) ↓ | RMSE (m) ↓ | Success Rate%(10 m) ↑ | TCI@5 ↑ | TCI@10 ↑ | TCI@20 ↑ |
|---|---|---|---|---|---|---|
| Global (baseline) | 6.43 ± 0.59 | 7.44 ± 0.76 | 76.92% | 0.038 | 0.192 | 0.900 |
| MaxPool | 8.58 ± 1.54 | 10.26 ± 1.85 | 62.82% | 0.011 | 0.082 | 0.761 |
| AvgPool | 9.21 ± 2.03 | 11.99 ± 2.64 | 76.92% | 0.082 | 0.217 | 0.696 |
| AvgMaxPool | 5.76 ± 0.40 | 6.71 ± 0.47 | 83.33% | 0.031 | 0.277 | 0.888 |
| GemPool | 6.84 ± 1.63 | 7.72 ± 1.82 | 73.08% | 0.024 | 0.182 | 0.876 |
| LPN (block = 2) | 10.25 ± 2.63 | 11.57 ± 3.02 | 47.44% | 0.002 | 0.0738 | 0.439 |
| FSRA (block = 2) | 5.28 ± 0.41 | 6.20 ± 0.54 | 85.90% | 0.046 | 0.306 | 0.937 |
| DGAG (ours) | 5.28 ± 0.38 | 5.92 ± 0.44 | 89.74% | 0.033 | 0.329 | 0.906 |
| Decay Strategy | Init Particles | Tracking Particles | Mean Error (m) ↓ | RMSE (m) ↓ | Tracking Latency (s/Frame) ↓ |
|---|---|---|---|---|---|
| Conservative (Baseline) | 1000 | 550 | 6.70 | 7.29 | ∼3.58 |
| Moderate | 1000 | 200 | 7.45 | 8.18 | ∼1.74 |
| Aggressive | 1000 | 50 | 8.94 | 9.34 | ∼0.95 |
| Divergent | 1000 | 20 | 28.40 | 30.56 | ∼0.80 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Su, C.; Yuan, J.; Zheng, E.; Xu, W.; Liu, Z.; Hu, J. CPFL: Resilient Continuous UAV Localization via Cross-View Perception and Particle Filtering. Drones 2026, 10, 437. https://doi.org/10.3390/drones10060437
Su C, Yuan J, Zheng E, Xu W, Liu Z, Hu J. CPFL: Resilient Continuous UAV Localization via Cross-View Perception and Particle Filtering. Drones. 2026; 10(6):437. https://doi.org/10.3390/drones10060437
Chicago/Turabian StyleSu, Chao, Jiayu Yuan, Enhui Zheng, Wangpin Xu, Zhanghua Liu, and Jianhong Hu. 2026. "CPFL: Resilient Continuous UAV Localization via Cross-View Perception and Particle Filtering" Drones 10, no. 6: 437. https://doi.org/10.3390/drones10060437
APA StyleSu, C., Yuan, J., Zheng, E., Xu, W., Liu, Z., & Hu, J. (2026). CPFL: Resilient Continuous UAV Localization via Cross-View Perception and Particle Filtering. Drones, 10(6), 437. https://doi.org/10.3390/drones10060437

