WAMF-FPI: A Weight-Adaptive Multi-Feature Fusion Network for UAV Localization
Abstract
:1. Introduction
- We propose a new end-to-end framework called WAMF-FPI, which considers positioning as a low-level task. It alleviates the loss of location information due to the feature map being compressed. In addition, we enhance the ability of the model to solve multi-scale problems through the WAMF module.
- We develop a new Hanning loss that assigns different weights to positive samples to make the model pay more attention to the center of the target region, which proves to be effective in experiments.
- Based on the model proposed in this paper, when RDS is used as the evaluation metric, the performance of the model improves from 57.22 to 65.33 by 8.11 points compared to FPI. We also used MA to evaluate the model, and the positioning accuracy of the model at the 5 m level, 10 m level, and 20 m level reached 26.99%, 52.63%, and 69.73%, respectively, which achieved the state-of-the-art (SOTA) on the Ul14 dataset.
2. Related Work
2.1. Overview of UAV Geo-Localization Methods
2.2. Vision Transformer
3. Materials and Methods
3.1. The Previous Methods
3.2. The Framework of WAMF-FPI
3.3. Feature Extraction Module
3.4. Weight-Adaptive Multi-Feature Fusion Module
3.5. Hanning Loss
4. Experiment
4.1. Implementation Details
4.2. Evaluation Metrics and Dataset
4.3. Main Result
5. Ablation Experiment
5.1. The Effect of Feature Pyramid Structure
5.2. The Effect of Multi-Feature Fusion
5.3. The Effect of Learnable Parameters
5.4. The Effect of Hanning Loss
5.5. The Effect of Upsampling
6. Discussion
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Karaca, Y.; Cicek, M.; Tatli, O.; Sahin, A.; Pasli, S.; Beser, M.F.; Turedi, S. The potential use of unmanned aircraft systems (drones) in mountain search and rescue operations. Am. J. Emerg. Med. 2018, 36, 583–588. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.C.; Ye, D.M.; Ding, X.B.; Teng, C.S.; Wang, G.H.; Li, T.H. UAV Aerial Photography Technology in Island Topographic Mapping. In Proceedings of the 2011 International Symposium on Image and Data Fusion, Tengchong, China, 9–11 August 2011. [Google Scholar] [CrossRef]
- Radoglou-Grammatikis, P.; Sarigiannidis, P.; Lagkas, T.; Moscholios, I. A compilation of UAV applications for precision agriculture. Comput. Netw. 2020, 172, 107148. [Google Scholar] [CrossRef]
- Ding, L.; Zhou, J.; Meng, L.; Long, Z. A Practical Cross-View Image Matching Method between UAV and Satellite for UAV-Based Geo-Localization. Remote Sens. 2020, 13, 47. [Google Scholar] [CrossRef]
- Dannenberg, M.; Wang, X.; Yan, D.; Smith, W. Phenological characteristics of global ecosystems based on optical, fluorescence, and microwave remote sensing. Remote Sens. 2020, 12, 671. [Google Scholar] [CrossRef]
- Zhang, X.; Zhou, J.; Liang, S.; Wang, D. A practical reanalysis data and thermal infrared remote sensing data merging (RTM) method for reconstruction of a 1-km all-weather land surface temperature. Remote Sens. Environ. 2021, 260, 112437. [Google Scholar] [CrossRef]
- Lee, S.; Jeon, S. A Study on the roughness measurement for joints in rock mass using LIDAR. Tunn. Undergr. Space 2017, 27, 58–68. [Google Scholar] [CrossRef]
- Lato, M.J.; Vöge, M. Automated mapping of rock discontinuities in 3D lidar and photogrammetry models. Int. J. Rock Mech. Min. Sci. 2012, 54, 150–158. [Google Scholar] [CrossRef]
- Ge, Y.; Tang, H.; Xia, D.; Wang, L.; Zhao, B.; Teaway, J.W.; Chen, H.; Zhou, T. Automated measurements of discontinuity geometric properties from a 3D-point cloud based on a modified region growing algorithm. Eng. Geol. 2018, 242, 44–54. [Google Scholar] [CrossRef]
- Lin, Y.C.; Habib, A. Quality control and crop characterization framework for multi-temporal UAV LiDAR data over mechanized agricultural fields. Remote Sens. Environ. 2021, 256, 112299. [Google Scholar] [CrossRef]
- Opromolla, R.; Fasano, G.; Rufino, G.; Grassi, M.; Savvaris, A. LIDAR-inertial integration for UAV localization and mapping in complex environments. In Proceedings of the 2016 International Conference on Unmanned Aircraft Systems (ICUAS), Arlington, VA, USA, 7–10 June 2016; pp. 649–656. [Google Scholar]
- Pritzl, V.; Vrba, M.; Štĕpán, P.; Saska, M. Cooperative navigation and guidance of a micro-scale aerial vehicle by an accompanying UAV using 3D LiDAR relative localization. In Proceedings of the 2022 International Conference on Unmanned Aircraft Systems (ICUAS), Dubrovnik, Croatia, 21–24 June 2022; pp. 526–535. [Google Scholar]
- Meng, F.; Yang, D. Research of UAV Location Control System Based on SINS, GPS and Optical Flow. In Proceedings of the 2020 IEEE International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), Chongqing, China, 6–8 November 2020. [Google Scholar] [CrossRef]
- Dai, M.; Chen, J.; Lu, Y.; Hao, W.; Zheng, E. Finding Point with Image: An End-to-End Benchmark for Vision-based UAV Localization. arXiv 2022, arXiv:2208.06561. [Google Scholar]
- Chu, X.; Tian, Z.; Wang, Y.; Zhang, B.; Ren, H.; Wei, X.; Xia, H.; Shen, C. Twins: Revisiting the design of spatial attention in vision transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 9355–9366. [Google Scholar]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning PMLR, Virtual, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
- Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar] [CrossRef]
- Kim, H.J.; Dunn, E.; Frahm, J.M. Learned Contextual Feature Reweighting for Image Geo-Localization. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
- Kim, H.J.; Dunn, E.; Frahm, J.M. Predicting Good Features for Image Geo-Localization Using Per-Bundle VLAD. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar] [CrossRef]
- Saurer, O.; Baatz, G.; Köser, K.; Ladický, L.; Pollefeys, M. Image Based Geo-localization in the Alps. Int. J. Comput. Vis. 2015, 116, 213–225. [Google Scholar] [CrossRef]
- Hays, J.; Efros, A.A. IM2GPS: Estimating geographic information from a single image. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008. [Google Scholar] [CrossRef]
- Vo, N.; Jacobs, N.; Hays, J. Revisiting IM2GPS in the Deep Learning Era. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar] [CrossRef]
- Sattler, T.; Havlena, M.; Schindler, K.; Pollefeys, M. Large-Scale Location Recognition and the Geometric Burstiness Problem. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar] [CrossRef]
- Zamir, A.R.; Shah, M. Image Geo-Localization Based on MultipleNearest Neighbor Feature Matching UsingGeneralized Graphs. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 1546–1558. [Google Scholar] [CrossRef]
- Tian, Y.; Chen, C.; Shah, M. Cross-View Image Matching for Geo-Localization in Urban Environments. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.Y.; Belongie, S.; Hays, J. Cross-View Image Geolocalization. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 23–28 June 2013. [Google Scholar] [CrossRef]
- Hu, S.; Feng, M.; Nguyen, R.M.H.; Lee, G.H. CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
- Zhai, M.; Bessinger, Z.; Workman, S.; Jacobs, N. Predicting Ground-Level Scene Layout from Aerial Imagery. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
- Workman, S.; Souvenir, R.; Jacobs, N. Wide-Area Image Geolocalization with Aerial Reference Imagery. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Washington, DC, USA, 7–13 December 2015. [Google Scholar] [CrossRef]
- Lin, T.Y.; Cui, Y.; Belongie, S.; Hays, J. Learning deep representations for ground-to-aerial geolocalization. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar] [CrossRef]
- Shi, Y.; Liu, L.; Yu, X.; Li, H. Spatial-aware feature aggregation for image based cross-view geo-localization. Adv. Neural Inf. Process. Syst. 2019, 32, 10090–10100. [Google Scholar]
- Toker, A.; Zhou, Q.; Maximov, M.; Leal-Taixe, L. Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar] [CrossRef]
- Zeng, Z.; Wang, Z.; Yang, F.; Satoh, S. Geo-Localization via Ground-to-Satellite Cross-View Image Retrieval. IEEE Trans. Multimed. 2022. [Google Scholar] [CrossRef]
- Zheng, Z.; Wei, Y.; Yang, Y. University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020. [Google Scholar] [CrossRef]
- Wang, T.; Zheng, Z.; Yan, C.; Zhang, J.; Sun, Y.; Zheng, B.; Yang, Y. Each Part Matters: Local Patterns Facilitate Cross-View Geo-Localization. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 867–879. [Google Scholar] [CrossRef]
- Dai, M.; Huang, J.; Zhuang, J.; Lan, W.; Cai, Y.; Zheng, E. Vision-Based UAV Localization System in Denial Environments. arXiv 2022, arXiv:2201.09201. [Google Scholar]
- Zhu, S.; Yang, T.; Chen, C. VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar] [CrossRef]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar] [CrossRef]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Kim, M.; Lee, Y.; Lim, C. Deformable convolutional networks based Mask R-CNN. J. Korean Data Inf. Sci. Soc. 2020, 31, 993–1008. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Computer Vision—ECCV 2020; Springer International Publishing: New York, NY, USA, 2020; pp. 213–229. [Google Scholar] [CrossRef]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.; et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar] [CrossRef]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Jiang, Y.; Chang, S.; Wang, Z. Transgan: Two pure transformers can make one strong gan, and that can scale up. Adv. Neural Inf. Process. Syst. 2021, 34, 14745–14758. [Google Scholar]
- Yang, F.; Yang, H.; Fu, J.; Lu, H.; Guo, B. Learning Texture Transformer Network for Image Super-Resolution. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
- He, S.; Luo, H.; Wang, P.; Wang, F.; Li, H.; Jiang, W. TransReID: Transformer-based Object Re-Identification. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
- Han, K.; Xiao, A.; Wu, E.; Guo, J.; Xu, C.; Wang, Y. Transformer in transformer. Adv. Neural Inf. Process. Syst. 2021, 34, 15908–15919. [Google Scholar]
- Zhang, Q.; Yang, Y.B. ResT: An efficient transformer for visual recognition. Adv. Neural Inf. Process. Syst. 2021, 34, 15475–15485. [Google Scholar]
- Wu, H.; Xiao, B.; Codella, N.; Liu, M.; Dai, X.; Yuan, L.; Zhang, L. CvT: Introducing Convolutions to Vision Transformers. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
- Ren, S.; Zhou, D.; He, S.; Feng, J.; Wang, X. Shunted self-attention via multi-scale token aggregation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 10853–10862. [Google Scholar]
DATASET | SATELLITE | UAV | UNIVERSITIES |
---|---|---|---|
Train | 6768 | 6768 | 10 |
Test | 2331 | 27,972 | 4 |
MODEL | RDS | GFLOPs | Param(M) | Inference Time |
---|---|---|---|---|
FPI | 57.22 | 14.88 | 44.48 | 1× |
WAMF-FPI | 65.33 | 13.32 | 48.5 | 1.69× |
Pyramid Structure | Outputsize | RDS |
---|---|---|
✓ | 101 × 101 | 61.32 |
× | 26 × 26 | 58.01 |
Feature | Outputsize | RDS | 3 m(%) | 5 m(%) | 10 m(%) | 20 m(%) | 30 m(%) | 40 m(%) | 50 m(%) |
---|---|---|---|---|---|---|---|---|---|
A3 | 400 × 400 | 61.32 | 9.49 | 20.52 | 43.89 | 63.65 | 69.08 | 71.82 | 74.58 |
A2 | 400 × 400 | 58.74 | 9.00 | 20.44 | 42.88 | 61.45 | 65.99 | 68.66 | 71.20 |
A1 | 400 × 400 | 49.62 | 5.64 | 13.19 | 30.71 | 48.28 | 54.96 | 58.58 | 61.93 |
A1 + A2 + A3 | 400 × 400 | 63.70 | 11.56 | 24.94 | 49.18 | 67.28 | 71.16 | 73.54 | 75.84 |
Learnable Parameters | RDS | 3 m (%) | 5 m (%) | 10 m (%) | 20 m (%) | 50 m (%) |
---|---|---|---|---|---|---|
✓ | 65.33 | 12.50 | 26.99 | 52.63 | 69.73 | 77.24 |
× | 63.70 | 11.56 | 24.94 | 49.18 | 67.28 | 75.84 |
Hanning Weight | Outputsize | RDS | 3 m(%) | 5 m(%) | 10 m(%) | 20 m(%) | 30 m(%) | 40 m(%) | 50 m(%) |
---|---|---|---|---|---|---|---|---|---|
✓ | 400 × 400 | 65.33 | 12.50 | 26.99 | 52.63 | 69.73 | 73.11 | 75.12 | 77.24 |
× | 400 × 400 | 63.56 | 10.22 | 22.21 | 46.99 | 66.77 | 71.72 | 74.52 | 76.82 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, G.; Chen, J.; Dai, M.; Zheng, E. WAMF-FPI: A Weight-Adaptive Multi-Feature Fusion Network for UAV Localization. Remote Sens. 2023, 15, 910. https://doi.org/10.3390/rs15040910
Wang G, Chen J, Dai M, Zheng E. WAMF-FPI: A Weight-Adaptive Multi-Feature Fusion Network for UAV Localization. Remote Sensing. 2023; 15(4):910. https://doi.org/10.3390/rs15040910
Chicago/Turabian StyleWang, Guirong, Jiahao Chen, Ming Dai, and Enhui Zheng. 2023. "WAMF-FPI: A Weight-Adaptive Multi-Feature Fusion Network for UAV Localization" Remote Sensing 15, no. 4: 910. https://doi.org/10.3390/rs15040910
APA StyleWang, G., Chen, J., Dai, M., & Zheng, E. (2023). WAMF-FPI: A Weight-Adaptive Multi-Feature Fusion Network for UAV Localization. Remote Sensing, 15(4), 910. https://doi.org/10.3390/rs15040910