UAV Geo-Localization Dataset and Method Based on Cross-View Matching
Abstract
:1. Introduction
- Utilizing the advantages of digital twin technology, the construction of the VDUAV dataset solves the problems of the traditional cross-view matching dataset with a single scene and high production cost.
- A new UAV cross-view geo-localization model, VRLM, is proposed, which performs multi-scale feature extraction on UAV images and satellite images through the first three stages of the FocalNet backbone network. This effectively mitigates the serious loss of image position information due to multiple compression and designs an adaptive multi-scale feature weighted fusion module (SCFF) to maximize the retention of image information; this improves the running speed and positioning accuracy of the model.
- Based on the VDUAV dataset and model proposed in this paper, using RDS as an evaluation metric, the model accuracy is improved from 67.07% to 74.13% compared with FPI, and the model achieves 45.13%, 64.71%, and 83.35% localization accuracies at the 5 m, 10 m, and 20 m levels, respectively, when evaluated with the meter-level accuracy rubric.
2. Related Work
2.1. Geo-Location Datasets
2.2. Deep Learning-Based Geo-Localization Method for UAV
2.3. Transformer
3. Digital Twin Platform Building
3.1. UAV Tilt-Photography Modeling
- Pre-Flight Setup: Data collection was conducted using a DJI M210 drone (Shenzhen DJI Innovation Technology Co., LTD., Shenzhen, China). Prior to takeoff, the reconstruction area was defined, and the drone’s flight path was automatically planned based on the required overlap rate to ensure comprehensive coverage. During the flight, an SRT file containing time-synchronized data was recorded. Exposure compensation and other parameters were adjusted to ensure image clarity.
- Data Preprocessing: The raw video and corresponding SRT files were organized. The video was exported frame by frame into images, and waypoint data from the SRT files were embedded into the corresponding image metadata. This step included the calibration of the drone camera’s internal parameters.
- The 3D Data Optimization: The processed camera calibration data and image data were imported into Metashape software 2.0.4 for feature point extraction from images taken from five different viewpoints. Bundle adjustment was employed to perform overall adjustment calculations, removing gross error points. Through multiple iterations of bundle adjustment and point position adjustments, the optimized camera poses for each image were obtained.
- The 3D Model Construction: The collected images and processed data were imported into ContextCapture software [47]. The optical properties, including sensor dimensions and focal length information, are also imported. After verifying that the information is correct, image matching is performed. Once matching is completed, aerial triangulation is conducted, and ground control points are imported. A manual association of ground control points with images is carried out, and the ground control points are manually adjusted. Finally, aerial triangulation measurement is performed, and once verified, a 3D model and orthophoto are generated.
3.2. Model Import
3.3. Accurate Mapping of Real and Virtual Space
4. VDUAV Dataset
4.1. Dataset Creation
4.2. Drone–Satellite Map Data Pairs
5. Methods
5.1. Deep Learning Modeling Framework
5.2. Backbone Network
5.3. Similarity Computation Multi-Feature Fusion Module
6. Experiment
6.1. Experimental Details
6.2. Datasets and Evaluation Indicators
6.3. Main Results
6.3.1. Comparative Analysis of Positioning Methods
6.3.2. Comparison of Different Data Sets
7. Controlled Experiment and Analysis
7.1. Comparative Experiments on Backbone Networks
7.2. Comparative Experiments of Fusion Methods
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mohsan, S.A.H.; Othman, N.Q.H.; Khan, M.A.; Amjad, H.; Żywiołek, J. A comprehensive review of micro UAV charging techniques. Micromachines 2022, 13, 977. [Google Scholar] [CrossRef] [PubMed]
- Mohsan, S.A.H.; Khan, M.A.; Noor, F.; Ullah, I.; Alsharif, M.H. Towards the unmanned aerial vehicles (UAVs): A comprehensive review. Drones 2022, 6, 147. [Google Scholar] [CrossRef]
- Grenier, A.; Lohan, E.S.; Ometov, A.; Nurmi, J. A survey on low-power GNSS. IEEE Commun. Surv. Tutorials 2023, 25, 1482–1509. [Google Scholar] [CrossRef]
- Rodriguez-Alvarez, N.; Munoz-Martin, J.F.; Morris, M. Latest advances in the global navigation satellite system—Reflectometry (GNSS-R) field. Remote Sens. 2023, 15, 2157. [Google Scholar] [CrossRef]
- Sonugür, G. A Review of quadrotor UAV: Control and SLAM methodologies ranging from conventional to innovative approaches. Robot. Auton. Syst. 2023, 161, 104342. [Google Scholar] [CrossRef]
- Luo, H.; Li, G.; Zou, D.; Li, K.; Li, X.; Yang, Z. UAV navigation with monocular visual inertial odometry under GNSS-denied environment. IEEE Trans. Geosci. Remote. Sens. 2023, 61, 1001615. [Google Scholar] [CrossRef]
- Gyagenda, N.; Hatilima, J.V.; Roth, H.; Zhmud, V. A review of GNSS-independent UAV navigation techniques. Robot. Auton. Syst. 2022, 152, 104069. [Google Scholar] [CrossRef]
- Rezwan, S.; Choi, W. Artificial intelligence approaches for UAV navigation: Recent advances and future challenges. IEEE Access 2022, 10, 26320–26339. [Google Scholar] [CrossRef]
- Couturier, A.; Akhloufi, M.A. A review on absolute visual localization for UAV. Robot. Auton. Syst. 2021, 135, 103666. [Google Scholar] [CrossRef]
- Sun, J.; Shen, Z.; Wang, Y.; Bao, H.; Zhou, X. LoFTR: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8922–8931. [Google Scholar]
- Li, X.; Zhu, R.; Yu, X.; Wang, X. High-Performance Detection-Based Tracker for Multiple Object Tracking in UAVs. Drones 2023, 7, 681. [Google Scholar] [CrossRef]
- Catalano, I.; Yu, X.; Queralta, J.P. Towards robust uav tracking in gnss-denied environments: A multi-lidar multi-uav dataset. In Proceedings of the 2023 IEEE International Conference on Robotics and Biomimetics (ROBIO), Koh Samui, Thailand, 4–9 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–7. [Google Scholar]
- Ye, J.; Fu, C.; Cao, Z.; An, S.; Zheng, G.; Li, B. Tracker meets night: A transformer enhancer for UAV tracking. IEEE Robot. Autom. Lett. 2022, 7, 3866–3873. [Google Scholar] [CrossRef]
- Kang, X.; Shao, Y.; Bai, G.; Sun, H.; Zhang, T.; Wang, D. Dual-UAV Collaborative High-Precision Passive Localization Method Based on Optoelectronic Platform. Drones 2023, 7, 646. [Google Scholar] [CrossRef]
- Delibasoglu, I. UAV images dataset for moving object detection from moving cameras. arXiv 2021, arXiv:2103.11460. [Google Scholar]
- Elashry, A.; Toth, C. A Novel Approach to Image Retrieval for Vision-Based Positioning Utilizing Graph Topology. ISPRS Ann. Photogramm. Remote. Sens. Spat. Inf. Sci. 2024, 10, 49–56. [Google Scholar] [CrossRef]
- Dai, M.; Chen, J.; Lu, Y.; Hao, W.; Zheng, E. Finding point with image: An end-to-end benchmark for vision-based UAV localization. arXiv 2022, arXiv:2208.06561. [Google Scholar]
- Lin, T.Y.; Cui, Y.; Belongie, S.; Hays, J. Learning deep representations for ground-to-aerial geolocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5007–5015. [Google Scholar]
- Zhai, M.; Bessinger, Z.; Workman, S.; Jacobs, N. Predicting ground-level scene layout from aerial imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 867–875. [Google Scholar]
- Tian, Y.; Chen, C.; Shah, M. Cross-view image matching for geo-localization in urban environments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3608–3616. [Google Scholar]
- Shi, Y.; Liu, L.; Yu, X.; Li, H. Spatial-aware feature aggregation for image based cross-view geo-localization. In Advances in Neural Information Processing Systems; IEEE Computer Society: Los Alamitos, CA, USA, 2019; Volume 32. [Google Scholar]
- Shi, Y.; Yu, X.; Liu, L.; Zhang, T.; Li, H. Optimal feature transport for cross-view image geo-localization. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11990–11997. [Google Scholar]
- Shi, Y.; Yu, X.; Campbell, D.; Li, H. Where am i looking at? joint location and orientation estimation by cross-view matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4064–4072. [Google Scholar]
- Zhu, S.; Yang, T.; Chen, C. Vigor: Cross-view image geo-localization beyond one-to-one retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3640–3649. [Google Scholar]
- Zheng, Z.; Wei, Y.; Yang, Y. University-1652: A multi-view multi-source benchmark for drone-based geo-localization. In Proceedings of the 28th ACM International Conference on Multimedia, Virtual Event/Seattle, WA, USA, 12–16 October 2020; pp. 1395–1403. [Google Scholar]
- Dai, M.; Zheng, E.; Feng, Z.; Qi, L.; Zhuang, J.; Yang, W. Vision-based UAV self-positioning in low-altitude urban environments. IEEE Trans. Image Process. 2023, 33, 493–508. [Google Scholar] [CrossRef]
- Zhu, R.; Yin, L.; Yang, M.; Wu, F.; Yang, Y.; Hu, W. SUES-200: A multi-height multi-scene cross-view image benchmark across drone and satellite. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 4825–4839. [Google Scholar] [CrossRef]
- Vo, N.N.; Hays, J. Localizing and orienting street views using overhead imagery. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 494–509. [Google Scholar]
- Workman, S.; Souvenir, R.; Jacobs, N. Wide-area image geolocalization with aerial reference imagery. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3961–3969. [Google Scholar]
- Li, H.; Wang, J.; Wei, Z.; Xu, W. Jointly Optimized Global-Local Visual Localization of UAVs. arXiv 2023, arXiv:2310.08082. [Google Scholar]
- Xu, W.; Yao, Y.; Cao, J.; Wei, Z.; Liu, C.; Wang, J.; Peng, M. UAV-VisLoc: A Large-scale Dataset for UAV Visual Localization. arXiv 2024, arXiv:2405.11936. [Google Scholar]
- Ding, L.; Zhou, J.; Meng, L.; Long, Z. A practical cross-view image matching method between UAV and satellite for UAV-based geo-localization. Remote Sens. 2020, 13, 47. [Google Scholar] [CrossRef]
- Tian, X.; Shao, J.; Ouyang, D.; Shen, H.T. UAV-satellite view synthesis for cross-view geo-localization. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 4804–4815. [Google Scholar] [CrossRef]
- Mughal, M.H.; Khokhar, M.J.; Shahzad, M. Assisting UAV localization via deep contextual image matching. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2021, 14, 2445–2457. [Google Scholar] [CrossRef]
- Cui, Z.; Zhou, P.; Wang, X.; Zhang, Z.; Li, Y.; Li, H.; Zhang, Y. A novel geo-localization method for UAV and satellite images using cross-view consistent attention. Remote Sens. 2023, 15, 4667. [Google Scholar] [CrossRef]
- Vaswani, A. Attention is All You Need. Advances in Neural Information Processing Systems, 2017. Available online: https://user.phil.hhu.de/~cwurm/wp-content/uploads/2020/01/7181-attention-is-all-you-need.pdf (accessed on 1 August 2024).
- Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Hu, S.; Feng, M.; Nguyen, R.M.; Lee, G.H. Cvm-net: Cross-view matching network for image-based ground-to-aerial geo-localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7258–7267. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Dosovitskiy, A. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 568–578. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pvt v2: Improved baselines with pyramid vision transformer. Comput. Vis. Media 2022, 8, 415–424. [Google Scholar] [CrossRef]
- Sun, P.; Cao, J.; Jiang, Y.; Zhang, R.; Xie, E.; Yuan, Z.; Wang, C.; Luo, P. Transtrack: Multiple object tracking with transformer. arXiv 2020, arXiv:2012.15460. [Google Scholar]
- Chen, X.; Yan, B.; Zhu, J.; Wang, D.; Yang, X.; Lu, H. Transformer tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8126–8135. [Google Scholar]
- Wang, G.; Chen, J.; Dai, M.; Zheng, E. Wamf-fpi: A weight-adaptive multi-feature fusion network for uav localization. Remote Sens. 2023, 15, 910. [Google Scholar] [CrossRef]
- Saglam, İ.E.; Karasaka, L. Evaluation of multi-camera images in different SfM-MVS based photogrammetric software and comparison of digital products in generating 3D city models. Ain Shams Eng. J. 2024, 15, 102700. [Google Scholar] [CrossRef]
Dataset | Images | Sampling | Target | Source | Platform | Evaluation | Coverage |
---|---|---|---|---|---|---|---|
VDUAV | 12.4k | Dense | UAV | Virtual Reality Scene | Virtual Drone-Satellite | RDS&MA | 5 provinces |
UL14 | 10k | Dense | UAV | Real Scenes | Drone-Satellite | RDS&MA | Fourteen universities |
DenseUAV | 20.3k | Dense | UAV | Real Scenes | Drone-Satellite | SDM | Fourteen universities |
SUES-200 | 6.1k | Dense | UAV | Real Scenes | Drone-Satellite | Recall@K & AP | A university |
University-1652 | 50.2k | Discrete | Building | Google Map | Drone-Ground-Satellite | Recall@K & AP | 1652 architectures of 72 universities |
VIGOR | 144k | Discrete | User | Google Map | Ground-Aerial | MA | Four American states |
CVUSA | 71k | Discrete | User | Google Map | Drone-Satellite | Recall@K | United States |
Number | Scenes | Training (Count) | Testing (Count) |
---|---|---|---|
1 | City | 2156 | 693 |
2 | Plain | 1667 | 574 |
3 | Hill | 1196 | 292 |
4 | Factory | 1354 | 433 |
5 | University | 2892 | 1153 |
Total | Multiple Scenes | 9265 | 3145 |
Split | UAV (Count) | Satellite (Count) |
---|---|---|
Train | 9265 | 9265 |
Test | 3145 | 37,740 |
MODEL | RDS (%) | GFLOPS | Params | MA@5 (%) | MA@10 (%) | MA@20 (%) |
---|---|---|---|---|---|---|
FPI | 67.07 | 12.66 | 42.57 | 31.81 | 52.72 | 71.97 |
WAMF-FPI | 70.48 | 12.04 | 34.69 | 40.27 | 60.31 | 78.49 |
VRLM | 74.13 | 10.28 | 21.79 | 45.13 | 64.72 | 83.35 |
Deit | Pvt | Pcpvt | FocalNet (Our) | RDS (%) |
---|---|---|---|---|
✓ | 62.94 | |||
✓ | 66.29 | |||
✓ | 74.13 | |||
✓ | 74.13 |
FPN | ASFF | SCFF (Our) | RDS (%) |
---|---|---|---|
✓ | 62.94 | ||
✓ | 66.29 | ||
✓ | 74.13 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yao, Y.; Sun, C.; Wang, T.; Yang, J.; Zheng, E. UAV Geo-Localization Dataset and Method Based on Cross-View Matching. Sensors 2024, 24, 6905. https://doi.org/10.3390/s24216905
Yao Y, Sun C, Wang T, Yang J, Zheng E. UAV Geo-Localization Dataset and Method Based on Cross-View Matching. Sensors. 2024; 24(21):6905. https://doi.org/10.3390/s24216905
Chicago/Turabian StyleYao, Yuwen, Cheng Sun, Tao Wang, Jianxing Yang, and Enhui Zheng. 2024. "UAV Geo-Localization Dataset and Method Based on Cross-View Matching" Sensors 24, no. 21: 6905. https://doi.org/10.3390/s24216905
APA StyleYao, Y., Sun, C., Wang, T., Yang, J., & Zheng, E. (2024). UAV Geo-Localization Dataset and Method Based on Cross-View Matching. Sensors, 24(21), 6905. https://doi.org/10.3390/s24216905