Monocular-Vision-Based Moving Target Geolocation Using Unmanned Aerial Vehicle
Abstract
:1. Introduction
- •
- To avoid the limitations of the traditional methods, we propose a novel moving target geolocation framework based on monocular vision. In this method, we designed a learning-based corresponding point matching model to address the challenge of using multiview geometry based on monocular vision to geolocate a moving target.
- •
- We then analyzed the shortcomings of the base model and further propose an enhanced model with two outputs, where a row-ness loss and a column-ness loss are defined to achieve a better performance. Moreover, we propose a coordinate mapping method that greatly reduces the error of corresponding point matching.
- •
- For the evaluation of the proposed framework, on the one hand, we constructed a dataset containing various aerial images with corresponding point annotations that can be used for training and evaluating the proposed learning-based models; on the other hand, the effectiveness of the proposed method was verified via the experiments in simulated and real environments.
2. Related Work
2.1. Moving Target Geolocation
2.2. Corresponding Point Matching
3. Methods
3.1. Base Model
3.1.1. Siamese Subnetwork
3.1.2. Center-Ness Subnetwork
3.2. Enhanced Model
- •
- Blank area in search image. A long baseline threshold (the distance between position and position in Figure 3) is beneficial for improving the accuracy of target geolocation. In practical applications, we chose as long a threshold as possible, which caused the corresponding points to be at the edges of previous frames. The base model takes the search patch X centered on the corresponding point as its input. In this case, a large area in the search patch X is blank, which reduces the accuracy of corresponding point matching.
- •
- Unreliable scoring mechanism. In the inference phase of the base model, the point with the highest score in the response map is selected and mapped back to the search image. It is unreliable to determine the final result from only the highest scoring point due to the imperfect accuracy of the model. It is worth mentioning that, in the base model, the error of 1 pixel in the response map is approximately equal to the error of 24 pixels in the original image.
- •
- Error in coordinate mapping. In the base model, the point with the highest score is selected and mapped back to the search image as a result. However, the sizes of the response map and search image are and , respectively, which means that the points in the response map can only be mapped to a subset of points in the search image and that the corresponding point may not be in the subset.
3.2.1. Siamese Subnetwork
3.2.2. Row-Ness and Column-Ness Subnetwork
3.2.3. Compensation Value of Coordinate Mapping
4. Evaluation
4.1. Learning-Based Model
4.1.1. Training and Test Datasets
4.1.2. Results on the Test Dataset
4.2. Moving Target Geolocation Method
4.2.1. Evaluation in Simulation Environment
4.2.2. Evaluation in Real Environment
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, S.; Jiang, F.; Zhang, B.; Ma, R.; Hao, Q. Development of UAV-based target tracking and recognition systems. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3409–3422. [Google Scholar] [CrossRef]
- Yun, W.J.; Park, S.; Kim, J.; Shin, M.; Jung, S.; Mohaisen, D.A.; Kim, J.H. Cooperative multiagent deep reinforcement learning for reliable surveillance via autonomous multi-UAV control. IEEE Trans. Ind. Inf. 2022, 18, 7086–7096. [Google Scholar] [CrossRef]
- Tsai, H.C.; Hong, Y.W.P.; Sheu, J.P. Completion Time Minimization for UAV-Enabled Surveillance over Multiple Restricted Regions. IEEE Trans. Mob. Comput. 2022. [Google Scholar] [CrossRef]
- Zhou, H.; Ma, Z.; Niu, Y.; Lin, B.; Wu, L. Design and Implementation of the UAV Reconnaissance System. In Advances in Guidance, Navigation and Control; Springer: Berlin/Heidelberg, Germany, 2022; pp. 2131–2142. [Google Scholar]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 2778–2788. [Google Scholar]
- Chen, C.; Zhang, Y.; Lv, Q.; Wei, S.; Wang, X.; Sun, X.; Dong, J. Rrnet: A hybrid detector for object detection in drone-captured images. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019; pp. 100–108. [Google Scholar]
- Zhan, W.; Sun, C.; Wang, M.; She, J.; Zhang, Y.; Zhang, Z.; Sun, Y. An improved Yolov5 real-time detection method for small objects captured by UAV. Soft Comput. 2022, 26, 361–373. [Google Scholar] [CrossRef]
- Hamdi, A.; Salim, F.; Kim, D.Y. Drotrack: High-speed drone-based object tracking under uncertainty. In Proceedings of the 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
- Wen, L.; Zhu, P.; Du, D.; Bian, X.; Ling, H.; Hu, Q.; Zheng, J.; Peng, T.; Wang, X.; Zhang, Y.; et al. Visdrone-mot2019: The vision meets drone multiple object tracking challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019; pp. 189–198. [Google Scholar]
- Wen, L.; Du, D.; Zhu, P.; Hu, Q.; Wang, Q.; Bo, L.; Lyu, S. Detection, tracking, and counting meets drones in crowds: A benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 7812–7821. [Google Scholar]
- Liu, C.; Liu, J.; Song, Y.; Liang, H. A novel system for correction of relative angular displacement between airborne platform and UAV in target localization. Sensors 2017, 17, 510. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, X.; Liu, J.; Zhou, Q. Real-time multi-target localization from unmanned aerial vehicles. Sensors 2016, 17, 33. [Google Scholar] [CrossRef] [PubMed]
- El Habchi, A.; Moumen, Y.; Zerrouk, I.; Khiati, W.; Berrich, J.; Bouchentouf, T. CGA: A new approach to estimate the geolocation of a ground target from drone aerial imagery. In Proceedings of the 2020 4th International Conference on Intelligent Computing in Data Sciences (ICDS), Fez, Morocco, 21–23 October 2020; pp. 1–4. [Google Scholar]
- Xu, C.; Huang, D.; Liu, J. Target location of unmanned aerial vehicles based on the electro-optical stabilization and tracking platform. Measurement 2019, 147, 106848. [Google Scholar] [CrossRef]
- Namazi, E.; Mester, R.; Lu, C.; Li, J. Geolocation estimation of target vehicles using image processing and geometric computation. Neurocomputing 2022, 499, 35–46. [Google Scholar] [CrossRef]
- Gao, F.; Deng, F.; Li, L.; Zhang, L.; Zhu, J.; Yu, C. MGG: Monocular Global Geolocation for Outdoor Long-Range Targets. IEEE Trans. Image Process. 2021, 30, 6349–6363. [Google Scholar] [CrossRef] [PubMed]
- Zhu, J.; Fang, Y. Learning object-specific distance from a monocular image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3839–3848. [Google Scholar]
- Dani, A.P.; Kan, Z.; Fischer, N.R.; Dixon, W.E. Structure and motion estimation of a moving object using a moving camera. In Proceedings of the 2010 American Control Conference, Baltimore, MA, USA, 30 June–2 July 2010; pp. 6962–6967. [Google Scholar]
- Bai, G.; Liu, J.; Song, Y.; Zuo, Y. Two-UAV intersection localization system based on the airborne optoelectronic platform. Sensors 2017, 17, 98. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Qiao, C.; Ding, Y.; Xu, Y.; Xiu, J. Ground target geolocation based on digital elevation model for airborne wide-area reconnaissance system. J. Appl. Remote Sens. 2018, 12, 016004. [Google Scholar] [CrossRef]
- Han, K.M.; DeSouza, G.N. Geolocation of multiple targets from airborne video without terrain data. J. Intell. Robot. Syst. 2011, 62, 159–183. [Google Scholar] [CrossRef]
- Zhang, L.; Deng, F.; Chen, J.; Bi, Y.; Phang, S.K.; Chen, X.; Chen, B.M. Vision-based target three-dimensional geolocation using unmanned aerial vehicles. IEEE Trans. Ind. Electron. 2018, 65, 8052–8061. [Google Scholar] [CrossRef]
- Wang, X.; Qin, W.; Bai, Y.; Cui, N. Cooperative target localization using multiple UAVs with out-of-sequence measurements. Aircr. Eng. Aerosp. Technol. 2017, 89, 112–119. [Google Scholar] [CrossRef]
- Xu, C.; Yin, C.; Huang, D.; Han, W.; Wang, D. 3D target localization based on multi–unmanned aerial vehicle cooperation. Meas. Control. 2021, 54, 895–907. [Google Scholar] [CrossRef]
- Avidan, S.; Shashua, A. Trajectory triangulation: 3D reconstruction of moving points from a monocular image sequence. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 348–357. [Google Scholar] [CrossRef] [Green Version]
- Kim, I.; Yow, K.C. Object location estimation from a single flying camera. UBICOMM 2015, 2015, 95. [Google Scholar]
- Yow, K.C.; Kim, I. General Moving Object Localization from a Single Flying Camera. Appl. Sci. 2020, 10, 6945. [Google Scholar] [CrossRef]
- Pizzoli, M.; Forster, C.; Scaramuzza, D. REMODE: Probabilistic, monocular dense reconstruction in real time. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 2609–2616. [Google Scholar]
- Shah, S.; Dey, D.; Lovett, C.; Kapoor, A. Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In Field and Service Robotics; Springer: Berlin/Heidelberg, Germany, 2018; pp. 621–635. [Google Scholar]
- Guo, D.; Wang, J.; Cui, Y.; Wang, Z.; Chen, S. SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6269–6277. [Google Scholar]
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H. Fully-convolutional siamese networks for object tracking. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 850–865. [Google Scholar]
- Bolme, D.S.; Beveridge, J.R.; Draper, B.A.; Lui, Y.M. Visual object tracking using adaptive correlation filters. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2544–2550. [Google Scholar]
Model | Inputs | Head Structure | Compensation Value | (Pixel) | |
---|---|---|---|---|---|
Base model | 2 | Cen | No | 8.79 | 0 |
CPointNet without Cen + Com | 3 | Row + Col | No | 6.63 | −2.16 |
CPointNet without Cen | 3 | Row + Col | Yes | 3.86 | −2.53 |
CPointNet | 3 | Row + Col + Cen | Yes | 3.41 | −0.48 |
Flight Altitude (m) | 50 | 100 | 150 | 200 | 250 | 300 | ||
FOV (degrees) | 110 | 70 | 60 | 50 | 35 | 25 | ||
One-shot Method | MAE | X (m) | 1.267 | 1.610 | 2.298 | 2.761 | 3.517 | 4.331 |
Y (m) | 1.230 | 1.647 | 2.090 | 2.907 | 3.535 | 3.985 | ||
Z (m) | 3.948 | 3.813 | 3.779 | 4.045 | 4.071 | 3.956 | ||
Position (m) | 4.683 | 5.202 | 5.566 | 6.521 | 7.433 | 8.241 | ||
STD | X (m) | 0.882 | 1.255 | 1.730 | 2.095 | 2.719 | 3.423 | |
Y (m) | 0.936 | 1.275 | 1.579 | 2.262 | 2.659 | 3.124 | ||
Z (m) | 2.843 | 2.887 | 2.947 | 3.084 | 3.022 | 3.004 | ||
Position (m) | 2.552 | 2.522 | 2.767 | 3.256 | 3.412 | 3.786 | ||
Our Method | MAE | X (m) | 0.672 | 0.950 | 1.108 | 1.515 | 2.036 | 2.644 |
Y (m) | 0.669 | 0.878 | 1.105 | 1.514 | 2.064 | 2.630 | ||
Z (m) | 2.829 | 3.012 | 3.305 | 3.851 | 4.575 | 5.337 | ||
Position (m) | 2.915 | 3.212 | 3.946 | 4.691 | 5.766 | 7.179 | ||
STD | X (m) | 0.515 | 0.736 | 0.827 | 1.123 | 1.546 | 1.933 | |
Y (m) | 0.496 | 0.701 | 0.845 | 1.015 | 1.573 | 1.921 | ||
Z (m) | 2.139 | 2.23 | 2.593 | 2.881 | 3.395 | 4.006 | ||
Position (m) | 1.707 | 2.056 | 2.583 | 3.027 | 3.951 | 4.981 |
Number of Images | Flight Altitude | Flight Speed | ||
---|---|---|---|---|
4 | 2.8 m | 0.26 m/s | 2° | 0.2 m |
Coordinate | MAE (m) | STD (m) | MAX (m) |
---|---|---|---|
X | 0.046 | 0.033 | 0.132 |
Y | 0.044 | 0.031 | 0.119 |
Z | 0.165 | 0.138 | 0.463 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pan, T.; Deng, B.; Dong, H.; Gui, J.; Zhao, B. Monocular-Vision-Based Moving Target Geolocation Using Unmanned Aerial Vehicle. Drones 2023, 7, 87. https://doi.org/10.3390/drones7020087
Pan T, Deng B, Dong H, Gui J, Zhao B. Monocular-Vision-Based Moving Target Geolocation Using Unmanned Aerial Vehicle. Drones. 2023; 7(2):87. https://doi.org/10.3390/drones7020087
Chicago/Turabian StylePan, Tingwei, Baosong Deng, Hongbin Dong, Jianjun Gui, and Bingxu Zhao. 2023. "Monocular-Vision-Based Moving Target Geolocation Using Unmanned Aerial Vehicle" Drones 7, no. 2: 87. https://doi.org/10.3390/drones7020087
APA StylePan, T., Deng, B., Dong, H., Gui, J., & Zhao, B. (2023). Monocular-Vision-Based Moving Target Geolocation Using Unmanned Aerial Vehicle. Drones, 7(2), 87. https://doi.org/10.3390/drones7020087