A Cross-View Image Matching Method with Feature Enhancement
Abstract
:1. Introduction
- (1)
- A combination of cross-attention and cross-convolution is designed for images with complex backgrounds and redundant information in remote sensing images for cross-view geo-localization, which can effectively filter useless features such as cars and clouds in images by image edge information.
- (2)
- A deep network and feature fusion module are introduced, which can effectively enhance the model’s ability to utilize multi-scale information and improve the scale robustness of the extracted features. It is demonstrated that depth fusion features can play an important role in cross-viewpoint image matching.
- (3)
- A model that can effectively improve the accuracy of cross-view matching is proposed.
2. Related Work
3. Methods
3.1. Polar Coordinate Transformation and Cross-Attention
3.2. Image Cross-Feature Enhancement and Multi-Scale Information Fusion
3.3. Cross-View Image Matching Strategy
4. Experimental Data and Evaluation Metrics
4.1. Datasets and Experimental Details
4.2. Evaluation Metrics
5. Result
5.1. Comparison with State-of-the-Art Methods
5.2. Ablation Experiments
6. Discussion
6.1. Comparison with Other Methods in Different Datasets
6.2. Comparison with State-of-the-Art Methods
6.3. Discussion Related to Ablation Experiments
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Choi, J.; Friedland, G. Multimodal Location Estimation of Videos and Images; Springer: Cham, Switzerland, 2014. [Google Scholar] [CrossRef]
- Hays, J.; Efros, A.A. IM2GPS: Estimating geographic information from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.-Y.; Cui, Y.; Belongie, S.; Hays, J. Learning deep representations for ground-to-aerial geolocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5007–5015. [Google Scholar] [CrossRef]
- Vo, N.N.; Hays, J. Localizing and Orienting Street Views Using Overhead Imagery. In European Conference on Computer Vision (ECCV); Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 494–509. [Google Scholar]
- Hu, S.; Feng, M.; Nguyen, R.M.H.; Lee, G.H. CVM-net: Cross-view matching network for image-based ground-to-aerial geo-localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7258–7267. [Google Scholar] [CrossRef]
- Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN architecture for weakly supervised place recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1437–1451. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shi, Y.; Yu, X.; Liu, L.; Zhang, T.; Li, H. Optimal feature transport for cross-view image geo-localization. Proc. Conf. AAAI Artif. Intell. 2020, 34, 11990–11997. [Google Scholar] [CrossRef]
- Wang, T.; Zheng, Z.; Yan, C.; Zhang, J.; Sun, Y.; Zheng, B.; Yang, Y. Each part matters: Local patterns facilitate cross-view geo-localization. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 867–879. [Google Scholar] [CrossRef]
- Liu, L.; Li, H. Lending Orientation to Neural Networks for Cross-View Geo-Localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 20 June 2019; pp. 5617–5626. [Google Scholar] [CrossRef] [Green Version]
- Zhu, S.J.; Yang, T.; Chen, C. VIGOR: Cross-view image geo-localization beyond one-to-one retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5316–5325. [Google Scholar] [CrossRef]
- Zhu, S.J.; Yang, T.; Chen, C. Revisiting street-to-aerial view image geo-localization and orientation estimation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 756–765. [Google Scholar] [CrossRef]
- Shi, Y.; Yu, X.; Campbell, D.; Li, H. Where am I looking at? Joint location and orientation estimation by cross-view matching. In Proceedings of the Institute of Electrical and Electronics Engineers (IEEE)/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Shi, Y.; Liu, L.; Yu, X.; Li, H. Spatial-aware feature aggregation for image based cross-view geo-localization. Adv. Neural Inf. Process. Syst. 2019, 32, 10090–10100. [Google Scholar]
- Toker, A.; Zhou, Q.; Maximov, M.; Leal-Taixe, L. Coming down to earth: Satellite-to-street view synthesis for geo-localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6484–6493. [Google Scholar] [CrossRef]
- Mousavian, A.; Kosecka, J. Semantic image based geolocation given a map. arXiv 2016, arXiv:1609.00278. [Google Scholar]
- Regmi, K.; Borji, A. Cross-view image synthesis using conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3501–3510. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Sarlin, P.E.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4938–4947. [Google Scholar]
- Liu, Y.; Jia, Q.; Fan, X.; Wang, S.; Ma, S.; Gao, W. Cross-SRN: Structure-Preserving Super-Resolution Network with Cross Convolution. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 4927–4939. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Workman, S.; Souvenir, R.; Jacobs, N. Wide-area image geolocalization with aerial reference imagery. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3961–3969. [Google Scholar] [CrossRef] [Green Version]
- Zhai, M.; Bessinger, Z.; Workman, S.; Jacobs, N. Predicting ground-level scene layout from aerial imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 867–875. [Google Scholar]
- Sun, B.; Chen, C.; Zhu, Y.; Jiang, J. Geocapsnet: Aerial to ground view image geo-localization using capsule network. arXiv 2019, arXiv:1904.06281. [Google Scholar]
- Regmi, K.; Shah, M. Bridging the domain gap for ground-to-aerial image matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 470–479. [Google Scholar]
- Cai, S.; Guo, Y.; Khan, S.; Hu, J.; Wen, G. Ground-to-aerial image geo-localization with a hard exemplar reweighting triplet loss. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Lin, J.; Zheng, Z.; Zhong, Z.; Luo, Z.; Li, S.; Yang, Y.; Sebe, N. Joint Representation Learning and Keypoint Detection for Cross-view Geo-localization. IEEE Trans. Image Process. 2022, 31, 3780–3792. [Google Scholar] [CrossRef] [PubMed]
- Zhu, S.; Shah, M.; Chen, C. Transgeo: Transformer is all you need for cross-view image geo-localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1162–1171. [Google Scholar]
Datasets | CVUSA | Vo and Hays | VIGOR | CVUSA- |
---|---|---|---|---|
Spatial resolution/m | ~0.6 | Sub-meter | 0.114 | ~0.3 |
Number of image pairs | 220,000 | 341,809 | 40,000 | 44,416 |
aerial image size | 800 × 800 | 354 × 354 | 640 × 640 | 750 × 750 |
ground image size | 3072 × 1536 | 230 × 230 | 2048 × 1024 832 × 1664 | 1232 × 224 |
area | Across the USA | Boston; Houston | Manhattan; Chicago; | Across the USA |
Panoramic Images | √ | × | √ | √ |
Seamless coverage | × | × | √ | × |
One-to-one | √ | √ | × | √ |
Method | r@1 | r@5 | r@10 | r@1% |
---|---|---|---|---|
Workman et al. [22] | — | — | — | 34.3 |
Zhai et al. [23] | — | — | — | 43.20 |
Vo and Hays [4] | — | — | — | 63.70 |
CVM-NET [5] | 22.47 | 49.98 | 63.18 | 93.62 |
Liu&Li [9] | 40.79 | 66.82 | 76.36 | 96.12 |
Regmi and Shah [25] | 48.75 | — | 81.27 | 95.98 |
Siam-FCANet34 [26] | — | — | — | 98.30 |
CVFT [7] | 61.43 | 84.69 | 90.49 | 99.02 |
DSM [12] | 91.96 | 97.50 | 98.54 | 99.67 |
SAFA [13] | 89.84 | 96.93 | 98.14 | 99.64 |
RK-Net [27] | 91.22 | — | — | 99.67 |
TransGeo [28] | 94.08 | 98.36 | — | 99.77 |
Ours | 92.23 | 98.41 | 99.12 | 99.73 |
Method | r@1 | r@5 | r@10 | r@1% |
---|---|---|---|---|
VGG | 33.17 | 60.68 | 74.23 | 91.83 |
Res | 34.21 | 65.44 | 78.33 | 92.21 |
Polar + VGG | 84.95 | 93.40 | 97.63 | 98.92 |
Polar + Res | 88.94 | 95.62 | 98.67 | 99.09 |
Polar + Res + Cross conv | 85.88 | 95.09 | 97.30 | 98.80 |
Polar + VGG + Cross conv | 82.77 | 90.64 | 92.14 | 97.85 |
Polar + Res + FFM | 90.03 | 96.84 | 98.96 | 99.23 |
Polar + VGG + FFM | 85.97 | 96.11 | 97.24 | 98.88 |
Polar + VGG + Cross conv + FFM | 88.96 | 95.59 | 96.88 | 99.28 |
Polar + Res + Cross conv + FFM (ours) | 92.89 | 98.47 | 99.24 | 99.74 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rao, Z.; Lu, J.; Li, C.; Guo, H. A Cross-View Image Matching Method with Feature Enhancement. Remote Sens. 2023, 15, 2083. https://doi.org/10.3390/rs15082083
Rao Z, Lu J, Li C, Guo H. A Cross-View Image Matching Method with Feature Enhancement. Remote Sensing. 2023; 15(8):2083. https://doi.org/10.3390/rs15082083
Chicago/Turabian StyleRao, Ziyu, Jun Lu, Chuan Li, and Haitao Guo. 2023. "A Cross-View Image Matching Method with Feature Enhancement" Remote Sensing 15, no. 8: 2083. https://doi.org/10.3390/rs15082083
APA StyleRao, Z., Lu, J., Li, C., & Guo, H. (2023). A Cross-View Image Matching Method with Feature Enhancement. Remote Sensing, 15(8), 2083. https://doi.org/10.3390/rs15082083