Attention-Based Matching Approach for Heterogeneous Remote Sensing Images
Abstract
:1. Introduction
- (1)
- A feature point-matching method based on the Attention mechanisms is proposed. CSAM first encodes the feature point position and feature descriptor, where the feature points’ horizontal and vertical axis coordinates are encoded separately and then spliced so that the algorithm exploits the context and surrounding point information during the feature aggregation process. To enhance the feature aggregation efficiency, linear attention mechanisms are introduced. In the deeper network layers, self-Attention focuses on the surrounding reliable points, whereas cross-Attention focuses on the corresponding matching points. Finally, the differentiable Sinkhorn algorithm [51] is used to optimize the matching, and a loss function for the supervised training is designed.
- (2)
- A method for building a training dataset of heterogeneous remote sensing images is suggested, and a dataset containing 700,000 pairs of images is generated for training and testing. Each image pair comprises pixel-level correspondence of UAV simulation images and satellite orthographic images. The dataset covers a period and obvious changes in the ground objects covering typical environments such as cities, suburbs, rural areas, and jungles. The original satellite images used to generate the dataset are available at https://drive.google.com/drive/folders/1EJIxYZthkN7GbfQNfdD1l8D5ADsl3WqX?usp=-sharing (accessed on 9 October 2022).
- (3)
- The Attention mechanism is visualized, and the differences between self-Attention and cross-Attention aggregation of the matched point pairs and unmatched points are analyzed by gradually deepening the network. Our analysis confirms the effectiveness of the Attention mechanism and studies the PRC curves of CSAM under different Attention layers to determine the optimal solution.
- (4)
- Comparative experiments are conducted against the NNDR, FSC, GMS, OA-Net, and SuperGlue algorithms on the test data set to verify the effectiveness of CSAM. A comparative experiment with RIFT and CFOG is also conducted on multimodal images to test the generalization ability of CSAM. Finally, a multimodal image registration experiment is conducted to test its efficiency for practical applications. The experiments highlight that CSAM has high robustness and efficiency. Therefore, it can provide a reasonable solution to heterogeneous remote sensing image matching.
2. Methodology
2.1. The CSAM Architecture
2.2. Training Data Set
2.2.1. UAV Imagery
2.2.2. Satellite Orthophoto
3. Experiments
3.1. Training and Visualizing the Attention
3.2. Experiments with Attention Network Layers
3.3. Comparison Test for Different Algorithms
3.4. Multimodal Image Matching
3.5. Multimodal Image Registration
3.6. Limitation Analysis
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhang, Y. Understanding image fusion. Photogramm. Eng. Remote Sens. 2004, 70, 657–661. [Google Scholar]
- Fang, Y.; Li, P.; Zhang, J.; Ren, P. Cohesion Intensive Hash Code Book Co-construction for Efficiently Localizing Sketch Depicted Scenes. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5629016. [Google Scholar]
- Radke, R.J.; Andra, S.; Al-Kofahi, O.; Roysam, B. Image change detection algorithms: A systematic survey. IEEE Trans. Image Process. 2005, 14, 294–307. [Google Scholar] [CrossRef] [PubMed]
- Tuia, D.; Marcos, D.; Camps-Valls, G. Multi-temporal and multi-source remote sensing image classification by nonlinear relative normalization. ISPRS J. Photogramm. Remote Sens. 2016, 120, 1–12. [Google Scholar] [CrossRef] [Green Version]
- Brunner, D.; Lemoine, G.; Bruzzone, L. Earthquake damage assessment of buildings using VHR optical and SAR imagery. IEEE Trans. Geosci. Remote Sens. 2010, 48, 2403–2420. [Google Scholar] [CrossRef] [Green Version]
- Deren, L. Towards geo-spatial information science in big data era. Acta Geod. Cartogr. Sin. 2016, 45, 379. [Google Scholar]
- Jiang, X.; Ma, J.; Xiao, G.; Shao, Z.; Guo, X. A review of multimodal image matching: Methods and applications. Inf. Fusion 2021, 73, 22–71. [Google Scholar] [CrossRef]
- Ma, J.; Jiang, X.; Fan, A.; Jiang, J.; Yan, J. Image matching from handcrafted to deep features: A survey. Int. J. Comput. Vis. 2020, 129, 23–79. [Google Scholar] [CrossRef]
- Mikolajczyk, K.; Schmid, C. A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1615–1630. [Google Scholar] [CrossRef] [Green Version]
- Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
- Chum, O.; Matas, J.; Kittler, J. Locally optimized RANSAC. In Proceedings of the Joint Pattern Recognition Symposium, Madison, WI, USA, 18–20 June 2003; pp. 236–243. [Google Scholar]
- Wu, Y.; Ma, W.; Gong, M.; Su, L.; Jiao, L. A novel point-matching algorithm based on fast sample consensus for image registration. IEEE Geosci. Remote Sens. Lett. 2014, 12, 43–47. [Google Scholar] [CrossRef]
- Torr, P.H.; Zisserman, A. MLESAC: A new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 2000, 78, 138–156. [Google Scholar] [CrossRef] [Green Version]
- Ni, K.; Jin, H.; Dellaert, F. GroupSAC: Efficient consensus in the presence of groupings. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 2193–2200. [Google Scholar]
- Pilet, J.; Lepetit, V.; Fua, P. Fast non-rigid surface detection, registration and realistic augmentation. Int. J. Comput. Vis. 2008, 76, 109–122. [Google Scholar] [CrossRef]
- Ma, J.; Zhao, J.; Tian, J.; Bai, X.; Tu, Z. Regularized vector field learning with sparse approximation for mismatch removal. Pattern Recognit. 2013, 46, 3519–3532. [Google Scholar] [CrossRef]
- Ma, J.; Zhao, J.; Tian, J.; Yuille, A.L.; Tu, Z. Robust point matching via vector field consensus. IEEE Trans. Image Process. 2014, 23, 1706–1721. [Google Scholar] [CrossRef] [Green Version]
- Ma, J.; Zhou, H.; Zhao, J.; Gao, Y.; Jiang, J.; Tian, J. Robust feature matching for remote sensing image registration via locally linear transforming. IEEE Trans. Geosci. Remote Sens. 2015, 53, 6469–6481. [Google Scholar] [CrossRef]
- Ma, J.; Wu, J.; Zhao, J.; Jiang, J.; Zhou, H.; Sheng, Q.Z. Nonrigid point set registration with robust transformation learning under manifold regularization. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 3584–3597. [Google Scholar] [CrossRef]
- Bian, J.; Lin, W.-Y.; Matsushita, Y.; Yeung, S.-K.; Nguyen, T.-D.; Cheng, M.-M. Gms: Grid-based motion statistics for fast, ultra-robust feature correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4181–4190. [Google Scholar]
- Ma, J.; Jiang, J.; Zhou, H.; Zhao, J.; Guo, X. Guided locality preserving feature matching for remote sensing image registration. IEEE Trans. Geosci. Remote Sens. 2018, 56, 4435–4447. [Google Scholar] [CrossRef]
- Ma, J.; Zhao, J.; Jiang, J.; Zhou, H.; Guo, X. Locality preserving matching. Int. J. Comput. Vis. 2019, 127, 512–531. [Google Scholar] [CrossRef]
- Jiang, X.; Ma, J.; Jiang, J.; Guo, X. Robust feature matching using spatial clustering with heavy outliers. IEEE Trans. Image Process. 2019, 29, 736–746. [Google Scholar] [CrossRef]
- Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.-S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef] [Green Version]
- Kussul, N.; Lavreniuk, M.; Skakun, S.; Shelestov, A. Deep learning classification of land cover and crop types using remote sensing data. IEEE Geosci. Remote Sens. Lett. 2017, 14, 778–782. [Google Scholar] [CrossRef]
- Liu, R.; Yang, C.; Sun, W.; Wang, X.; Li, H. Stereogan: Bridging synthetic-to-real domain gap by joint optimization of domain translation and stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12757–12766. [Google Scholar]
- Sarlin, P.-E.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4938–4947. [Google Scholar]
- Brachmann, E.; Krull, A.; Nowozin, S.; Shotton, J.; Michel, F.; Gumhold, S.; Rother, C. Dsac-differentiable ransac for camera localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6684–6692. [Google Scholar]
- Ma, J.; Jiang, X.; Jiang, J.; Zhao, J.; Guo, X. LMR: Learning a two-class classifier for mismatch removal. IEEE Trans. Image Process. 2019, 28, 4045–4059. [Google Scholar] [CrossRef] [PubMed]
- Kluger, F.; Brachmann, E.; Ackermann, H.; Rother, C.; Yang, M.Y.; Rosenhahn, B. Consac: Robust multi-model fitting by conditional sample consensus. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4634–4643. [Google Scholar]
- Yi, K.M.; Trulls, E.; Ono, Y.; Lepetit, V.; Salzmann, M.; Fua, P. Learning to find good correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2666–2674. [Google Scholar]
- Zhang, J.; Sun, D.; Luo, Z.; Yao, A.; Zhou, L.; Shen, T.; Chen, Y.; Quan, L.; Liao, H. Learning two-view correspondences and geometry using order-aware network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 5845–5854. [Google Scholar]
- Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2008, 20, 61–80. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhao, X.; Zhang, J.; Yang, C.; Song, H.; Shi, Y.; Zhou, X.; Zhang, D.; Zhang, G. Registration for optical multimodal remote sensing images based on FAST detection, window selection, and histogram specification. Remote Sens. 2018, 10, 663. [Google Scholar] [CrossRef] [Green Version]
- Uss, M.; Vozel, B.; Lukin, V.; Chehdi, K. Efficient discrimination and localization of multimodal remote sensing images using CNN-based prediction of localization uncertainty. Remote Sens. 2020, 12, 703. [Google Scholar] [CrossRef] [Green Version]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; pp. 886–893. [Google Scholar]
- Ye, Y.; Bruzzone, L.; Shan, J.; Bovolo, F.; Zhu, Q. Fast and robust matching for multimodal remote sensing image registration. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9059–9070. [Google Scholar] [CrossRef] [Green Version]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Li, J.; Hu, Q.; Ai, M. RIFT: Multi-modal image matching based on radiation-variation insensitive feature transform. IEEE Trans. Image Process. 2019, 29, 3296–3310. [Google Scholar] [CrossRef]
- Xiong, X.; Xu, Q.; Jin, G.; Zhang, H.; Gao, X. Rank-based local self-similarity descriptor for optical-to-SAR image matching. IEEE Geosci. Remote Sens. Lett. 2019, 17, 1742–1746. [Google Scholar] [CrossRef]
- Shechtman, E.; Irani, M. Matching local self-similarities across images and videos. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–8. [Google Scholar]
- Chaozhen, L.; Wanjie, L.; Junming, Y.; Qing, X. Deep learning algorithm for feature matching of cross modality remote sensing images. Acta Geod. Cartogr. Sin. 2021, 50, 189. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Wang, H.; Zhu, Y.; Green, B.; Adam, H.; Yuille, A.; Chen, L.-C. Axial-deeplab: Stand-alone axial-attention for panoptic segmentation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 108–126. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.J. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Esser, P.; Rombach, R.; Ommer, B. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12873–12883. [Google Scholar]
- Lee, J.; Lee, Y.; Kim, J.; Kosiorek, A.; Choi, S.; Teh, Y.W. Set transformer: A framework for attention-based permutation-invariant neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 3744–3753. [Google Scholar]
- Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. Adv. Neural Inf. Process. Syst. 2013, 26, 2292–2300. [Google Scholar]
- Gehring, J.; Auli, M.; Grangier, D.; Yarats, D.; Dauphin, Y.N. Convolutional sequence to sequence learning. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1243–1252. [Google Scholar]
- Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural message passing for quantum chemistry. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1263–1272. [Google Scholar]
- Shen, Z.; Zhang, M.; Zhao, H.; Yi, S.; Li, H. Efficient attention: Attention with linear complexities. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 4–8 January 2021; pp. 3531–3539. [Google Scholar]
- Sinkhorn, R.; Knopp, P. Concerning nonnegative matrices and doubly stochastic matrices. Pac. J. Math. 1967, 21, 343–348. [Google Scholar] [CrossRef] [Green Version]
- Munkres, J. Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 1957, 5, 32–38. [Google Scholar] [CrossRef]
- Linder, W. Digital Photogrammetry; Springer: Berlin/Heidelberg, Germany, 2009; Volume 1. [Google Scholar]
- Sheng, Y. Theoretical analysis of the iterative photogrammetric method to determining ground coordinates from photo coordinates and a DEM. Photogramm. Eng. Remote Sens. 2005, 71, 863–871. [Google Scholar] [CrossRef]
- Hou, H.; Xu, Q.; Lan, C.; Lu, W.; Zhang, Y.; Cui, Z.; Qin, J. UAV Pose Estimation in GNSS-Denied Environment Assisted by Satellite Imagery Deep Learning Features. IEEE Access 2020, 9, 6358–6367. [Google Scholar] [CrossRef]
- Brown, M.; Hua, G.; Winder, S. Discriminative learning of local image descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 43–57. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; pp. 224–236. [Google Scholar]
- Gesto-Diaz, M.; Tombari, F.; Gonzalez-Aguilera, D.; Lopez-Fernandez, L.; Rodriguez-Gonzalvez, P. Feature matching evaluation for multimodal correspondence. ISPRS J. Photogramm. Remote Sens. 2017, 129, 179–188. [Google Scholar] [CrossRef]
Local Features | Matcher | NCM | Precision (%) | PT (s) |
---|---|---|---|---|
SIFT | NNDR + FSC | 5 | 21.29 | 0.41 |
RIFT | 43 | 81.27 | 12.23 | |
CFOG | CFOG | 26 | 83.65 | 3.19 |
Superpoint | NNDR + FSC | 8 | 29.91 | 0.24 |
NNDR + GMS | 9 | 30.13 | 0.25 | |
OA-Net | 6 | 10.12 | 2.60 | |
Superglue | 35 | 34.52 | 0.14 | |
CSAM | 183 | 81.29 | 0.13 |
Category | Reference Image | Sensed Image | Image Characteristic | ||||
---|---|---|---|---|---|---|---|
Sensor | GSD | Size | Sensor | GSD | Size | ||
Optical-SAR | ZY-3 PAN optical | 2.5 | 1000 × 1000 | CF-3 SL SAR | 3 | 1000 × 1000 | Different bands |
Day–night | Optical | - | 1000 × 1000 | SNPP/VIIS | - | 1000 × 1000 | Day–night |
Optical-infrared | Optical | - | 800 × 800 | Infrared | - | 800 × 800 | Different bands with a certain rotation |
Optical-infrared | UAV Optical | - | 512 × 512 | UAV Infrared | - | 512 × 512 | Different bands with more details |
Optical-map | Google Earth | 15 | 500 × 500 | Open Street Map | 15 | 500 × 500 | Different model |
Optical-LiDAR | WorldView-2 optical | 2.0 | 500 × 500 | LiDAR depth | 2.0 | 500 × 500 | Different model |
Category | CFOG | RIFT + NNDR + FSC | Superpoint + CSAM |
---|---|---|---|
Optical–SAR | 16 | 175 | 251 |
Day–night | 27 | 158 | 72 |
Optical–infrared | 0 | 47 | 358 |
UAV Optical–infrared | 135 | 188 | 233 |
Optical–map | 158 | 349 | 325 |
Optical–LiDAR | 148 | 337 | 269 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hou, H.; Lan, C.; Xu, Q.; Lv, L.; Xiong, X.; Yao, F.; Wang, L. Attention-Based Matching Approach for Heterogeneous Remote Sensing Images. Remote Sens. 2023, 15, 163. https://doi.org/10.3390/rs15010163
Hou H, Lan C, Xu Q, Lv L, Xiong X, Yao F, Wang L. Attention-Based Matching Approach for Heterogeneous Remote Sensing Images. Remote Sensing. 2023; 15(1):163. https://doi.org/10.3390/rs15010163
Chicago/Turabian StyleHou, Huitai, Chaozhen Lan, Qing Xu, Liang Lv, Xin Xiong, Fushan Yao, and Longhao Wang. 2023. "Attention-Based Matching Approach for Heterogeneous Remote Sensing Images" Remote Sensing 15, no. 1: 163. https://doi.org/10.3390/rs15010163
APA StyleHou, H., Lan, C., Xu, Q., Lv, L., Xiong, X., Yao, F., & Wang, L. (2023). Attention-Based Matching Approach for Heterogeneous Remote Sensing Images. Remote Sensing, 15(1), 163. https://doi.org/10.3390/rs15010163