Multi-Class Double-Transformation Network for SAR Image Registration
Abstract
1. Introduction
- We utilize each key point directly as a class to design the multi-class model of SAR image registration, which avoids the difficulty of constructing the positive instances (matched-point pairs) in the traditional (two-classification) registration model.
- We design the double-transformation network with the coarse-to-precise structure, where key points from two images are, respectively, used to train two sub-networks that alternately predict key points from another image. It addresses the problem that the categories are inconsistent in training and testing sets.
- A precise-matching module is designed to modify the predictions of two sub-networks and obtain the consistent matched-points, where the nearest points of each key point are introduced to refine the predicted matched-points.
2. Related Works
2.1. The Attention Mechanism
2.2. The Transformer Model
3. The Proposed Method
3.1. The Multi-Class Double-Transformation Networks
3.1.1. Constructing Samples-Based Key Points
3.1.2. Multi-Class Double-Transformation Networks
3.2. The Precise-Matching Module
4. Experiments and Analyses
- 1.
- expresses the root mean square error of the registration result. Note that means that the performance reaches sub-pixel accuracy.
- 2.
- is the number of matched-points pairs. Its value is higher, which may be beneficial for obtaining a transformation matrix with a better performance of image registration.
- 3.
- expresses the error obtained based on the Leave-One-Out strategy and the root mean square error. For each point in , is the average of all errors ( of points).
- 4.
- is used to detect whether the retained feature points are evenly distributed in the quadrant, and its value should be less than .
- 5.
- expresses the bad point proportion in obtained matched-points pairs, where a point with a residual value above a certain threshold (r) is called the bad point.
- 6.
- denotes the absolute value of the calculated correlation coefficient. Note that the Spearman correlation coefficient is used when ; otherwise, the Pearson correlation coefficient is applied.
- 7.
- is a statistical evaluation of the entire image feature point distribution [43], which should be less than .
- 8.
- is the linear combination of the above seven indicators, calculated byWhen , is not used, and the above formula is simplified asand its value should be less than 0.605.
4.1. Comparison and Analysis of the Experimental Results
4.2. The Visual Results of SAR Image Registration
4.3. Analyses on the Precise-Matching Module
4.4. Analyses on the Double-Transformation Network
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Fitch, J.P. Synthetic Aperture Radar; Springer Science & Business Media: New York, NY, USA, 2012. [Google Scholar]
- Joyce, K.E.; Samsonov, S.V.; Levick, S.R.; Engelbrecht, J.; Belliss, S. Mapping and monitoring geological hazards using optical, LiDAR, and synthetic aperture RADAR image data. Nat. Hazards 2014, 73, 137–163. [Google Scholar] [CrossRef]
- Quartulli, M.; Olaizola, I.G. A review of EO image information mining. ISPRS J. Photogramm. Remote Sens. 2013, 75, 11–28. [Google Scholar] [CrossRef]
- Wang, Y.; Du, L.; Dai, H. Unsupervised sar image change detection based on sift keypoints and region information. IEEE Geosci. Remote Sens. Lett. 2016, 13, 931–935. [Google Scholar] [CrossRef]
- Poulain, V.; Inglada, J.; Spigai, M. High-resolution optical and SAR image fusion for building database updating. IEEE Trans. Geosci. Remote Sens. 2011, 49, 2900–2910. [Google Scholar] [CrossRef]
- Byun, Y.; Choi, J.; Han, Y. An Area-Based Image Fusion Scheme for the Integration of SAR and Optical Satellite Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 2212–2220. [Google Scholar] [CrossRef]
- Moser, G.; Serpico, S.B. Unsupervised Change Detection from Multichannel SAR Data by Markovian Data Fusion. IEEE Trans. Geosci. Remote Sens. 2009, 47, 2114–2128. [Google Scholar] [CrossRef]
- Song, T.; Yi, S. Fast and Accurate Target Detection Based on Multiscale Saliency and Active Contour Model for High-Resolution SAR Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 5729–5744. [Google Scholar]
- Giusti, E.; Ghio, S.; Oveis, A.H.; Martorella, M. Proportional Similarity-Based Openmax Classifier for Open Set Recognition in SAR Images. Remote Sens. 2022, 14, 4665. [Google Scholar] [CrossRef]
- Schwind, P.; Suri, S.; Reinartz, P.; Siebert, A. Applicability of the sift operator to geometric sar image registration. Int. J. Remote Sens. 2010, 31, 1959–1980. [Google Scholar] [CrossRef]
- Wang, S.H.; You, H.J.; Fu, K. Bfsift: A novel method to find feature matches for sar image registration. IEEE Geosci. Remote Sens. Lett. 2012, 9, 649–653. [Google Scholar] [CrossRef]
- Wang, S.; Quan, D.; Liang, X.; Ning, M.; Guo, Y.; Jiao, L. A deep learning framework for remote sensing image registration. ISPRS J. Photogramm. Remote Sens. 2018, 145, 148–164. [Google Scholar] [CrossRef]
- Dellinger, F.; Delon, J.; Gousseau, Y.; Michel, J.; Tupin, F. SAR-SIFT: A SIFT-Like Algorithm for SAR Images. IEEE Trans. Geosci. Remote Sens. 2015, 53, 453–466. [Google Scholar] [CrossRef]
- Wu, Y.; Miao, Q.; Ma, W.; Gong, M.; Wang, S. PSOSAC: Particle Swarm Optimization Sample Consensus Algorithm for Remote Sensing Image Registration. IEEE Geosci. Remote Sens. Lett. 2018, 15, 242–246. [Google Scholar] [CrossRef]
- Liu, Y.; Zhou, Y.; Zhou, Y.; Ma, L.; Wang, B.; Zhang, F. Accelerating SAR Image Registration Using Swarm-Intelligent GPU Parallelization. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5694–5703. [Google Scholar] [CrossRef]
- Mao, S.; Yang, J.; Gou, S.; Jiao, L.; Xiong, T.; Xiong, L. Multi-Scale Fused SAR Image Registration Based on Deep Forest. Remote Sens. 2021, 13, 2227. [Google Scholar] [CrossRef]
- Zhang, S.; Sui, L.; Zhou, R.; Xun, Z.; Du, C.; Guo, X. Mountainous SAR Image Registration Using Image Simulation and an L2E Robust Estimator. Sustainability 2022, 14, 9315. [Google Scholar] [CrossRef]
- Gong, M.; Zhao, S.; Jiao, L.; Tian, D.; Wang, S. A novel coarse-to-fine scheme for automatic image registration based on SIFT and mutual information. IEEE Trans. Geosci. Remote Sens. 2013, 52, 4328–4338. [Google Scholar] [CrossRef]
- Sollers, J.J.; Buchanan, T.W.; Mowrer, S.M.; Hill, L.K.; Thayer, J.F. Comparison of the ratio of the standard deviation of the RR interval and the root mean squared successive differences (SD/rMSSD) to the low frequency-to-high frequency (LF/HF) ratio in a patient population and normal healthy controls. Biomed. Sci. Instrum. 2007, 43, 158–163. [Google Scholar] [PubMed]
- Ma, W.; Zhang, J.; Wu, Y.; Jiao, L.; Zhu, H.; Zhao, W. A Novel Two-Step Registration Method for Remote Sensing Images Based on Deep and Local Features. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4834–4843. [Google Scholar] [CrossRef]
- Quan, D.; Wang, S.; Ning, M.; Xiong, T.; Jiao, L. Using deep neural networks for synthetic aperture radar image registration. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 2799–2802. [Google Scholar]
- Ye, F.; Su, Y.; Xiao, H.; Zhao, X.; Min, W. Remote sensing image registration using convolutional neural network features. IEEE Geosci. Remote Sens. Lett. 2018, 15, 232–236. [Google Scholar] [CrossRef]
- Mu, J.; Gou, S.; Mao, S.; Zheng, S. A Stepwise Matching Method for Multi-modal Image based on Cascaded Network. In Proceedings of the 29th ACM International Conference on Multimedia, Nice, France, 21–25 October 2021; pp. 1284–1292. [Google Scholar]
- Zou, B.; Li, H.; Zhang, L. Self-Supervised SAR Image Registration With SAR-Superpoint and Transformation Aggregation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5201115. [Google Scholar] [CrossRef]
- Mao, S.; Yang, J.; Gou, S.; Lu, K.; Jiao, L.; Xiong, T.; Xiong, L. Adaptive Self-Supervised SAR Image Registration with Modifications of Alignment Transformation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5203715. [Google Scholar] [CrossRef]
- Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
- Chaudhari, S.; Mithal, V.; Polatkan, G.; Ramanath, R. An attentive survey of attention models. ACM Trans. Intell. Syst. Technol. 2021, 12, 53. [Google Scholar] [CrossRef]
- Kim, Y.; Lee, J.; Lee, E.B.; Lee, J.H. Application of Natural Language Processing (NLP) and Text-Mining of Big-Data to Engineering-Procurement-Construction (EPC) Bid and Contract Documents. In Proceedings of the 6th Conference on Data Science and Machine Learning Applications (CDMA), Riyadh, Saudi Arabia, 4–5 March 2020; pp. 123–128. [Google Scholar]
- Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhudinov, R.; Zemel, R.; Bengio, Y. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 2048–2057. [Google Scholar]
- Guo, H.; Zheng, K.; Fan, X.; Yu, H.; Wang, S. Visual attention consistency under image transforms for multi-label image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 729–739. [Google Scholar]
- Haut, J.M.; Paoletti, M.E.; Plaza, J.; Plaza, A.; Li, J. Visual attention-driven hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8065–8080. [Google Scholar] [CrossRef]
- Li, W.; Liu, K.; Zhang, L.; Cheng, F. Object detection based on an adaptive attention mechanism. Sci. Rep. 2020, 10, 11307. [Google Scholar] [CrossRef]
- Zhu, Y.; Zhao, C.; Guo, H.; Wang, J.; Zhao, X.; Lu, H. Attention CoupleNet: Fully convolutional attention coupling network for object detection. IEEE Trans. Image Process. 2018, 28, 113–126. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A Survey on Vision Transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 87–110. [Google Scholar] [CrossRef]
- Dai, Z.; Yang, Z.; Yang, Y.; Carbonell, J.; Le, Q.V.; Salakhutdinov, R. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv 2019, arXiv:1901.02860. [Google Scholar]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 x 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Mansard, E.P.; Funke, E.R. The measurement of incident and reflected spectra using a least squares method. Coast. Eng. Proc. 1980, 17, 8. [Google Scholar] [CrossRef]
- Hugo, T.; Matthieu, C.; Matthijs, D.; Francisco, M.; Alexandre, S.; Hervé, J. Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning, PMLR, Virtual Event, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
- Goncalves, H.; Goncalves, J.A.; Corte-Real, L. Measures for an Objective Evaluation of the Geometric Correction Process Quality. IEEE Geosci. Remote Sens. Lett. 2009, 6, 292–296. [Google Scholar] [CrossRef]
- Lowe, D.G. Object recognition from local scale-invariant features. IEEE Int. Conf. Comput. Vis. 1999, 2, 1150–1157. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]












| Methods | ||||||||
|---|---|---|---|---|---|---|---|---|
| SIFT | 17 | 1.2076 | 1.2139 | — | 0.6471 | 0.1367 | 0.9991 | 0.7048 |
| SAR-SIFT | 66 | 1.2455 | 1.2491 | 0.6300 | 0.6212 | 0.1251 | 0.9961 | 0.6784 |
| VGG16-LS | 58 | 0.5611 | 0.5694 | 0.6665 | 0.2556 | 0.0389 | 1.0000 | 0.4420 |
| ResNet50-LS | 68 | 0.4818 | 0.4966 | 0.7162 | 0.2818 | 0.1943 | 0.9766 | 0.4489 |
| ViT-LS | 64 | 0.5218 | 0.5304 | 0.6101 | 0.2330 | 0.1072 | 1.0000 | 0.4296 |
| DNN + RANSAC | 8 | 0.6471 | 0.6766 | – | 0.1818 | 0.0943 | 0.9766 | 0.4484 |
| MSDF-Net | 39 | 0.4345 | 0.4893 | 0.6101 | 0.3124 | 0.1072 | 1.0000 | 0.4304 |
| AdaSSIR | 47 | 0.4217 | 0.4459 | 0.6254 | 0.3377 | 0.1165 | 1.0000 | 0.4287 |
| STDT-Net (Ours) | 78 | 0.4490 | 0.4520 | 0.6254 | 0.2277 | 0.1165 | 1.0000 | 0.4122 |
| Rank/All | 1/10 | 3/10 | 2/10 | 2/7 | 2/10 | 4/10 | 4/4 | 1/10 |
| Methods | ||||||||
|---|---|---|---|---|---|---|---|---|
| SIFT | 69 | 1.1768 | 1.1806 | 0.9013 | 0.6812 | 0.9922 | 0.7010 | |
| SAR-SIFT | 1.2487 | 1.2948 | 0.6016 | 0.6755 | 0.1274 | 0.9980 | 0.6910 | |
| VGG16-LS | 112 | 0.5604 | 0.5685 | 0.6150 | 0.3621 | 0.1271 | 1.0000 | 0.4626 |
| ResNet50-LS | 120 | 0.4903 | 0.5064 | 0.2515 | 0.1027 | 1.0000 | 0.4215 | |
| ViT-LS | 109 | 0.5276 | 0.5371 | 0.7162 | 0.2529 | 0.1105 | 1.0000 | 0.4472 |
| DNN+RANSAC | 8 | 0.7293 | 0.7582 | – | 0.5000 | 0.1227 | 0.9766 | 0.5365 |
| MSDF-Net | 12 | 0.4645 | 0.4835 | – | 0.4000 | 0.1175 | 0.9999 | 0.4356 |
| AdaSSIR | 71 | 0.4637 | 0.4707 | 0.6013 | 0.4545 | 0.1072 | 1.0000 | 0.4504 |
| STDT-Net (Ours) | 115 | 0.4732 | 0.6740 | 0.1175 | 1.0000 | |||
| Rank/All | 3/9 | 1/9 | 2/9 | 5/7 | 2/9 | 4/9 | 4/4 | 1/9 |
| Methods | ||||||||
|---|---|---|---|---|---|---|---|---|
| SIFT | 11 | 0.9105 | 0.9436 | — | 0.5455 | 0.1055 | 0.9873 | 0.5908 |
| SAR-SIFT | 1.1424 | 1.2948 | 0.5910 | 0.7419 | 0.0962 | 1.0000 | 0.6636 | |
| VGG16-LS | 19 | 0.6089 | 0.6114 | — | 0.4211 | 0.1061 | 1.0000 | 0.4703 |
| ResNet50-LS | 25 | 0.5725 | 0.5889 | 0.5814 | 0.6058 | 0.1387 | 1.0000 | 0.5102 |
| ViT-LS | 20 | 0.5986 | 0.5571 | 0.5821 | 0.5875 | 0.1266 | 1.0000 | 0.5118 |
| DNN+RANSAC | 10 | 0.8024 | 0.8518 | – | 0.6000 | 0.1381 | 0.9996 | 0.5821 |
| MSDF-Net | 11 | 0.5923 | 0.6114 | – | 0.4351 | 0.0834 | 0.9990 | 0.4753 |
| AdaSSIR | 20 | 0.5534 | 0.5720 | 0.5395 | 0.4444 | 0.1086 | 1.0000 | 0.4715 |
| STDT-Net (Ours) | 24 | 0.5486 | 0.4038 | 0.1088 | 1.0000 | |||
| Rank/All | 3/9 | 1/9 | 1/9 | 2/7 | 1/9 | 6/9 | 4/4 | 1/9 |
| Methods | ||||||||
|---|---|---|---|---|---|---|---|---|
| SIFT | 88 | 1.1696 | 1.1711 | 0.6399 | 0.7841 | 0.1138 | 0.6757 | |
| SAR-SIFT | 1.1903 | 1.1973 | 0.8961 | 0.8671 | 0.1318 | 1.0000 | 0.7390 | |
| VGG16-LS | 54 | 0.5406 | 0.5504 | 0.6804 | 0.3187 | 0.1277 | 1.0000 | 0.4607 |
| ResNet50-LS | 70 | 0.5036 | 0.5106 | 0.7162 | 0.2778 | 0.1208 | 0.9999 | 0.4470 |
| ViT-LS | 67 | 0.5015 | 0.5095 | 0.2925 | 0.1281 | 1.0000 | 0.4356 | |
| DNN+RANSAC | 10 | 0.5784 | 0.5906 | – | 0.0000 | 0.1308 | 0.9999 | 0.3946 |
| MSDF-Net | 52 | 0.5051 | 0.5220 | 0.6112 | 0.7692 | 0.1434 | 1.0000 | 0.5215 |
| AdaSSIR | 68 | 0.4858 | 0.4994 | 0.6013 | 0.5714 | 0.1149 | 1.0000 | 0.4776 |
| STDT-Net (Ours) | 79 | 0.4808 | 0.4954 | 0.6740 | 0.2692 | 0.1134 | 1.0000 | 0.4347 |
| Rank/All | 3/9 | 1/9 | 1/9 | 5/7 | 2/9 | 1/9 | 4/4 | 2/9 |
| Datasets | Branch | Without Precise-Matching | With Precise-Matching |
|---|---|---|---|
| Wuhan | R→S | 0.4598 | 0.4579 |
| S→R | 0.4620 | 0.4590 | |
| YellowR1 | R→S | 0.5798 | 0.5525 |
| S→R | 0.5585 | 0.5535 | |
| YAMBA | R→S | 0.4788 | 0.4960 |
| S→R | 0.4858 | 0.4763 | |
| YellowR2 | R→S | 0.5253 | 0.5185 |
| S→R | 0.5093 | 0.4960 |
| Datasets | Performance | VGG16 | ResNet50 | ViT | Swin-Transformer |
|---|---|---|---|---|---|
| YellowR1 | (%) | 87.13 | 89.32 | 89.59 | 92.74 |
| (m) | 47 | 38 | 42 | 31 | |
| Wuhan | (%) | 89.26 | 92.71 | 91.10 | 94.83 |
| (m) | 19 | 13 | 28 | 10 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Deng, X.; Mao, S.; Yang, J.; Lu, S.; Gou, S.; Zhou, Y.; Jiao, L. Multi-Class Double-Transformation Network for SAR Image Registration. Remote Sens. 2023, 15, 2927. https://doi.org/10.3390/rs15112927
Deng X, Mao S, Yang J, Lu S, Gou S, Zhou Y, Jiao L. Multi-Class Double-Transformation Network for SAR Image Registration. Remote Sensing. 2023; 15(11):2927. https://doi.org/10.3390/rs15112927
Chicago/Turabian StyleDeng, Xiaozheng, Shasha Mao, Jinyuan Yang, Shiming Lu, Shuiping Gou, Youming Zhou, and Licheng Jiao. 2023. "Multi-Class Double-Transformation Network for SAR Image Registration" Remote Sensing 15, no. 11: 2927. https://doi.org/10.3390/rs15112927
APA StyleDeng, X., Mao, S., Yang, J., Lu, S., Gou, S., Zhou, Y., & Jiao, L. (2023). Multi-Class Double-Transformation Network for SAR Image Registration. Remote Sensing, 15(11), 2927. https://doi.org/10.3390/rs15112927

