# Multi-Class Double-Transformation Network for SAR Image Registration

## Abstract

## 1. Introduction

- We utilize each key point directly as a class to design the multi-class model of SAR image registration, which avoids the difficulty of constructing the positive instances (matched-point pairs) in the traditional (two-classification) registration model.
- We design the double-transformation network with the coarse-to-precise structure, where key points from two images are, respectively, used to train two sub-networks that alternately predict key points from another image. It addresses the problem that the categories are inconsistent in training and testing sets.
- A precise-matching module is designed to modify the predictions of two sub-networks and obtain the consistent matched-points, where the nearest points of each key point are introduced to refine the predicted matched-points.

## 2. Related Works

#### 2.1. The Attention Mechanism

#### 2.2. The Transformer Model

## 3. The Proposed Method

#### 3.1. The Multi-Class Double-Transformation Networks

#### 3.1.1. Constructing Samples-Based Key Points

#### 3.1.2. Multi-Class Double-Transformation Networks

#### 3.2. The Precise-Matching Module

## 4. Experiments and Analyses

- 1.
- $RM{S}_{all}$ expresses the root mean square error of the registration result. Note that $RM{S}_{all}\le 1$ means that the performance reaches sub-pixel accuracy.
- 2.
- ${N}_{red}$ is the number of matched-points pairs. Its value is higher, which may be beneficial for obtaining a transformation matrix with a better performance of image registration.
- 3.
- $RM{S}_{LOO}$ expresses the error obtained based on the Leave-One-Out strategy and the root mean square error. For each point in ${N}_{red}$, $RM{S}_{LOO}$ is the average of all errors ($RM{S}_{all}$ of ${N}_{red}-1$ points).
- 4.
- ${P}_{quad}$ is used to detect whether the retained feature points are evenly distributed in the quadrant, and its value should be less than $95\%$.
- 5.
- $BPP\left(r\right)$ expresses the bad point proportion in obtained matched-points pairs, where a point with a residual value above a certain threshold (r) is called the bad point.
- 6.
- ${S}_{kew}$ denotes the absolute value of the calculated correlation coefficient. Note that the Spearman correlation coefficient is used when ${N}_{red}<20$; otherwise, the Pearson correlation coefficient is applied.
- 7.
- ${S}_{cat}$ is a statistical evaluation of the entire image feature point distribution [43], which should be less than $95\%$.
- 8.
- $\varphi $ is the linear combination of the above seven indicators, calculated by$$\begin{array}{c}\hfill \varphi =\frac{1}{12}[2\times (\frac{1}{{N}_{red}}+RM{S}_{LOO}+BPP\left(1.0\right)+{S}_{cat})\\ \hfill +RM{S}_{all}+1.5\times ({P}_{quad}+{S}_{knew})].\end{array}$$When ${N}_{red}\ge 20$, ${P}_{quad}$ is not used, and the above formula is simplified as$$\begin{array}{c}\hfill \varphi =\frac{1}{10.5}[2\times (\frac{1}{{N}_{red}}+RM{S}_{LOO}+BPP\left(1.0\right)+{S}_{cat})\\ \hfill +RM{S}_{all}+1.5\times {S}_{knew}],\end{array}$$

#### 4.1. Comparison and Analysis of the Experimental Results

**SIFT**is mainly matched by using the Euclidean distance ratio between the nearest and second-nearest neighbors of the corresponding features.

**SAR-SIFT**is an improvement of the SIFT method, and it is more consistent with the SAR image characteristics.

**VGG16-LS, ResNet50-LS and ViT-LS**are deep-learning-based classification methods.

**DNN + RANSAC**[12] constructs the training sample set by using self-learning methods, and then it uses DNN networks to obtain matched image pairs.

**MSDF-Net**[16] uses deep forest to construct multiple matching models based on multi-scale fusion to obtain the matched-points pairs, and then it uses RANSAC to calculate the transformation matrix.

**AdaSSIR**[25] proposes an adaptive self-supervised SAR image registration method, where the registration of SAR images is considered as a self-supervised learning problem and each key point is regarded as a category-independent instance to construct the contrastive model for searching out the accurate matched points.

#### 4.2. The Visual Results of SAR Image Registration

#### 4.3. Analyses on the Precise-Matching Module

#### 4.4. Analyses on the Double-Transformation Network

## 5. Discussion

## 6. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

**Figure 2.**A visual example of eight near-points around a key point from the sensed image with k pixels, where $k=5$ and the predictions are obtained by the R-S branch ($Ne{t}_{R}$).

**Figure 3.**Reference and sensed images of Wuhan data. The image size is $400\times 400$ and the resolution is 10 m.

**Figure 4.**Reference and sensed images of Australia-Yama data. The image size is $650\times 350$ pixels.

**Figure 5.**Reference and sensed images of YellowR1 data. The image size is $700\times 700$ pixels and the resolution is 8 m.

**Figure 6.**Reference and sensed images of YellowR2 data. The image size is $1000\times 1000$ pixels and the resolution is 8 m.

**Figure 11.**The comparison of sub-images corresponding to matched-points obtained by the proposed method without precised-matching module and with precise-matching module. For each point, the left sub-image corresponds to that point, the right top sub-image corresponds to its matched-point obtained by the proposed method without precised-matching module, and the right under sub-image (labeled in the red box) corresponds to the matched results with precised-matching module.

**Figure 12.**The comparison of the proposed double-transformation network to two single branches (the R-S branch and the S-R branch).

Methods | ${\mathit{N}}_{\mathit{red}}$ | ${\mathit{RMS}}_{\mathit{all}}$ | ${\mathit{RMS}}_{\mathit{Loo}}$ | ${\mathit{P}}_{\mathit{quad}}$ | $\mathit{BPP}\left(\mathit{r}\right)$ | ${\mathit{S}}_{\mathit{kew}}$ | ${\mathit{S}}_{\mathit{cat}}$ | $\mathit{\varphi}$ |
---|---|---|---|---|---|---|---|---|

SIFT | 17 | 1.2076 | 1.2139 | — | 0.6471 | 0.1367 | 0.9991 | 0.7048 |

SAR-SIFT | 66 | 1.2455 | 1.2491 | 0.6300 | 0.6212 | 0.1251 | 0.9961 | 0.6784 |

VGG16-LS | 58 | 0.5611 | 0.5694 | 0.6665 | 0.2556 | 0.0389 | 1.0000 | 0.4420 |

ResNet50-LS | 68 | 0.4818 | 0.4966 | 0.7162 | 0.2818 | 0.1943 | 0.9766 | 0.4489 |

ViT-LS | 64 | 0.5218 | 0.5304 | 0.6101 | 0.2330 | 0.1072 | 1.0000 | 0.4296 |

DNN + RANSAC | 8 | 0.6471 | 0.6766 | – | 0.1818 | 0.0943 | 0.9766 | 0.4484 |

MSDF-Net | 39 | 0.4345 | 0.4893 | 0.6101 | 0.3124 | 0.1072 | 1.0000 | 0.4304 |

AdaSSIR | 47 | 0.4217 | 0.4459 | 0.6254 | 0.3377 | 0.1165 | 1.0000 | 0.4287 |

STDT-Net (Ours) | 78 | 0.4490 | 0.4520 | 0.6254 | 0.2277 | 0.1165 | 1.0000 | 0.4122 |

Rank/All | 1/10 | 3/10 | 2/10 | 2/7 | 2/10 | 4/10 | 4/4 | 1/10 |

Methods | ${\mathit{N}}_{\mathit{red}}$ | ${\mathit{RMS}}_{\mathit{all}}$ | ${\mathit{RMS}}_{\mathit{Loo}}$ | ${\mathit{P}}_{\mathit{quad}}$ | $\mathit{BPP}\left(\mathit{r}\right)$ | ${\mathit{S}}_{\mathit{kew}}$ | ${\mathit{S}}_{\mathit{cat}}$ | $\mathit{\varphi}$ |
---|---|---|---|---|---|---|---|---|

SIFT | 69 | 1.1768 | 1.1806 | 0.9013 | 0.6812 | $\mathbf{0}.\mathbf{0975}$ | 0.9922 | 0.7010 |

SAR-SIFT | $\mathbf{151}$ | 1.2487 | 1.2948 | 0.6016 | 0.6755 | 0.1274 | 0.9980 | 0.6910 |

VGG16-LS | 112 | 0.5604 | 0.5685 | 0.6150 | 0.3621 | 0.1271 | 1.0000 | 0.4626 |

ResNet50-LS | 120 | 0.4903 | 0.5064 | $\mathbf{0}.\mathbf{5873}$ | 0.2515 | 0.1027 | 1.0000 | 0.4215 |

ViT-LS | 109 | 0.5276 | 0.5371 | 0.7162 | 0.2529 | 0.1105 | 1.0000 | 0.4472 |

DNN+RANSAC | 8 | 0.7293 | 0.7582 | – | 0.5000 | 0.1227 | 0.9766 | 0.5365 |

MSDF-Net | 12 | 0.4645 | 0.4835 | – | 0.4000 | 0.1175 | 0.9999 | 0.4356 |

AdaSSIR | 71 | 0.4637 | 0.4707 | 0.6013 | 0.4545 | 0.1072 | 1.0000 | 0.4504 |

STDT-Net (Ours) | 115 | $\mathbf{0}.\mathbf{4604}$ | 0.4732 | 0.6740 | $\mathbf{0}.\mathbf{2173}$ | 0.1175 | 1.0000 | $\mathbf{0}.\mathbf{4205}$ |

Rank/All | 3/9 | 1/9 | 2/9 | 5/7 | 2/9 | 4/9 | 4/4 | 1/9 |

Methods | ${\mathit{N}}_{\mathit{red}}$ | ${\mathit{RMS}}_{\mathit{all}}$ | ${\mathit{RMS}}_{\mathit{Loo}}$ | ${\mathit{P}}_{\mathit{quad}}$ | $\mathit{BPP}\left(\mathit{r}\right)$ | ${\mathit{S}}_{\mathit{kew}}$ | ${\mathit{S}}_{\mathit{cat}}$ | $\mathit{\varphi}$ |
---|---|---|---|---|---|---|---|---|

SIFT | 11 | 0.9105 | 0.9436 | — | 0.5455 | 0.1055 | 0.9873 | 0.5908 |

SAR-SIFT | $\mathbf{31}$ | 1.1424 | 1.2948 | 0.5910 | 0.7419 | 0.0962 | 1.0000 | 0.6636 |

VGG16-LS | 19 | 0.6089 | 0.6114 | — | 0.4211 | 0.1061 | 1.0000 | 0.4703 |

ResNet50-LS | 25 | 0.5725 | 0.5889 | 0.5814 | 0.6058 | 0.1387 | 1.0000 | 0.5102 |

ViT-LS | 20 | 0.5986 | 0.5571 | 0.5821 | 0.5875 | 0.1266 | 1.0000 | 0.5118 |

DNN+RANSAC | 10 | 0.8024 | 0.8518 | – | 0.6000 | 0.1381 | 0.9996 | 0.5821 |

MSDF-Net | 11 | 0.5923 | 0.6114 | – | 0.4351 | 0.0834 | 0.9990 | 0.4753 |

AdaSSIR | 20 | 0.5534 | 0.5720 | 0.5395 | 0.4444 | 0.1086 | 1.0000 | 0.4715 |

STDT-Net (Ours) | 24 | $\mathbf{0}.\mathbf{5487}$ | $\mathbf{0}.\mathbf{5531}$ | 0.5486 | 0.4038 | 0.1088 | 1.0000 | $\mathbf{0}.\mathbf{4610}$ |

Rank/All | 3/9 | 1/9 | 1/9 | 2/7 | 1/9 | 6/9 | 4/4 | 1/9 |

Methods | ${\mathit{N}}_{\mathit{red}}$ | ${\mathit{RMS}}_{\mathit{all}}$ | ${\mathit{RMS}}_{\mathit{Loo}}$ | ${\mathit{P}}_{\mathit{quad}}$ | $\mathit{BPP}\left(\mathit{r}\right)$ | ${\mathit{S}}_{\mathit{kew}}$ | ${\mathit{S}}_{\mathit{cat}}$ | $\mathit{\varphi}$ |
---|---|---|---|---|---|---|---|---|

SIFT | 88 | 1.1696 | 1.1711 | 0.6399 | 0.7841 | 0.1138 | $\mathbf{0}.\mathbf{9375}$ | 0.6757 |

SAR-SIFT | $\mathbf{301}$ | 1.1903 | 1.1973 | 0.8961 | 0.8671 | 0.1318 | 1.0000 | 0.7390 |

VGG16-LS | 54 | 0.5406 | 0.5504 | 0.6804 | 0.3187 | 0.1277 | 1.0000 | 0.4607 |

ResNet50-LS | 70 | 0.5036 | 0.5106 | 0.7162 | 0.2778 | 0.1208 | 0.9999 | 0.4470 |

ViT-LS | 67 | 0.5015 | 0.5095 | $\mathbf{0}.\mathbf{6000}$ | 0.2925 | 0.1281 | 1.0000 | 0.4356 |

DNN+RANSAC | 10 | 0.5784 | 0.5906 | – | 0.0000 | 0.1308 | 0.9999 | 0.3946 |

MSDF-Net | 52 | 0.5051 | 0.5220 | 0.6112 | 0.7692 | 0.1434 | 1.0000 | 0.5215 |

AdaSSIR | 68 | 0.4858 | 0.4994 | 0.6013 | 0.5714 | 0.1149 | 1.0000 | 0.4776 |

STDT-Net (Ours) | 79 | 0.4808 | 0.4954 | 0.6740 | 0.2692 | 0.1134 | 1.0000 | 0.4347 |

Rank/All | 3/9 | 1/9 | 1/9 | 5/7 | 2/9 | 1/9 | 4/4 | 2/9 |

Datasets | Branch | Without Precise-Matching | With Precise-Matching |
---|---|---|---|

Wuhan | R→S | 0.4598 | 0.4579 |

S→R | 0.4620 | 0.4590 | |

YellowR1 | R→S | 0.5798 | 0.5525 |

S→R | 0.5585 | 0.5535 | |

YAMBA | R→S | 0.4788 | 0.4960 |

S→R | 0.4858 | 0.4763 | |

YellowR2 | R→S | 0.5253 | 0.5185 |

S→R | 0.5093 | 0.4960 |

Datasets | Performance | VGG16 | ResNet50 | ViT | Swin-Transformer |
---|---|---|---|---|---|

YellowR1 | $Acc$ (%) | 87.13 | 89.32 | 89.59 | 92.74 |

$Time$ (m) | 47 | 38 | 42 | 31 | |

Wuhan | $Acc$ (%) | 89.26 | 92.71 | 91.10 | 94.83 |

$Time$ (m) | 19 | 13 | 28 | 10 |

