Multi-Scale Fused SAR Image Registration Based on Deep Forest

: SAR image registration is a crucial problem in SAR image processing since the registration results with high precision are conducive to improving the quality of other problems, such as change detection of SAR images. Recently, for most DL-based SAR image registration methods, the problem of SAR image registration has been regarded as a binary classiﬁcation problem with matching and non-matching categories to construct the training model, where a ﬁxed scale is generally set to capture pair image blocks corresponding to key points to generate the training set, whereas it is known that image blocks with different scales contain different information, which affects the performance of registration. Moreover, the number of key points is not enough to generate a mass of class-balance training samples. Hence, we proposed a new method of SAR image registration that meanwhile utilizes the information of multiple scales to construct the matching models. Speciﬁcally, considering that the number of training samples is small, deep forest was employed to train multiple matching models. Moreover, a multi-scale fusion strategy is proposed to integrate the multiple predictions and obtain the best pair matching points between the reference image and the sensed image. Finally, experimental results on four datasets illustrate that the proposed method is better than the compared state-of-the-art methods, and the analyses for different scales also indicate that the fusion of multiple scales is more effective and more robust for SAR image registration than one single ﬁxed scale.


Introduction
Synthetic Aperture Radar (SAR) [1,2] is of the characteristics of all-day, all-weather, high-resolution imaging, compared with infrared imaging and optical imaging. Based on these characteristics, SAR image processing has drawn much attention in military and civilian fields [3,4]. In particular, it is necessary to simultaneously analyze and process two or more SAR images in some problems of SAR image processing, such as SAR image change detection [5][6][7], SAR image fusion [8,9], object detection of SAR image [10,11], etc. However, the analyzed multiple SAR images are generally captured under different conditions, such as different SAR sensors, different viewpoints, different times, etc., which causes the captured multiple SAR images to be diverse, even for the same scene. Moreover, different imaging algorithms also result in diverse SAR images, such as BP [12], RD [13], compressive sensing based approach [14], etc., and SAR imagery may be unfocused by the motion errors [15,16]. Therefore, SAR image registrations are significant for some problems that need to process two SAR images at the same time. For example, the registration accuracy of the unchanged SAR image and the changed SAR image directly affect the performance of SAR image change detection. as the training samples for DL-based SAR image registration. However, compared to natural images, it is tough to obtain a mass of annotated training samples in SAR image registration, since manually annotating pair image patches (especially matching image patches) is very time consuming for SAR images, and noisy labels are easily produced in the processing of manual annotations.
Moreover, most studies of DL-based SAR image registration regard image patches with a fixed size as one sample to represent a matching point in general, whereas the information contained in patches with different sizes may be diverse in practice; an illustration is shown in Figure 1, where the left image is the reference image and the right is the sensed image. In two SAR images, we give three points: The point A is in the reference image, and the points B and C are in the sensed image. Noticeably, point B (labeled in red) is matched with point A (labeled in red), but point C (labeled in green) is not matched with A. For each point, region patches with two different sizes (m and s) are given to represent the corresponding points, respectively, where the region in the blue box corresponds to the size m × m and the region in the purple box corresponds to the size s × s. Obviously, it is seen that patches with different sizes contain diverse information. If the patch with m × m is used to represent one point, the patch of A is similar to both patches of B and C. If using the size s, the patch of A is similar to B but different from C. Actually, point A is actually matched with point B and not matched with point C. This indicates that the size of image patches may directly affect the confidence level of matching prediction and one fixed size is insufficient for DL-based SAR image registration, whereas, in practice, it is tough to determine which size is more suitable for improving the accuracy of SAR image registration because of the complexity of SAR images.  Based on the analyses mentioned above, we propose a new method-a multi-scale fusion SAR image registration framework based on deep forest-in this paper. In the proposed method, the self-learning method is firstly utilized to generate pair matching and non-matching image blocks with multiple scales based on the key points of the reference image and its transformed image, and the generated pair image blocks are used to construct multi-scale training sets. Then, the diversity map between pair image blocks is obtained as the input sample to train multiple binary classification models via deep forest. Finally, a multi-scale fusion strategy is proposed to integrate the multiple predictions and obtain the best pair matching points between the reference image and the sensed image. Experimental results indicate the proposed method can obtain better registration performance compared with the state-of-the-art methods. The analyses for the performance corresponding to different scales of image blocks also illustrate that the fusion of multiple scales is more effective and robust for SAR image registration than the single fixed scale.
The remainder of this paper is organized as follows. Section 2 discusses the details of the proposed SAR image registration method, and Sections 3 and 4 give the experimental results and analytical discussions. Finally, the conclusion is provided in Sections 5.

The Proposed Method
In this paper, we propose a multi-scale fused SAR image registration method based on deep forest; the framework of the proposed method is shown in Figure 2. As shown in Figure 2, the proposed method is mainly composed of three parts: Constructing multiscale training sets, training the matching model, and multi-scale fusion. First, the part of constructing multi-scale training sets focuses on generating training samples based on the obtained key points between the reference image and its transformed image with different sizes of image blocks corresponding to key points. Second, the image registration is considered as a binary classification problem, and multiple matching models are trained by deep forest based on the constructed multi-scale training sets. Meanwhile, multiple different predictions are obtained by multiple matching models. Finally, a multi-scale fusion strategy is proposed to combine the obtained multiple predictions for SAR image registration. As follows, the details of three parts will be introduced, respectively. Before introducing the proposed method, we show some descriptions for deep forest in the first sub-section.

Deep Forest
Deep Forest [44] was proposed by Zhou et al. in 2018, and is a new deep model based on random forest [45] and is different from deep neural network. Random forest is a classical method of ensemble learning [46][47][48][49] where multiple decision trees are constructed and combined, and where the training samples and features are randomly selected to generate each decision tree. Deep forest is mainly composed of two parts: Multi-granularity scanning and cascaded forest structure. The most critical part is the cascade forest structure, where each layer of the cascade forest structure includes two completely random tree forests and two random forests and then each forest generates a class vector. A performance test is done at the end of one level, and if there is a significant performance improvement, then the next level is generated, otherwise the training is terminated. Since the number of layers of cascaded forest structure is self-adjusting, deep forest works well on small sample tasks. When training the model, there is no need to set hyperparameters and just adjust a few parameters. We use the difference feature vector of the image pair as the input feature, whose size is 1 × s 2 . The output of the network is the corresponding matching labels. We consider that if centroids of two image blocks are corresponding, then their matching labels are given as 1, otherwise the labels are given as 0.
In recent years, random forest has been applied in the field of remote sensing image processing [50,51]. For example, Pierce et al. [52] used a random forest algorithm to prevent forest fires, which reduced the occurrence of fires to a certain extent. Zou et al. [53] proposed a random clustered forest algorithm which solved the problem of terrain classification of polarimetric SAR. Ma et al. [54] proposed a SAR image change detection method based on deep forest, which adequately obtained useful information from the local image blocks and significantly improved the detection accuracy. However, to the best of our knowledge, random forest is not used on the SAR image registration task. In this paper, we utilize deep forest to improve the registration performance of SAR images.

Constructing Multi-Scale Training Sets
At present, most methods of DL-based image registration utilize the obtained key points in the reference image to construct the training set, where an image block with a fixed scale is captured to represent a key point. The classical strategy of setting the size of the image block is to select a size corresponding to the best registration performance from several different sizes as the final fixed size. As shown in Figure 1, it illustrates that the information contained in image blocks of different sizes corresponding to one key point is diversity, which indicates only using the image block with a fixed size is insufficient to characterize the neighborhood information of one key point. Moreover, it is tough to find a suitable size for different remote sensing images, and the robustness of the SAR image registration method is reduced.
Based on this, we utilize multi-scale information of image blocks corresponding to key points to construct training sets with multiple scales. In this part, the self-learning strategy [42] is employed to produce the pair image blocks from the reference image and its transformed images, considering that it is a benefit to obtain sufficient pair image blocks with accurate labels. Differently from [42], we construct a pair of training samples based on multiple scales. Note that multiple transformation strategies are employed to obtain invariant transformed images of the reference image in terms of rotation, scale, and translation.
Given a reference SAR image I R and the multiple scales S = {s 1 , . . . , s K }. First, the reference image I R is transformed by the given t transformation matrices {T 1 , . . . , T t }, respectively, and the transformed images {I T 1 , . . . , I T t } are obtained. Second, a traditional local feature point extraction method (Scale-Invariant Feature Transform, SIFT [27]) is used to find m key points in the reference image I R , P R = {P R1 , . . . , P RM }. For each key point P Rm (m = 1, . . . , M), it is of one and only matching points and gains M − 1 non-matching points in each transformed image I T .
Then, based on a scale s k , image blocks B R km and B T km with s k × s k are captured from I R and I T , corresponding to the key point P Rm , respectively. Combining with the obtained pair matching points in I R and I T , a pair of matching image blocks are obtained corresponding to P Rm , denoted as (B R km , B T km ). Similarly, a pair of non-matching image blocks are obtained corresponding to P Rm , denoted as (B A simple example is shown in Figure 3. In Figure 3, the reference SAR image I R is firstly transformed based on three affine transformation strategies: The scale transformation T 1 , the rotation transformation T 2 and the affine transformation T 3 , respectively, and three transformed SAR images (I T 1 , I T 2 and I T 3 ) are obtained. Then, nine key points are obtained by SIFT for the reference image I R . According to nine key points, a pair of matching image blocks and a pair of non-matching image blocks are captured with the scale s k × s k from I R and I T , respectively, where the centre of the image block is located in their corresponding key point. As shown in Figure 3, nine pairs of matching image blocks and nine pairs of non-matching image blocks are obtained for each scale. From the above, it is known that the training set is constructed based on the reference image and its transformed image. Differently from the training set, the testing set of SAR image registration is constructed based on the sensed image and the reference image. Similarly to the reference image I R , N key points are firstly obtained from the sensed image I S by SIFT. Then, based on M key points of I R and N key points of I S , each key point of I S is paired with all key points of I R , and N × M pair image blocks with each scale are obtained. Finally, N × M diversity maps with the size s k × s k are given as the testing set D Tk with the scale s k , without given labels.

Training Matching Model
In general, the problem of SAR image registration is converted into a binary classification problem in learning-based SAR image registration, where the pair of matching image blocks are regarded as the positive instances and the pair of non-matching image blocks are regards as the negative instances. According to the part of constructing multi-scale training sets, we can obtain m pair matching image blocks as the positive instances and m non pair non-matching image blocks as the negative instances in each scale. Noticeably, in order to construct a class-balanced training set, we set m non = m. However, it is known that the number m of key points obtained by SIFT is limited, which means a mass of pairs image blocks are not competently generated, especially pair matching image blocks. Based on this, we apply deep forest [44] as the basic classification model to training multiple matching models, considering that deep forest is more effective for the training set with a small size compared with deep neural network.
According to the constructed multi-scale training sets, we utilizes the diversity map between a pair of image blocks to represent the input corresponding to one pair image blocks, and all pixel values of the diversity map are cascaded as a sample of training the classification model. For instance, a pair of matching image blocks (B R km , B T km ) with s k × s k were obtained, and then the diversity map D km is given by subtracting the pixel value of B R km from the pixel value of B T km . In particular, the diversity map D km is vectorized as a vector z km with the size 1 × s 2 k , and the vector z km is regarded as a training sample belonging to the positive category and its label is y km = 1. Similarly, for a pair of non-matching image blocks (B R km , B T kr ), their diversity map D * km is obtained and vectorized as the vector z * km belonging to the negative category, and its label is y * km = 0. Figure 3 shows an example of diversity maps corresponding to a pair matching image blocks and a pair non-matching image blocks from YellowR1 data, respectively. From Figure 4, it is seen that the diversity map is darker since the similarity between pair matched image blocks is higher. In contrast, the difference maps of the non-matched image pairs are a bit brighter. In the training process of deep forest, the key is to train the cascade forest structure, where each layer includes two completely random tree forests and two random forests. Therefore, for the training set with s k , a two-dimensional class vector [a ki , b ki ] is obtained by four random forests, respectively, where a ki and b ki express the probability that the sample is classified into the positive category and the negative category, respectively, i = 1, . . . , 4. Finally, the output class vector [a k , b k ] is obtained by averaging these four class vectors, and the prediction label y is the category with the largest class distribution on the final class vector, shown as the following formulas: and By the above formulas, it is known that if a sample is classified into the positive category (y = 1), its corresponding pair image blocks are matched. Otherwise, the corresponding pair image blocks are not matched.
According to parts of constructing multi-scale training sets, K training sets with multiple scales {s 1 , s 2 , . . . , s K } are constructed and trained, respectively. Based on K training sets with different scales, K classification models are trained and denoted as {φ 1 , φ 2 , . . . , φ K }. Then, the prediction of a diversity image I D is obtained by each model φ k (k ∈ {1, . . . , K}). Algorithm 1 shows the procedure of training matching models based on multi-scale training sets.

Algorithm 1
The Procedure of Traing Matching Models.

Input:
The constructed multi-scale training sets: Calculate the accuracy Acc j of the model in the current layer. 12: until (Acc j < Acc j−1 ) 13: Get a training matching model φ k ; 14: end for

Multi-Scale Fusion
For the generated K training models with multiple scales, we propose a multi-scale fusion strategy for SAR image registration to fuse the predictions corresponding to multiple scales, to more effectively utilize the complementation of multiple scales. By K training models, a set of predictions {Y T1 , . . . , Y TK } is gained for K testing sets with different scales {D T1 , . . . , D TK }.
Due to the remote sensing imaging mechanism and the use of image block matching, for each point of the sensed image I S , more than one pair key points may be classified as 1. However, theoretically, only one point (or zero) of I R is matched with each point of I S . This means that some pseudo-matching predictions are given by φ k , k = 1, . . . , K, and the pseudo-matching predictions are not conducive to the calculation of the final transformation matrix. Therefore, our fusion strategy is composed of local constraint, multiple scales union and global constraint to delete pseudo-matching points, and the details are shown as follows.
Local Constraint: Normalized Cross Correlation (NCC) [55] refers to the similarity between two image blocks by calculating the pixel intensity. In our method, the image block with the largest NCC value is regarded as the final matched image pair, and the value c of NCC is calculated by the following formula: Multiple Scales Union: By the part of local constraint, we obtain g k (b k < n) sets of matched image pairs for each scale s k , and then all sets of pair matching points corresponding to all scales are integrated and the final matching points are given by where K is the number of scales. Global Constraint: We use RANdom SAmple Consensus Algorithm (RANSAC) [56] to remove the imprecise matched points by iteratively selecting a random subset of matched points to estimate the performance of the current model. Finally, w pair matched points are obtained from G. Based on w pair matched points, the transformation matrix T F between the reference SAR image and the sensed SAR image is calculated [19] by the following formula: where (x, y) is the coordinate of a key point from I S , (x , y ) is the coordinate of the point by affine transformation, and its transformation matrix T is given by

Experimental Results and Analyses
In this section, we will validate the performance of the proposed method from several items: (1) The comparison performance of our method with the state-of-the-art methods on four datasets of SAR image registration; (2) the visualization on the chessboard diagram of SAR image registration; (3) the analysis on the performance obtained based on different scales. As follows, we will firstly introduce experimental datasets and settings.

Experimental Data and Settings
In our experiments, four data of SAR images are used to test the performance, captured by Radarsat-2 and ALOS-PALSAR, respectively. Four datasets are Wuhan Data, YellowR1 Data, Australia-Yama Data and YellowR2 Data, and their detailed descriptions are given as: June 2009, respectively. In YellowR1 data, the size of two SAR images is 700 × 700 pixels and the resolution is 8 m, shown as in Figure 6. In YellowR2 data, the size of two SAR images is 1000 × 1000 pixels and the resolution is 8 meters, shown as in Figure 7. Note that YellowR1 and YellowR2 data are cropped from the SAR images of Yellow River Data with 7666 × 7692. Moreover, the sensed SAR image obtained in 2009 has more multiplicative speckle noise than the reference SAR image obtained in 2008. • Two SAR images of Australia-Yama Data was collected by the ALOS-PALSAR satellite in the Yamba region of Australia in 2018 and 2019, respectively. The size of two images is 650 × 350 pixels, and they are shown in Figure 8.

Reference SAR Image Sensed SAR Image
Wuhan Data Reference SAR Image Sensed SAR Image YellowR1 Data

Reference SAR Image Sensed SAR Image
YellowR2 Data

Reference SAR Image Sensed SAR Image
Australia-Yama Data According to Section 2.2, it is known that the training samples are constructed based on the reference image and the transformed image corresponding to the reference image. In our experiments, the used affine transformations are scale transformation and rotation transformation, respectively. The parameters of scale transformation are transformed in the range of [0.5, 1.5], and the parameters of the rotation transformation are randomly selected from 1 to 90 degrees. The parameters of deep forest are referred to [44].
Moreover, to validate the registration performance of the proposed method better, we apply seven evaluation criteria [57] to evaluate the accuracy of SAR image registration, shown as follows:

1.
RMS all represents the root mean square error calculated by the following formula: where i = 1, . . . , M. Note that RMS all ≤ 1 means that the performance reaches sub-pixel accuracy. 2. N red is the number of matching pairs. For the transformation matrix, a bigger value may result in a better performance of image registration. 3. RMS LOO expresses the error obtained based on the Leave-One-Out strategy and the root mean square error. For each feature point in N red , we calculate the RMS all of N red − 1 feature points, and then their average value is equal to RMS LOO . 4. P quad is used to detect whether the retained feature points are evenly distributed in the quadrant, and its value should be less than 95%. First, we calculate the residuals between key points from the reference image and the transformed points in the sensed image obtained by the transformation matrix.Then, the number of residual distances is calculated in each quadrant. Finally, the cardinality distribution (χ 2 ) of the goodnessof-fit is used to detect the distribution of feature points. In particular, this index is not suitable for the case of N red < 20. 5. BPP(r) is the abbreviation of Bad Point Proportion. A point with a residual value lie above a certain threshold(r) is called Bad Point, and thus BPP(r) represents the ratio of Bad Point to the number of detected matching pairs. 6. S kew is defined as the absolute value of the calculated correlation coefficient, which is about the statistical evaluation of the preference axis on the residual scatter plot and should be less than 0.4. As stated in [57], a more robust method of identifying the presence of a preference axis on the residual distribution is the correlation coefficient. When N red < 20, the Spearman correlation coefficient is used; otherwise, the Pearson correlation coefficient is adequate. 7. S cat is a statistical evaluation of the entire image feature point distribution, which should be less than 95%. The calculation of S cat is referred to [57]. 8. φ is the linear combination of the above seven calculation indicators, the calculation formula is as follows: When N red ≥ 20, P quad is not used, and thus the above formula is simplified as (9) and the value should be less than 0.605.

The Comparison Performance
In this part, we compare the proposed method with five classical and effective methods: SIFT [31], SAR-SIFT [33], PSO-SIFT [58], DNN+RANSAC [42] and SNCNet+RANSAC [59]. The compared methods are introduced as follows: • SIFT detects the key points by constructing the difference-of-Gaussian scale-space, and then uses the 128-dimensional features of the key points to obtain matching pairs, and finally filters the matching pairs with the RANSAC algorithm to find the transformation parameters. • Differently from SIFT, SAR-SIFT uses SAR-Harris space instead of difference-of-Gaussian scale-space to find keypoints. • PSO-SIFT introduces an enhanced feature matching method that combines the position, scale and orientation of each key point based on the SIFT algorithm, greatly increasing the number of correctly corresponding point pairs. • DNN+RANSAC constructs training sample sets using self-learning methods, and then it uses DNN networks to obtain matched image pairs. • SNCNet+RANSAC uses the Sparse Neighborhood Consensus Network (SNCNet) to get the matching points (the network has public code), and then it uses the RANSAC algorithm to calculate the transformation matrix parameters.
In five compared methods, SIFT, SAR-SIFT and PSO-SIFT belong to traditional methods of SAR image registration, and DNN+RANSAC and SNCNet+RANSAC are two DL-based methods. Tables 1-4 show the experimental results obtained by six methods on four datas, respectively, where the best performance corresponding to each index is bolded.
From the four tables, it is obviously seen that the performance of the proposed method is superior to five compared methods on RMS all and RMS LOO , and the performances of our method reached to the sub-pixel level (RMS all and RMS LOO are less than 1.0) for four datasets of SAR image registration. φ is the total weighted measure of the above seven metrics, and a smaller φ value implies a better combined measure. The proposed method obtains the best φ values on Wuhan, YellowR1 and Yamba datasets. In addition, the proposed method obtains a better point space distribution (P quad and S kew ) and a lower bad point ratio (BPP(r)).

The Visualization on SAR Image Registration
In order to visually validate the registration effectiveness of the proposed method, the chessboard mosaicked image of registrations on four datasets are shown in Figures 9-12, respectively. In the chessboard mosaicked image, the continuity of edges and overlapping regions illustrate the registration performance. In particular, in order to make the checkerboard chart look more obvious, the reference image was darkened overall during making the chessboard mosaicked image.      -12 show the chessboard mosaicked images given by the proposed method for Wuhan Data, YellowR1 Data, Yamba Data and YellowR2 Data, respectively. Moreover, in order to better visual the details of registration results between the reference image and the sensed image, chessboard mosaicked results of two local regions are enlarged for each data, and they are shown in blue and red boxs, respectively. From four figures, it is visually seen that the foreground and the background are well overlapped in each chessboard mosaicked image, where the foreground is corresponding to the registration image aligned by the proposed method. Moreover, by the enlarged images (in blue and red boxes), it is obviously observed that the lines are continuous and regions are well overlapped in Figures 9, 11 and 12, and the rivers is well aligned together and edges are smooth in Figure 10. In short, the results of chessboard mosaicked images demonstrate that the proposed method is able to obtain the higher registration accuracy.

Analyses on Registration Performance with Different Scales
Considering that our method is proposed based on multiple scales fusion, we make an analysis on the registration performance with different scales to validate the effectiveness of the multi-scale fusion. In this experiment, we test five different scales of image blocks: 8 × 8, 16 × 16, 24 × 24, 32 × 32, and 64 × 64. Table 5 shows the performance obtained based on five scales, respectively,where the best performance corresponding to each data is bolded.
From Table 5, it is obviously seen that the performance of our method is better than each single scale for four datas, which illustrates the fusion of multi scales is more effective for SAR image registration. Note that the registration is ineffectual for the test images when the scale of image blocks is set as 8 × 8, and the reason is that the scale 8 × 8 is too small to contain useful information in each image block. Moreover, it is also found that the scale corresponding to the best performance is different for different SAR images. For example, for Wuhan data, the scale 64 × 64 obtains the best performance in four different scales, while the scale 24 × 24 obtains the best performance for YellowR1 data. It also indicates the bigger scale is not better for SAR image registration. With increasing the scale of image blocks, the neighborhood information of the key point is not necessarily positive feedback to image registration. For our method, we apply three different scales (16 × 16, 24 × 24 and 32 × 32) in our experiments to validate the performance of registration. 64 × 64 is not used to fuse, since this scale is big which increases the computation complexity. In short, the analysis on multiple scales illustrates that the fusion of multiple scales is more benefit and robustness for SAR image registration than using a single fixed scale.

Discussion
According to experimental results shown in Section 3, it illustrates that the proposed method achieves better registration performance for SAR images than compared state-ofthe art methods. The reasons include three points mainly: First, deep forest is employed as the basic training model, which is more benefit for the training samples with a small size. Second, the multiple scale strategy is proposed to construct multiple training models based on image blocks with different scales corresponding to each keypoint, since image blocks with different scales are of more information for each keypoint. Third, multi-scale fusion based global and local constraints are constructed to seek for the most precise matched points between reference and sensed SAR images. While the comparison performance, the visualization of registration results and the analyses on the parameter setting have been given and discussed, the running time of SAR image registration is worth observing. Thus, we analyze the running time of SAR image registration implemented by the proposed method and five compared methods in this part. Additionally, we also make an analysis about the application of SAR images registration to validate the significance of the registration performance of two images.

Running Time
In this part, we observe the running time of the proposed method in comparison with the existing approaches, where all experimental settings are the same as previous experiments in Section 3, and experimental results are shown in Table 6. From Table 6, it is seen that the running time of the proposed method ('Ours') is longer than other compared methods. According to our analyses, the main reason is that our method uses the multi-scale strategy and constructs the whole training model based on samples with multiple scales. While the multi-scale training model has a longer running time, the previous experimental results illustrate that multiple scales improve the performance of SAR image registration. Compared with our method, SNCNet+RANSAC has a shorter running time because it is a trained model and is directly used to test, and DNN+RANSAC also has a shorter time since it fixes the size of image patches and only one model is trained. In further works, we consider to speed up the running time by designing the training model of registration methods.

An Application on Change Detection
Generally, it is necessary to simultaneously analyze two SAR images in some problems of SAR image processing, such as change detection of SAR images, SAR image fusion, object detection of SAR images, etc., where two SAR images are diverse and captured under different conditions. Hence, SAR image registration is helpful and crucial for enhancing the performance of these problems. In order to validate the significance of SAR image registration in some applications, we make a simple analysis that the registration result is applied to the task of SAR image change detection, where the project related to change detection is from Github https://github.com/summitgao/SAR_Change_Detection_CWNN (accessed on 3 August 2019). In the project, the used dataset is called as Bern Flood Data. Two SAR images of Bern Flood Data were collected by the ERS-2 SAR sensor on April 1999 and May 1999 in the city of Bern, respectively, and the size of two images is 301 × 301. Figure 13 shows reference and sensed SAR images of Bern Flood Data.

Reference SAR Image
Sensed SAR Image Bern Flood Data In this experiment, two SAR images of Bern Flood Data are firstly matched by our proposed method and five compared methods, and then the change detection is achieved based on the results obtained by six registration methods, respectively, where the PCA-Kmeans method [60] is used as the basic change detection method. Table 7 shows the experimental results, where 'Methods' expresses the registration methods, 'RMS all ' expresses the result of SAR image registration and 'Kappa' expresses the Kappa coefficients of SAR change detection [61], and the best performance corresponding to each index is bolded. From Table 7, it is obviously seen that the registration performance of the proposed method ('Ours') is superior to five compared methods, and meanwhile its corresponding Kappa value is higher than others, which illustrates that the higher registration result is a benefit of obtaining the better result for change detection of different SAR images.

Conclusions
In this paper, we propose a multi-scale fused SAR image registration method, where deep forest is employed as the basic learning model to construct the matching model. Considering that the information contained in image blocks with difference scales is different, multi-scale training sets are constructed to train a matching model based on multiple scales. Specifically, a multi-scale fusion strategy is proposed to integrate the predictive pair matching points from local and global views. Experimental results demonstrate the proposed method can obtain better registration performance for four datasets than for other compared methods. Meanwhile, the performance of different scales illustrates the fusion of multiple scales is superior to single fixed scale, which validates the effectiveness of the multi-scale fusion for SAR image registration. Furthermore, from experimental results, it is also observed that the number of pair matching points is small, and thus we will focus on how to obtain more and stricter pair matching points between the reference and sensed images in further works.