Region-by-Region Registration Combining Feature-Based and Optical Flow Methods for Remote Sensing Images

: While geometric registration has been studied in remote sensing community for many decades, successful cases are rare, which register images allowing for local inconsistency deformation caused by topographic relief. Toward this end, a region-by-region registration combining the feature-based and optical ﬂow methods is proposed. The proposed framework establishes on the calculation of pixel-wise displacement and mosaic of displacement ﬁelds. Concretely, the initial displacement ﬁelds for a pair of images are calculated by the block-weighted projective model and Brox optical ﬂow estimation, respectively in the ﬂat-and complex-terrain regions. The abnormal displacements resulting from the sensitivity of optical ﬂow in the land use or land cover changes, are adaptively detected and corrected by the weighted Taylor expansion. Subsequently, the displacement ﬁelds are mosaicked seamlessly for subsequent steps. Experimental results show that the proposed method outperforms comparative algorithms, achieving the highest registration accuracy qualitatively and quantitatively.


Introduction
The spatial position consistency of the same objects in multiple images guarantees the subsequent applications accuracy of remote sensing [1,2], medical imaging [3,4] and computer vision fields [5,6]. It is usually enforced by the means of geometric registration, which is a process of aligning different images of the same scene acquired at different times, viewing angles, and/or sensors [7].
In particular, quite a few algorithms have been proposed over the past decades for remote sensing images registration, broadly falling into two categories [8,9], i.e., the areabased methods and the feature-based methods. The area-based methods register images with their intensity information directly, whereas they are disabled to cope with large rotation and scale changes [10]. Consequently, more attention has been paid to feature-based methods [10][11][12][13][14][15][16]. The sensed image is aligned to the reference image by their significant geometrical features rather than intensity information, including feature extraction, feature matching, transformation model construction, coordinate transformation, and resampling [14]. Taking the feature point extraction as an example, the scale-invariant feature transform (SIFT) [17], the speeded-up robust feature (SURF) [18], and the extended SIFT or SURF [15,16] are well-known operators. The exhaustive search method [17] and the KD-tree matching algorithm are popular representatives of feature matching. The random sample consensus (RANSAC) [19] or maximum likelihood estimation sample consensus regions, respectively. Secondly, the abnormal displacements correction is conducted by integrating the feature-based GP model for adaptive detection with the weighted Taylor expansion for thorough modification. Afterwards, the displacement fields of flat-and complex-terrain regions are mosaicked to generate a seamless one. Ultimately, the aligned image is obtained by the coordinate transformation and resampling.

Initial Displacements Calculation
As mentioned above, remote sensing images to be registered are divided into a series of flat-and complex-terrain parts in accordance with their topographic characteristics for initial displacement calculation. Since the elevation data is the typical representation of the terrain characteristics, among many sets of elevation data, considering the stability and wide availability, the digital elevation model (DEM) of the optimized shuttle radar topography mission (SRTM) was employed in our experiments. The local elevation difference (LED) indicating the terrain differences among the adjacent DEM cells in a specified window, is screened as the criterion of region division. Herein, LED is calculated by the elevation differences from the central cell to its eight neighbors.

Initial Displacements Calculation
As mentioned above, remote sensing images to be registered are divided into a series of flat-and complex-terrain parts in accordance with their topographic characteristics for initial displacement calculation. Since the elevation data is the typical representation of the terrain characteristics, among many sets of elevation data, considering the stability and wide availability, the digital elevation model (DEM) of the optimized shuttle radar topography mission (SRTM) was employed in our experiments. The local elevation difference (LED) indicating the terrain differences among the adjacent DEM cells in a specified window, is screened as the criterion of region division. Herein, LED is calculated by the elevation differences from the central cell to its eight neighbors.
where DEM i represents the elevation in the i-th local window and LED i is the local terrain difference of the central pixel. mask_LED i is the terrain mask, and it equals one when LED i is larger than the threshold T r , otherwise zero. T r is an integer ranging from five to eight, which is empirically set as seven in the experiment. However, there are inevitable fragmentary blobs, e.g., a small hole belonging to the complex-terrain region appears in the flat-terrain region. The hole-filling algorithm [32] and the morphological expansion approach [33,34] are utilized for terrain mask post-processing, where the expansion radius is set as ten. In this way, images are divided into different parts with two kinds of terrain characteristics. The displacement fields of the flat-and complex-terrain regions will be calculated by the feature-based BWP model and BOF estimation, respectively.

Displacements Calculation in the Flat-Terrain Region
In the flat-terrain region, the feature-based method is utilized to calculate the pixelwise displacements between the reference and sensed images. The feature points are extracted by SIFT and matched by the nearest neighbor distance ratio (NNDR) algorithm for transformation model construction [17]. As one of the local models, BWP is employed to calculate the displacements [35] by dividing the image into blocks and designing block-wise geometric relationships. Generally, if the transformed point in the sensed image coincides with the original corresponding point in the reference image with the transformation model H, it is rewritten as X × HX = → 0 [14]. Linearizing the equation and the projective transformation model h of a specified block is estimated as follows [25].
where N is the number of the matched feature points. m is the left part of the linearized X × HX = → 0 and M represents the stack of all m. w i is the weight of the specified feature point calculated by the inverse distance weight (IDW) function, varying with blocks. For each block, with the transformation model h, the pixel-wise displacement

Displacements Estimation in the Complex-Terrain Region
As for the displacement field estimation in the complex-terrain region, the classical motion estimation method proposed by Brox et al. [36] is employed (namely BOF) and rewritten as follows: (4) where o o f = (u o f , v o f ) is the displacement field, and (x, y) represents the coordinate in the reference image. I s and I r mean the intensity value in the specified location of the sensed and reference image, respectively. ∇ = (∂ x , ∂ y ) T is the gradient. ψ(s 2 ) = √ s 2 + ε 2 is modified L 1 minimization. γ and α are the weights of gradient consistency and smoothness terms, respectively.

Abnormal Displacements Correction
However, BOF is sensitive to occlusions, which is similar to the LULC changes of remote sensing images, leading the abnormal displacements. These abnormal displacements will successively result in the changed content of the aligned image. Therefore, they should be corrected by the following algorithms.

Incorrect Displacements Detection
Generally, abnormal displacements appear in the region where the objects in the reference image could not be sought in the sensed image. In this situation, the displacements are discontinuous and mutational compared to their surroundings, with large or small magnitude as well as different directions. Since the feature-based global model obtains the continuous and stabilized displacement field even around the region of LULC changes, it is exactly used for detecting the abnormal displacements. By comparison with the displace-Remote Sens. 2021, 13, 1475 5 of 18 ments of the feature-based method, the abnormal displacements of BOF are adaptively detected by a given threshold. Concretely, the feature points are extracted by SIFT. The relationship between the reference and the sensed image assumes to be described by the GP model H p . Then the pixel-wise displacement o p is generated by Equation (5) and the mask for abnormal displacement is obtained in Equation (6).
where ∆o = [∆o(x), ∆o(y)] represents the displacement differences in x and y directions. T mx and T my are the thresholds determined by the specified percentile of the ascendingorder ∆o. It is hard to detect abnormal displacements identical to the LULC changes. To ensure the correction effect, the detected abnormal displacements are appropriately enlarged than the actual size of LULC changes with the thresholds. Our experiments verified that extra small-scale false abnormal displacements (in fact, they are normal displacements) will not reduce the final registration accuracy. According to our experimental experience, the available percentage range is [0.7, 0.9], which is set as 0.75 in the following experiments.

Incorrect Displacements Rectification
Guided by the mask from the detection result, the abnormal displacements will be corrected by the weighted first-order Taylor expansion. As illustrated in Figure 2, the abnormal displacements on the boundary are firstly recalculated, utilizing the neighboring accurate displacements specified with a yellow dotted rectangle. The boundary gradually propagates into the abnormal displacement region in white along the direction of arrows, until it is corrected completely. the displacements of the feature-based method, the abnormal displacements of adaptively detected by a given threshold. Concretely, the feature points are extr SIFT. The relationship between the reference and the sensed image assumes t scribed by the GP model p H . Then the pixel-wise displacement p o is generated tion (5) and the mask for abnormal displacement is obtained in Equation (6).
represents the displacement differences in x and tions. mx T and my T are the thresholds determined by the specified percentile o cending-order o Δ . It is hard to detect abnormal displacements identical to th changes. To ensure the correction effect, the detected abnormal displacements ar priately enlarged than the actual size of LULC changes with the thresholds. Ou ments verified that extra small-scale false abnormal displacements (in fact, they mal displacements) will not reduce the final registration accuracy. According to perimental experience, the available percentage range is [0.7, 0.9], which is set a the following experiments.

Incorrect Displacements Rectification
Guided by the mask from the detection result, the abnormal displacement corrected by the weighted first-order Taylor expansion. As illustrated in Figure 2 normal displacements on the boundary are firstly recalculated, utilizing the neig accurate displacements specified with a yellow dotted rectangle. The boundary g propagates into the abnormal displacement region in white along the direction o until it is corrected completely.  Concretely, supposing q is the position of an abnormal displacement, it will be corrected with p i in its neighborhood q . There is an assumption that the displacement variation is locally small in q [37,38], as shown in Figure 3. As seen, taking part of the displacement field, owning 21 × 21 pixels, the displacement is displayed with yellow arrow every 5 pixels. Except the abnormal displacements in the lower right corner, the displacement variation is locally small in both magnitude and direction, which is consistent with the continuous law of remote sensing image. Under this circumstance, the displace-Remote Sens. 2021, 13, 1475 6 of 18 ment in q (denoted as o p q ) could be written by the first-order Taylor series expansion in Equation (8) [37,38].
where ∇o o f = (∇u o f , ∇v o f ) is the gradient of the displacement vector and contains xand y-direction gradient. (q − p) = (∆x, ∆y) T denotes the coordinate difference between pixels q and p. The abnormal displacement of q (denoted as o CBOF (q)) is recalculated by Equation (9).
where w p is a weighting function by the product of two essential factors. One is the geometric distance w dis , and the other is the structural similarity w PC . w dis is calculated by IDW, which means that the closer to the pixel q to be corrected, the greater the weight.
Remote Sens. 2021, 13, 1475 6 of 18 Concretely, supposing q is the position of an abnormal displacement, it will be corrected with i p in its neighborhood q ℜ . There is an assumption that the displacement variation is locally small in q ℜ [37,38], as shown in Figure 3. As seen, taking part of the displacement field, owning 21 × 21 pixels, the displacement is displayed with yellow arrow every 5 pixels. Except the abnormal displacements in the lower right corner, the displacement variation is locally small in both magnitude and direction, which is consistent with the continuous law of remote sensing image. Under this circumstance, the displacement in q (denoted as p q o ) could be written by the first-order Taylor series expansion in  The neighboring pixels with similar structures from the same object possibly own a similar displacement. The image structure extracted by the phase congruency [13], is similar to the image gradient whereas it is insensitive to illumination and contrast changes. The LULC changes in the sensed image will lead to incorrect weight. Therefore, structural similarity is calculated in the reference image.
Equation (11) should not be calculated if the denominator is zero. X( ∼) represents the coordinate of the specified pixel. PC means the phase congruency in the reference image. Although the measure does not capture the exact structure similarity between q and p, the structural similarity between q and p on the reference image gives an approximation.
To summarize, with the geometric distance term w dis , the effect from distant accurate displacement decreases. Additionally, the structure similarity w PC approximates anisotropic propagation of neighboring accurate displacement. To avoid a jump between the corrected displacements and the original accurate displacements, the median filtering is conducted for the ultimate adjustment. An example of abnormal displacement detection and correction is shown in Figure 4. Figure 4a,b are the reference and sensed images. The LULC changes are marked with a yellow round rectangle. It leads to abnormal displacements estimation in Figure 4c, where the magnitude and direction of displacements are inconsistent with their neighborhood. The abnormal displacements further change the image content, comparing the region marked with a yellow round rectangle in Figure  4b,e. The abnormal displacements are detected and their corresponding mask is shown in Figure 4d. The white pixels represent the abnormal displacements and the black means the accurate ones. The abnormal displacements are corrected and shown in Figure 4f, where the corrected displacements are similar to the displacements in their surrounding region. Furthermore, the aligned image (see Figure 4g) is similar to the corresponding region in the sensed image in Figure 4b. Thus, the corrected BOF method is effective for abnormal displacement correction.

Displacement Fields Mosaic
As mentioned previously, the proposed algorithm respectively calculates the displacements of flat-terrain and complex-terrain regions. To generate a displacement field

Displacement Fields Mosaic
As mentioned previously, the proposed algorithm respectively calculates the displacements of flat-terrain and complex-terrain regions. To generate a displacement field for the entire image, the displacement fields from different regions should be mosaicked. Avoiding the jump at the edge of different displacement fields, for example, a buffer is constructed for two displacement fields, as shown in Figure 5, which is similar to image mosaic [39]. Outside the buffer, the displacements at the bottom and top come from the corrected BOF and feature-based BWP, respectively. Inside the buffer, each pixel's displacement is a weighted combination of the two displacement fields. Without any loss of generality, we choose a simple weighted standard: distance. For example, as shown in Figure 5, for any pixel l(i, j) in the buffer, the distance d is the buffer size from the bottom border to the top. The closer the pixel l(i, j) lies to the bottom of the buffer, the heavier the weight of the displacement from the complex-terrain region is, and the lighter the weight of the flat-terrain region displacement is; and vice versa. tion.

Displacement Fields Mosaic
As mentioned previously, the proposed algorithm respectively calcu placements of flat-terrain and complex-terrain regions. To generate a disp for the entire image, the displacement fields from different regions should Avoiding the jump at the edge of different displacement fields, for examp constructed for two displacement fields, as shown in Figure 5, which is sim mosaic [39]. Outside the buffer, the displacements at the bottom and top c corrected BOF and feature-based BWP, respectively. Inside the buffer, ea placement is a weighted combination of the two displacement fields. Witho generality, we choose a simple weighted standard: distance. For example  Concretely, supposing f o is the final mosaicked displacement field, as follows. Concretely, supposing o f is the final mosaicked displacement field, it is calculated as follows.
where w CBOF and w BWP are the respective weights of displacement in the complex-and flat-terrain part in the buffer. o CBOF is the displacement field from the complex-terrain region. c represents the complex-terrain region and f is the flat-terrain region. It is required that w CBOF (i, j) + w BWP (i, j) = 1 and 0 ≤ w CBOF (i, j), w BWP (i, j) ≤ 1. Therefore, the weight could also be designed according to IDW. Supposing that the distance of any pixel (i, j) in the buffer to the border closer to the complex-terrain region is denoted as w CBOF (i, j) = 1 − coordinates in the sensed image are directly transformed. After resampling, the aligned image is obtained.

Experiments and Evaluations
In this section, four representative pairs of remote sensing images in Table 1 were testified to evaluate the proposed algorithm qualitatively and quantitatively. Firstly, the results of the proposed method were visually compared with those of three local transformation models under the feature-based framework and the original optical flow estimation, i.e., PLM [22], TPS [24], BWP-OIS [25], and BOF [36]. PLM designs a series of local affine transformation models for each triangle constructed by the extracted feature points [22]. TPS estimates the geometrical relationship between the reference and sensed image by a systematic affine transformation model and a weighted radial basis function for considering local distortion [24]. BWP-OIS locally registers images in a two-step process by integrating the block-weighted projective transformation model and the outlier-insensitive model [25]. BOF registers the image by directly transforming the coordinates and resampling with pixel-wise displacements calculated under the brightness consistency and the gradient consistency [36]. Secondly, three assessment indicators were used for quantitative evaluation in Section 3.2, including root-mean-square error (RMSE), normal mutual information (NMI), and correlation coefficients (CC).

Visual Judgment
The registration results of the proposed algorithm are shown in Figure 6. As seen from the original images in Figure 6a,b, they show a large radiation difference ("test-I"), or are contaminated by uneven clouds ("test-II"), or include typical hybrid terrain region ("test-III"), or cover the complicated terrain ("test-IV"). The terrain mask is calculated by TR in Equation (1) and shown in Figure 6c. The white pixels represent the complex-terrain region and the black ones mean the flat-terrain region. They are approximately consistent with the visual observation and elevation data. Especially, since the test-IV images are located in Chongqing, China, which is known as the "mountain city", most of the image is in the complex-terrain region and they are white in the terrain mask. The overlapping results of the original images in Figure 6d show blurs and ghosts, which indicates that the geometrical deformation exists between reference and sensed images. By contrast, the overlapping results of aligned images by the proposed algorithm in Figure 6e are visually clear and distinctive. In other words, the experiments demonstrate the effectiveness of the proposed algorithm.
in the complex-terrain region and they are white in the terrain mask. The overlapping results of the original images in Figure 6d show blurs and ghosts, which indicates that the geometrical deformation exists between reference and sensed images. By contrast, the overlapping results of aligned images by the proposed algorithm in Figure 6e are visually clear and distinctive. In other words, the experiments demonstrate the effectiveness of the proposed algorithm.  Successively, the comparison results of the proposed algorithm and other comparative methods are shown in Figure 7, which are the enlarged sub-regions of the red and green dotted-rectangle regions in Figure 6a for fine comparisons. The vertical alignment is labeled with red "A" and the horizontal alignment is marked with green "B". The boundaries of the reference and aligned images are available to judge the registration accuracy. The white arrows and filled circles are auxiliary to indicate the displacements between the reference and aligned image. The arrows depict the direction of misalignment and their lengths are proportional to the amount of dislocation from the aligned image to the reference image. The white filled circles represent the well-registered results.

Quantitative Evaluation
The quantitative evaluation with three indictors was further used for verifying the visual judgment. On one hand, RMSE depicts the spatial registration accuracy. On the other hand, NMI and CC judge the similarity between the aligned image and the reference image according to their corresponding pixel values.
The RMSE focuses on evaluating the registration result by calculating the average distance of the corresponding points in the reference and aligned image. As most literature did, some distinct feature points from the corresponding reference and aligned images were extracted manually to calculate their RMSE [7,40]. RMSE of the comparative algorithms is listed in Table 2. The feature points for evaluation are extracted manually and the number is listed in brackets. "Original" means the RMSE of the reference and sensed image before registration. All the algorithms relieve the geometric distortion of the reference and sensed image as shown in Table 2. TPS gives the largest RMSE among all While focusing on the first comparison for "test-I" in Figure 7, row "A" shows the vertical registration of different methods. The "Original" broken linear features mean a large vertical deformation between the reference and sensed images. PLM and TPS eliminate the most of vertical deformation, but not completely. BWP-OIS results in overcompensation [25] to the sensed image so that the opposite misalignment of the linear features is introduced on the boundary. BOF and the proposed algorithm achieve highly accurate registration, where the linear features are continuous and smooth at the same position. For row "B", though most of the horizontal displacements have been eliminated, the results of PLM and TPS still have dislocations. BWP-OIS, BOF, and the proposed algorithm register the sensed image well. However, the abnormal objects are introduced by BOF, which are not visible in the aligned image of the proposed method, marked with the white dotted ellipses. Therefore, the proposed algorithm gives high registration accuracy and guarantees the content against changes simultaneously.
The second comparison is for "test-II" in Figure 7. In Row "A", the road at the bottom and the edge of the second Chinese character on the top are used to judge the registration result. The second Chinese character should be symmetrical, whereas its left and right parts are asymmetries due to the vertical dislocation. Although PLM aligns the road continuously and smoothly, the deformation of the second Chinese character is magnified. TPS could not align the sensed image well to the reference image. The Chinese character is aligned by BWP-OIS, but the road is not continuous. Fortunately, BOF and the proposed algorithm obtain the well-aligned outcomes for the road and the second Chinese character. However, the road on the result of BOF is changed. In "B" row, the road in the sensed image is shifted a lot to right by PLM and TPS. The proposed algorithm, BOF, and BWP-OIS provide the desired alignments. The only drawback is the content of the sensed image is altered by BOF. Therefore, PLM and TPS fail to align the sensed image to the reference image spatially. BWP-OIS gives desirable registration in the horizontal direction but not in the vertical direction. BOF generates accurate alignments accompanied by abnormal changes marked with white dotted ellipses. The proposed algorithm outperforms others in alignment with accuracy or content fidelity.
The selected regions in "test-III" and "test-IV" in Figure 6 further verify the proposed algorithm, as shown in Figure 7. For "test-III" in Figure 7, a similar phenomenon is observed, where the proposed algorithm outperforms others both in the vertical and horizontal directions. The phenomenon of the content change introduced by BOF is removed by the proposed algorithm. For "test-IV" in Figure 7, the other three algorithms excluding PLM provide the well-aligned outcomes in "A" row. In "B" row (test-IV), PLM and TPS widen the dislocation, and BWP-OIS introduces the opposite misalignment of the bridge. BOF and the proposed algorithm completely eliminate the deformation and the bridge is connected smoothly and continuously. However, BOF changes the content (labeled by a white dotted ellipse), and the proposed algorithm keeps the content of the sensed image well. In conclusion, the proposed algorithm visually outperforms other algorithms in all experiments. To further evaluate the proposed algorithm, the quantitative evaluation is conducted.

Quantitative Evaluation
The quantitative evaluation with three indictors was further used for verifying the visual judgment. On one hand, RMSE depicts the spatial registration accuracy. On the other hand, NMI and CC judge the similarity between the aligned image and the reference image according to their corresponding pixel values.
The RMSE focuses on evaluating the registration result by calculating the average distance of the corresponding points in the reference and aligned image. As most literature did, some distinct feature points from the corresponding reference and aligned images were extracted manually to calculate their RMSE [7,40]. RMSE of the comparative algorithms is listed in Table 2. The feature points for evaluation are extracted manually and the number is listed in brackets. "Original" means the RMSE of the reference and sensed image before registration. All the algorithms relieve the geometric distortion of the reference and sensed image as shown in Table 2. TPS gives the largest RMSE among all registration algorithms. BWP-OIS performs better than PLM in the last three experiments. The smallest RMSE is obtained by the proposed algorithm and the suboptimal performance is given by BOF. Since there are some abnormal objects and pseudo traces in the aligned image of BOF, these anomalies change the shape or the original track of linear features in the aligned image, causing lower registration accuracy. The proposed algorithm performs better than the three feature-based algorithms, which means that it is more favorable for registering complex-terrain regions. Furthermore, RMSE of the proposed algorithm is smaller than that of BOF, indicating that the feature-based method owns the similar registration accuracy without being affected by the LULC changes in the flat-terrain region, and the proposed algorithm copes with the LULC changes in the complex-terrain region well. To further evaluate the performance of the proposed method, we extract the complexterrain region of the aligned image and the corresponding part in the reference image. The extracted ones of the above experiments are conducted to calculate the NMI and CC shown in Figure 8a,b. NMI and CC are the similarity criteria, usually employing as the iteration termination condition in the area-based registration algorithm. The NMI index measures the statistical correlation of images, which is defined as follows: where H(X re f ) and H(X sen ) are the entropy of images X re f and X sen , respectively. H(X re f , X sen ) is the joint entropy of two images. Larger NMI correspond to a greater similarity between two images. The CC value of images X re f and X sen is calculated as: CC(X re f , X sen ) = σX re f X sen σX re f σX sen (15) where σX re f X sen denotes covariance between two images. σX re f and σX sen are the standard deviations of two images, respectively. Since CC could range from −1 to 1, 1 indicates perfect correlation.
, 1475 14 of 18 the RMSE of the test-I original images in Table 2 is larger than those of other experiments. Hence, there is a big jump in Figure 8 of NMI and CC values on test-I. The proposed method automatically registers images, including image division according to the terrain characteristics, alignment of flat-terrain region with feature-based algorithm, the pixelwise displacement calculation of complex-terrain region with improved optical flow estimation. The thresholds in these steps are determined with the experience of a large number of experiments. The evaluation demonstrates that the proposed method performs well on the mixed-terrain region compared with the conventional feature-based methods and the original optical flow algorithm. To a certain extent, it also shows that the scheme of region-by-region registration is feasible.

Discussions
The previous section validated the registration result of the proposed method. This section focuses on core parameter analysis. The main parameter is the LED threshold r T for the terrain division, determining the algorithm utilized to register the local region of images and further influencing the final registration accuracy. r T for terrain division determines that a pixel should be divided into the complex terrain region or flat-terrain region, which further influences the overall registration accu- As shown, the proposed method achieves the largest NMI and CC in all the experiments. The registration accuracy of PLM is close to that of BWP-OIS whereas the latter is slightly superior in "test-III" and "test-IV" in Figure 8. The local mapping functions of PLM are constructed by feature points, so its accuracy is limited by the precision of the feature points extraction. On that account, BWP-OIS alleviates the situation by designing the transformation functions with all feature points in different weights. Therefore, BWP-OIS performs better than PLM in most instances. Additionally, TPS gets the worst registration accuracy in four experiments, which is suitable for images with dense and uniformly distributed feature points [23]. It is difficult to extract feature points in the complex-terrain region, especially points with uniform distribution. In addition, the registration accuracy of BOF is comparable to the proposed method in the complex-terrain regions when evaluating with NMI and CC, which is similar to the RMSE result. In addition, it is worth noting that there is a big jump on test-I. It is because there is a large deformation between the reference and sensed images, resulting in the smaller NMI and CC compared with those of the registration results. On the one hand, not only the local region rounded with yellow rectangles of test-I in Figure 5 presents an obvious geometric dislocation; but the RMSE of the test-I original images in Table 2 is larger than those of other experiments. Hence, there is a big jump in Figure 8 of NMI and CC values on test-I. The proposed method automatically registers images, including image division according to the terrain characteristics, alignment of flat-terrain region with feature-based algorithm, the pixel-wise displacement calculation of complex-terrain region with improved optical flow estimation. The thresholds in these steps are determined with the experience of a large number of experiments. The evaluation demonstrates that the proposed method performs well on the mixed-terrain region compared with the conventional feature-based methods and the original optical flow algorithm. To a certain extent, it also shows that the scheme of region-by-region registration is feasible.

Discussions
The previous section validated the registration result of the proposed method. This section focuses on core parameter analysis. The main parameter is the LED threshold T r for the terrain division, determining the algorithm utilized to register the local region of images and further influencing the final registration accuracy.
T r for terrain division determines that a pixel should be divided into the complex terrain region or flat-terrain region, which further influences the overall registration accuracy. To reveal the relationship, a series of experiments were conducted with different thresholds. Without loss of generality, the aforementioned experimental data in Figure 6 is employed, with the threshold varied from five to thirteen. The quantitative evaluation results of the final registration with varied thresholds are shown in Table 3, with the evaluation indicators of NMI and CC. For the four groups of experiments, with the increase in the threshold, the two indicators are at first small fluctuations, or even stable, and then decrease, showing the relatively simple relationship. However, focusing on the quantitative evaluation of "test-III", the largest NMI is obtained when the LED threshold is set as eight whereas the best CC is generated while T r equals six. Except for the third experiment, they share the same phenomenon, in that the optimal registration results are obtained while the LED threshold is set as seven. When the threshold is larger than seven, more pixels are put into the flat-terrain region whereas the complex terrain is in practice, weakening the registration accuracy. However, this does not mean that the smaller the threshold is, the higher the registration accuracy will be, such as the third and fourth groups of the experiments. However, when the available range of LED threshold is five to eight, the indicators do not show an obvious difference, where the largest numerical difference is 0.0008. Under this circumstance, the difference between the different registration results could not be caught by the eyes. After the comprehensive consideration, the LED threshold was determined to be seven in the experiments.
For conveniently applying this technique, taking test-III as an example, the terrain masks generated by different thresholds are listed in Figure 9. The intention is to divide the image to be registered into multiple local regions with flat-and complex-terrain characteristics. That is to say, the heterogeneous features distributed in a topographic block are not put into this block as far as possible. Therefore, the hole-filling algorithm and the morphological expansion approach are utilized, as described in Section 2.1. Before the postprocessing of the terrain mask, the generated mask should provide an ideal foundation. The threshold of LED for terrain division is set with the visual observation and the quantitative evaluation of registration result. In Figure 9, the black indicates the complex-terrain region, and white means the flat-terrain region. When the threshold is 5, there are heterogeneous and small blobs both in the complex-and flat-terrain regions. While the threshold is 7, there are relatively few heterogeneous blobs in the flat-terrain region (denoted in white), and more white blobs in the complex-terrain region, compared with the first two masks. When compared with the last six masks, the terrain division is relatively regular, namely most flat-terrain pixels are in the white region and complex-terrain blobs are in black. The conclusion is similar to Table 3. Not only test-III but three other experiments also give a similar conclusion, which is not shown here. Therefore, after quantitative comparison and qualitative observation of a large number of experiments, we provide the available range of the threshold T r in Section 2.1 and the best recommended threshold is seven.
cators are at first small fluctuations, or even stable, and then decrease, showing the relatively simple relationship. However, focusing on the quantitative evaluation of "test-III", the largest NMI is obtained when the LED threshold is set as eight whereas the best CC is generated while r T equals six. Except for the third experiment, they share the same phenomenon, in that the optimal registration results are obtained while the LED threshold is set as seven. When the threshold is larger than seven, more pixels are put into the flatterrain region whereas the complex terrain is in practice, weakening the registration accuracy. However, this does not mean that the smaller the threshold is, the higher the registration accuracy will be, such as the third and fourth groups of the experiments. However, when the available range of LED threshold is five to eight, the indicators do not show an obvious difference, where the largest numerical difference is 0.0008. Under this circumstance, the difference between the different registration results could not be caught by the eyes. After the comprehensive consideration, the LED threshold was determined to be seven in the experiments. For conveniently applying this technique, taking test-III as an example, the terrain masks generated by different thresholds are listed in Figure 9. The intention is to divide the image to be registered into multiple local regions with flat-and complex-terrain characteristics. That is to say, the heterogeneous features distributed in a topographic block are not put into this block as far as possible. Therefore, the hole-filling algorithm and the morphological expansion approach are utilized, as described in Section 2.1. Before the post-processing of the terrain mask, the generated mask should provide an ideal foundation. The threshold of LED for terrain division is set with the visual observation and the quantitative evaluation of registration result. In Figure 9, the black indicates the complexterrain region, and white means the flat-terrain region. When the threshold is 5, there are heterogeneous and small blobs both in the complex-and flat-terrain regions. While the threshold is 7, there are relatively few heterogeneous blobs in the flat-terrain region (denoted in white), and more white blobs in the complex-terrain region, compared with the first two masks. When compared with the last six masks, the terrain division is relatively regular, namely most flat-terrain pixels are in the white region and complex-terrain blobs are in black. The conclusion is similar to Table 3. Not only test-III but three other experiments also give a similar conclusion, which is not shown here. Therefore, after quantitative comparison and qualitative observation of a large number of experiments, we provide the available range of the threshold r T in Section 2.1 and the best recommended threshold is seven.

Conclusions
In this paper, we proposed a region-by-region registration combining feature-based and optical flow methods for remote sensing images. The proposed method innovatively makes use of the geometric deformation characteristic varied with regions caused by the topographic relief, registering flat-and complex-terrain regions, respectively. Owing to the pixel-wise displacement estimation of dense optical flow algorithm, the registration accuracy in complex-terrain region is reinforced. Moreover, the adaptive detection and weighted Taylor expansion enforced makes up for the defect of the abnormal displacements in the LULC changed region, which is sensitive for dense optical flow estimation. In other words, the feature-based method, dense optical flow estimation, adaptive detection technique, and weighted Taylor expansion are all employed for high-precision pixelwise displacements, further for accurate registration. The accuracy of the complex-terrain region determines the whole registration precision of images covering the mixed terrain, which has attracted much attention in the proposed framework, and is realized by improving the optical flow algorithm. As confirmed in the experiments, the proposed approach outperforms the conventional feature-based methods and the original optical flow algorithm, not only qualitatively, but also quantitatively, from the aspects of visual observation, NMI, CC and RMSE.
However, the proposed method is just an interesting first trial of mixed-terrain images registration. The result is preliminary, and there is still room for improvement. For instance, when the experiments are conducted on Matlab2018a settled on computer with an Intel (R) Xeon (R) CPU E5-2650 v2 2.6GHz, the runtime of our experiments is 295.237s, 294.044s, 373.050s, and 143.808s respectively. It is time-consuming to register images by the proposed algorithm, which exceeds the expected time of image registration in the preprocessing stage, even real-time processing. Therefore, the time efficiency needs to be improved with parallel processing on the C++ platform, particularly for the complex-terrain Figure 9. The terrain mask of test-III with different thresholds (Note: black represents the complex-terrain region and white means the flat-terrain region).

Conclusions
In this paper, we proposed a region-by-region registration combining feature-based and optical flow methods for remote sensing images. The proposed method innovatively makes use of the geometric deformation characteristic varied with regions caused by the topographic relief, registering flat-and complex-terrain regions, respectively. Owing to the pixel-wise displacement estimation of dense optical flow algorithm, the registration accuracy in complex-terrain region is reinforced. Moreover, the adaptive detection and weighted Taylor expansion enforced makes up for the defect of the abnormal displacements in the LULC changed region, which is sensitive for dense optical flow estimation. In other words, the feature-based method, dense optical flow estimation, adaptive detection technique, and weighted Taylor expansion are all employed for high-precision pixel-wise displacements, further for accurate registration. The accuracy of the complex-terrain region determines the whole registration precision of images covering the mixed terrain, which has attracted much attention in the proposed framework, and is realized by improving the optical flow algorithm. As confirmed in the experiments, the proposed approach outperforms the conventional feature-based methods and the original optical flow algorithm, not only qualitatively, but also quantitatively, from the aspects of visual observation, NMI, CC and RMSE.
However, the proposed method is just an interesting first trial of mixed-terrain images registration. The result is preliminary, and there is still room for improvement. For instance, when the experiments are conducted on Matlab2018a settled on computer with an Intel (R) Xeon (R) CPU E5-2650 v2 2.6GHz, the runtime of our experiments is 295.237s, 294.044s, 373.050s, and 143.808s respectively. It is time-consuming to register images by the proposed algorithm, which exceeds the expected time of image registration in the pre-processing stage, even real-time processing. Therefore, the time efficiency needs to be improved with parallel processing on the C++ platform, particularly for the complex-terrain region alignment. Additionally, when registering two degraded images with optical flow algorithm in the complex-terrain region, both images are simultaneously contaminated at the same location and abnormal displacements could not be restored as the weight from phase congruency for Taylor expansion calculating abnormally, which further changes the aligned image content compared with sensed image. Moreover, more mixed-terrain images from other sensors should be further tested in experiments for our proposed method. These issues will be addressed in our future work.