High-Resolution Optical Remote Sensing Image Registration via Reweighted Random Walk Based Hyper-Graph Matching

: High-resolution optical remote sensing image registration is still a challenging task due to non-linearity in the intensity di ﬀ erences and geometric distortion. In this paper, an e ﬃ cient method utilizing a hyper-graph matching algorithm is proposed, which can simultaneously use the high-order structure information and radiometric information, to obtain thousands of feature point pairs for accurate image registration. The method mainly consists of the following steps: ﬁrstly, initial matching by Uniform Robust Scale-Invariant Feature Transform (UR-SIFT) is carried out in the highest pyramid image level to derive the approximate geometric relationship between the images; secondly, two-stage point matching is performed to ﬁnd the matches, that is, a rotation and scale invariant area-based matching method is used to derive matching candidates for each feature point and an e ﬃ cient hyper-graph matching algorithm is applied to ﬁnd the best match for each feature point; thirdly, a local quadratic polynomial constraint framework is used to eliminate match outliers; ﬁnally, the above process is iterated until ﬁnishing the matching in the original image. Then, the obtained correspondences are used to perform the image registration. The e ﬀ ectiveness of the proposed method is tested with six pairs of high-resolution optical images, covering di ﬀ erent landscape types—such as mountain area, urban, suburb, and ﬂat land—and registration accuracy of sub-pixel level is obtained. The experiments show that the proposed method outperforms the conventional matching algorithms such as SURF, AKAZE, ORB, BRISK, and FAST in terms of total number of correct matches and matching precision.


Introduction
Image registration is to align two or more images taken by different sensors or by the same sensor but from different viewpoints [1,2], which is the basis processing of many remote sensing applications, such as change detection [3], image fusion [4], and environment monitoring [5]. For high resolution optical remote images, the relationship between intensity value of conjugate pixels is complex, which is known as non-linear differences and large geometric distortion [6], making the as the graph matching problem, such as spectral graph matching [36], probabilistic graph matching (HGM) [37,38], balanced graph matching [39], tensor-based high-order graph matching (TM) [40], and reweighted random walks graph matching [41,42]. However, in these algorithms, the number of graph nodes is only a few dozen, the computational cost is very high and a large amount of computer memory is required, which could be unaffordable for the remote sensing image registration.
On the basis of previous studies, this paper presents a reweighted random walk based hyper-graph matching for registration of multi-sensor optical remote-sensing imagery, and it mainly has two main characteristics: (1) Improving the robustness and success rate of image matching without paying too high of a computational cost. Image matching is an ill-posed problem, and ABM, FBM, as well as graph matching methods have their pros and cons. Currently, most graph matching methods have high computation cost and require large amount of computer memory, and many of them are not suitable for the remote sensing image registration since it needs to match a large number of feature points. This paper describes a framework of image matching that integrates ABM, FBM and graph matching methods together to improve the image matching robustness and success rate without paying too much computation cost; (2) Simultaneously utilizing high-order structure information and one-order intensity similarity in the matching process in an efficient way. Taking building the three-order similarity tensor for example, most graph matching algorithms will randomly sample a certain number of triangles for each point in the reference image, and all the possible triangles will be selected. In this paper, the candidates for each matching feature point are firstly searched by the ABM method, and the feature points' candidate relationship is utilized to build the hyper-edge tensor, which can significantly improve the sparseness of association graph and the computational efficiency.

Matching Feature
Various features-such as point features, linear features, and regional features-could be utilized. In the case of multi-sensor remote sensing imagery registration, although the linear and regional features are more stable and easy to identify and match, the extracted features often appear to be fragmented, incomplete, or not completely coincidental. Moreover, homogenous linear features and areal regions do not guarantee existing or having a well distribution in the image. All these factors are not conducive to the image registration of high accuracy. For point features, they are not limited to the image content and more suitable for the general cases. Through developing suitable matching strategies to enhance the matching success rate and robustness, it could guarantee the correct matching of point features, and when hundreds and thousands of feature points are successfully matched, they can depict the local differences accurately and derive satisfactory results. Therefore, the point feature is selected as the matching primitive. Strategies-such as initial point position prediction for false matching points, rotation and scale invariant area-based matching, and hyper-graph matching-are integrated to obtain large amount of well-distributed feature points.

Matching Process
The framework of our image registration method is illustrated in Figure 1. Before starting matching, the image pyramid with three levels is generated from original image in which different levels have different spatial resolutions (see Section 2.2). Firstly, well-distributed feature points were extracted by the Förstner operator. Next, the initial matching was conducted by SIFT matching method, and the affine transformation between the matching images was computed. Then, a rotation and scale invariant area-based matching method is used to derive the candidate matching points.
Finally, the hyper-graph matching which can simultaneously consider the radiometric and high-order structure information is used to find the correct matches from the matching candidates. Local quadratic polynomial constraint framework and TIN-based image resampling are adopted to eliminate the outliers and perform accurate image registration. The coarse-to-fine strategy is adopted and the matching results from the higher pyramid level are used as guidance for the matching in the next pyramid level.
Remote Sens. 2019, 11, 2841 4 of 21 and the matching results from the higher pyramid level are used as guidance for the matching in the next pyramid level.

Initial Matching by FBM Method
Since the size of remote sensing imagery is usually large, the coarse-to-fine strategy is adopted to increase the matching efficiency. The pyramid image is generated by the 3 × 3 pixel average method. The original image is regarded as the pyramid image of level zero, then the average gray value of 3 × 3 pixels in the original image is assigned to the corresponding pixel of the pyramid image of level one, iterating this process until a pyramid image of level three is generated.
The initial matching is only carried out in the pyramid image of highest level. The main purpose of initial matching is deriving the approximate geometric relationship between the reference image and the searching image. With this relationship, we can obtain the approximate values about the rotation and scale differences between the images and the range of overlapping area. In this paper, UR-SIFT [43] is employed for the initial matching. Compared with SIFT, UR-SIFT feature would more likely retain high-quality features across the entire image. The ratio R between the Euclidean distance to the closest neighbor and that of the second closest is set to be 0.7.
In order to robustly estimate the parameters of transformation model, the RANSAC algorithm is used to eliminate the effect of false matches. Since the geometric deformations are very small for the pyramid image of the highest level, the affine transformation model is sufficient to describe the geometric constraint between the matches

Initial Matching by FBM Method
Since the size of remote sensing imagery is usually large, the coarse-to-fine strategy is adopted to increase the matching efficiency. The pyramid image is generated by the 3 × 3 pixel average method. The original image is regarded as the pyramid image of level zero, then the average gray value of 3 × 3 pixels in the original image is assigned to the corresponding pixel of the pyramid image of level one, iterating this process until a pyramid image of level three is generated.
The initial matching is only carried out in the pyramid image of highest level. The main purpose of initial matching is deriving the approximate geometric relationship between the reference image and the searching image. With this relationship, we can obtain the approximate values about the rotation and scale differences between the images and the range of overlapping area. In this paper, UR-SIFT [43] is employed for the initial matching. Compared with SIFT, UR-SIFT feature would more likely retain high-quality features across the entire image. The ratio R between the Euclidean distance to the closest neighbor and that of the second closest is set to be 0.7.
In order to robustly estimate the parameters of transformation model, the RANSAC algorithm is used to eliminate the effect of false matches. Since the geometric deformations are very small for the pyramid image of the highest level, the affine transformation model is sufficient to describe the geometric constraint between the matches where (x 1 , y 1 ) are the coordinates of a point in the reference image, (x 2 , y 2 ) are the coordinates of its corresponding point in the searching image, h ij is the parameters of affine transformation model, and h 11 , h 21 represent the offset, h 12 , h 13 , h 22 and h 23 reflect the rotation and scale between the images. After obtaining the affine transform coefficients, the rotation angle θ and the scale λ between the images can be calculated using following formula: After the initial matching, the scope of overlapping area can be determined, which is divided into virtual grid cells. For each grid cell, one feature point is extracted in it by the Förstner operator.

Two-Stage Point Matching
In order to improve the success rate and robustness, a two-stage point matching approach is employed. Firstly, an ABM method-a rotation and scale invariant area-based matching algorithm is used to search the candidate matching point; and secondly, the high-order structural information between the feature points are exploited to find the correct matching points.

Candidate Point Matching by a Rotation and Scale Invariant ABM
For the multi-sensor optical satellite image registration, there would be large rotation and scale changes between the images, and this would lead to the failure of traditional NCC method. Thus, the correlation windows are warped before matching in the presence of such geometric distortions.
Specifically, we open a matching window in the reference image, predict its corresponding center position in the searching image, perform the image matching window warping, and finally calculate the similarity. The specific steps are as follows: (1) Prediction of initial position of conjugate point in the searching image. In the pyramid image of highest level, affine transformation model is directly used to calculate the initial position of conjugate point in the searching image for each point in the reference image. In other levels, for the points successfully matched in the higher level, just directly project it to the current level. For the point failed to be matched in the higher level, its nearest successfully matched point is used to derive the initial position.
As shown in Figure 2, (P 1 , P 1 ) is a point pair successfully matched in the higher pyramid level, and P 1 is the nearest successfully matched feature point for P 2 , which fails to match in the higher pyramid level. Based on the knowledge that structural information is relatively stable, through (dx, dy), which is the image distance between the feature point P 1 and P 2 , the rotation angle θ and the scale λ calculated from formula (2), we can obtain the initial position of the conjugate point of P 2 by the following formula where (x 1 , y 1 ) is the image coordinates of point P 1 , (x 2 , y 2 ) is the predicted image coordinates of point P 2 . (P , P ) ′ is a point pair successfully matched in the higher pyramid level, 2 P is a feature point failed to matching in the higher level, 2 P′ is the initial position predicted by our method. θ is the rotation angel between the reference image and the searching image.
(dx,dy) is the distance between the feature point 1 P and 2 P in the image.
(dx ,dy ) ′ ′ is the calculated distance between the 1 P′ and 2 P′ with consideration of the rotation and scale difference between two matching images. Since the 1 P is successfully matched and the position of P′ is known, we can derive the initial position for the feature point 2 P although it was failed to match previously.
(2) Perform the image matching window wrapping. The image matching window in the searching image is rotated and scaled according to the rotation angle θ and scale parameter λ . By doing the image matching window wrapping, the geometrical distortion effects can be partially compensated.

Point Matching by Hyper-Graph Matching
After finding candidate points in the searching image, the next step is to find correspondences between two sets of features, this process can be defined as a graph matching problem. This kind of approach is usually restricted to the normal graph embedding unary and pairwise relations. Pairwise relations are not enough to incorporate the information about the entire geometrical structure of features. Embedding higher-order information into the matching will overcome the limitation of pairwise similarity. Recent hyper-graph matching methods incorporate higher-order similarity measures to achieve more accurate results.
A hyper-graph ∈ , hyper-edges e ∈ Ε , and attributes A α ∈ associated with the hyper-edges. We consider two feature sets P and Q , which can formulate two hyper-graphs ( , , ) The goal of the hyper-graph matching is to establish mapping between nodes of two hyper-graphs ( , , ) p n and Q n denote the numbers of node in P G and Q G , respectively. We do not assume p Q n n = , i.e., there may be different numbers of feature points in the two feature sets to be matched. In the case of tie point matching, Q n is usually larger than p n . Suppose a set of all possible node correspondence P Q C V V = × , and k -tuples among them. For k -th order hypergraph matching, the similarities of the k -tuples are measured by comparing attributes of two k -th order hyper-edges  (P 1 , P 1 ) is a point pair successfully matched in the higher pyramid level, P 2 is a feature point failed to matching in the higher level, P 2 is the initial position predicted by our method. θ is the rotation angel between the reference image and the searching image. (dx, dy) is the distance between the feature point P 1 and P 2 in the image. (dx , dy ) is the calculated distance between the P 1 and with consideration of the rotation and scale difference between two matching images. Since the P 1 is successfully matched and the position of P is known, we can derive the initial position for the feature point P 2 although it was failed to match previously.
(2) Perform the image matching window wrapping. The image matching window in the searching image is rotated and scaled according to the rotation angle θ and scale parameter λ. By doing the image matching window wrapping, the geometrical distortion effects can be partially compensated.

Point Matching by Hyper-Graph Matching
After finding candidate points in the searching image, the next step is to find correspondences between two sets of features, this process can be defined as a graph matching problem. This kind of approach is usually restricted to the normal graph embedding unary and pairwise relations. Pairwise relations are not enough to incorporate the information about the entire geometrical structure of features. Embedding higher-order information into the matching will overcome the limitation of pairwise similarity. Recent hyper-graph matching methods incorporate higher-order similarity measures to achieve more accurate results.
A hyper-graph G = (V, E, A) consists of nodes v ∈ V, hyper-edges e ∈ E, and attributes α ∈ A associated with the hyper-edges. We consider two feature sets P and Q, which can formulate two hyper-graphs G P = (V P , E P , A P ) and G Q = (V Q , E Q , A Q ), The goal of the hyper-graph matching is to establish mapping between nodes of two hyper-graphs G P = (V P , E P , A P ) and G Q = (V Q , E Q , A Q ). n p and n Q denote the numbers of node in G P and G Q , respectively. We do not assume n p = n Q , i.e., there may be different numbers of feature points in the two feature sets to be matched. In the case of tie point matching, n Q is usually larger than n p . Suppose a set of all possible node correspondence For k-th order hyper-graph matching, the similarities of the k-tuples are measured by comparing attributes of two k-th order hyper-edges e P p 1 ,...,p k and e Q q 1 ,...,q k , which mean the hyper-edge connecting v P p 1 , . . . , v P p k and v Q q 1 , . . . , v Q q k respectively. The similarity function is denoted by Ω(·, ·), the k-th order similarity of the k-tuple is measured by Ω k (α P p 1 ,...,p k , α Q q 1 ,...,q k ), so the k-th order similarity tensor H (k) can be derived in a recursive manner as H Assuming the value of maximum order among all hyper-edges is δ, the resulting similarity tensor H (δ) contains the entire similarity information. In the rest, we abbreviate H (δ) as H.
The problem of hyper-graph matching is equivalent to looking for a binary assignment matrix X ∈ {0, 1} n p ×n Q such that X p,q is equal to 1 when v P p ∈ V P matches to v Q q ∈ V Q , and to 0 otherwise. It is natural to force one-to-one constraints that make X a permutation matrix where 1 n×1 denotes an all-ones vector with size n. Given x is the vector version of matrix X, the hyper-graph matching score is defined as Here the product x c w 1 . . . x c wk will be equal to 1 if the points p 1 , . . . , p k are all matched to the points q 1 , . . . , q k , and 0 otherwise. In the first case, it will add H c w 1 ,...,c w k to the total score function. This is a similarity measure, which will be high if the sets of features p 1 , . . . , p k is similar to the set q 1 , . . . , q k . We can find that H contains similarity values of all the order, S(x) amounts to the summation of all similarity values in all order, then the goal of the hyper-graph matching is to find the assignment vector x * which maximizes the matching score function S(x) under the constraints of Equation (5) Several algorithms have been proposed to solve this problem, including tensor matching (TM) [40], reweighted random walks for hyper-graph matching method (RRWHM) [41], hyper-graph matching (HGM) [42], etc. In this letter, a modified RRWHM method is introduced with consideration of the candidate relationship among the feature points.
First, an association graph G w = (V w , E w , A w ) is defined, in which every node represents a candidate correspondence. A random walk from a node v w1 ∈ G w to another node v w2 ∈ G w on the graph another correspondence c w2 between G P and G Q . To address the problem of scaling up the outlier nodes weight, an absorbing node is added to G w to perform the affinity-preserving random walks, the transition tensor P and the updating rule for the hyper-graph random walks can be summarized as In the equation, d w is the degree of the node represented by the sum of the weight values associated with it, x (t) is x at t-th iteration, r is the vector for personalization, λ is the bias parameters, and 1 is the all one vector.
In the method of RRWHM, to build the three order similarity tensor H, first a certain number of triangles per points in reference image are randomly sampled, and all possible triangles in the searching image are sampled and their descriptors are calculated and stored by KD-tree. For each of selected triangles in the reference image, several hundred, such as 300, nearest neighbors in the searching image are searched to build H. It can be found that in the RRWHM, building tensor H is very time consuming and needs large amount of computer memory. For image registration, the number of feature points would be a few thousand, building three-order tensor would demand too much memory that regular computer cannot afford.
To solve this problem, a sparse high-order similarity tensor without losing any useful structure information is built. After the candidate points are derived by the rotation and scale invariant ABM matching method, when vertex of the triangle in the reference image and that of in the searching image is not the correspondence candidate, then these two triangles would definitely not become the corresponding hyper-edges. Therefore, when we set the maximum number of candidate points for each feature point as 5, then the number of candidate triangles is only 5 3 at most, and there is no need to calculate all possible triangles in the searching image but only sample a few hundred nearest triangles to serve as the candidate triangles for building the tensor H.
Due to the image distortion, pairwise similarity measure, which is sensitive to the scale change, is not used in this study. Only the unary similarity and three-order similarity measure are adopted. For the unary similarity in H, NCC value is directly used. For the three-order similarity, three angles in the triangle are compared with their sine values where θ P wk and θ Q wk are the angles of node related to the correspondence w k . A balance weight is used to merge the unary similarity tensor H (1) and triple similarity tensor H (3) , where f 1 is the unary similarity, f 3 is the triple similarity, w H is the balanced weight, n 1 is the number of matched points and n 2 is the number of triangles. In this way, geometric and radiometric information are simultaneous utilized in the matching process.

Outlier Elimination and Image Resampling
In the matching process, the outliers are inevitably present, and they must be eliminated before undergoing the image registration. Traditionally, RANSAC-like algorithms are used to estimate the transformation model, such as homography and quadratic polynomial. The matches not conforming with the transformation model are regarded as outliers. However, when there is a relatively complex image distortion, the transformation model cannot serve as a criterion to determine outliers. Therefore, a local quadratic polynomial constraint framework is proposed in this letter. The quadratic polynomial is shown as where (a i , b i )(i = 0, · · · , 5) are the polynomial coefficients, (x m , y m ) and (x s , y s ) are the image coordinates of the matched point pairs in the reference image and searching image respectively. The procedure of outlier elimination is as follows: (1) Adopt the kd-tree to store the image coordinates of the matching points.
(2) Traverse and judge each matching point. For the current judging point, several nearest neighboring points around it are collected by using the K-NN strategy on the basis of image coordinates distance. For quadratic polynomial is used, we recommend the number of nearest neighboring points is better larger than 10. The estimated quadratic polynomial is used to calculate the coordinate residual of current judging point. When the coordinate residual is greater than RMSE twice, the judge point is regarded as outliers and indexed. (3) Return to step (1) to reconstruct kd-tree using the matching points which are not labeled as outlier after traversing all matching points. (4) Iteratively perform above process until no matching point is labeled as outlier.
The retained matching points are used as control points (CP) to form a pair of dense TINs to describe the mapping function. For each pair of triangles, the affine transformation is calculated, which is used to resample the image of the triangle in the searching image to the reference image. Iterating to process all the triangles, the image registration between the reference image and searching image is completed.

Description of Test Data
Six pairs of images captured by WorldView-1/3, GeoEye, GF-1, and SPOT-5 sensors are used for the evaluation. These datasets are all formed by the high-resolution optical satellite remote sensing images, covering different land type, such as mountain area, urban, suburb, and flat land. The images of the last two datasets are also covered by some clouds. The detail information of the experimental datasets is presented in Table 1.

Matching Results and Analysis
According to the matching strategy mentioned above, UR-SIFT matching is performed in the highest pyramid image level, to derive rough transformation information between the matching images. With this, the values of rotation angle and scale difference between the reference image and searching image are obtained, which are used for the compensation of searching image window. With the coarse-to-fine strategy, the matching results in the higher level are used as guidance for the matching in the next level. It is mainly manifested in the following two aspects: firstly, for the points successfully matched, just project its coordinates to the current level to derive its initial position; secondly, for the points that failed to match, search the nearest successfully matching point and use the relative geometric relationship between them to calculate its initial position in the current level.
Through this strategy, the matching failed points in the higher level can still be matched, and this would enhance the matching success rate effectively.
The block size used in our approach is 800 × 800 pixels, for each block several hundred or a few thousand feature points are extracted by the Förstner operator. The size of template window and search window is 13 × 13 pixels and 35 × 35 pixels respectively, and the threshold for normalized correlation coefficient is 0.7. After searching matched candidates for each point, the hyper graph matching is carried out block by block. As shown in the Figure 3, abundant feature points with effective distribution in all of the experimental images can be obtained.
Through this strategy, the matching failed points in the higher level can still be matched, and this would enhance the matching success rate effectively.
The block size used in our approach is 800 × 800 pixels, for each block several hundred or a few thousand feature points are extracted by the Förstner operator. The size of template window and search window is 13 × 13 pixels and 35 × 35 pixels respectively, and the threshold for normalized correlation coefficient is 0.7. After searching matched candidates for each point, the hyper graph matching is carried out block by block. As shown in the Figure 3, abundant feature points with effective distribution in all of the experimental images can be obtained. The first image pair is acquired from WorldView-1 satellite sensor, covering the mountainous area in Qinghai province of China with obvious illuminations differences and clear topographic relief. These factors result in locally non-linear difference in intensity and distortions in geometry. Nevertheless, our method can still obtain evenly distributed matched points.
The second image pair is also from the WorldView-1 satellite sensor, and it mainly covers suburb area containing large amount of terrace. The topographic relief is also obvious, and the repetitive texture from the terrace would make trouble to the feature point matching. However, our method can obtain satisfactory matching results as well.
The third image pair is from the Geoeye satellite sensor covering the urban areas and with a very high resolution of 0.5 m. The images are characterized with structured textures, and these textures benefit the feature point matching, but the locally geometric distortion caused by the buildings makes the accurate image registration very difficult.
The fourth image pair is from the flat area, and the image resolution is relatively low (5 m) compared with the aforementioned three datasets (0.5m). The two images in this pair are from SPOT-5 satellite, and they contain large amount of farmland and buildings. They are captured at different seasons with large time span of over two years, and so forth the images have considerable textural changes due to the rapid development in China (as shown in Figure 3h). It can find matches successfully in the unchanged textures, and good matching results are obtained.
The fifth image pair is from GF-1 satellite, and the image resolution is relatively low (16 m) compared with the aforementioned three datasets (0.5 m). Due to the low image resolution, the geometric distortion is relatively small, but some parts of the image are covered by the clouds, which would make trouble for the accurate image registration.
The sixth image pair is from the Worldview-3 satellite sensor covering the urban areas with an image resolution of 0.5 m. For the high image resolution and tall buildings, the distortion of building show the details of the sub-images marked by the rectangle in the (a-f). The red box represents the place of detail window on the right, the yellow circle represents the approximate position of the sub-images of registration chessboard images showed later in Figure 4, and the green cross represents the successfully matched corresponding points.
The first image pair is acquired from WorldView-1 satellite sensor, covering the mountainous area in Qinghai province of China with obvious illuminations differences and clear topographic relief. These factors result in locally non-linear difference in intensity and distortions in geometry. Nevertheless, our method can still obtain evenly distributed matched points.
The second image pair is also from the WorldView-1 satellite sensor, and it mainly covers suburb area containing large amount of terrace. The topographic relief is also obvious, and the repetitive texture from the terrace would make trouble to the feature point matching. However, our method can obtain satisfactory matching results as well.
The third image pair is from the Geoeye satellite sensor covering the urban areas and with a very high resolution of 0.5 m. The images are characterized with structured textures, and these textures benefit the feature point matching, but the locally geometric distortion caused by the buildings makes the accurate image registration very difficult.
The fourth image pair is from the flat area, and the image resolution is relatively low (5 m) compared with the aforementioned three datasets (0.5 m). The two images in this pair are from SPOT-5 satellite, and they contain large amount of farmland and buildings. They are captured at different seasons with large time span of over two years, and so forth the images have considerable textural changes due to the rapid development in China (as shown in Figure 3h). It can find matches successfully in the unchanged textures, and good matching results are obtained.
The fifth image pair is from GF-1 satellite, and the image resolution is relatively low (16 m) compared with the aforementioned three datasets (0.5 m). Due to the low image resolution, the geometric distortion is relatively small, but some parts of the image are covered by the clouds, which would make trouble for the accurate image registration.
The sixth image pair is from the Worldview-3 satellite sensor covering the urban areas with an image resolution of 0.5 m. For the high image resolution and tall buildings, the distortion of building is very severe, in additional, there is some clouds covering a large part of the top of the image, which makes the registration extremely hard.
To further examine the registration effectiveness of our method, the registration chessboard images of the sub-images marked by the circle in the Figure 3a-d are generated, depicted in Figure 4. The brightness of the slave image is specially adjusted to better express the registration effect.
It can be found that, for the first image pair-because the terrain is undulating and the mountain registration result is relatively poor-the main features such as mountain undulations, roads, and residential areas are accurately registered. For the image pairs of 2-4, significant linear objects such as roads and rivers have been accurately registered in suburbs, urban areas, and the continuity of image features has been well preserved. For the image pairs of 5-6, although the images covered by the clouds, which affect the distribution of the feature points, sufficient feature points can be derived in the areas not covered by the clouds, which enables us to get good registration results. is very severe, in additional, there is some clouds covering a large part of the top of the image, which makes the registration extremely hard.
To further examine the registration effectiveness of our method, the registration chessboard images of the sub-images marked by the circle in the Figure 3(a-d) are generated, depicted in Figure  4. The brightness of the slave image is specially adjusted to better express the registration effect.
It can be found that, for the first image pair-because the terrain is undulating and the mountain registration result is relatively poor-the main features such as mountain undulations, roads, and residential areas are accurately registered. For the image pairs of 2-4, significant linear objects such as roads and rivers have been accurately registered in suburbs, urban areas, and the continuity of image features has been well preserved. For the image pairs of 5-6, although the images covered by the clouds, which affect the distribution of the feature points, sufficient feature points can be derived in the areas not covered by the clouds, which enables us to get good registration results. (e) (f) Figure 4. Registration results of our method: (a-f) is the registration details of the sub-images marked by the circle in the Figure 3(a-f). For each dataset, three enlarged detail windows are illustrated, which are numbered with 1, 2, 3 and colored with red, green and blue accordingly.
To quantitatively analyze the registration accuracy, a number of conjugate points are selected as checkpoints in the above experimental area. The calculated coordinates of checkpoints on the slave images are obtained by interpolation using the dense TINs formed by the matching results and are compared with the correct coordinates on the slave image. The differences between the two are statistically analyzed. The registration results for the six datasets are shown in the Table 2.

Comparison with Other Methods
In the paper, SURF, AKAZE, ORB, BRISK, and FAST are chosen as the competitors to evaluate the effectiveness of the method. Recall and precision [43], were used as criteria in the evaluation. M is the number of matching points, C is the number of existing correspondences, CM is the number of correctly correspondences, and the two criteria are defined as: recall = CM/C and precision = CM/M.
Since the comparing algorithms often adopt a global transformation such as projective model to eliminate the erroneous matches, which could not accurately express the geometric transformation for the image with large size, especially covering mountainous and urban areas. Thus, four pairs of sub-images with size of 800 × 800 pixels are grabbed from the four datasets are used for evaluation.  Table 3 and Figure 5.  To quantitatively analyze the registration accuracy, a number of conjugate points are selected as checkpoints in the above experimental area. The calculated coordinates of checkpoints on the slave images are obtained by interpolation using the dense TINs formed by the matching results and are compared with the correct coordinates on the slave image. The differences between the two are statistically analyzed. The registration results for the six datasets are shown in the Table 2.

Comparison with Other Methods
In the paper, SURF, AKAZE, ORB, BRISK, and FAST are chosen as the competitors to evaluate the effectiveness of the method. Recall and precision [43], were used as criteria in the evaluation. M is the number of matching points, C is the number of existing correspondences, CM is the number of correctly correspondences, and the two criteria are defined as: recall = CM/C and precision = CM/M.
Since the comparing algorithms often adopt a global transformation such as projective model to eliminate the erroneous matches, which could not accurately express the geometric transformation for the image with large size, especially covering mountainous and urban areas. Thus, four pairs of sub-images with size of 800 × 800 pixels are grabbed from the four datasets are used for evaluation. For the algorithms like SURF, BRISK, and FAST, we directly used the relevant functions provide by the MATLAB 2018b with default values to perform the experiment; and for the algorithms like AKAZE and ORB, as they are not provided by the MATLAB 2018b, the open source software OpenCV with version 4.1.1 is used, and the parameters for the feature detection, description and matching are all default values. The computer configuration is as follows: Window10, Inter (R) Core (TM) i7-6700 @ 3.4.0GHz, 8G RAM. The matching results for four image pairs are shown in Table 3 and Figure 5.
As shown in the Table 3 and Figure 5, our method can obtain large number of feature points and outperform the other matching algorithms, especially in the terms of number of correct matches and matching precision. Hundreds of point pairs can be acquired by our method, while other algorithms can only obtain a few dozens of them, which are not sufficient for the accurate image registration, especially for the image with complex geometric distortion. The precision and the successfully matched points of our method are dramatically better than the other methods. However, we find that the recall of FAST algorithm is better than our method. We find that C is the successfully matched points, and C of FAST algorithm is very small than our method; for example, C is only 63 for FAST but 1428 for our method, and larger the C number higher the false match probability, that may be the reason why FAST algorithm have a higher recall. However, the C from FAST algorithm is too small, even though it has a relatively high recall, the correctly matched points are too few to perform accurate image registration. As shown in the Table 3 and Figure 5, our method can obtain large number of feature points and outperform the other matching algorithms, especially in the terms of number of correct matches and matching precision. Hundreds of point pairs can be acquired by our method, while other algorithms can only obtain a few dozens of them, which are not sufficient for the accurate image registration, especially for the image with complex geometric distortion. The precision and the successfully matched points of our method are dramatically better than the other methods. However, we find that (a) In general, the methods such as SURF, AKAZE, ORB, BRISK, and FAST, are carried out by nearest neighboring distance ratio of radiometric descriptors. When encountered image with repetitive textures or non-linear intensity differences, these algorithms are hard to find conjugate features, and this is the main reason that this kind of algorithms can only obtain a few feature points. For our method, it can simultaneously make use of the radiometric and high-order geometric information to search the corresponding points, which largely enhance the matching robustness and success rate, and by integrating the coarse-to-fine and outlier detection strategy, the matching in the current level will benefit from the matching results from former level, which ensures the whole matching process is robust and produces more correct points. This can be easily seen in Figure 6, which is the comparing results between our method and other classical feature matching methods for the image pair 2. Figure 7 is the matching results from our method without hyper-graph matching algorithm, which determines the correspondence for each feature point only by the rotation and scale invariant ABM. From the results, it is found that-through the strategies of rotation and scale invariant ABM, blunder rejection, and coarse-to-fine matching-it can derive large number of matches. However, for the repetitive textures of the image, it is prone to obtain false matches, and hard to reject them completely, as shown by the circle in the Figure 7. By integrating the high-order structural information in the matching process, it contributes a lot to the matching robustness and success rate. In general, the methods such as SURF, AKAZE, ORB, BRISK, and FAST, are carried out by nearest neighboring distance ratio of radiometric descriptors. When encountered image with repetitive textures or non-linear intensity differences, these algorithms are hard to find conjugate features, and this is the main reason that this kind of algorithms can only obtain a few feature points. For our method, it can simultaneously make use of the radiometric and high-order geometric information to search the corresponding points, which largely enhance the matching robustness and success rate, and by integrating the coarse-to-fine and outlier detection strategy, the matching in the current level will benefit from the matching results from former level, which ensures the whole matching process is robust and produces more correct points. This can be easily seen in Figure 6, which is the comparing results between our method and other classical feature matching methods for the image pair 2. Figure 7 is the matching results from our method without hyper-graph matching algorithm, which determines the correspondence for each feature point only by the rotation and scale invariant ABM. From the results, it is found that-through the strategies of rotation and scale invariant ABM, blunder rejection, and coarse-to-fine matching-it can derive large number of matches. However, for the repetitive textures of the image, it is prone to obtain false matches, and hard to reject them completely, as shown by the circle in the Figure 7. By integrating the high-order structural information in the matching process, it contributes a lot to the matching robustness and success rate. In general, the methods such as SURF, AKAZE, ORB, BRISK, and FAST, are carried out by nearest neighboring distance ratio of radiometric descriptors. When encountered image with repetitive textures or non-linear intensity differences, these algorithms are hard to find conjugate features, and this is the main reason that this kind of algorithms can only obtain a few feature points. For our method, it can simultaneously make use of the radiometric and high-order geometric information to search the corresponding points, which largely enhance the matching robustness and success rate, and by integrating the coarse-to-fine and outlier detection strategy, the matching in the current level will benefit from the matching results from former level, which ensures the whole matching process is robust and produces more correct points. This can be easily seen in Figure 6, which is the comparing results between our method and other classical feature matching methods for the image pair 2. Figure 7 is the matching results from our method without hyper-graph matching algorithm, which determines the correspondence for each feature point only by the rotation and scale invariant ABM. From the results, it is found that-through the strategies of rotation and scale invariant ABM, blunder rejection, and coarse-to-fine matching-it can derive large number of (a)

Conclusions
In this paper, a reweighted random walk based hyper-graph matching method is presented for the registration of high-resolution optical remote-sensing imagery. For the feature-based methods, they mainly utilize the radiometric information and the number of the correct matches is few and their distribution is uneven. This study proposes an efficient way to use the high-order structure information and radiometric information simultaneously and extends the hyper-graph matching method to the quasi-dense image matching domain. The proposed algorithm involves three steps: initial matching by a FBM method, two stage point matching, and outlier elimination. The initial matching is only carried out in the pyramid image of highest level to obtain the rough geometric relationship between the matching images. In the process of two-stage point matching, a rotation and scale invariant ABM is used to find candidate points for each feature point, and then by considering the candidate relationship between the matching points, a sparse high-order similarity tensor is efficiently built for hyper-graph matching, which helps to find correspondences between two sets of features. A local quadratic polynomial constraint framework is used to eliminate outliers of matched points. The acquired corresponding points are used as control points to carry out the image registration.
The proposed method has been evaluated using six datasets, which are captured by the high-resolution satellite optical sensors and cover different land types, such as mountain area, urban, suburb, and flat land. Furthermore, two datasets are covered by the cloud. These illustrated that the proposed method is highly adaptable to various situations and could be used for the practical high-resolution remote sensing image registration.

Conclusions
In this paper, a reweighted random walk based hyper-graph matching method is presented for the registration of high-resolution optical remote-sensing imagery. For the feature-based methods, they mainly utilize the radiometric information and the number of the correct matches is few and their distribution is uneven. This study proposes an efficient way to use the high-order structure information and radiometric information simultaneously and extends the hyper-graph matching method to the quasi-dense image matching domain. The proposed algorithm involves three steps: initial matching by a FBM method, two stage point matching, and outlier elimination. The initial matching is only carried out in the pyramid image of highest level to obtain the rough geometric relationship between the matching images. In the process of two-stage point matching, a rotation and scale invariant ABM is used to find candidate points for each feature point, and then by considering the candidate relationship between the matching points, a sparse high-order similarity tensor is efficiently built for hyper-graph matching, which helps to find correspondences between two sets of features. A local quadratic polynomial constraint framework is used to eliminate outliers of matched points. The acquired corresponding points are used as control points to carry out the image registration.
The proposed method has been evaluated using six datasets, which are captured by the high-resolution satellite optical sensors and cover different land types, such as mountain area, urban, suburb, and flat land. Furthermore, two datasets are covered by the cloud. These illustrated that the proposed method is highly adaptable to various situations and could be used for the practical high-resolution remote sensing image registration.

Conclusions
In this paper, a reweighted random walk based hyper-graph matching method is presented for the registration of high-resolution optical remote-sensing imagery. For the feature-based methods, they mainly utilize the radiometric information and the number of the correct matches is few and their distribution is uneven. This study proposes an efficient way to use the high-order structure information and radiometric information simultaneously and extends the hyper-graph matching method to the quasi-dense image matching domain. The proposed algorithm involves three steps: initial matching by a FBM method, two stage point matching, and outlier elimination. The initial matching is only carried out in the pyramid image of highest level to obtain the rough geometric relationship between the matching images. In the process of two-stage point matching, a rotation and scale invariant ABM is used to find candidate points for each feature point, and then by considering the candidate relationship between the matching points, a sparse high-order similarity tensor is efficiently built for hyper-graph matching, which helps to find correspondences between two sets of features. A local quadratic polynomial constraint framework is used to eliminate outliers of matched points. The acquired corresponding points are used as control points to carry out the image registration.
The proposed method has been evaluated using six datasets, which are captured by the high-resolution satellite optical sensors and cover different land types, such as mountain area, urban, suburb, and flat land. Furthermore, two datasets are covered by the cloud. These illustrated that the proposed method is highly adaptable to various situations and could be used for the practical high-resolution remote sensing image registration.
To demonstrate the advantage of the hyper-graph matching strategies, we have performed the experiment on image pair two with our method but without hyper-graph matching algorithms. Although the other strategies, such as rotation and scale invariant ABM, blunder rejection, and coarse-to-fine matching, enabled us to obtain large number of correct matches. However, when the image has repetitive textures, the intensity information is insufficient to find the correct matches. The hyper-graph matching algorithm simultaneously utilizes the radiometric and high-order geometric information to search the corresponding points, which contributes a lot to the improvement of matching robustness and success rate.
The relationship of the feature points are used, and a sparse high-order similarity tensor without losing any useful structure information is built, which enables us to overcome the computational burden and computer memory cost and introduce the hyper-graph matching algorithm to the remote sensing image registration.
To demonstrate the advantage of the proposed method, we compared it with the conventional matching algorithms, such as SURF, AKAZE, ORB, BRISK, and FAST. Four pairs of sub-images measuring 800 × 800 pixels are grabbed from the datasets of 1-4 and used for evaluation. Two criteria of recall and precision are used to evaluate the methods. From the experimental results, we find that the conventional methods can only derive very few feature points, which can only be used to derive global approximate geometric transformation. However, for our method, thousands of feature points can be derived, which guarantees the accurate image registration even for the difficult situations. The matching results can meet the image registration requirements of sub-pixel level accuracy.
However, as the high-order affinity tensor is needed to compute in our method, the computation time is longer than that of the compared methods, and how to improve the computational efficacy of the algorithm is our future work.