An ASIFT-based local registration method for satellite imagery

: Imagery registration is a fundamental step, which greatly affects later processes in image mosaic, multi-spectral image fusion, digital surface modelling, etc. , where the final solution needs blending of pixel information from more than one images. It is highly desired to find a way to identify registration regions among input stereo image pairs with high accuracy, particularly in remote sensing applications in which ground control points (GCPs) are not always available, such as in selecting a landing zone on an outer space planet. In this paper, a framework for localization in image registration is developed. It strengthened the local registration accuracy from two aspects: less reprojection error and better feature point distribution. Affine scale-invariant feature transform (ASIFT) was used for acquiring feature points and correspondences on the input images. Then, a homography matrix was estimated as the transformation model by an improved random sample consensus (IM-RANSAC) algorithm. In order to identify a registration region with a better spatial distribution of feature points, the Euclidean distance between the feature points is applied (


Introduction
Image registration refers to aligning two image point sets that share the same scene in a common coordination system.It is widely used in remote sensing applications, including multi-spectral classification, environment surveillance, super-resolution reconstruction, change detection and outer space exploitation [1][2][3][4].It is known that among those factors, the number of control points (CPs) and their spatial distribution across the registered region are of great importance to perform reliable registration [5].Inadequate, inaccurate or ill-distributed CPs may lead to undesirable consequences [6,7].In traditional registration approaches, CPs are obtained either manually selected by an experienced expert or pre-defined ground control points (GCP) [8].Unfortunately, the accuracy of manually-selected CPs are dependent on the judgment of the specific expert, and this may vary from person to person, while it is expensive or even impossible to establish these GCPs in some areas, such as the Arctic or outer space planets.To solve the problem mentioned above, automatic registration algorithms that require no or little manual work have emerged in the last few decades.Automatic registration algorithms can be classified into two main categories, intensity based and feature based [2,9].Intensity-based algorithms are more sensitive to variations in contrast, illumination, image noise and scale effects than feature-based ones [1,2].Moreover, they are unsuitable for multispectral image registration, because the intensity value of multispectral image does not change linearly [10].As a result, researchers gradually put their attention toward the feature-based local descriptors.Lowe [11,12] proposed a feature-based algorithm to extract scale-invariant key point and to find correspondence between image pairs, named scale-invariant feature transform (SIFT).SIFT locates key point by finding scale-space extreme in the difference-of-Gaussian (DoG) function; once a key point is detected, it will be described and matched based on its orientation, scale and coordination.SIFT and its variations are the most popular and successful feature detection algorithms in many remote sensing registration applications [2,9,[13][14][15][16].However, if the image pairs are obtained under a large difference of camera viewpoints, the correspondences generated by SIFT would be too few to perform a reliable registration.Affine-SIFT (ASIFT) is the anti-affine-enhanced edition of SIFT, and it outperformed many state-of-the-art methods in finding correspondences under affine distortion and large viewpoint changes [17].Though ASIFT can acquire sufficient correspondences even under a severe viewpoint change, it suffers from low registration accuracy [18].
In many remote sensing applications, local registration accuracy is much more important than global registration accuracy.For instance, in the selection of a lunar landing site, in order to obtain the actual morphology of the lunar terrain, a high registration accuracy is essential for acquiring the three-dimensional information of the lunar surface.However, it is unnecessary to perform a global registration, since a small area with high registration accuracy within the preselected landing area is large enough for landing on the moon surface.Inspired by this practical demand, we developed a local registration method in this paper to identify a region between an input image pair with higher local registration accuracy than global registration has.Given a stereo image pair as the input reference and sensed image, conversional feature-based registration methods take all of the feature points generated by a feature extraction algorithm as the CPs after the procedure of outlier removal, but pay little attention to the distribution of the CPs.The locations of the feature points actually depend on the context of input images, in other words the feature points are "randomly" distributed across the registration region.Toward achieving the maximum local registration accuracy between the reference and a sensed image in the proposed method, we firstly take advantage of the feature extraction capacity of ASIFT to obtain sufficient feature points and coarse correspondences (tentative correspondences), the mismatched and low accuracy correspondences among the tentative correspondences are then eliminated by the improved RANSAC (IM-RANSAC) algorithm, which is proposed in this paper to gain high accuracy inliers.Secondly, with the proposed S criterion, the standard deviation of Euclidean distance summation of each feature point to remaining feature points is calculated to select an image region in which the feature points are uniformly distributed.This ensures a reasonable distribution for the leaving correspondences.Finally, the non-linear parameter optimization algorithm Levenberg-Marquardt (LM) is employed to further minimize the matching error of correspondences.After the above three steps, it automatically secures a region in the image pair to be registered with a high registration accuracy.
The rest of paper is organized as follows.Section 2 describes the proposed method in detail.Section 3 gives the experimental results and a discussion, followed by the conclusions in Section 4.

Methodology
As shown in Figure 1, the developed local registration algorithm consists of four main steps.With the input of a stereo image pair, ASIFT is applied to generate a vast amount of correspondences; IM-RANSAC is then used for removing incorrect and low accuracy matching feature points to create a set of inliers; an automated region selection mechanism named the S criterion is further developed in the algorithm, combined with an optimization process to locate the image registration region and obtain the transformation model.The details of the method are discussed in the following sections.

A Brief Introduction to ASIFT
The key idea behind ASIFT is to simulate different viewpoints of input images by changing the spatial position of camera optical axis, and then, the same procedure of SIFT is applied to extract correspondences from simulated images.It is known that any affine matrix A can be decomposed as: where, as shown in Figure 2, λ is a scale factor, κ is a rotation angle around image normal by the camera, θ is the roll angle around the camera axis, t T is a tilt defined as 1 t ( / cos ) = φ , where φ is the angle between the camera optical axis and the image normal, and θ R and κ R are rotation matrices.Because SIFT is robust to rotation and scale, thus λ and θ can be neglected from Equation (1).As a result, the affine matrix A becomes: Therefore, ASIFT can achieve full affine invariance simulation by only changing the combination of the t and κ parameters.Some examples of the camera position according to different ( t, ) κ pairs are denoted by black dots in Figure 3.After locating the camera position, simulated affine images can be obtained from each of the positions.The simulated image pair with the largest number of correspondences is then used for registration.For details about ASIFT, please refer to [16].[19].The RANSAC algorithm is a typical method used in remote sensing image registration to eliminate incorrect correspondences (outliers) and to estimate the transformation matrix between stereo image pairs.However, the traditional RANSAC requires the setting of a predefined threshold, which often depends on specific problems.Unfortunately, it is usually difficult to give a proper threshold, because too large of a threshold will introduce big reprojection error to inliers, while too small of a threshold may lead to the failure of the algorithm due to insufficient inliers.Reprojection error refers to the image coordination difference between a point from the registered image and its matching point in the reference image.Here, in this paper, we develop an improved RANSAC algorithm, which is named IM-RANSAC, to give a new iteration strategy for the threshold selection.As the flowchart in Figure 4 shows, given a small positive number Δt as a threshold increment, let the initial threshold T = Δt, and a random sample set M with the size of m is selected from the ASIFT-generated tentative corresponding point set N. According to multi-view geometry [20], images of a plane captured at different viewpoints can be related by a homography transformation model.The terrain height variation within the range of the area where the experimental image pairs are taken can be ignored compared to an orbit height of 100 km of Chang'E-2, and the spatial resolution of a pixel in the horizontal is 7 meters.As a result, the experimental image pairs are considered as being captured from the same plane.Bearing these in mind, we use the eight-parameter homography matrix as the transformation model to estimate geometric distortions between stereo image pairs.The homography matrix H is calculated by M through the least squares method; if the reprojection error between two correspondences is equal to or less than the threshold, they will be considered as inliers.Then, a consensus consisting of inliers is generated as the possible output of IM-RANSAC.P is the probability that the final homograph matrix is calculated by a random outlier containing M; it is given by: where ω is the probability of choosing an inlier from the tentative corresponding point set N; k is the number of times of sampling M. When P is less than a sufficiently small positive NP, the homography matrix H is then considered to be acceptable.The number of inliers output by IM-RANSAC is constrained by the parameter inliermin, which represents for the minimum acceptable number of inliers, to ensure that there are enough inliers to go through the rest of the processes in the proposed algorithm.
The inliermin is defined as [21]: where n is the total number of correspondences in N; function QN(i) represents the probability that i out of n tentative correspondences support an arbitrary transformation model derived from an outlier-containing sample set M. When  is smaller than a small positive ϕ , the probability that an inlier set that has a number of inliers more than i is incorrect will be far less than ϕ .Under the premise that the event of a correspondence belongs to inlier set is independent Bernoulli [22], QN(i) can be given by: where β is the probability that an transformation model derived by outlier-containing sample set M is supported by one of the correspondences in N, which is not included in the outlier-containing sample set M; i is in the value range of m i n < ≤ ; m is the size of the M. If the number of inliers is less than inliermin, the generated consensus will be rejected, meanwhile another IM-RANSAC evaluation begins with the updated threshold, which is increased by Δt on the previous one.With the increase of the threshold, the requirement of inliermin needs to be satisfied, in other words, and the IM-RANSAC always has an output set of inliers with the minimum threshold unless the number of tentative correspondences is too few in N.After the IM-RANSAC process, the final threshold, homography matrix H, and inliers with the minimum final threshold are obtained simultaneously.The coarse correspondences that entered into IM-RANSAC are from ASIFT, so the quantity and reliability are promised unless the input stereo image pair is totally unmatchable.However, even when the input images are not matching at all, the algorithm will stop at ASIFT before it goes into IM-RANSAC.The stopping criterion of IM-RANSAC is controlled by inliermin; once the number of inliers in the consensus is more than inliermin, IM-RANSAC will stop.Thus, the probability that the final threshold becomes too large to be reliable is very small.In IM-RANSAC, there is no need to pre-define the threshold for determining inliers, for it is automatically generated in the course of iteration by gradually increasing the initial input value.This dramatically improves the conventional RANSAC algorithm, which requires a pre-defined threshold in operation.

Automatic Selection of Registration Area
The value of final threshold from the IM-RANSAC output is dependent on the small increment; therefore, the increment should be set as small as possible to obtain inliers with small reprojection error.With the smaller threshold, the number of inliers will be also lesser.As a result, the distribution of some inliers will be scattered from others, leading to an unevenly spatial distribution, and the registration accuracy in the scattered potion will be low.Moreover, the derived homography matrix also will be less accurate even for the dense portion.In this paper, a novel algorithm named S criterion is applied to eliminate this kind of scattered point and automatically obtain a local registration region in which the feature points are distributed uniformly.Suppose that the Euclidean distance between point i and point j is denoted by dij and the standard deviation of Euclidean distance summation of point i to all of the other remaining feature points is denoted by S. If the point i has: the point i, as well as its matching point on the other image of the stereo image pair should be removed from the inliers.As shown in Figure 5a, the inliers on one of the images of the stereo image pair are illustrated in the o-x-y image coordinate system; round black dots represent the inliers.According to the S criterion, scattered points A and B should be removed.It can also use σS (0 1) σ < ≤ as the criterion to remove more feature points for a denser region, but with the consideration of the size of registration region; here, we chose 1 σ = in the following experiments.After the removal of bad distributed feature points, the automatic region selection mechanism of the S criterion can be summarized as follows: (1) the point that has the maximum coordinate in the y dimension among the remaining points of inliers is selected as the starting point.Draw a line though the starting point parallel to the x-axis; as shown in Figure 5b, point P is the starting point, and the derived line is denoted by l; (2) rotate the line l counter clockwise (or clockwise) around the starting point until it reaches any one of the points of inliers; as shown in Figure 5c, the Point P1 is the reached point, and the dotted line is the stopping position of line l; (3) connect the starting point and the reached point, as shown in Figure 5d; set the reached point as the new starting point, and continue to rotate the line l from its last stopping position; (4) repeat Steps (2) and (3) until the very starting point with the maximum y coordinate is reached by line l again; and (5) finally, the registration region as shown in Figure 5e can be automatically obtained.Both the distribution and the reprojection error of inliers optimized in this region result in better registration accuracy.The procedures above can locate a relative larger area with a profile of convex polygon, and they are computationally light algorithms with only a few steps.Because the coordinates of inlier points are not integers, we reassigned their locations to the nearest integer coordinates during the procedure.

LM Optimization
Although IM-RANSAC can obtain smaller reprojection error by means of automatically choosing a proper threshold, the homography transformation matrix H is not yet in its optimal status, due to the fact that it is computed from all of the inliers outputted by IM-RANSAC rather than adapting to the S criterion selected region.Thus, we apply the nonlinear optimization algorithm LM [23] to optimize the parameters of H merely by the inliers in the S criterion selected area; by doing so, the homography transformation will be more precise with respect to the local registration region.We denote the parameters of H by [ ] , , , , = h h h h h  and the correspondences in the S criterion selected region by (xi, yi) and (Xi, Yi) (i = 1,2,3,…n).Let ( ) X Y be the homography matrix H transformed coordinates of (xi, yi); thus, the transform equations were defined as: ( ; ; ) 1 The total reprojection error to be minimized between the correspondences can be formulated as: The LM algorithm is particularly suited to a problem that is expressed by a sum of squared residuals.Therefore, the reduction of reprojection error is converted to a minimization problem in which optimal solution is given by the LM algorithm via minimizing over h: Usually, less than thirty iterations are sufficient for the convergence of the LM algorithm to the optimal solution.By now, the proposed algorithm (named AIRRL hereafter), which includes ASIFT, IM-RANSAC, the S criterion Region selection and LM optimization, is complete.

Source Data Used in the Experiment
The experimental source data are the stereo image pairs captured by the linear push-broom imaging system of the Chang'E-2 satellite.It is a patch from No. 0581 orbit images with a spatial resolution of seven meters.The Chang'E-2 is the second lunar exploration satellite of China to acquire a three-dimensional morphological and tectonic model of the lunar surface, lunched 1 October 2010.The CCD camera mounted on Chang'E-2 is a two-line push-broom sensor assembled on a focal plane separately, as shown in Figure 6.Two CCD arrays, which acquire forward and backward sight angle simultaneously, share the same optical axis with the same focal length of 144.2 mm.Each CCD line array has 6144 pixels, and the forward and backward sight angle is 8° and 17.2°, respectively; the orbit height is 100 km [24].The original format of Chang'E-2 CCD images is in PDS (Planetary Data System) 2C-level.The PDS is created by NASA's Science Mission Directorate for planetary missions, and the 2C-level images represent the images after the radiometric calibration, geometric correction and photometric correction; thus, they can be directly used in image registration.In Figure 7, the stereo image pair used in the experiment is shown in the selenographic coordinates system.In order to reduce computational complexity, the resolution of the image pair is fixed to 800 × 600 pixels.

Discriminative Ability
The conventional RANSAC is applied to SIFT and ASIFT, forming SIFT-RANSAC (SIFT-R) and ASIFT-RANSAC (ASIFT-R), respectively, for performance comparison to the proposed local registration algorithm.Because AIRRL, SIFT-RANSAC and ASIFT-RANSAC are all SIFT-based algorithms, the initial matching results are obtained by comparing the ratio of the Euclidean distance of 128 dimensional feature vectors between the nearest neighbour and second nearest neighbour; thus, varying the threshold of the ratio (distance ratio) will affect the final registration result to a large extent.The smaller the value, the more accurate the matching result will be, but the inliers also will be lesser.Lowe suggested 0.8 as the distance ratio [10]; thus, we chose a range of 0.6-0.8, with a 0.05 interval, to compare the performance of the three methods.The threshold of conventional RANSAC in SIFT-R and ASIFT-R is set to three, which is the empirical value typically used in remote sensing image registration [25], while the threshold increment of IM-RANSAC is set to 0.1 with respect to the compromise between time and accuracy.Because the transformation model is an eight-parameter homography, the size of M is set to four.The source codes of SIFT, RANSAC and ASIFT were written by Rob Hans [26] and Guoshen Yu et al. [27].The value of parameters NP, β in IM-RANSAC is set to 0.01, 0.01 according to the source code of RANSAC and [22], and the small positive φ in Equation ( 4) is set to a typical value of 0.05 [21].
In this experiment, we employ the RMSE of the Euclidean distance as the evaluation criteria to investigate the matching accuracy of the correspondences.This is defined as follows [28]: where ( ) E h is the total reprojection error in Equation ( 9).Additionally, the RMSE of intensity (I-RMSE) here represents the intensity difference at the pixel level; it is used for evaluating the quality of the image mosaic, as well as the quality of registration.I-RMSE is defined as follows: where r is the number of pixels in the registration region, and Δ is the intensity difference between each pixel on the reference image from the local registration region and its corresponding pixel on the registered image.
The correspondences generated by the three methods are shown in Figure 8; from left to right are the matches acquired by SIFT-R, ASIFT-R and AIRRL.The final threshold automatically obtained by IM-RANSAC is 0.6.We can see the difference density and distribution of correspondences between the three methods.ASIFT-R has the densest correspondences, followed by AIRRL and SIFT-R.The distributions of inliers of ASIFT-R and SIFT-R are almost the same; their correspondences are spread all over the image, while AIRRL has no correspondences on the right side of the image.That is because the process of IM-RANSAC and region selection has filtered out many correspondences with relatively large reprojection error or undesired spatial distribution.Figure 8 shows registration results obtained at a distance ratio of 0.7, while this ratio varies between 0.6-0.8, and the number of correspondences generated by three methods is shown in Figure 9.As expected, ASIFT-R generated the greatest number of correspondences, at least two-times more than those of the other two methods.Although AIRRL eliminated many "unqualified" correspondences, it still has more correspondences than SIFT-R does.Figure 10 shows the RMSE value of reprojection error.AIRRL performed best among the three methods, and it is not sensitive to variation of the distance ratio.With the increase of the distance ratio, the RMSE value of ASIFT-R increases and reached its maximum 1.33 at a distance ratio of 0.8; while SIFT-R decreases after reaching a maximum value of 1.07 at a distance ratio of 0.7, but still about two-times larger than the RMSE value of AIRRL.The performances of ASIFT and SIFT are nearly same when the difference of imaging angle between the stereo image pair is small; however, the result may change as the content of the stereo image pair varies; in this experiment, SIFT-R performed better than ASIFT-R with respect to the RMSE value.In Figures 9 and 10, the RMSE of ASIFT-R and SIFT-R are obtained at the empirical RANSAC threshold value of three, while the final thresholds generated by IM-RANSAC at each distance ratio are 0.6, 0.5, 0.6, 0.5 and 0.5, respectively.When the RANSAC threshold of SIFT-R is decreased to 0.8 manually, the correspondence is too little to perform a successful registration.On the other hand, though ASIFT-R can reach the same threshold and RMSE level of AIRRL, it needs manually attempting to gain an appropriate threshold, while AIRRL can automatically perform the process.The experiments were conducted on a laptop featuring a 2.6-GHz Intel core i5-3320 processor and 4 GB RAM.The average time consumption of the SIFT-R, ASIFT-R and AIRRL are 5.93 s, 7.05 s and 8.67 s, respectively.

Local Registration Region Selection
In this section, the local registration region was determined by applying the S criterion, and then, the image mosaicking was performed to inspect the registration accuracy.According to the experiment results shown in Section 3.2, when the distance ratio is 0.7, AIRRL performed at its best, so we set the ratio to 0.7 in the later experiments.Figure 11a shows the result of registration region selection; tiny pink circles represent the locations of the feature point of inliers.The white rectangle is the area covered by all of the inliers from the output of IM-RANSAC, and the distribution of feature points is scattered in the right part of the region.If including these points in the registration process, the registration accuracy will be lowered.The region outlined by green line segments is the selection region of the S criterion; feature points in this region are obviously denser and evener.Figure 11b shows the mosaic result of AIRRL; the backward quadratic interpolation is used to perform resampling.Figure 11c shows the intensity difference of image mosaic; the brighter the pixel point is, the larger the mosaic error will be.Figure 11d-f is the registration region, image mosaic and mosaic difference with all of the inliers from IM-RANSAC without the S criterion.In Figure 11f, we notice that the mosaic error of the right portion is much larger than that of the left portion, and the S criterion precisely excluded the right portion (as shown in Figure 11c).Due to that RANSAC-based outlier removal algorithms having a certain randomness, feature points in Figure 11a are different from those in Figure 11d, but the overall trend is constant.We conducted the comparison of the image mosaic between the S criterion and non-S criterion many times and found that the S criterion region selection could acquire lower I-RMSE, as shown in Table 1.Table 1.The average of intensity (I)-RMSE with and without the S criterion.

I-RMSE
7.380 Figure 11f 13.925 The result shown in Table 1 is just a relative comparison, and the absolute value of I-RMSE with the S criterion was high compared to the value of 4.50 in [5].We notice that the difference of exposure level between the input image pair is too significant to be ignored, as the I-RMSE is based on the measurement of image intensity; this could be a major reason that affects the I-RMSE value.In order to avoid this problem, a synthetic image pair was developed for the experiment by choosing the backward view in Figure 7 along with its affine simulation as the new stereo image pair, as shown in Figure 12.The Figure 12b is the affine simulation of Figure 12a; the warp angle is 25.2°, which is the same intersection angle between the forward and the backward sight of Chang'E-2.The experimental result of affine image pair is shown in Figure 13; from left to right is the result of region selection, image mosaic and intensity difference.Comparing Figure 13c to Figure 11c, the exposure difference is corrected.The value of I-RMSE of the affine simulation with and without the S criterion in the selected region is 2.045 and 5.915, respectively.It should be pointed out that because of the large orbit height (100 km) of Chang'E-2, the captured moon surface images can be treated as from a plane at relatively planar terrains, so we use a homography matrix to estimate the transformation model between the input image pair in this experiment.However, from Figure 11c and Figure 11f, it can be seen that in the image portion with a large intensity gradient, for example in the registration area, which is hilly or involves craters, the mosaic error will be larger.A possible solution is to use a fundamental matrix, utilizing the epipolar constraint, to replace the homography matrix for better stereo correspondence.However, this approach may not be practical in dense matching, as shown in Figure 8, due to the epipolar constraint possibly resulting in one point matching many epipolar lines or vice versa, and it is hard to decide the exact match.Therefore, the homography transformation model is used in this research.Consequently, the registration accuracy of the proposed method will be lowered when the spatial resolution of image pair is low or the terrain relief (parallax) in the content of image pair is high.

Conclusions
In this paper, we proposed a novel framework with various techniques to identify overlapping regions with high registration accuracy for a stereo image pair.The key contributes of the research can be summarized in four aspects.(1) The IM-RANSAC algorithm optimizes the inliers that are generated by ASIFT and dramatically improves the matching accuracy.In the case where ground control points (GCPs) are not available, these matching points can be used as GCPs for image registration.(2) In the process of IM-RANSAC, there is no need to pre-define the threshold.The optimal threshold can be achieved automatically by an iteratively-incremental algorithm.It significantly simplifies the registration process.(3) The S criterion is developed to automatically obtain a better distribution region of correspondences between the image pairs according to the positions of the feature points.(4) The registration regions selected by the local image registration framework presented in this paper can be used as a seed portion for global registration.Future research will explore potential algorithms that can take the local registration region to grow into large-scale registered areas and also to narrow the reprojection error of the large topographic gradient portion to further improve the local registration accuracy.

Figure 2 .
Figure 2. Spatial relationship of image plane u and camera optical axis.

Figure 3 .
Figure 3. Illustration of different camera positions.

Figure 5 .
Figure 5. Illustration of S criterion algorithm.(a) Distribution of IM-RANSAC outputted inliers.(b) Remaining inliers after the S criterion selection.(b-e) Demonstration of registration region locating by the S criterion.

Figure 7 .
Figure 7. Experimental stereo image pair shown in selenographic coordinates system.The backward view is on the left, and the forward view is on the right.

Figure 9 .
Figure 9. Number of correspondences of different methods.

Figure 10 .
Figure 10.RMSE between correspondences of different methods.

Figure 11 .
Figure 11.Demonstration of region selection and image mosaic.(a) S criterion region selection.(b) S criterion image mosaic.(c) Mosaic error of the S criterion.(d,e,f) The demonstration of the registration region, image mosaic and mosaic error without the S criterion.