Automatic Sub-Pixel Co-Registration of Remote Sensing Images Using Phase Correlation and Harris Detector

In this paper, we propose a new approach for sub-pixel co-registration based on Fourier phase correlation combined with the Harris detector. Due to the limitation of the standard phase correlation method to achieve only pixel-level accuracy, another approach is required to reach subpixel matching precision. We first applied the Harris corner detector to extract corners from both references and sensed images. Then, we identified their corresponding points using phase correlation between the image pairs. To achieve sub-pixel registration accuracy, two optimization algorithms were used. The effectiveness of the proposed method was tested with very high-resolution (VHR) remote sensing images, including Pleiades satellite images and aerial imagery. Compared with the speeded-up robust features (SURF)-based method, phase correlation with the Blackman window function produced 91% more matches with high reliability. Moreover, the results of the optimization analysis have revealed that Nelder–Mead algorithm performs better than the two-point step size gradient algorithm regarding localization accuracy and computation time. The proposed approach achieves better accuracy than 0.5 pixels and outperforms the speeded-up robust features (SURF)based method. It can achieve sub-pixel accuracy in the presence of noise and produces large numbers of correct matching points.


Introduction
The image registration process is an important task in image processing in applications such as automatic change detection, data fusion, motion tracking, and image mosaicking. When images of the same location taken at different dates or from different viewpoints need to be compared, they must be geometrically registered to one another. In the process of image registration, the image to be registered, called the sensed image, is aligned against a reference image. Accurate registration allows detection and monitoring of changes between images that might otherwise be difficult to ascertain without sub-pixel misregistration.
In remote sensing applications, the registration process is most often carried out manually, which is not feasible for large numbers of images. Hence, automated techniques are needed that require little or no operator supervision. Automatic image registration is a classical problem and has been an active research topic [1][2][3][4][5][6][7][8]. Although considerable research has been conducted, until now, significant challenges remain. There is, therefore, a critical need to develop an accurate robust and efficient image registration algorithm that requires minimal or no operator supervision to register multi-temporal remote sensing images.
A review of the most common image registration techniques was conducted in [9][10][11][12]. The existing methods can be broadly divided into two major categories: area-based and feature-based techniques. Several studies examined the improvements in accuracy and reliability and means of mitigating the limitations of both groups of registration techniques.
In feature-based techniques, the algorithms involve two stages. Firstly, the method extracts feature from the reference and sensed images. Secondly, corresponding points geometric distribution and have low sensitivity to noise, outliers, and occlusions. To achieve the above-mentioned objectives, an appropriate method should be selected and used for each of the following four components [42]: (1) a feature space, (2) a search space, (3) a search strategy, and (4) a similarity metric.
Here, we propose a hybrid algorithm that combines Harris corner detection and Fourier phase correlation for co-registering very high-resolution remote sensing images. This approach uses phase correlation to reduce sensitivity to noise and Harris corner to improve efficiency [43]. The proposed method comprises three major steps: (1) feature extraction; (2) Fourier phase matching; and (3) phase-matching optimization. First, the corners are generated by the Harris corner detector, which provides useful reference points but still contains sub-pixel shifts. We then refine the location of the corners to obtain subpixel accuracy using optimization techniques. The proposed phase correlation approach is formulated as an optimization problem that minimizes a cost function.
The proposed method is presented in Section 3, followed by the experimental results and discussion in Section 4. The conclusions are presented in Section 5.

Brief Overview of Related Work
In this section, we first present a summary of representative hybrid registration operations in both an area-based and feature-based approach, followed by a discussion of sub-pixel phase correlation methods.

Hybrid Registration Approach
In recent years, various hybrid approaches combining both area-based and featurebased methods have been developed in order to achieve reliable and precise image registration [8]. In [44], the authors proposed a hybrid image registration method based on an entropy-based detector and a robust similarity measure. Their approach consists of two steps: the extraction of scale invariant salient region features for each image and the estimation of correspondences using a joint correspondence between multiple pairs of region features and the entropy correlation coefficient. Experimental results on medical images showed that the proposed method has excellent robustness to image noise. In [45], the concept of the wavelet-based hierarchical pyramid is combined with MI and SIFT to align medical images. This approach is compared to different image registration methods, namely cross-correlation, MI-based hierarchical registration, registration using SIFT features, and the hybrid registration technique described in [46]. The idea of this study was therefore to investigate the performance of the proposed method with results obtained from the same algorithms in the spatial domain. Experimental results showed that the proposed registration methods in the wavelet domain could achieve better performance than those in the spatial domain. Ezzeldeen et al. conducted a comparative study between a fast Fourier transform (FFT)-based technique, a contour-based technique, a wavelet-based technique, a Harris-pulse coupled neural network (PCNN)-based technique, and a Harrismoment-based technique for Landsat Thematic Mapper (TM) and SPOT remote sensing images. It was observed that the most appropriate technique was FFT, although it had the largest RMSE, of above two, whereas the method that detected the greatest number of control points in both images was the wavelet-based technique.
In [47], a two-stage registration algorithm was proposed which integrates featurebased and area-based methods for remote sensing images at different spatial resolutions (3.7, 16, and 30 m). In the first stage, the corresponding points are detected and matched using the SIFT algorithm. After outliers are removed using the random sample consensus (RANSAC), a weight function is introduced to distribute weight to different feature points to improve the performance of the local projective model. Then, a second registration is applied using Huber estimation and the structure tensor (ST) at the finer scale to minimize the influence of the remaining outliers on the transformation model. The results show that the proposed algorithm can achieve a reliable registration accuracy for varied terrain features.
In [48], another approach was proposed based on the phase correlation and Harris affine detector for images of outdoor buildings. Firstly, the overlapping areas in reference and sensed images are determined using translation parameters recovered using the phase correlation. Then, interest points are detected using the Harris affine detector and matched with the normalized cross-correlation (NCC). The experiment results demonstrate that the feature matching method achieves an accuracy of about 69.2% for 13 matched feature point pairs.
More recently, new approaches for hybrid approaches have been developed [47,48]. In [47], the authors apply the registration twice. Therefore, the resampling of the sensed image introduces aliasing in the similarity measure and therefore may reduce the accuracy of any correlation process, and hence any registration process [49]. In [48], the first step of the study was to determine the overlapping areas between the reference and sensed image by phase correlation; however, image matching is performed using a normalized cross-correlation algorithm. In addition, the proposed approach does not address outliers, which are inevitable, particularly for real data sets.
Although most approaches achieve a reasonable result, their performances tend to be heavily affected by the presence of local distortions. Knowledge about the characteristics of each type of local distortion must be considered to design and develop a robust registration approach for VHR remote sensing images. Therefore, our choice of algorithms was primarily dictated by the following criteria: (1) appropriate features should be robust to noise; (2) a large number of control points evenly distributed are required for an accurate result; (3) the matching method should be robust to noise; (4) sub-pixel localization accuracy of control points using optimization algorithms; (5) able to address inevitable outliers; (6) able to rapidly treat high data volume, which affects the processing speed; (7) automatic operation to avoid the requirement of manual experience and a priori knowledge; and (8) allow selection of the proper transformation function suitable for each type of geometric difference between the images.

Fine Registration Using Phase Correlation
Precise image matching is a crucial step in the registration of multi-source remote sensing images that may have significantly different illumination. For these images, matching algorithms can often only achieve accuracy at the pixel level due to the effect of noise and differences in acquired image pairs. Therefore, there is a need for sub-pixel correlation algorithms that enhance the accuracy, robustness, and efficiency of matching. Indeed, to address frequency-dependent noise due to illumination or changes in sensors, phase correlation based on the invariant properties of the Fourier transform is a good candidate.
Several sub-pixel phase correlation approaches have been developed, which are generally classified into two major categories [31,46,50,51]. In the first approach, the sub-pixel displacement is estimated in the spatial domain using the inverse Fourier transform method through interpolation, fitting, or other approaches with a certain range of neighbors [6]. The sinc function is one of the most commonly used fitting models [52][53][54]. Argyriou modified the conventional sinc function to fit the exact sub-pixel peak location using Gaussian weighting, which can better approximate the noisy correlation surface [55]. Another feasible alternative, which can provide accurate estimates of the real peak location through minimization of the correlation function, is defined as the inverse Fourier transform of the cross-power spectrum [56,57]. The sub-pixel methods calculated in the spatial domain are sensitive to the aliasing errors dispersed following inverse transforming of the cross-power spectrum [58].
In contrast, these artifacts are avoided in the second approach, which can estimate the sub-pixel shifts directly in the frequency domain. Three algorithms have been proposed: the SVD (singular value decomposition) method [59], the 2D fitting method [60], and nonlinear optimization [49,61]. Hoge used the SVD method, according to the Eckart-Young-Mirsky theorem, to determine the sub-pixel shifts by finding the optimum rank-one approximation of the normalized cross-power spectrum matrix [62]. In [61], a robust nonlinear optimization was proposed to estimate the sub-pixel displacements, with respect to the Hermitian inner product, by maximizing the norm of the projection of the computed normalized cross-spectrum of the images onto the continuous space defined by the theoretical space. In [49], another optimization method was proposed based on the Frobenius norm, which minimizes the difference between the computed and theoretical cross-power spectrum. More recently, in [3] the authors combined enhanced sub-pixel PC matching with a block-based phase congruency feature detector to precisely estimate translation for satellite imagery from different sensors. In [63], another approach focused on the effects of aliasing on the shift estimation and estimated the translation parameter using a least-squares fit to a two-dimensional data set. Foroosh et al. [52] extended the phase correlation method to sub-pixel accuracy using down-sampled images.
To improve the accuracy of registration algorithms at the sub-pixel level, the optimization method should be initialized appropriately to converge toward the optimal solution [49]. In addition, the task of designing or selecting the optimal sub-pixel phase correlation method for a specific problem should consider the kind of features extracted, the matching strategy, and the source of distortion present in the images to be registered [42]. Although most phase-matching approaches achieve a reasonable result, their performances tend to be heavily affected by the presence of edge effects. Therefore, it is of interest to develop a co-registration algorithm that combines good quality, robustness to noise, and a low computational cost. Common strategies for obtaining robust algorithms often involve the optimization of nonlinear cost functions [2,21], and significantly reduce the total computation cost. In the present contribution, inspired by the co-registration technique presented by Leprince et al. [49] and Alba [64], we propose an efficient sub-pixel registration approach based on an iterative optimization algorithm.

Materials and Methods
Our co-registration approach can be divided into four major steps. In the first step, corners are extracted from a set of reference images and sensed images. Then the corresponding points are identified on the sensed and reference images using phase correlation. After obtaining the corresponding points, accurate sub-pixel measurement is performed by minimizing a cost function. Following point-wise correspondence, it is important to address outliers prior to computing the spatial transformation model in order to achieve more accurate registration. All of the experiments were performed on a desktop with a 64-bit Windows 10 OS with MATLAB. The hardware of the desktop comprised 2.30 GHz Intel i5-6200U CPU and 12 GB memory.

Harris Corners Extraction
The fundamental step in image registration is to choose which features are to be used for robust matching in order to determine the most optimal transformation between the two images [11]. In order to achieve reliable matching, corner points are chosen as matching features as they are the most stable features in the image; they are invariant to scale, rotation, and viewpoint changes [42]. As a result, that computing the optimal transformation parameters depends on these features, a sufficient number of invariant features, evenly distributed across the image, are required.
The proposed approach consists of detecting interest points using a suitable corner detector. Compared to previous detectors, the Harris corner detector has been proven to provide more fine and stable results [65,66]. Here, we propose the detection of accurate corners using a Harris corner detector to significantly enhance the registration performance. Moreover, since precise homologous feature points with a homogeneous spatial distribution over the images are important for reliable determination of spatial transformation model parameters [67], our approach involves first subdividing the image into nine sub-regions and then extracting the Harris corners points for each area using the Harris detection algorithm. This process ensures a uniform distribution of points throughout the image and avoids out-of-memory errors.

Point Correspondence Using Fourier Phase Matching
Following the extraction of distinctive control points from the reference image, the search for corresponding control points (CP) is then performed using Fourier phase matching, which consists of searching the best match for each corner in the reference image from a set of candidate points in the sensed image. The candidate corner with maximum magnitude of the inverse Fourier transform of the normalized cross-power spectrum is considered the correct corresponding point.
Phase correlation methods used to estimate the displacement [2,18,19,21,29] with sub-pixel accuracy are affected by aliasing and edge effects, which are mostly present at higher frequencies. The bandwidth of the magnitude of the cross-spectrum may, therefore, be essential to limit these artifacts. Stone recommended masking the phase-correlation matrix by retaining only the frequencies for which the magnitude of the cross-spectrum crosses above a certain threshold value [63].
Additionally, image features close to the image edges can have a negative effect on the ability to identify translational motion between two images. Applying a 2D spatial Blackman or Blackman-Harris window greatly reduces image-boundary effects in the Fourier domain [63]. In [59], the Blackman window removes a significant amount of the signal energy. The Hann window achieves a good compromise between noise reduction and losing the desired phase information [49]. The idea behind denoising is to choose the appropriate frequency masking such that the undesired signal is removed while preserving the necessary information.
The more successful phase matching methods typically apply Hann or Blackman windows to reduce the impact of edge effects on the phase correlation process. Although performing phase matching can provide accurate results, it is important to improve the image clarity using windowing. Identification of efficient image denoising techniques remains a critical challenge. To choose a suitable window for the reference and the sensed images, we evaluated the effect of the Blackman window and the Hann window prior to the Fourier transform as a post-process to minimize the amount of loss in the image intensity information, and therefore obtain a more accurate estimation of the translational shifts.

Sub-Pixel Translation Estimation
Among the existing sub-pixel registration techniques, Fourier domain methods achieve excellent robustness against correlated and frequency-dependent noise [4,[19][20][21][22]. Most of these methods are in fact variations of the original phase-correlation method [28]. The translation (∆x, ∆y) that is given by R w x , w y can be recovered in either the spatial or frequency domains. In the first approach, the inverse Fourier transform of the normalized cross-power spectrum r[∆x, ∆y] was computed in [15,19], and we obtain a function that is an impulse; that is, it is approximately zero everywhere except at the displacement (x o , y o ), which is needed for the shift to optimally register the two images. As a result, it is limited to integer-pixel accuracy. Getting sub-pixel accuracy requires finding the displacement (∆x, ∆y) that maximizes the inverse Fourier transform of the normalized cross-power spectrum r[∆x, ∆y] defined as the phase-only correlation (POC) function. The normalized cross-spectrum of the images is defined by: where w x and w y are the frequency variables in column and row. And the POC function is given by with N x = 2M x + 1 and N y = 2M y + 1 , where N x , and N y are, respectively, the width and height of the input images. The problem of finding the extrema of the cost function with sub-pixel accuracy is equivalent to finding the zeros of r [∆x, ∆y]. The derivative of r[∆x, ∆y] takes the form of a vector-valued bivariate gradient function, given by: with and This stage involves computing the sub-pixel displacements that minimize the cost function. The process is then iterated until convergence to a close neighborhood or it has reached its maximum number of iterations; then, the algorithm is implemented to obtain a fast and accurate solution. The drawback of phase correlation is that it cannot cope with large noisy data. Therefore, both a good initial estimate of the displacement and an optimal filter for noise reduction without losing the desired information are required.
The derivative of the cost function has increased high-frequency content with respect to r[∆x, ∆y]. Thus, r [∆x, ∆y] is very sensitive to noise and border effects. For this reason, two methods are used to minimize the objective function for accurate local searching. The first uses only function evaluations, and the second method requires the first derivative to update the solution point.

Nelder-Mead (NM) Optimization
The Nelder-Mead (NM) method [68] was designed to solve multidimensional unconstrained minimization of an objective function of n variables using (n + 1) vertices arranged as a simplex. For two variables, a simplex is a triangle, and the algorithm begins with a randomly-generated simplex requiring three starting points; the first was the integer solution (∆x 0 , ∆y 0 ) and the remaining two were chosen as (∆x 0 ±0.5, ∆y 0 ) and (∆x 0 , ∆y 0 ±0.5) where the sign was selected depending on which neighbor was best. At every iteration step, the NM algorithm modifies a single vertex of the current simplex towards an optimal region in the search space to determine the optimal value of the objective function. Finally, the vertex of the triangle that yields the most optimal objective value is returned.

The Two-Point Step Size (TPSS) Gradient
In [21], the TPSS is an optimization algorithm used to minimize the POC function at a point x k by iteratively moving in the direction of steepest descent until convergence is achieved.
The updating equation for any gradient algorithm has the following iterative form: where ∇ r (x k ) is the gradient vector of r at x k , x 0 is a given starting point, and the scalar α k is given by: where s k−1 = x k − x k−1 and y k−1 = ∇ r (x k ) − ∇ r (x k−1 ).
Iterations stop when |x k+1 − x k | ≤ where is a pre-specified small scalar.

Detection of Outliers
The choice of the best key points plays a crucial role in order to achieve the most accurate registration. Co-registration can be inaccurate if outliers are present. Thus, dealing with outliers is absolutely necessary prior to computing a spatial transformation model [69,70]. To identify and handle outliers efficiently, robust algorithms are required. Our approach is based on the Random Sample Consensus (RANSAC) algorithm for filtering outliers in an efficient way. RANSAC algorithm proposed by Fischler and Bolles [71], is one of the most robust techniques designed to detect and deal with a large percentage of outliers in the key points set. This method aims to repeatedly select a minimal set of points and evaluates their likelihood with other data until the optimal solution is found. The basic steps of the RANSAC algorithm are as follows: Sampling of all minimal random points from all point-wise correspondences to estimate a model by minimizing a cost function that is the sum of the distance error of the inlier. All other point correspondences are checked against the fitted model. A point with a residual larger than a threshold will be considered an outlier. The estimated model with sufficiently many inliers is classified as the optimal transformation. Finally, all the inliers are used to re-estimate the model. The RANSAC algorithm is listed in Algorithm 1. Select N random data sets required to determine the model parameters.

2.
Estimate the parameters of the model.

3.
Fit how many data items within a user-defined tolerance ε fit the model. Call this k.

4.
If k is large enough, accept model fit and exit with success.
Algorithm will be exit with fail.

Transformation Model
A precise registration requires an adequate choice of the optimal transformation that correctly models the distortions expected to be present within the images [11]. A transformation is selected depending on the nature of distortions. The most commonly used registration is the global transformations which are implemented by a single equation that maps the entire image using a single function. Generally, the registration of satellite imagery has been modeled as a global deformation, mainly utilizing affine and polynomial transformations [1,72]. For very high spatial resolution imagery with rugged terrain, a global transformation may not handle the local distortions. Global transformations that can accommodate slight local variations include thin-plate spline (TPS) [1,73] and multiquadric transformations (MQ). To overcome the problem of geometric distortions, several local mapping functions have been proposed. These transformations are more accurate, but are time-consuming and need large numbers of evenly distributed features to represent the local variation [74]. In [75], Goshtasby found that TPS and MQ functions are effective in the registration of remote sensing images with few local geometric differences and a small number of widely spread points. By comparison, piecewise linear transformation (PL) is more appropriate for images with local geometric differences because local accuracy is kept in a small neighborhood surrounding the corresponding control points [76,77]. When the correspondences are noisy, weighted mean and weighted linear methods are preferred to PL, TPS, and MQ because inaccuracies in the control point are smoothed and the rational weight is adapted to the density and organization of the control points [77]. Here we use TPS and polynomial transformations to register VHR remote sensing images.

Workflow of the Proposed Approach
The proposed hybrid algorithm used for sub-pixel co-registration can be summarized in the following steps ( Figure 1):

1.
Extract Harris corners from the sensed and reference images for each of the nine sub-regions.

2.
For each point PC i in the reference image, weigh the reference template image p 1 [x, y] and the candidate template image p 2 [x, y] for each of the k-nearest neighbors in the sensed image by a Blackman window.

3.
Compute the discrete Fourier transformp k of each filtered image p k [x, y].

5.
The candidate points with the maximum magnitude of the phase-only correlation are considered as the exact corresponding points. 6.
Eliminate the pairs for which the score is less than 0.3, in addition to the many-to-one match with the minimum score. 7.
Deal with outliers using the RANSAC algorithm. 8.
Compute the displacement ∆x 0 , ∆y 0 . for each corresponding point pairs using phase correlation. 9.
Using ∆x 0 and ∆y 0 as initial approximations, two optimization algorithms are used to find (∆x, ∆y) that maximize the POC function r[∆x, ∆y]. 10. Compute the parameters of the model transformation using least squares minimization. 11. Apply model transformations to align the sensed image with the reference image.

Evaluation Criteria
We evaluated the overall performance of our proposed co-registration workflow using the precision of phase matching and the root mean square error (RMSE) of measured displacements before and after the registration.
The RMSE can be written generically as:

Evaluation Criteria
We evaluated the overall performance of our proposed co-registration workflow using the precision of phase matching and the root mean square error (RMSE) of measured displacements before and after the registration.
The RMSE can be written generically as: where i is the CP number, n is the number of evaluated points, xt i and yt i are the true coordinates, and xr i and yr i are the retransformed coordinates for CP i . Feature matching precision is used to evaluate the matching performance. Matching precision refers to the ratio of the number of correct matches over the total number of matches.
where N c is the number of correct matches and N w is the number of wrong matches.

Results and Discussion
In this section, we first present the results obtained using our proposed method to automatically find corresponding points and accurately register the sensed image. Then, we evaluate the effects of different window functions on the correlation measures. After identifying a suitable window function, in the third series of experiments we apply the Nelder-Mead and TPSS optimization methods to improve the matching accuracy. The proposed methods were carried out using ground-truth sub-pixel shifts.

Descriptions of Experimental Data
The reliability and validity of the proposed method were evaluated with two sets of VHR remote sensing images acquired from different sensors. These data include a Pleiades multispectral scene and an aerial image at Fes and Rabat city, respectively. Their specific parameters are given in Tables 1 and 2, and the aerial imagery is displayed in Figure 2. Each set of images contains a reference image and a sensed image registered based on the metadata and georeferencing information. The approximate height range at the Fes city is between 180 and 837 m. The elevation for the Rabat test site ranges between 2 and 332 m.

Large-Scale Displacements Estimation with Pixel-Level Accuracy
In this section, we evaluate the effect of the window functions on the phase matching and identified the corresponding control points using phase correlation based on the win dows function that provided the largest number of matches.

Effect of Two Window Functions on the Correlation Measures
Here, phase matching results for a pair of aerial images 2014-2016 are presented and discussed. To choose the optimal window, we applied Blackman and Hann window func tions to the templates of sensed and reference images centered on 128 Harris points. A correlation was considered to be successful if the magnitude of the POC function was les than 0.3. After outlier removal, 110 corresponding point pairs remained. Examples of th template images of reference points are shown in Figure 3. The experiment results show that the use of the Blackman window for both reference and sensed images generated greater number of correct matches than the Hann window function; however, the Hann window produced more accurate results. The responses of both the Blackman and Hann window functions are robust to noise and produce more accurate matching. The result are summarized in Table 3.

Large-Scale Displacements Estimation with Pixel-Level Accuracy
In this section, we evaluate the effect of the window functions on the phase matching and identified the corresponding control points using phase correlation based on the windows function that provided the largest number of matches.

Effect of Two Window Functions on the Correlation Measures
Here, phase matching results for a pair of aerial images 2014-2016 are presented and discussed. To choose the optimal window, we applied Blackman and Hann window functions to the templates of sensed and reference images centered on 128 Harris points. A correlation was considered to be successful if the magnitude of the POC function was less than 0.3. After outlier removal, 110 corresponding point pairs remained. Examples of the template images of reference points are shown in Figure 3. The experiment results show that the use of the Blackman window for both reference and sensed images generated a greater number of correct matches than the Hann window function; however, the Hann window produced more accurate results. The responses of both the Blackman and Hann window functions are robust to noise and produce more accurate matching. The results are summarized in Table 3.

Large-Scale Displacements Estimation with Pixel-Level Accuracy
In this section, we evaluate the effect of the window functions on the phase matching and identified the corresponding control points using phase correlation based on the windows function that provided the largest number of matches.

Effect of Two Window Functions on the Correlation Measures
Here, phase matching results for a pair of aerial images 2014-2016 are presented and discussed. To choose the optimal window, we applied Blackman and Hann window functions to the templates of sensed and reference images centered on 128 Harris points. A correlation was considered to be successful if the magnitude of the POC function was less than 0.3. After outlier removal, 110 corresponding point pairs remained. Examples of the template images of reference points are shown in Figure 3. The experiment results show that the use of the Blackman window for both reference and sensed images generated a greater number of correct matches than the Hann window function; however, the Hann window produced more accurate results. The responses of both the Blackman and Hann window functions are robust to noise and produce more accurate matching. The results are summarized in Table 3.    Table 4 shows the results of phase matching and our algorithm for aerial images. For the first algorithm, which uses phase correlation only, the experiment results confirm that using only phase matching may not always provide precise correlation, particularly for multipixel displacements between reference and sensed images; this confirms the results of Leprince et al. [49]. To improve the accuracy of image matching, we applied a hybrid algorithm that combines phase correlation and a feature-based technique. After extracting the interest points using the Harris corner detector, we efficiently identified the corresponding feature matches using phase correlation with a template image centered in the Harris corner. The Harris corner was first used to achieve a good initial position, based on which matching was then refined using the phase matching method. Our approach avoids decorrelation caused by multipixel displacements and produces a displacement estimation with pixel precision. The results of the correlation process of our approach and the SURF-based method are shown in Table 5. The matching score of the proposed approach indicates better performance was achieved than that of the SURF descriptor; the average matching precisions are about 0.1 and 0.02 for aerial images, respectively, and for satellite images, the average matching scores obtained are about 0.11 and 0.15, respectively. The main difference between the SURF-based method and our method is the number of correct matches. Figure 4 shows that our approach provides the largest number of corresponding feature points for both aerial and satellite images. In terms of the overhead computation time, for satellite images, our approach requires an average of 237.5 mn due to the large number of corresponding points. Conversely, the SURF-based matching can obtain a result in a shorter time.   The success of the proposed approach is largely due to the use of the Blackman and Hann windows, which avoid edge effects by using patches of 16 × 16 pixels and the RAN-SAC algorithm to deal with outliers; the larger patch size increases the number of incorrectly matched point. To obtain this performance for the corresponding algorithm, the results of many phase matching trials were examined for various experimental conditions. The success of the proposed approach is largely due to the use of the Blackman and Hann windows, which avoid edge effects by using patches of 16 × 16 pixels and the RANSAC algorithm to deal with outliers; the larger patch size increases the number of incorrectly matched point. To obtain this performance for the corresponding algorithm, the results of many phase matching trials were examined for various experimental conditions. The experiments demonstrated that with VHR remote sensing images, the proposed method was able to find a large number of correct matches with homogeneous distribution.

Enhanced Sub-Pixel Displacement Estimation
To achieve more accurate results, we tested two optimization approaches using POC function maximization. To assess the quality and accuracy of this automated process on aerial and satellite images, we manually relocated the corresponding point in the sensed image for 40 control points.
A comparison of two-point step size and Nelder-Mead algorithms are shown in Table 6. The analysis of the optimization results shows that both of the optimization algorithms provide the same accuracy of the displacement estimate. The obtained RMSE values of the NM algorithm are similar to those of the solution without optimization, even if both algorithms are initialized with the same displacement generated using phase correlation. In addition, when the MN algorithm attempts to iteratively reduce the error for both the east/west and north/south displacement to improve the minimum of the POC function, the estimations provide the same interest point localizations as the Harris detector algorithm. The difference between the two-point step size and Nelder-Mead algorithms are the overhead computation time: the two-point step size algorithm (107 mn) is more timeconsuming than the Nelder-Mead algorithm, which required an average of 3 mn for the same number of corners (in addition to the preprocessing and corresponding point phase matching steps). It was observed that our sub-pixel approach based on Nelder-Mead optimization performs better in terms of the number of correct matches, localization accuracy, and computation time.

Validation of the Proposed Registration Method
Following sub-pixel refinement, the corresponding points were used to estimate the model transformation for each data set. The proposed approach found 150,619, and 33,565 corresponding points in aerial image 2014, aerial image 2016, and satellite image 2014, respectively. In this work, we registered the three images using TPS and polynomial transformation. This step was performed using PCI Geomatica.
The estimated misregistration, measured on a set of independent check points, was found using TPS and first-order polynomial transformations, separately, in each of the x and y dimensions. As shown in Table 7, the polynomial transformation of degree one provides better performance than TPS, which is time-consuming for large images. The proposed approach was able to align remote sensing images and achieved an average accuracy of 0.3 pixels in the x and y directions for aerial image 2016. This performance was lower for aerial image 2014, which was co-registered with the aerial image 2009 acquired with an analogue camera. Regarding the co-registration of the satellite image, the accuracy was about 0.6 pixels.

Conclusions
Very high-resolution remote sensing images usually contain obvious geometric distortion in local areas introduced by increasing the spatial resolution and lowering the altitude of the sensors. To co-register these images more effectively, this paper proposes a hybrid registration approach based on combining Harris feature corners and phase correlation, which has two advantages. First, the proposed approach uses Harris feature points, which improve the reliability and efficiency of Fourier phase matching. Second, it is more insensitive to image noise and sub-pixel resolution is feasible.
As a result that the Harris corner location is at the pixel level, two optimization algorithms were used to refine the localization to sub-pixel accuracy. The proposed approach was used to co-register the aerial and satellite imagery and was validated in terms of precision and robustness to noise.
The experiment results show that the responses of both the Blackman and Hann window functions are robust to noise and result in more accurate matching. The use of the Blackman window for both reference and sensed images generates a larger number of correct matches than the Hann window function and results in a correct matching rate of 0.8. In addition, the correct matching rate of the proposed approach is superior to that of the SURF-based method. The experiments also demonstrate that the proposed approach based on the MN optimization algorithm performs better in terms of the number of correct matches, localization accuracy, and computation time. The sensed image was then registered using polynomial and TPS transformations, and the parameters of the polynomial mapping function were estimated via the weighted least squares method. Experiments confirmed that the proposed approach can achieve registration accuracy better than 0.5 pixels and is more robust to noise. Our approach is a better compromise between accurate results and a high number of correspondences. The advantages of our approach are: (1) it produces a high number of true correspondences that match with the sub-pixel location; (2) it results in high performance of phase matching, which is based on corners and produces more precise and stable results; (3) it reduces the computation cost because the correlation is based on corners, and allows accurate error evaluation because the error can be detected and measured more easily.
Three limiting factors of our approach need to be improved. First, our adopted phase correlation approach recovers only the translation between the reference and sensed images. Second, the high computational processing time can be addressed using a coarse-to-fine registration strategy and hardware with high capabilities. The third factor relates to the control points used to assess the optimization algorithms. The control points are manually relocated, which means that the localization may contain errors because it is difficult to localize the same corner with sub-pixel accuracy. Therefore, the performance and efficacy of the optimization methods must be revalidated with simulated data.
Our future work will focus on Fourier-Mellin transformation to match images that are translated, rotated, and scaled, with the aim to improve sub-pixel image registration accuracy using satellite and drone imagery.