Area-Based Dense Image Matching with Subpixel Accuracy for Remote Sensing Applications: Practical Analysis and Comparative Study

: Dense image matching is a crucial step in many image processing tasks. Subpixel accuracy and fractional measurement are commonly pursued, considering the image resolution and application requirement, especially in the ﬁeld of remote sensing. In this study, we conducted a practical analysis and comparative study on area-based dense image matching with subpixel accuracy for remote sensing applications, with a speciﬁc focus on the subpixel capability and robustness. Twelve representative matching algorithms with two types of correlation-based similarity measures and seven types of subpixel methods were selected. The existing matching algorithms were compared and evaluated in a simulated experiment using synthetic image pairs with varying amounts of aliasing and two real applications of attitude jitter detection and disparity estimation. The experimental results indicate that there are two types of systematic errors: displacement-dependent errors, depending on the fractional values of displacement, and displacement-independent errors represented as unexpected wave artifacts in this study. In addition, the strengths and limitations of di ﬀ erent matching algorithms on the robustness to these two types of systematic errors were investigated and discussed.


Introduction
Dense image matching is the process of determining correspondences between two or more overlapping images in a high-density or even pixel-to-pixel manner. It is a crucial step in many image processing tasks, revealing information from the combination of multiple data sources. Dense image matching is actually a generalization of various terms for different tasks, such as image alignment [1], stereo matching [2], optical flow [3], and digital image correlation [4]. Image matching can be generally implemented by feature-based and area-based methods. The area-based methods possess advantages in the aspects of precision, distribution, and stability [5]. The present study concentrated on area-based dense image matching, involving high-density correspondence estimation by directly maximizing the similarity in intensity information between the transformed template image and the reference image. The transformation function w that maps the template image to the reference image can be expressed by a spatially-varying displacement vector (m 0 , n 0 ) associated with each position as: different implementations that could be extended to subpixel measurement [48,49], while the selected matching algorithms have been often used in different applications with success and cover seven types of subpixel methods mentioned above. The objective and contribution of this study are to compare and investigate the relative performance of typical algorithms for dense image matching on remote sensing data with a particular focus on the subpixel capability and robustness to the possible systematic errors. The remainder of this paper is organized as follows. In Section 2, we describe in detail the subpixel dense image matching, including the overall workflow, the adopted similarity measures, and the subpixel methods under study. Section 3 presents the implementations of the compared subpixel matching algorithms and the settings of simulated and real data experiments. The experimental results and discussions are then provided in Section 4. Finally, the conclusions of this study are reported in Section 5.

Dense Image Matching
Subpixel dense image matching aims at identifying and quantifying high-density correspondences with subpixel accuracy from overlapping images. In this section, we first introduce the overall workflow of dense image matching, and then the adopted two correlation-based similarity measures, and finally the evaluated subpixel methods that can obtain fractional measurements.

Overall Workflow
Although the implementation of dense image matching is not completely consistent in different applications, the overall processing can be summarized as a workflow shown in Figure 1. The inputs are two overlapping images for pairwise matching that transforms the template image to a reference image. The general framework contains three basic components: preprocessing, image matching, and postprocessing. The outputs are two-directional displacement maps that could represent different meanings, such as disparity information, motion information, or deformation information, in different applications. The preprocessing stage mainly includes two aspects, namely radiometric and geometric preprocessing. For the radiometric preprocessing, the objective is to ensure the good radiometric properties for the following image matching by correcting the radiometric artifacts and improving the contrast levels of the images [31]. The geometric preprocessing is a process of image co-registration that can decrease the large differences, including rotation between stereo images and the image deformations caused by unexpected factors in order to enhance the computational efficiency and reliability of image matching. The co-registration can be conducted based on the feature correspondences, the sensor model, and the prior knowledge in various ways, considering the application scenario, such as epipolar rectification [50], orthorectification [7], and alignment using global transformation models [1]. The image matching stage is normally performed in a coarse-to-fine manner. The coarse matching provides initial estimates, and narrows search ranges for the fine matching by computing approximate correspondences using a pixel-level matching or a pyramid-based search [42]. The fine matching involves refining the estimates by means of local search or global optimization based on the subpixel methods described below. The postprocessing stage here identifies and deletes the erroneous matches and optionally fills the hole by adopting a certain filter to improve the quality of displacement maps. The filtering should require little user interaction and is generally implemented, depending on the prior knowledge, the neighboring The preprocessing stage mainly includes two aspects, namely radiometric and geometric preprocessing. For the radiometric preprocessing, the objective is to ensure the good radiometric properties for the following image matching by correcting the radiometric artifacts and improving the contrast levels of the images [31]. The geometric preprocessing is a process of image co-registration that can decrease the large differences, including rotation between stereo images and the image deformations caused by unexpected factors in order to enhance the computational efficiency and reliability of image matching. The co-registration can be conducted based on the feature correspondences, the sensor model, and the prior knowledge in various ways, considering the application scenario, such as epipolar rectification [50], orthorectification [7], and alignment using global transformation models [1]. The image matching stage is normally performed in a coarse-to-fine manner. The coarse matching provides initial estimates, and narrows search ranges for the fine matching by computing approximate correspondences using a pixel-level matching or a pyramid-based search [42]. The fine matching involves refining the estimates by means of local search or global optimization based on the subpixel methods described below. The postprocessing stage here identifies and deletes the erroneous matches and optionally fills the hole by adopting a certain filter to improve the quality of displacement maps. The filtering should require little user interaction and is generally implemented, depending on the prior knowledge, the neighboring values, and the confidence measures, such as correlation coefficient, Remote Sens. 2020, 12, 696 4 of 24 signal-to-noise ratio, and left-right consistency [13]. All these three components would determine the final performance of outputs. In this study, in order to exclusively investigate the performance of subpixel dense matching and the induced systematic errors, no additional radiometric preprocessing was adopted, and the mismatch removal in postprocessing was conducted by selecting a filtering method that was independent of the matching method used.

Similarity Measures
The similarity measure plays a crucial role in area-based image matching. In our study, two correlation-based similarity measures were tested. The first one was NCC that was operated in the spatial domain, and another option was PC that was calculated in the frequency domain.
NCC is frequently used to estimate the similarity between matching entities in two images due to its simplicity and reliability [51]. For a patch r(x, y) in the reference image and a corresponding patch t(x + m, y + n) in the template image, where m and n stand for the offsets within the search range, the NCC between two patches is defined as [52]: where r and t are the mean intensities of the patches r(x, y) and t(x + m, y + n), respectively. The values of NCC range between −1, indicating exactly inverse, and +1, indicating the same. NCC is a compatible measure for different correlation attempts and offers robustness to the offset and linear scale in illumination changes [22]. It is worth noting that directly integrating NCC in a continuous optimization scheme is complicated due to its non-differentiability. Fortunately, NCC can be related to the zero-normalized sum of squared differences (ZNSSD) according to the relationship [22,44]: The optimization using ZNSSD is much easier. PC is also a commonly used patch-based similarity measure that conducts normalization in the frequency domain based on the translation property of Fourier transform [53,54]. Different from NCC, the computation of PC avoids the iterative search process due to the use of Fourier transform routine. The correlation function of PC that is calculated from the inverse Fourier transform of the normalized cross-power spectrum Q is expressed by: where F r (u, v) and F t (u, v) are the corresponding Fourier transform of r(x, y) and t(x, y), F −1 denotes the inverse Fourier transform, * denotes the complex conjugate, and i = √ −1. Considering the phase information solely, PC is also insensitive to image content. Compared to the classical correlation, including NCC, PC has advantages in computational efficiency with the help of fast Fourier transform and robustness to frequency-dependency noise that will not alter the dominant phase difference [55]. It can be found that in the case of integer displacements, the correlation function of PC is theoretically a Dirac delta function centered at (m 0 , n 0 ) [56]. The subpixel methods for using PC can be realized by locating the peak of the correlation function in the spatial domain or by estimating the phase difference in the frequency domain [24].

Subpixel Image Matching Methods
For spatially discrete images, a specific way to approximate continuous signal is required to obtain subpixel precision. In this study, seven different types of subpixel methods were tested and evaluated: peak centroid, similarity fitting, similarity interpolation, intensity interpolation, local continuous optimization, global continuous optimization, and phase-based methods.
Peak centroid is an intuitive but effective method to obtain the subpixel peak location in the correlation surface. Considering the correlation values of the neighboring pixels around the integer peak location is inversely related to the distance between the true peak and each pixel, the subpixel displacement (m 0 , n 0 ) can be simply estimated as centroid in a local neighborhood N by weighted averaging the correlation values [29]: x,y∈N C(x, y) , n 0 = x,y∈N yC(x, y) x,y∈N C(x, y) .
A threshold can be predefined to suppress the low correlation values with less reliability. A straightforward alternative method to achieve fractional measurement is similarity fitting that finds the peak location by fitting the 1-D similarity curve or 2-D similarity surface. The difficulty is the selection of an optimal model that can correctly describe the real curve or surface. A variety of models have been investigated in empirical and theoretical studies [24]. In this study, one of the most commonly used fitting models, namely the quadratic function, was tested due to its simplicity. For 1-D parabola fitting, the subpixel peak location is found using the integer-valued main peak (x 0 , y 0 ) and its two nearest neighbors separately in each direction as: where C can be either similarity values in local search or matching costs in the global energy function.
For the correlation function of PC, the analytical derivation indicates that PC yields a 2-D Dirichlet kernel after the inverse Fourier transform [56]: where α ≤ 1 denotes the peak value, W and H are the width and height of a patch. The subpixel locations can be solved by nonlinear optimization with a set of points around the main peak [57]. A 1-D peak fitting model can be given after approximating Equation (7) to a sinc function as [45]: In addition to peak estimation via averaging or fitting of similarity values, another intuitive method is direct resampling of similarity values around the peak with a higher sampling rate. With regard to the Fourier-based image correlation, the inverse Fourier transform of the cross-power spectrum can be upsampled in the frequency domain by means of zero-padding [29] and matrix-multiply discrete Fourier transform [20]. The advantage of the latter one is that it is more suitable for local upsampling within a small neighborhood and without the need to zero-pad the entire cross-power spectrum, which makes it more computational and memory efficient. For instance, the locally upsampled correlation function in a neighborhood of s × s pixels around the initial peak can be calculated by multiplying three matrices: Remote Sens. 2020, 12, 696 6 of 24 where Q[U, V] is the original cross-power spectrum matrix calculated from Equation (4), k is the upsampling factor, [X , Y ] and [U, V] are the spatial distributions of patch pixels in the upsampled grid and the frequency distributions of cross-power spectrum matrix in the original grid, respectively. The peak can be relocated at the new resolution. Similar to the above similarity interpolation, we can also interpolate the intensity values instead of similarity values to reconstruct a higher resolution signal since the correlation of two upsampled images equals to upsampling the correlation of two original images [58]. This method includes two forms. The first one is resampling images or matching patches to the desired resolution [21]. Another one is applying fractional steps to sliding patch or matching cost computation by image interpolation [27]. Then, a certain pixel-level matching algorithm is utilized, and the subpixel displacements can be measured according to the new resolution. The parameter managing the upsampling factor or fractional step determines the computational burden and the subpixel precision of the algorithm, and the choice of intensity interpolation method largely affects the performance of the matching algorithm [36,59]. This type of subpixel method is somewhat computationally expensive for large-scale applications and high precision requirements.
More sophisticated methods regard subpixel matching as an optimization problem and solve it using continuous optimization algorithms in which the variables are allowed to take on any real values permitted by the constraints. Local methods normally perform independent estimations in small patches and express relative deformations using parametric models, while global methods perform optimization globally and make prior assumptions by imposing an explicit regularization term [23]. The two most widely used local methods are iterative spatial gradient-based methods that avoid the derivatives of similarity criteria, such as LSM [26], and gradient descent approximation-based methods that involve the optimization of similarity criterion, such as Lucas-Kanade algorithm and its variants [60]. The identical objective of these two methods is to solve the parametric vector. The commonly adopted mapping function between the corresponding patches is the first-order affine warp model that allows translation, rotation, shear transformation. In LSM, considering the linear radiometric difference, the relationship between two patches in the reference image and template image can be expressed by: where (a 0 , a 1 , a 2 , b 0 , b 1 , b 2 ) are the unknown parameters of the affine warp model, (k 1 , k 2 ) control the contrast and brightness differences, and n(x, y) denotes the random noise. After linearization with a first-order Taylor series, the residual function is given by: where X = [da 0 , da 1 , da 2 , db 0 , db 1 , db 2 , dk 1 , dk 2 ] T is the incremental parameter vector. C is a Jacobian or coefficient matrix that records the partial derivatives to each unknown parameter. L = k 01 t(a 00 + a 01 x + a 02 y, b 00 + b 01 x + b 02 y) + k 02 − r(x, y) is the observation term. The parameter vector can be solved using least-squares adjustment iteratively by updating the parameters until reaching a certain termination criterion [61]. Alternatively, for the parameter vector p = [a 0 , a 1 , a 2 , b 0 , b 1 , b 2 ] T that also considers the first-order mapping function, we can solve it by directly optimizing the similarity measure, such as the ZNSSD in Equation (3). The solution using the iterative Newton-Raphson algorithm can be written as [22]: where p 0 is the initial values of the parameter vector, ∇C(p 0 ) is the gradients of similarity function, and ∇∇C(p 0 ) is the second-order derivatives of similarity function, commonly called the Hessian matrix. In addition to this forward additive scheme, a more computational efficient inverse compositional scheme that avoids the redundant computation of the Hessian matrix in each iteration can be adopted [60,62]. Global methods pose the dense matching problem as the minimization of a global energy function that is normally formulated as the weighted sum of two terms [23]: where Ω denotes the entire image domain, w is the transformation function, E data is the data term that defines the total matching error involving similarity measures, E reg is the regularization term that provides the smoothness constraints in the homogeneous region while preserves the discontinuities at the boundaries, and λ is a parameter controlling the balance between two terms. The commonly used regularization models include spatial gradient constraint and non-local regularization [3]. This variational framework can be solved using the continuous optimization algorithms, such as Euler-Lagrange equations-based solutions and proximal splitting [23]. With regard to the similarity measure of PC, subpixel displacements can be recovered in the frequency domain by explicitly finding the best approximation to the phase difference without the inverse Fourier transform process. According to the translation property of Fourier transform and the Euler's formula, the phase difference between two patches that is the phase angle of the normalized cross-power spectrum Q in Equation (4) can be expressed by: This implies that the phase difference represents a 2-D plane in the Cartesian coordinates of m-n with the slopes determined by displacements. Furthermore, the normalized cross-power spectrum matrix is theoretically a rank-one matrix as it is separable. Therefore, the subpixel displacements can be estimated by line fitting of the phase difference after low-rank approximation [63].
Another natural way to retrieve subpixel displacement from the normalized cross-power spectrum is to directly compare the computed one Q(u, v) with the theoretical one exp(−i(um 0 + vn 0 )). For instance, in order to minimize the Frobenius norm of the weighted difference matrix, the objective function is formulated as [7]: where W(u, v) is a frequency weighting matrix and can be defined according to the magnitudes of cross-power spectrum [7]. The subpixel displacements (m 0 , n 0 ) are solved by minimizing φ using a gradient descent algorithm.

Algorithmic Implementations
In this paper, there were a total of twelve dense matching algorithms with subpixel accuracy for conducting the practical analysis and comparative study, as shown in Table 1. Among them, seven ones adopts the NCC-related similarity measure, and five ones adopts the PC-related similarity measure. The implementation details of these algorithms are described as follows.
(1) Centroid-NCC. This algorithm calculated the correlation function within the search range using NCC and estimated the subpixel displacements using Equation (5) in a 5 × 5 neighborhood around the initial peak for each matching position. The correlation values below the mean value were not considered. The implementation in the ImGRAFT toolbox [41] was used in this study.
(2) SimiFit-NCC. This algorithm estimated the subpixel displacements for each matching position by fitting the NCC correlation values to a quadratic curve in x and y direction independently. The implementation in COSI-Corr [7] called "statistic correlator" was used. (3) SimiFit-SGM. This algorithm also utilized quadratic function fitting, while the fitting object was the matching costs of a global discrete optimization algorithm instead. The implementation in the Ames Stereo Pipeline (ASP) [42] was used. The dense matching was implemented using a two-stage coarse-to-fine process. The costs were calculated using a variant of semi-global matching [25], which was extended to a 2-D disparity search and to update the costs using the information from more directions that could reduce the streaking artifacts [64]. (4) LSM. This algorithm iteratively updated the 8-parameters vector by solving Equation (11) via least-squares adjustment until the new NCC value would not increase. The initial displacements were provided by the conventional NCC matching. (5) IC-GN. This algorithm directly optimized the ZNSSD similarity measure between two patches and solved the parameters of first-order mapping function by the iterative Gauss-Newton algorithm. The ZNSSD similarity measure was related to NCC according to Equation (3). An inverse compositional scheme with higher computational efficiency and robustness was employed, which iteratively solved for an incremental warp rather than an additive update to the parameter vector, and inverted the roles of the reference image and template image [60]. The implementation in Ncorr [43] was used. A reliability-guided strategy was adopted to guide the correlation and provide initial guesses for the following positions [65]. (6) MicMac. The 2-D dense image matching algorithm implemented in MicMac calculated displacements for every position defined by the discretization step and intensity interpolation [27].
With the subpixel step, this algorithm optimized an energy function, including data term and regularization term, using multi-directional dynamic programming [66]. The data term was constructed using non-linear matching costs based on NCC. (7) Correlation flow. This is an optical flow algorithm that solves a global energy function in the form of Equation (13) [44]. In order to overcome the difficulty in integrating NCC in this variational framework, a correlation transform was separately performed on each image according to the relationship in Equation (3), and the data term was computed as the sum of squared differences between the correlation transforms. The variational framework employed a non-local regularization term and was optimized based on the projected-proximal-point algorithm in a coarse-to-fine warping strategy [44]. (8) Centroid-OC. This algorithm is similar to Centroid-NCC but employed OC [67] that is a PC-related similarity measure. OC can be regarded as a different type of normalization compared to PC, which performs cross-correlation in the frequency domain on the complex orientation images calculated from the orientation of intensity gradient. (9) SimiFit-PC. According to the 1-D peak fitting model in Equation (8), this algorithm found the subpixel peak locations of PC correlation function from multiple tri-tuples consisting of the initial peak and its corresponding surrounding points, without the need of iteration [45]. A low-pass Gaussian function was additionally used on the normalized cross-power spectrum for spectral weighting. (10) Upsampling. This algorithm directly resampled the PC similarity values to a higher resolution based on the matrix-multiplication implementation of discrete Fourier transform [20], as shown in Equation (9). A 1.5 × 1.5 pixels local neighborhood around the initial peak was upsampled with a factor of 200. (11) SVD-RANSAC. This algorithm estimated the subpixel displacements from the phase angle of domain singular vectors of normalized cross-power spectrum matrix by robust line fitting using a unified random sample consensus algorithm [46]. The frequency masking and a fringe filter with a size of 5 × 5 pixels were additionally applied to improve the robustness.
(12) COSI-Corr-Freq. The implementation of "frequency correlator" in COSI-Corr [7] estimated the subpixel displacements by minimizing Equation (15) using the two-point step size algorithm with the initial values from peak centroid. The frequency weighting matrix was adaptively determined based on the magnitude of normalized log-spectrum, and the correlation was iterated five times with the normalized cross-power spectrum matrix and the frequency weighting matrix adjusted in each iteration. The other parameter settings of these algorithms followed the recommended defaults in the original papers. The patch size for calculating similarity measure between the corresponding patches in subpixel matching was fixed in all experiments. For spatial image correlation algorithms, a patch size of 25 × 25 pixels was chosen. For matching algorithms with PC-related similarity measure, the use of an additional window function prior to Fourier transform to attenuate the edge effects makes the actual effective patch narrower, a slightly larger patch size of 32 × 32 pixels was thus chosen. For matching algorithms with regularization (SimiFit-SGM, MicMac, Correlation flow), a smaller patch size (e.g., 7 × 7 pixels) was more suitable.

Experimental Settings
In this study, the matching performance and robustness to systematic errors of the above twelve subpixel algorithms were quantitatively compared in a simulated experiment with known ground truth and qualitatively investigated using two real remote sensing applications.

Simulated Experiment
To evaluate the subpixel accuracy of different matching algorithms, the synthetic image pairs in the simulated experiment were produced with the ground truth from a single real high-resolution satellite image. The subpixel displacements were generated by low-pass filtering and downsampling [46]. We applied an integer displacement S to the high-resolution image and downsampled the original and translated images by a factor of M. The relative displacement between the downsampled image pair became a fractional value of S/M. In this study, the horizontal integer displacements S x were ranged from 1 to 10 pixels with an increment of 1 pixel, the vertical displacement S y was fixed at 10 pixels, and the downsampling factor was set as 10. To conform to the real radiometric conditions, we also introduced systematic radiometric differences to the reference image that were modeled as a linear model with gain and offset [13]. The gain and offset parameters were randomly generated for each subset in a 3 × 3 grid. In addition, aliasing that occurs if images are not spatially band-limited and is common with satellite systems was considered. To control the amount of aliasing, a low-pass Gaussian filter with a standard deviation of σ was applied prior to downsampling [68]. The standard deviation σ varied from 1 to 5 in a step of 1. Totally 50 synthetic image pairs with sizes of 1000 × 1000 pixels were, therefore, generated. The adopted high-resolution satellite image was a ZiYuan-3 panchromatic image with a resolution of 2.1 m and a quantization level of 10 bits. Figure 2 shows the sample reference images generated with different standard deviations of the Gaussian filter. It could be seen from the enlarged subsets that the amount of aliasing was decreased, and the image became smoother with the increasing value of σ. In the simulated experiment, the postprocessing stage only removed the mismatches whose biases exceeded the predefined threshold. The matching errors of different algorithms were then calculated between the derived displacements and actual displacements, and the performance of dense matching was assessed in terms of mean value and standard deviation of the error vector for each image pair.
Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 24 amount of aliasing, a low-pass Gaussian filter with a standard deviation of σ was applied prior to downsampling [68]. The standard deviation σ varied from 1 to 5 in a step of 1. Totally 50 synthetic image pairs with sizes of 1000 × 1000 pixels were, therefore, generated. The adopted high-resolution satellite image was a ZiYuan-3 panchromatic image with a resolution of 2.1 m and a quantization level of 10 bits. Figure 2 shows the sample reference images generated with different standard deviations of the Gaussian filter. It could be seen from the enlarged subsets that the amount of aliasing was decreased, and the image became smoother with the increasing value of σ. In the simulated experiment, the postprocessing stage only removed the mismatches whose biases exceeded the predefined threshold. The matching errors of different algorithms were then calculated between the derived displacements and actual displacements, and the performance of dense matching was assessed in terms of mean value and standard deviation of the error vector for each image pair.

Real Application One: Attitude Jitter Detection
Attitude jitter refers to periodic instability of satellite attitude. This subtle attitude variation can be transmitted to imaging sensors and result in image distortions. Attitude jitter can thus be effectively detected and estimated from short-time asynchronous image data [69]. In this process, the accuracy of dense image matching to retrieve the image distortions caused by attitude jitter is crucial. Here, we pixel-to-pixel matched different multispectral bands with a resolution of 5.8 m from the ZiYuan-3 satellite. The Ziyuan-3 multispectral sensor provided four spectral bands (blue, green, red, and near-infrared). For simplicity, the first two bands (blue and green) were used in this study. The derived displacements resulted from attitude jitter, which have been demonstrated to appear as an approximate sinusoidal model with an amplitude of fractional pixels [70]. Therefore, dense matching in attitude jitter detection of the ZiYuan-3 satellite is a reasonable choice for the practical study of different subpixel matching algorithm and possible systematic errors. Figure 3 gives the first band of the multispectral image used in this study. The coarse co-registration in the preprocessing stage was realized based on the designed distance between two bands on the focal plane. Since attitude jitter varied with time, and the ZiYuan-3 multispectral image was acquired by a push-broom sensor, the image distortion caused by attitude jitter in the same image line should be constant. Therefore, outliers in the results of dense matching were filtered based on the statistics within the image line. The false displacements that differed from the mean value beyond three times

Real Application One: Attitude Jitter Detection
Attitude jitter refers to periodic instability of satellite attitude. This subtle attitude variation can be transmitted to imaging sensors and result in image distortions. Attitude jitter can thus be effectively detected and estimated from short-time asynchronous image data [69]. In this process, the accuracy of dense image matching to retrieve the image distortions caused by attitude jitter is crucial. Here, we pixel-to-pixel matched different multispectral bands with a resolution of 5.8 m from the ZiYuan-3 satellite. The Ziyuan-3 multispectral sensor provided four spectral bands (blue, green, red, and near-infrared). For simplicity, the first two bands (blue and green) were used in this study. The derived displacements resulted from attitude jitter, which have been demonstrated to appear as an approximate sinusoidal model with an amplitude of fractional pixels [70]. Therefore, dense matching in attitude jitter detection of the ZiYuan-3 satellite is a reasonable choice for the practical study of different subpixel matching algorithm and possible systematic errors. Figure 3 gives the first band of the multispectral image used in this study. The coarse co-registration in the preprocessing stage was realized based on the designed distance between two bands on the focal plane. Since attitude jitter varied with time, and the ZiYuan-3 multispectral image was acquired by a push-broom sensor, the image distortion caused by attitude jitter in the same image line should be constant. Therefore, outliers in the results of dense matching were filtered based on the statistics within the image line. The false displacements that differed from the mean value beyond three times the standard deviation in each line were removed. Finally, the 1-D median filter was separately applied in every line for xand y-displacement maps.
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 24 the standard deviation in each line were removed. Finally, the 1-D median filter was separately applied in every line for x-and y-displacement maps.

Real Application Two: Disparity Estimation
Another application scenario to evaluate the performance of different dense matching algorithms in this study is disparity estimation, which is an essential step for 3-D reconstruction and digital elevation model generation [47]. Here, we applied two subsets of engineering data record images from the narrow-angle camera (NAC) of NASA's lunar reconnaissance orbiter camera (LROC) [71] to extract topographic information. The stereo image pair was acquired from two adjacent orbits with a temporal interval of about 2 h and captured the same area around the landing site of Apollo 17, the left image of which is shown in Figure 4a. The convergence angle for this image set was 12.2°, and the ground sampling distance was approximately 1.4 m. As shown in Figure 4b, we preprocessed the image pair using affine epipolar transformation in ASP [42] to roughly align two images. After matching the rectified images using different algorithms, the outliers in disparity maps were filtered, depending on the neighboring disparities [13]. The poor correspondences whose disparities deviated more than a certain threshold from the 5 × 5 mean filtered results were removed, and the disparity maps were finally refined using a 3 × 3 median filter.

Results of Simulated Experiment
The mean value and standard deviation of the absolute error vector as a function of σ calculated from different matching algorithms are shown in Figure 5. For each algorithm and each value of σ, there were 10 million matching measurements in total. For better visualization, we illustrated the subpixel algorithms based on NCC-related and PC-related similarity measures separately.
As could be seen, all the algorithms can generate the results with mean absolute error and standard deviation less than 0.2 pixels, which demonstrated the subpixel capability of these

Real Application Two: Disparity Estimation
Another application scenario to evaluate the performance of different dense matching algorithms in this study is disparity estimation, which is an essential step for 3-D reconstruction and digital elevation model generation [47]. Here, we applied two subsets of engineering data record images from the narrow-angle camera (NAC) of NASA's lunar reconnaissance orbiter camera (LROC) [71] to extract topographic information. The stereo image pair was acquired from two adjacent orbits with a temporal interval of about 2 h and captured the same area around the landing site of Apollo 17, the left image of which is shown in Figure 4a. The convergence angle for this image set was 12.2 • , and the ground sampling distance was approximately 1.4 m. As shown in Figure 4b, we preprocessed the image pair using affine epipolar transformation in ASP [42] to roughly align two images. After matching the rectified images using different algorithms, the outliers in disparity maps were filtered, depending on the neighboring disparities [13]. The poor correspondences whose disparities deviated more than a certain threshold from the 5 × 5 mean filtered results were removed, and the disparity maps were finally refined using a 3 × 3 median filter.
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 24 the standard deviation in each line were removed. Finally, the 1-D median filter was separately applied in every line for x-and y-displacement maps.

Real Application Two: Disparity Estimation
Another application scenario to evaluate the performance of different dense matching algorithms in this study is disparity estimation, which is an essential step for 3-D reconstruction and digital elevation model generation [47]. Here, we applied two subsets of engineering data record images from the narrow-angle camera (NAC) of NASA's lunar reconnaissance orbiter camera (LROC) [71] to extract topographic information. The stereo image pair was acquired from two adjacent orbits with a temporal interval of about 2 h and captured the same area around the landing site of Apollo 17, the left image of which is shown in Figure 4a. The convergence angle for this image set was 12.2°, and the ground sampling distance was approximately 1.4 m. As shown in Figure 4b, we preprocessed the image pair using affine epipolar transformation in ASP [42] to roughly align two images. After matching the rectified images using different algorithms, the outliers in disparity maps were filtered, depending on the neighboring disparities [13]. The poor correspondences whose disparities deviated more than a certain threshold from the 5 × 5 mean filtered results were removed, and the disparity maps were finally refined using a 3 × 3 median filter.

Results of Simulated Experiment
The mean value and standard deviation of the absolute error vector as a function of σ calculated from different matching algorithms are shown in Figure 5. For each algorithm and each value of σ, there were 10 million matching measurements in total. For better visualization, we illustrated the subpixel algorithms based on NCC-related and PC-related similarity measures separately.
As could be seen, all the algorithms can generate the results with mean absolute error and standard deviation less than 0.2 pixels, which demonstrated the subpixel capability of these

Results of Simulated Experiment
The mean value and standard deviation of the absolute error vector as a function of σ calculated from different matching algorithms are shown in Figure 5. For each algorithm and each value of σ, there were 10 million matching measurements in total. For better visualization, we illustrated the subpixel algorithms based on NCC-related and PC-related similarity measures separately.
As could be seen, all the algorithms can generate the results with mean absolute error and standard deviation less than 0.2 pixels, which demonstrated the subpixel capability of these matching algorithms in the presence of aliasing. It was evident that the matching errors for most algorithms presented an overall descending trend with the rising of σ, so as the standard deviations. The common trends indicate that the performance of subpixel matching becomes worse with the increased amount of aliasing. For the algorithms using NCC-related similarity measure, Correlation flow yielded better results in the case of large amounts of aliasing, and IC-GN and MicMac achieved the minimum errors when σ > 3. For the algorithms using PC-related similarity measure, the centroid method provided better results in the case of large amounts of aliasing, and SVD-RANSAC and COSI-Corr-Freq obtained the minimum errors when σ > 3. In contrast, SimiFit-SGM and Upsampling performed relatively worse in this simulated experiment. Similar conclusions can also be drawn from the standard deviation.
Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 24 matching algorithms in the presence of aliasing. It was evident that the matching errors for most algorithms presented an overall descending trend with the rising of σ, so as the standard deviations. The common trends indicate that the performance of subpixel matching becomes worse with the increased amount of aliasing. For the algorithms using NCC-related similarity measure, Correlation flow yielded better results in the case of large amounts of aliasing, and IC-GN and MicMac achieved the minimum errors when σ > 3. For the algorithms using PC-related similarity measure, the centroid method provided better results in the case of large amounts of aliasing, and SVD-RANSAC and COSI-Corr-Freq obtained the minimum errors when σ > 3. In contrast, SimiFit-SGM and Upsampling performed relatively worse in this simulated experiment. Similar conclusions can also be drawn from the standard deviation. In order to investigate the detailed relationship between the matching performance and the real displacement, an error distribution analysis was further conducted for each matching algorithm and each value of σ. Since the relative displacements were designed to vary in the x-direction, we calculated the mean value and standard deviation of x-direction matching errors in each synthetic image pair with a certain horizontal subpixel displacement, as shown in Figures 6 and 7, respectively.
It could be seen that the mean error and standard deviation curves of these twelve matching algorithms presented systematic trends with regard to subpixel displacements. The mean error curves revealed an S-shape trend with varying displacements. This implied the systematic errors, so-called pixel locking effect, depending on the matching algorithms and image characteristics [46,72]. The pixel locking effect refers to the phenomenon by which the estimated displacements tend to be biased toward discrete pixel values. Therefore, the matching errors show a periodic distribution within a pixel interval, in that the errors at approximate 0.5 pixels and integral pixel locations are relatively small, and the errors where displacements are larger than 0.5 pixels are positive values, while the errors where displacements were smaller than 0.5 pixels were negative values. With a decreasing value of σ, the incomplete sampling and aliasing become more serious, which is found to deteriorate the pixel locking effect. In addition, the degree of pixel locking effect for various matching algorithms was different. In the case of a smaller amount of aliasing (σ > 3), the degree of pixel locking effect for most matching algorithms was reduced considerably, while the algorithms still with relatively serious pixel locking effect included SimiFit-NCC, SimiFit-SGM, and Upsampling. Correlation flow presented the anti-pixel locking effect by which the estimated In order to investigate the detailed relationship between the matching performance and the real displacement, an error distribution analysis was further conducted for each matching algorithm and each value of σ. Since the relative displacements were designed to vary in the x-direction, we calculated the mean value and standard deviation of x-direction matching errors in each synthetic image pair with a certain horizontal subpixel displacement, as shown in Figures 6 and 7, respectively.
It could be seen that the mean error and standard deviation curves of these twelve matching algorithms presented systematic trends with regard to subpixel displacements. The mean error curves revealed an S-shape trend with varying displacements. This implied the systematic errors, so-called pixel locking effect, depending on the matching algorithms and image characteristics [46,72]. The pixel locking effect refers to the phenomenon by which the estimated displacements tend to be biased toward discrete pixel values. Therefore, the matching errors show a periodic distribution within a pixel interval, in that the errors at approximate 0.5 pixels and integral pixel locations are relatively small, and the errors where displacements are larger than 0.5 pixels are positive values, while the errors where displacements were smaller than 0.5 pixels were negative values. With a decreasing value of σ, the incomplete sampling and aliasing become more serious, which is found to deteriorate the pixel locking effect. In addition, the degree of pixel locking effect for various matching algorithms was different. In the case of a smaller amount of aliasing (σ > 3), the degree of pixel locking effect for most matching algorithms was reduced considerably, while the algorithms still with relatively serious pixel locking effect included SimiFit-NCC, SimiFit-SGM, and Upsampling. Correlation flow presented the anti-pixel locking effect by which the estimated displacements inclined away integer values, and the sign of matching errors was reverse. The standard deviation curves appeared as a bell shape with varying displacements. In the distribution of standard deviation, the maximum value was obtained at approximately 0.5 pixels for most cases, especially in the case of a large amount of aliasing. The standard deviations at other positions increased with the distance to the nearest integer value and were symmetric with respect to 0.5 pixels. It is possible that the displacements around 0.5 pixels are biased toward two integer sides more randomly. The standard deviations of diverse matching algorithms also varied differently and grew with a larger amount of aliasing.
Remote Sens. 2020, 12, x FOR PEER REVIEW 13 of 24 displacements inclined away integer values, and the sign of matching errors was reverse. The standard deviation curves appeared as a bell shape with varying displacements. In the distribution of standard deviation, the maximum value was obtained at approximately 0.5 pixels for most cases, especially in the case of a large amount of aliasing. The standard deviations at other positions increased with the distance to the nearest integer value and were symmetric with respect to 0.5 pixels. It is possible that the displacements around 0.5 pixels are biased toward two integer sides more randomly. The standard deviations of diverse matching algorithms also varied differently and grew with a larger amount of aliasing.   Remote Sens. 2020, 12, x FOR PEER REVIEW 13 of 24 displacements inclined away integer values, and the sign of matching errors was reverse. The standard deviation curves appeared as a bell shape with varying displacements. In the distribution of standard deviation, the maximum value was obtained at approximately 0.5 pixels for most cases, especially in the case of a large amount of aliasing. The standard deviations at other positions increased with the distance to the nearest integer value and were symmetric with respect to 0.5 pixels. It is possible that the displacements around 0.5 pixels are biased toward two integer sides more randomly. The standard deviations of diverse matching algorithms also varied differently and grew with a larger amount of aliasing.   The real displacements in the y-direction were fixed to 1 pixel for every synthetic image pair in this experiment. It was, however, surprising that the varying displacements in the x-direction affected the matching results in another direction as well, particularly for standard deviations. Figure 8 shows the relationships between y-direction standard deviations and x-direction subpixel displacements for each matching algorithm. It is worth noting that the patterns of y-direction standard deviations were similar to ones in the x-direction, and the values of y-direction standard deviations were slightly larger than ones in the x-direction. The y-direction standard deviations also achieved maximum value around 0.5 pixels and were proportional to the distance between x-direction displacements and the nearest integer value. A larger amount of aliasing degraded the matching stability in the y-direction as well.
Remote Sens. 2020, 12, x FOR PEER REVIEW 14 of 24 The real displacements in the y-direction were fixed to 1 pixel for every synthetic image pair in this experiment. It was, however, surprising that the varying displacements in the x-direction affected the matching results in another direction as well, particularly for standard deviations. Figure 8 shows the relationships between y-direction standard deviations and x-direction subpixel displacements for each matching algorithm. It is worth noting that the patterns of y-direction standard deviations were similar to ones in the x-direction, and the values of y-direction standard deviations were slightly larger than ones in the x-direction. The y-direction standard deviations also achieved maximum value around 0.5 pixels and were proportional to the distance between x-direction displacements and the nearest integer value. A larger amount of aliasing degraded the matching stability in the y-direction as well.

Results of Real Application One
After pixel-to-pixel matching the two multispectral band images, two filtered displacement maps were outputted from every matching algorithm, as shown in Figures 9 and 10. The x-direction and y-direction displacements referred to the image distortions caused by attitude jitter of roll angle and pitch angle, respectively. The stripe pattern with a color variation of displacement maps demonstrated the existence of attitude jitter, which was dominant in roll angle. As shown in Figure  9, all the compared dense matching algorithms were able to capture this significant variation of image distortion in the x-direction, and few differences could be found by visual inspection of displacement maps. The displacement maps were rather regular, in which the interval of these stripes reflected the frequency, and the degree of color variation reflected the amplitude.

Results of Real Application One
After pixel-to-pixel matching the two multispectral band images, two filtered displacement maps were outputted from every matching algorithm, as shown in Figures 9 and 10. The x-direction and y-direction displacements referred to the image distortions caused by attitude jitter of roll angle and pitch angle, respectively. The stripe pattern with a color variation of displacement maps demonstrated the existence of attitude jitter, which was dominant in roll angle. As shown in Figure 9, all the compared dense matching algorithms were able to capture this significant variation of image distortion in the x-direction, and few differences could be found by visual inspection of displacement maps. The displacement maps were rather regular, in which the interval of these stripes reflected the frequency, and the degree of color variation reflected the amplitude.
From the results in Figure 10, it was observed that the stripe patterns of y-direction displacement maps were less apparent since the amplitude of attitude jitter of pitch angle is relatively small, and the jitter distortion is contaminated by topographic information due to the short baseline [69]. The displacement maps obtained by Centroid-NCC and SimiFit-SGM were noisy, and the results from the SimiFit-NCC, SimiFit-SGM, LSM, and Upsampling seemed to appear some artifacts. In contrast, the displacement maps of other matching algorithms were visually similar to each other. From the results in Figure 10, it was observed that the stripe patterns of y-direction displacement maps were less apparent since the amplitude of attitude jitter of pitch angle is relatively small, and the jitter distortion is contaminated by topographic information due to the short baseline [69]. The displacement maps obtained by Centroid-NCC and SimiFit-SGM were noisy, and the results from the SimiFit-NCC, SimiFit-SGM, LSM, and Upsampling seemed to appear some artifacts. In contrast, the displacement maps of other matching algorithms were visually similar to each other. In order to more clearly compare the performance of different matching algorithms, the two-directional displacement maps were averaged by line as jitter distortions should be constant in each line. Figure 11 displays the averaged displacement curves in the x-direction. The results of most matching algorithms were found to agree well with each other and resemble a sinusoidal function  From the results in Figure 10, it was observed that the stripe patterns of y-direction displacement maps were less apparent since the amplitude of attitude jitter of pitch angle is relatively small, and the jitter distortion is contaminated by topographic information due to the short baseline [69]. The displacement maps obtained by Centroid-NCC and SimiFit-SGM were noisy, and the results from the SimiFit-NCC, SimiFit-SGM, LSM, and Upsampling seemed to appear some artifacts. In contrast, the displacement maps of other matching algorithms were visually similar to each other. In order to more clearly compare the performance of different matching algorithms, the two-directional displacement maps were averaged by line as jitter distortions should be constant in each line. Figure 11 displays the averaged displacement curves in the x-direction. The results of most matching algorithms were found to agree well with each other and resemble a sinusoidal function In order to more clearly compare the performance of different matching algorithms, the two-directional displacement maps were averaged by line as jitter distortions should be constant in each line. Figure 11 displays the averaged displacement curves in the x-direction. The results of most matching algorithms were found to agree well with each other and resemble a sinusoidal function with a regular frequency and values ranging from −1 to 0.5 pixels. Note that the results of SimiFit-NCC, SimiFit-SGM, LSM, and Upsampling suffered from the pixel-locking effect. Therefore, the displacement curves of these four algorithms accorded with others at the displacements of multiples of half and integral pixels and were convex at the displacements in the range of −0.5 to 0 pixels, while were concave at the displacements in the ranges of 0 to 0.5 and −1 to −0.5 pixels. In contrast, the result of Correlation flow presented a slight anti-pixel locking effect. In the y-direction, as shown in Figure 12, the periodic displacements were smaller with values, mainly ranging from 0 to 0.2 pixels. The pixel locking and anti-pixel locking effect can also be observed, by which the results of some algorithms were overall higher or lower. In addition, the results of SimiFit-NCC, SimiFit-SGM, LSM, and Upsampling appeared obvious artifacts. The positions of the artifacts were basically consistent, at where the x-direction displacements were around multiples of 0.5 pixels. One possible explanation for these artifacts is that the matching results in the y-direction were affected by the fractional values of x-direction displacements. Figure 13 gives the y-direction standard deviation curves, and the relationship between x-direction averaged displacements and y-direction standard deviations obtained by SimiFit-NCC and SimiFit-SGM. It could be seen that y-direction standard deviations were related to the fractional values of x-direction displacements, and the matching results in the y-direction were most unstable when the x-direction displacements were close to half pixels. The cases of systematic errors in this application agreed well with the phenomena in the simulated experiment.
with a regular frequency and values ranging from −1 to 0.5 pixels. Note that the results of SimiFit-NCC, SimiFit-SGM, LSM, and Upsampling suffered from the pixel-locking effect. Therefore, the displacement curves of these four algorithms accorded with others at the displacements of multiples of half and integral pixels and were convex at the displacements in the range of −0.5 to 0 pixels, while were concave at the displacements in the ranges of 0 to 0.5 and −1 to −0.5 pixels. In contrast, the result of Correlation flow presented a slight anti-pixel locking effect. In the y-direction, as shown in Figure 12, the periodic displacements were smaller with values, mainly ranging from 0 to 0.2 pixels. The pixel locking and anti-pixel locking effect can also be observed, by which the results of some algorithms were overall higher or lower. In addition, the results of SimiFit-NCC, SimiFit-SGM, LSM, and Upsampling appeared obvious artifacts. The positions of the artifacts were basically consistent, at where the x-direction displacements were around multiples of 0.5 pixels. One possible explanation for these artifacts is that the matching results in the y-direction were affected by the fractional values of x-direction displacements. Figure 13 gives the y-direction standard deviation curves, and the relationship between x-direction averaged displacements and y-direction standard deviations obtained by SimiFit-NCC and SimiFit-SGM. It could be seen that y-direction standard deviations were related to the fractional values of x-direction displacements, and the matching results in the y-direction were most unstable when the x-direction displacements were close to half pixels. The cases of systematic errors in this application agreed well with the phenomena in the simulated experiment.

Results of Real Application Two
The disparity maps can be obtained by matching the rectified image pair. Figure 14 displays the disparity maps outputted from twelve different matching algorithms, in which the color variation corresponded to the terrain relief. The overall depth trend from different matching algorithms was basically consistent, and several lunar craters could be identified in all the disparity maps. However, a simple visual inspection of the disparity maps showed that there existed systematic errors in some

Results of Real Application Two
The disparity maps can be obtained by matching the rectified image pair. Figure 14 displays the disparity maps outputted from twelve different matching algorithms, in which the color variation corresponded to the terrain relief. The overall depth trend from different matching algorithms was basically consistent, and several lunar craters could be identified in all the disparity maps. However, a simple visual inspection of the disparity maps showed that there existed systematic errors in some

Results of Real Application Two
The disparity maps can be obtained by matching the rectified image pair. Figure 14 displays the disparity maps outputted from twelve different matching algorithms, in which the color variation corresponded to the terrain relief. The overall depth trend from different matching algorithms was basically consistent, and several lunar craters could be identified in all the disparity maps. However, a simple visual inspection of the disparity maps showed that there existed systematic errors in some matching results. The disparity maps obtained by SimiFit-NCC, SimiFit-SGM, and MicMac appeared obvious wave artifacts, and the result from LSM can also see a relatively slight artifact, while the others were free of this artifact. It is noteworthy that the systematic error in this application was different from the above two experiments, which was not related to the fractional value of displacements. In contrast, the wave artifacts were induced regularly along a certain direction.
Remote Sens. 2020, 12, x FOR PEER REVIEW 18 of 24 matching results. The disparity maps obtained by SimiFit-NCC, SimiFit-SGM, and MicMac appeared obvious wave artifacts, and the result from LSM can also see a relatively slight artifact, while the others were free of this artifact. It is noteworthy that the systematic error in this application was different from the above two experiments, which was not related to the fractional value of displacements. In contrast, the wave artifacts were induced regularly along a certain direction. Besides, we further examined the disparity values of different matching algorithms along two profiles (i.e., profiles A and B, illustrated in Figure 14a), as shown in Figure 15. As could be seen, all the matching algorithms were able to capture the shape of large lunar crater and the underlying trend of terrain relief. However, in addition to the noisy results caused by random errors, some matching results were unexpectedly fluctuating around the underlying trend due to the systematic errors. The profile B in Figure 15b with a narrower disparity range illustrated this case more clearly. It was found that the most severe result influenced by these systematic errors was from MicMac. Besides, we further examined the disparity values of different matching algorithms along two profiles (i.e., profiles A and B, illustrated in Figure 14a), as shown in Figure 15. As could be seen, all the matching algorithms were able to capture the shape of large lunar crater and the underlying trend of terrain relief. However, in addition to the noisy results caused by random errors, some matching results were unexpectedly fluctuating around the underlying trend due to the systematic errors. The profile B in Figure 15b with a narrower disparity range illustrated this case more clearly. It was found that the most severe result influenced by these systematic errors was from MicMac.

Summary and Discussion
Based on the experimental results using both simulated and real data, the systematic errors induced in subpixel dense image matching can be generally divided into two types: displacement-dependent and displacement-independent errors. The displacement-dependent systematic errors have been revealed in the simulated experiment and real application one, which are represented as pixel locking effect and standard deviation artifact. The phenomenon that bias and standard deviation vary with the fractional values of displacements has been reported in the field of particle image velocimetry and digital image correlation [73,74]. It was found in this study that this phenomenon also exists in remote sensing applications. Besides, the fractional values of displacements in one direction are shown to influence the matching results in another direction as well. This issue is ascribed to the mismatch between the continuous assumption in subpixel matching and the real situation. Therefore, improvement of sensor resolution, frequency smoothing, more sophisticated interpolation method, and fitting model can attenuate these systematic errors, while the increasing amount of aliasing will aggravate the degree of pixel locking effect and standard deviation artifact. However, the specific mechanism of the displacement-dependent systematic errors still needs further investigation. The displacement-independent systematic errors are also complicated and will not obviously present in all scenarios. Some matching results of real application two appear wave artifacts that are not related to the values of displacements. These unexpected artifacts are possibly caused by the frequency-dependent radiometric noise of the used images. It is noted that similar wave artifacts can also be observed in previous dense matching applications [7].
Various dense matching algorithms with two types of correlation-based similarity measures and seven types of subpixel methods were investigated and evaluated in this study. The similarity measure, NCC, and PC are, respectively, operated in the spatial domain and frequency domain. They use different manners of normalizations and are all insensitive to radiometric changes. PC performs better in the case of poor visual contrast and large intensity differences, while NCC performs better with small patch size [13]. The experimental results indicate that matching algorithms using PC are free of the displacement-independent systematic errors since PC is robust to the possible frequency-dependent noise that is limited to a small range of frequencies [55]. In contrast, PC would experience more severe displacement-dependent systematic errors than cross-correlation since it is sensitive to noise spread over the frequencies, and the correlation peak is sharper. However, as calculated in the frequency domain and having a theoretical model, it is convenient for PC-related algorithms to deal with aliasing and noise by adopting simple additional operations [24]. For instance, the performance of SimiFit-PC is mainly attributed to the spectral

Summary and Discussion
Based on the experimental results using both simulated and real data, the systematic errors induced in subpixel dense image matching can be generally divided into two types: displacement-dependent and displacement-independent errors. The displacement-dependent systematic errors have been revealed in the simulated experiment and real application one, which are represented as pixel locking effect and standard deviation artifact. The phenomenon that bias and standard deviation vary with the fractional values of displacements has been reported in the field of particle image velocimetry and digital image correlation [73,74]. It was found in this study that this phenomenon also exists in remote sensing applications. Besides, the fractional values of displacements in one direction are shown to influence the matching results in another direction as well. This issue is ascribed to the mismatch between the continuous assumption in subpixel matching and the real situation. Therefore, improvement of sensor resolution, frequency smoothing, more sophisticated interpolation method, and fitting model can attenuate these systematic errors, while the increasing amount of aliasing will aggravate the degree of pixel locking effect and standard deviation artifact. However, the specific mechanism of the displacement-dependent systematic errors still needs further investigation. The displacement-independent systematic errors are also complicated and will not obviously present in all scenarios. Some matching results of real application two appear wave artifacts that are not related to the values of displacements. These unexpected artifacts are possibly caused by the frequency-dependent radiometric noise of the used images. It is noted that similar wave artifacts can also be observed in previous dense matching applications [7].
Various dense matching algorithms with two types of correlation-based similarity measures and seven types of subpixel methods were investigated and evaluated in this study. The similarity measure, NCC, and PC are, respectively, operated in the spatial domain and frequency domain. They use different manners of normalizations and are all insensitive to radiometric changes. PC performs better in the case of poor visual contrast and large intensity differences, while NCC performs better with small patch size [13]. The experimental results indicate that matching algorithms using PC are free of the displacement-independent systematic errors since PC is robust to the possible frequency-dependent noise that is limited to a small range of frequencies [55]. In contrast, PC would experience more severe displacement-dependent systematic errors than cross-correlation since it is sensitive to noise spread over the frequencies, and the correlation peak is sharper. However, as calculated in the frequency domain and having a theoretical model, it is convenient for PC-related algorithms to deal with aliasing and noise by adopting simple additional operations [24]. For instance, the performance of SimiFit-PC is mainly attributed to the spectral weighting with Gaussian function. SVD-RANSAC integrates frequency filtering and robust estimation, and COSI-Corr-Freq also embeds frequency masking and robustness iteration. Therefore, these three algorithms offer satisfactory robustness to systematic errors. In the NCC-related algorithms, similarity fitting with simple models, either using NCC values or using SGM costs, performs poorly. MicMac is limited by displacement-independent systematic errors, while the algorithms based on local and global continuous optimization possess the implicit smoothing effect to some degree and are able to reduce the influence of systematic errors. Surprisingly, peak centroid is a simple but effective subpixel method with low susceptibility to systematic errors, although the results of Centroid-NCC are somewhat noisy in real applications. Considering the warping assumption, MicMac and Correlation flow impose explicit smoothness constraint in the subpixel calculation, compared to the others that adopt a zero-order or first-order warp model between local patches. Therefore, they can generate smoother results with smaller standard deviations and handle the motion discontinuities better but could fail to recognize the unreliable surface changes [30]. In light of the results of comparative analysis in this study, the following suggestions on the selection of matching algorithms can be presented. (1) For time-crucial applications, peak centroid method is a reasonable option due to its simple implementation and pleasurable robustness. (2) PC-related algorithms exhibit favorable insensitivity to the displacement-independent systematic errors and are, thus, preferred for scenarios in the presence of frequency-dependent noise. (3) In the case of input images with aliasing, subpixel matching algorithms that achieve promising accuracy and robustness performance, such as IC-GN, MicMac, SVD-RANSAC, and COSI-Corr-Freq, are appropriate options to suppress the possible displacement-dependent systematic errors. (4) For the matching of scenes with considerable motion discontinuities, the algorithms based on global optimization with smaller patch sizes, such as MicMac and Correlation flow, are highly recommended.
In this study, we concentrated on the subpixel methods for area-based dense image matching and avoided additional operations that affected the exclusive comparison. In addition to the subpixel methods, the final quality of dense matching is also determined by the preprocessing stage that can change the characteristics of input images and the postprocessing stage that can directly filter out the false matches. It has been reported that a pre-filtering is able to effectively alleviate the displacement-dependent systematic errors [38,75], and additional tests in this study found that a low-pass filtering or radiometric calibration could attenuate the displacement-independent artifacts as well. However, the image details and gradients are usually reduced by the smoothing filters, which could lead to an increase in random errors in subpixel matching. It is not a trivial task to design a prefilter to balance the systematic errors and random errors. Therefore, the investigation of the intrinsic robustness of different subpixel matching algorithms is still indispensable in practical applications. In addition, the patch size is an essential parameter in dense image matching, which was empirically fixed in this study. The selection of suitable patch sizes for all cases is difficult. The larger patch sizes with enough information can improve the matching performance, especially the stability of matching, but increase the possibility of deformations that deviate the warping assumption. The patch size will not affect the presence of systematic errors [28]. Several adaptive strategies for adjusting patch size have been developed based on local structural statistics [14,76] or hierarchical variation [34,77]. The performance of various matching algorithms with varying patch sizes or adaptive strategies will be investigated in future work.

Conclusions
This paper presents a practical analysis and comparative study on the subpixel capability and robustness to systematic errors of area-based dense image matching for remote sensing applications. Experiments were conducted using both simulated and real data to compare and evaluate twelve existing matching algorithms. The simulated experiment and jitter detection application revealed the displacement-dependent systematic errors that were related to the intrinsic continuous assumption of matching algorithms and were exacerbated by the increasing amount of aliasing, and the disparity estimation application exhibited the displacement-independent artifacts that were related to the high-frequency image noise. According to the experimental results, peak centroid is a simple but effective subpixel method, while similarity fitting using quadratic function easily induces systematic errors. MicMac is susceptible to the displacement-independent artifacts, and the algorithms based on local and global continuous optimization can ensure the robustness to the systematic errors to different extents. PC-related algorithms are free of the displacement-independent artifacts and could be easily equipped with additional frequency operations to attenuate the displacement-dependent systematic errors. The future work will focus on the theoretical analysis and more implementation settings for various matching algorithms and matching frameworks.
Author Contributions: All authors contributed to this manuscript: Conceptualization, Z.Y. and Y.X.; methodology and software, Z.Y., Y.X. and J.Z.; experiment and analysis, Z.Y. and H.C.; data curation, Y.X. and X.T.; writing-original draft preparation, Z.Y.; writing-review and editing, Y.X.; supervision and funding acquisition, X.T. and U.S. All authors have read and agreed to the published version of the manuscript.