A Double Epipolar Resampling Approach to Reliable Conjugate Point Extraction for Accurate Kompsat-3 / 3A Stereo Data Processing

: Kompsat-3 / 3A provides along-track and across-track stereo data for accurate three-dimensional (3D) topographic mapping. Stereo data preprocessing involves conjugate point extraction and acquisition of ground control points (GCPs), rational polynomial coe ﬃ cient (RPC) bias compensation, and epipolar image resampling. Applications where absolute positional accuracy is not a top priority do not require GCPs, but require precise conjugate points from stereo images for subsequent RPC bias compensation, i.e., relative orientation. Conjugate points are extracted between the original stereo data using image-matching methods by a proper outlier removal process. Inaccurate matching results and potential outliers produce geometric inconsistency in the stereo data. Hence, the reliability of conjugate point extraction must be improved. For this purpose, we proposed to apply the coarse epipolar resampling using raw RPCs before the conjugate point matching. We expect epipolar images with even inaccurate RPCs to show better stereo similarity than the original images, providing better conjugate point extraction. To this end, we carried out the quantitative analysis of the conjugate point extraction performance by comparing the proposed approach using the coarsely epipolar resampled images to the traditional approach using the original stereo images. We tested along-track Kompsat-3 stereo and across-track Kompsat-3A stereo data with four well-known image-matching methods: phase correlation (PC), mutual information (MI), speeded up robust features (SURF), and Harris detector combined with fast retina keypoint (FREAK) descriptor (i.e., Harris). These matching methods were applied to the original stereo images and coarsely resampled epipolar images, and the conjugate point extraction performance was investigated. Experimental results showed that the coarse epipolar image approach was very helpful for accurate conjugate point extraction, realizing highly accurate RPC reﬁnement and sub-pixel y-parallax through ﬁne epipolar image resampling, which was not achievable through the traditional approach. MI and PC provided the most stable results for both along-track and across-track test data with larger patch sizes of more than 400 pixels.

For most stereo applications, epipolar image resampling is necessary for efficient image processing and stereo display in a stereo plotter [9,10].
The requirements for good epipolar image resampling are near zero y-parallax and x-parallax linearly proportional to the ground height; hence, accurate sensor modeling such as that based on rational polynomial coefficients (RPCs) is necessary. Epipolar image-resampling algorithms [11][12][13] use sensor modeling information to establish accurate correspondence between the ground object space and the image space. However, the conventional positional accuracy of direct sensor modeling or raw RPCs range from a few pixels to hundreds of pixels depending on the satellite. Therefore, the bias compensation for the sensor modeling is performed using the conjugate points and ground control points (GCPs) that are obtained from in situ surveys or from existing geospatial data. In addition, the conjugate points and GCPs in the image space should not be skewed but well distributed over the entire image. Accurate sensor modeling achieves less than one-pixel-level of accuracy [14,15].
Sensor modeling using conjugate points and GCPs increases not only the absolute positional accuracy, but also the relative geometric consistency between the stereo data. However, some applications over inaccessible areas do not necessarily require GCPs when rapid 3D monitoring is the top priority and the absolute positional accuracy is the secondary concern. Even under these circumstances, the direct use of the raw RPCs from data providers may not satisfy the geometric consistency between the stereo images because the RPCs of each image contain errors due to the satellite ephemeris. Even in a single trajectory, a satellite changes its attitude to acquire overlapping data between stereo data acquisitions, resulting in geometric inconsistency between stereo data [16]. Therefore, reliable conjugate points must be extracted through image matchings and used for accurate sensor modeling to ensure high geometric consistency among the stereo data, which comprise relative orientation.
In terms of image matching, conjugate points are extracted using reliable image-matching methods that are generally classified into two types according to their similarity measures-area-based and feature-based methods [17]. The area-based methods use templates of a predefined size at each image and identify the peak of similarity between them for extracting the conjugate points. It works well in situations in which the geometric differences between images in terms of scale, rotation, and translation are not serious. In this situation, a search space of the sliding template can be geometrically constrained to identify the similarity peak associated with the location of the conjugate points. Mutual information (MI) [18], phase correlation (PC) [19], and least square matching [20] are representative area-based matching approaches. However, the area-based methods are likely to fail when the images show significant distortions or large geometric differences. The feature-based methods extract conjugate points based on representative points, e.g., dominant features, line intersections, and centroid pixels of close-boundary regions. The extracted representative points between images are matched with each other using descriptors or similarity measures and the spatial relationships. These methods can be applied even when images have significant distortions and geometric differences [21,22]. Scale invariant feature transform (SIFT) [23], speeded up robust features (SURF) [24], and binary robust invariant scalable keypoints (BRISK) [25] are representative feature-based matching approaches. However, the limitations of this method are that they find conjugate points from the entire image that usually has similar features and might result in a large number of mismatched pairs.
To reduce the number of falsely matched points extracted in these methods, the search space for finding the correspondences should be minimized. A limited search space for finding conjugate points enables successful extraction in both the area-based and feature-based methods. Since RPCs relating the ground object space and image space when the satellite image was acquired are available, a rough geometric relation between stereo images can be derived. Even raw RPCs, i.e., RPCs without bias compensation, improve the matching results by reducing the search space. Using trajectory information of the sensor is an example approach to reduce the search space for conjugate point extraction [26,27]. Coarse-To-Fine matching approach is also based on the concept to reduce the search space in the coarse matching step to extract the reliable conjugate points in a fine matching step [28,29].
Remote Sens. 2020, 12, 2940 3 of 26 However, geometrically overlapped images (or local templates of the images) with raw RPCs still include differences in the translation, scale, rotation, and shear because of the errors in the RPCs. These differences can cause a large number of false matches. Apart from reducing the search space, studies focusing on satellite imagery to improve the matching performance were also introduced. In [30], uniform robust SIFT was developed by extracting features focusing on the numbers and their distribution. Studies for extracting well-distributed conjugate points over the overlapping regions were conducted by splitting the region for carrying out the matching process [18,21].
Generally, the conjugate point extraction is carried out on the original image space. In the case that the similarity between the original stereo data is high, these approaches show decent performance with proper outlier removal. However, there are many cases when low similarity stereo sets show poor matching or failure. Therefore, it is important to increase the reliability of conjugate point extraction. Therefore, we propose a double epipolar image-resampling method for more reliable and robust conjugate point extraction and for generating precise epipolar images and accurate 3D processing. First, a coarse epipolar resampling is performed using raw RPCs, and this is followed by the conjugate point extraction and RPC bias compensation. Second, fine epipolar image resampling is performed with highly stereo-consistent RPCs for achieving good-quality 3D stereo data. The basic idea is that coarse epipolar images with even inaccurate RPCs should show better similarity than the original images for better conjugate point extraction. In addition, various image-matching methods, including the area-based methods-MI and PC-and feature-based methods-SURF and Harris corners together with fast retina keypoint (FREAK) description (hereafter, "Harris"), were tested for quantitative analysis of the conjugate points extraction performance between the proposed and traditional approaches.
The experiments were conducted for along-track Kompsat-3 stereo and across-track Kompsat-3A stereo data. We compared the performance of the traditional and proposed stereo data-processing approaches from the perspectives of conjugate point extraction, RPC bias compensation, and resulting y-parallax. For the traditional approach, we tested the matching methods for the original stereo data to investigate their conjugate point extraction performance. The conjugate points extracted by each matching method were used for relative RPC bias compensation [16]. The conjugate image points were used to generate quasi-GCPs by the space intersection using the RPCs, followed by the reprojection of the quasi-GCPs back to the stereo data to analyze the errors in the image space. The errors were minimized by applying bias compensation to the RPCs [31][32][33]. In contrast, for the proposed approach, we generated coarse epipolar images using the raw RPCs, followed by image matching with different methods and template sizes. The extracted conjugate points were transformed back into the original image space using the coarse epipolar image transformation parameters for relative orientation. Then, fine epipolar image resampling was performed with the bias-compensated RPCs. We analyzed the performances of conjugate point extraction, RPC bias compensation, and y-parallax for each case.
The paper is organized as follows: Section 2 introduces and elaborates on relative orientation. Section 3 provides the experimental results. The conclusions are presented in Section 4.

Methods
The flowchart of the study is presented in Figure 1. Figure 1a shows the traditional relative orientation-based epipolar image resampling. Given stereo satellite data with RPCs, image matchings were performed with outlier removal to extract conjugate points. We applied several matching methods, such as Harris, SURF, MI, and PC to investigate their conjugate point extraction performance. The extracted conjugate points were intersected for ground-point coordinate computation using the raw RPCs, i.e., for quasi-GCP generation. The quasi-GCPs were back projected to the image space to perform RPC bias compensation of the stereo data. Then, epipolar image resampling and y-parallax analysis were performed. Figure 1b shows the process when coarse epipolar image resampling is introduced before conjugate point extraction and RPC bias compensation, and this coarse resampling is followed by fine epipolar resampling; in other words, double epipolar resampling is performed. Image matching is performed for the coarse epipolar resampled images, and the performances were compared. The conjugate points extracted from the coarse epipolar images were transformed back to the original image space using the resampling transformation parameters. After RPC bias compensation, fine epipolar resampling was performed to ensure near zero y-parallax between the stereo data.
Remote Sens. 2020, 12, x FOR PEER REVIEW 4 of 29 double epipolar resampling is performed. Image matching is performed for the coarse epipolar resampled images, and the performances were compared. The conjugate points extracted from the coarse epipolar images were transformed back to the original image space using the resampling transformation parameters. After RPC bias compensation, fine epipolar resampling was performed to ensure near zero y-parallax between the stereo data.

Epipolar Resampling
Stereo satellite data are resampled to satisfy the aforementioned epipolar condition (zero yparallax and linear relationship between the x-parallax and ground height). Several epipolar image resampling algorithms [11][12][13] provide satisfactory results in terms of sub-pixel-level y-parallax. All these algorithms require accurate sensor modeling information such as RPCs to perform accurate projections between the ground object space and image space. In this study, we used the piecewise approach [11] that iteratively extracts epipolar curve points over entire images to establish epipolar image transformation. Figure 2 shows the epipolar curve pair extraction from the image-to-ground projection and the alignment of the epipolar curve to the same image line with an example satellite stereo data.

Epipolar Resampling
Stereo satellite data are resampled to satisfy the aforementioned epipolar condition (zero y-parallax and linear relationship between the x-parallax and ground height). Several epipolar image resampling algorithms [11][12][13] provide satisfactory results in terms of sub-pixel-level y-parallax. All these algorithms require accurate sensor modeling information such as RPCs to perform accurate projections between the ground object space and image space. In this study, we used the piecewise approach [11] that iteratively extracts epipolar curve points over entire images to establish epipolar image transformation. Figure 2 shows the epipolar curve pair extraction from the image-to-ground projection and the alignment of the epipolar curve to the same image line with an example satellite stereo data.

Conjugate Point Extraction
Extracting conjugate points with the help of RPCs that allow projection between the ground object space and image space can reduce the search area, thereby increasing the matching rate. The basic idea is that when even coarsely epipolar images are resampled with raw RPCs, i.e., without bias compensation, theoretically, it is possible to minimize dissimilarities (e.g., y-direction parallax, rotation amount, etc.) Remote Sens. 2020, 12, 2940 5 of 26 other than the x-direction parallax. In other words, we aim to demonstrate that the extraction of conjugate points from coarse epipolar images will be more advantageous than extraction from the original images. Therefore, two representative feature-based methods, SURF and Harris, and two representative area-based methods, MI and PC, were applied to the original and coarsely epipolar resampled stereo data, and the results were examined.

Conjugate Point Extraction
Extracting conjugate points with the help of RPCs that allow projection between the ground object space and image space can reduce the search area, thereby increasing the matching rate. The basic idea is that when even coarsely epipolar images are resampled with raw RPCs, i.e., without bias compensation, theoretically, it is possible to minimize dissimilarities (e.g., y-direction parallax, rotation amount, etc.) other than the x-direction parallax. In other words, we aim to demonstrate that the extraction of conjugate points from coarse epipolar images will be more advantageous than extraction from the original images. Therefore, two representative feature-based methods, SURF and Harris, and two representative area-based methods, MI and PC, were applied to the original and coarsely epipolar resampled stereo data, and the results were examined.
To detect well-distributed conjugate points over overlapping stereo images, local templates were constructed over the entire region with the same intervals in the x and y directions. When a template of one image (reference image) was determined, the location of the corresponding template of the other image (target image) was determined based on the coordinate information derived from the RPCs.
After extracting the conjugate points using each method, the outliers, that is, the falsely matched conjugate point pairs, were removed. To this end, the extracted conjugate points were used to estimate affine transformation coefficients, and the outliers with large root mean square errors (RMSEs) compared to other conjugate points were eliminated from the conjugate point set. The process of the outlier removal was repeated until all the remaining conjugate points satisfied the condition that the RMSE values were smaller than the predefined threshold. The target image was then warped to the coordinates of the reference image by using the estimated affine coefficients to calculate the correlation between the images to check the quality of the warping performance.
The detailed explanation of each matching method considering the local template concept is given in the following sub-sections.

Feature-Based Matching Using Local Templates
In this study, we adopted SURF and Harris approaches as examples of the feature-based matching methods. General feature-based matching methods that extract features and find correspondences from the entire image based on descriptor similarity result in a large number of To detect well-distributed conjugate points over overlapping stereo images, local templates were constructed over the entire region with the same intervals in the x and y directions. When a template of one image (reference image) was determined, the location of the corresponding template of the other image (target image) was determined based on the coordinate information derived from the RPCs.
After extracting the conjugate points using each method, the outliers, that is, the falsely matched conjugate point pairs, were removed. To this end, the extracted conjugate points were used to estimate affine transformation coefficients, and the outliers with large root mean square errors (RMSEs) compared to other conjugate points were eliminated from the conjugate point set. The process of the outlier removal was repeated until all the remaining conjugate points satisfied the condition that the RMSE values were smaller than the predefined threshold. The target image was then warped to the coordinates of the reference image by using the estimated affine coefficients to calculate the correlation between the images to check the quality of the warping performance.
The detailed explanation of each matching method considering the local template concept is given in the following sub-sections.

Feature-Based Matching Using Local Templates
In this study, we adopted SURF and Harris approaches as examples of the feature-based matching methods. General feature-based matching methods that extract features and find correspondences from the entire image based on descriptor similarity result in a large number of mismatched pairs. To reduce such falsely matched points extracted in the feature-based methods, the search space for finding the correspondences should be minimized. To this end, the entire image was divided into regular templates, and the conjugate point extraction approach was applied to the corresponding templates [18,21]. More specifically, the template in the reference image was first constructed, and the center coordinate of the template was converted to the ground coordinates using the RPCs of the reference image. Then, the ground coordinate was converted to the coordinates of the target image using the RPCs of the target image. The corresponding local template of the target image was Remote Sens. 2020, 12, 2940 6 of 26 determined by selecting the image coordinate as the center of the template. According to the properties of the feature-based matching method, large numbers of conjugate points can be extracted for each corresponding template pair. To fairly compare the area-based and feature-based matching methods in terms of extracted numbers of the conjugate points, for each corresponding template pair, we extracted only the conjugate point that shows the highest similarity among the conjugate points.
The first feature-based matching method that we used is the well-known SURF algorithm proposed by Bay et al. [24]. The SURF algorithm uses the Hessian matrix to solve the problem of reduced repeatability and to increase the processing speed. Descriptors were generated to describe the characteristics of the extracted features. The SURF algorithm generates a 64-dimensional description vector that describes each feature by calculating the Haar-wavelet response value for the subdivided area. Finally, the ratio of the first to the second closest distances between the features described by the 64-dimensional description vector is calculated to extract the conjugate points.
In [34], authors compared combinations of well-known feature detectors and descriptors including Harris-FREAK, Hessian-SURF, maximally stable extremal regions (MSER)-SURF, and Fast-FREAK to identify the optimal detector and descriptor pair that best suits the matching procedure. Experiments conducted on 50 images revealed that the Harris-FREAK combination outperformed the other combinations. Therefore, as a second feature-based matching method, Harris-FREAK was adopted. In this method, a feature is identified by the Harris corner detector and creates a descriptor of each feature using the FREAK feature extractor [35].
The Harris detector [36] is invariant against geometric transformation and is resistant to change in illumination and noise. In this method, the corner point is detected based on the difference in pixel intensities measured using the auto correlation function. FREAK, which is inspired by the retina of the human visual system, is a keypoint descriptor [37]. A retinal sampling pattern is used for computing a cascade of binary strings by efficiently comparing image intensities. A binary descriptor is constructed by thresholding the difference between pairs of receptive fields with their corresponding Gaussian kernels. To combine the feature detector and descriptor, the FREAK descriptor is generated from the neighboring region of the point extracted from the Harris corner detector.

Area-Based Matching Using Feature-Centered Local Templates
PC and MI are the area-based matching methods used in this study. In such methods, generally, the center of the local template in the reference image is used as a feature, and its corresponding template in the target image is employed by using the geometric relationship between them. However, these methods do not work properly even in local regions since the center of the template rarely corresponds to a keypoint with a strong characteristic for descriptors, which provides the advantage of being extracted as a matching point. Therefore, we applied the area-based method in a local region where the center was extracted from a keypoint with a strong characteristic. Specifically, the center of each local template in the reference image was selected from the position of the strongest feature extracted based on the Hessian box filter used in the SURF method [24]. Then, the center of the template was converted to the ground location using the RPCs of the reference image, and the converted ground location was projected to the coordinate of the target image using the RPCs of the target image. The corresponding template of the target image was determined based on the position of its center for the matching process.
The PC approach was adopted to identify the similarity peak, which is associated with the position where the corresponding local templates show the optimal translation difference. The translation difference between template images in the x and y directions can be extracted by PC [19,38]. The method searches the difference in the frequency domain. Since we assume that the template images only have translation differences, the phase correlation values are approximately zero everywhere except at the shifted displacement. The location of this peak can be interpreted as the translation difference between the two corresponding local templates. The MI method [18] matches two template images by measuring their statistical correlation. The MI value can be calculated using the sum of the entropies of the template images subtracted by their joint entropy. The larger the MI value between template images, the greater is the similarity between them.

Relative Orientation Based on Bias Compensation of RPCs with Outlier Removal
In photogrammetry, the relative orientation is basically the estimation of the position and orientation of one image relative to another from the correspondences of ray pairs without GCPs. For satellite sensors, the ray pairs are generated from extracted conjugate points using the rational function model (RFM) shown in Equation (1) [14]. The quasi-GCPs are computed by intersecting the light rays ( Figure 3). Note that the light rays do not actually intersect if the RPCs, i.e., the coefficients of the RFM, are not sufficiently accurate, but they should be estimated by the least square adjustment. Because the quasi-GCPs are not on the light rays, the image coordinates of the quasi-GCPs also differ from those of the original image points when the ground coordinates of quasi-GCPs are back projected to the image space using Equation (2). These differences indicate the geometric inconsistency between the stereo data, and they must be minimized over the entire image using affine or polynomial relationships in the image space as shown in Equation (3). This minimization process is called RPCs bias compensation and well explained in [14]. In this study, we used the bias compensation to investigate how much accuracy the derived conjugate points can achieve from the proposed coarse epipolar image approach and the traditional conjugate point extraction method.
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 29 template images only have translation differences, the phase correlation values are approximately zero everywhere except at the shifted displacement. The location of this peak can be interpreted as the translation difference between the two corresponding local templates. The MI method [18] matches two template images by measuring their statistical correlation. The MI value can be calculated using the sum of the entropies of the template images subtracted by their joint entropy. The larger the MI value between template images, the greater is the similarity between them.

Relative Orientation Based on Bias Compensation of RPCs with Outlier Removal
In photogrammetry, the relative orientation is basically the estimation of the position and orientation of one image relative to another from the correspondences of ray pairs without GCPs. For satellite sensors, the ray pairs are generated from extracted conjugate points using the rational function model (RFM) shown in Equation (1) [14]. The quasi-GCPs are computed by intersecting the light rays ( Figure 3). Note that the light rays do not actually intersect if the RPCs, i.e., the coefficients of the RFM, are not sufficiently accurate, but they should be estimated by the least square adjustment. Because the quasi-GCPs are not on the light rays, the image coordinates of the quasi-GCPs also differ from those of the original image points when the ground coordinates of quasi-GCPs are back projected to the image space using Equation (2). These differences indicate the geometric inconsistency between the stereo data, and they must be minimized over the entire image using affine or polynomial relationships in the image space as shown in Equation (3). This minimization process is called RPCs bias compensation and well explained in [14]. In this study, we used the bias compensation to investigate how much accuracy the derived conjugate points can achieve from the proposed coarse epipolar image approach and the traditional conjugate point extraction method.
j ) are the j-th conjugate point image coordinates from the image matching for the left and right image, respectively. Further, (l ) are the image coordinates computed from the quasi-ground points using RPCs.
where A, B are the bias compensation coefficients.
There is a high possibility of outliers among the extracted conjugate points. Therefore, we applied an outlier removal process based on the data snooping [39] for the RPC bias compensation process.

Experimental Dataset
Experiments were conducted for along-track Kompsat-3 and across-track Kompsat-3A images over Yangsan and Daejeon in South Korea, respectively. The detailed specifications of the scenes are listed in Table 1, and the images are presented in Figure 4. The processing level is radiometrically corrected 1R. The Kompsat-3 data include various land covers such as urban, agriculture, and mountains, but Kompsat-3A data include more areas with high-rise buildings, resulting in large geometric dissimilarities between the stereo data. In addition, Kompsat-3 data are along-track data acquired for high radiometric similarity, but Kompsat-3A data are across-track data with a time gap of 7 days and much less radiometric similarity; for example, these data show differences in the length of shadows. Moreover, the Kompsat-3A data have acquisition angle differences in terms of the roll and pitch, unlike the Kompsat-3 data, which mostly show only differences in pitch angle.

Coarse Epipolar Image Resampling Using Raw RPCs
We performed the coarse resampling using the raw RPCs. Figures 5 and 6 show Kompsat-3 along-track stereo and Kompsat-3A across-track stereo data, respectively. The lines and red and blue dots indicate the image boundaries and epipolar points in the original and resampled images, respectively. Note that the blue dots are aligned on the same image line in the images on the left and right. We used a second order polynomial model for the transformation from the original image points (red dots) to the resampled image (blue dots). Note that RPCs are used to produce the epipolar curve points (red dots) in the original stereo images. The points are rearranged to satisfy the epipolar Remote Sens. 2020, 12, 2940 9 of 26 geometry as blue dots. This rearrangement is the geometric transformation between epipolar image space and the original image space that can be modeled using a polynomial model. In [11], the second order polynomial model could achieve a mean residual less than 0.1 pixels.

Coarse Epipolar Image Resampling Using Raw RPCs
We performed the coarse resampling using the raw RPCs. Figures 5 and 6 show Kompsat-3 along-track stereo and Kompsat-3A across-track stereo data, respectively. The lines and red and blue dots indicate the image boundaries and epipolar points in the original and resampled images, respectively. Note that the blue dots are aligned on the same image line in the images on the left and right. We used a second order polynomial model for the transformation from the original image points (red dots) to the resampled image (blue dots). Note that RPCs are used to produce the epipolar curve points (red dots) in the original stereo images. The points are rearranged to satisfy the epipolar geometry as blue dots. This rearrangement is the geometric transformation between epipolar image space and the original image space that can be modeled using a polynomial model. In [11], the second order polynomial model could achieve a mean residual less than 0.1 pixels.     Cyan and red colors are for the right and left images, respectively. The figure shows that the buildings seen in stereo data are now aligned along the horizontal direction. By measuring the line coordinates differences between the same objects that appeared in the stereo, we can compute the y-parallax that hinders the efficient and accurate stereo processing. The two dotted lines show each line coordinate of the same object (e.g., building corners in cyan and blue, respectively) and the discrepancy between the dotted lines shows large y-parallaxes of up to 17-25 pixels. For better stereo post-processing, such as that required for precise 3D display and dense image matching, these y-parallaxes should be minimized to the sub-pixel level.
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 29 Figure 7 shows parts of the coarse epipolar resampled images as examples. Cyan and red colors are for the right and left images, respectively. The figure shows that the buildings seen in stereo data are now aligned along the horizontal direction. By measuring the line coordinates differences between the same objects that appeared in the stereo, we can compute the y-parallax that hinders the efficient and accurate stereo processing. The two dotted lines show each line coordinate of the same object (e.g., building corners in cyan and blue, respectively) and the discrepancy between the dotted lines shows large y-parallaxes of up to 17-25 pixels. For better stereo post-processing, such as that required for precise 3D display and dense image matching, these y-parallaxes should be minimized to the sub-pixel level.

Conjugate Point Extraction
Four matching methods were applied to the original images and the coarsely epipolar resampled images. The matching processes were employed for each corresponding template image pair. To

Conjugate Point Extraction
Four matching methods were applied to the original images and the coarsely epipolar resampled images. The matching processes were employed for each corresponding template image pair. To analyze the matching performance according to template sizes, the size of the template was increased from 100 × 100 pixels to 1000 × 1000 pixels, increasing the size by 100 pixels in both the x and y directions. The interval of the local template was set to 1000 pixels in the x and y directions regardless of the template size.
The matching results were analyzed in terms of the extracted numbers of the conjugate points, distribution of the conjugate points, correlation between the relatively oriented images, and efficiency. A distribution quality index was used for evaluating the distribution of the conjugated points [40]. Distribution quality describes how evenly the conjugate points are distributed over an image. To determine the distribution quality, triangulations were performed using the Delaunay algorithm. The distribution quality consists of area descriptor D A and shape descriptor D S : where n is the number of triangles; A i , the area of triangle i; and Max(J i ), the largest internal angle i in radians. A low descriptor value indicates better point distribution. Considering both the area and shape descriptors, the distribution of points can be described by the distribution quality DQ: The problem of the distribution quality is that the value is calculated according to the triangular regions constructed from the conjugate points. It means that the region where the conjugate points were not extracted, i.e., outside the triangular region, does not affect the value calculation. Therefore, we additionally selected four pseudo-conjugate points along the boundary of the overlapping regions between the images to enlarge the triangular region [41]. Accordingly, the distribution quality enables the evaluation of the distribution of the conjugate points while considering the entire overlapping region.
The correlation coefficients (CCs) between the reference and warped target images were calculated [38]. To transform the target image, we used affine coefficients, which were estimated in the outlier removal process by considering conjugate points with RMSE > 10 as outliers and repeatedly removing such points from the set to obtain the final affine coefficients. The CC between two images was calculated using the following equation: where σ I ref I tar denotes the covariance of the pixel values between the corresponding reference and the sensed images, and σ I ref and σ I tar are the standard deviations of the pixel values for each image. The higher the CC value, the better the warping result. The acquisition times during the matching process were also measured to compare the efficiency of the matching processes. All the process of the conjugate point extraction was implemented using Matlab. Conjugate point extraction results for Kompsat-3 along-track stereo data are shown in Figure 8. First, in terms of the number of the extracted conjugated points, a large number of matching pairs were extracted when the PC method with a template size of 100 was applied. Although the number of matching pairs was large, the likelihood of falsely matched pairs is high because of the small search area resulting from the small template size so that the matched points are extracted from the local optimum rather than from the location of the actual matching pair. In other words, the template size of 100 is insufficient to compensate for the geometric errors of the Kompsat-3 1R stereo data to find proper conjugate points. For the other template sizes, it was confirmed that a similar number of matching pairs were extracted for each method. More matching pairs were extracted from the coarsely epipolar resampled images than from the original resampled images, regardless of the template size.
In terms of distribution quality, it was difficult to identify the trend of distribution according to the template size. Differences based on the distribution may be observed to some extent depending on each method, but in this study, since the local template was defined where matching pairs were extracted, the tendency of the distribution difference according to the methods was not observed clearly. However, considering the extracted numbers and the distribution, we find that the distribution is poor when fewer pairs are extracted. This is confirmed by the DQ values, which tend to increase when the extracted conjugate point numbers decrease.
A comparison of the CC values shows that a significant improvement was achieved by using the coarse epipolar images compared to the original images, regardless of the template size. In terms of the time required, the MI method took the longest time, while the rest of the methods required relatively shorter times. Overall, it was verified that the time required increases with the template size. Almost similar experimental results were obtained in the case of the across-track Kompsat-3A stereo data set, as seen in Figure 9.
To clearly analyze the difference between the conjugate points extracted from the original and coarse epipolar image pairs, the performance for a fixed template size of 500 × 500 pixels with an interval of 1000 pixels in both the x and y directions was compared. A comparison of the experimental results for original and coarse epipolar Kompsat-3A along-track images ( Figure 10) showed that the coarse epipolar images had more extracted conjugate points and higher CC values. In terms of the distribution quality, the PC and SURF methods showed better distribution results when applied to the coarse epipolar images. However, it is difficult to say whether this result was derived from the properties of the PC and SURF methods. Since the matching process was performed in the local template, the distribution properties according to the matching methods were not clearly seen. There was no significant difference in acquisition times between the original and coarse epipolar images. A comparison of the matching methods showed that the area-based methods (PC and MI), extracted more conjugated points than the featured-based ones. We observed no significant difference in the CC values with the matching methods. In terms of acquisition time, the MI method took a longer time compared to the other methods. To visually check the extracted number of conjugate points and their distributions according to the matching methods, the conjugated points were overlaid with the original and coarse epipolar images ( Figure 11). Since the matching was performed based on the local template, the conjugate points were evenly distributed in all cases.
The results obtained for the across-track stereo dataset in terms of the extracted conjugate points, CC, and acquisition time were similar to those obtained for the along-track stereo dataset ( Figure 12). More conjugated points were extracted from the coarse epipolar images by the area-based methods (i.e., PC and MI), whereas fewer conjugated points were extracted from both the original and coarse epipolar images when the feature-based SURF and Harris methods were used. The conjugated points are overlaid with the original and coarse epipolar images in Figure 13 to allow visual inspection of the points according to the matching methods. For the PC and MI methods, the conjugate points were well-distributed over the entire overlapping region. However, the SURF and Harris approaches could not extract sufficient conjugate points for distribution over the entire region. Remote Sens. 2020, 12, x FOR PEER REVIEW 13 of 29   more conjugated points than the featured-based ones. We observed no significant difference in the CC values with the matching methods. In terms of acquisition time, the MI method took a longer time compared to the other methods. To visually check the extracted number of conjugate points and their distributions according to the matching methods, the conjugated points were overlaid with the original and coarse epipolar images ( Figure 11). Since the matching was performed based on the local template, the conjugate points were evenly distributed in all cases.  The results obtained for the across-track stereo dataset in terms of the extracted conjugate points, CC, and acquisition time were similar to those obtained for the along-track stereo dataset (Figure 12). More conjugated points were extracted from the coarse epipolar images by the area-based methods (i.e., PC and MI), whereas fewer conjugated points were extracted from both the original and coarse epipolar images when the feature-based SURF and Harris methods were used. The conjugated points

RPC Bias Compensation with Quasi-GCPs
We generated quasi-GCPs by restituting the conjugate points derived from the image matchings. The light rays of each conjugate point pair from the stereo data did not intersect the single ground location; hence, the most probable ground position was estimated by least square adjustment. Then, the quasi-GCPs were projected back into the image space, and the projected image coordinates were compared with the conjugate point coordinates for computing the reprojection error. The reprojection errors should be small to achieve high geometric consistency for the stereo data. Figures 14-17 show the reprojection errors of the derived conjugate points before RPC adjustment for four stereo datasets: two along-track Yangsan data (Kompsat-3 original stereo data and Kompsat-3 coarse epipolar data) and two across-track Daejeon data (Kompsat-3A original stereo data and Kompsat-3A coarse epipolar data). The plots for each dataset are presented according to the image-matching method and the matching patch sizes ranging from 100 to 1000 pixels. Note that the conjugate points may still include outliers. In Figures 14 and 15, both along-track Kompsat-3 original data and coarse epipolar data show reprojection errors of 5-10 pixels in most cases. For the original data, among the image-matching methods, the Harris approach shows the least stable reprojection error patterns for changes in patch sizes.

RPC Bias Compensation with Quasi-GCPs
We generated quasi-GCPs by restituting the conjugate points derived from the image matchings. The light rays of each conjugate point pair from the stereo data did not intersect the single ground location; hence, the most probable ground position was estimated by least square adjustment. Then, the quasi-GCPs were projected back into the image space, and the projected image coordinates were compared with the conjugate point coordinates for computing the reprojection error. The reprojection errors should be small to achieve high geometric consistency for the stereo data. Figures 14-17 show the reprojection errors of the derived conjugate points before RPC adjustment for four stereo datasets: two along-track Yangsan data (Kompsat-3 original stereo data and Kompsat-3 coarse epipolar data) and two across-track Daejeon data (Kompsat-3A original stereo data and Kompsat-3A coarse epipolar data). The plots for each dataset are presented according to the image-matching method and the matching patch sizes ranging from 100 to 1000 pixels. Note that the conjugate points may still include outliers. In Figures 14 and 15, both along-track Kompsat-3 original data and coarse epipolar data show reprojection errors of 5-10 pixels in most cases. For the original data, among the image-matching methods, the Harris approach shows the least stable reprojection error patterns for changes in patch sizes.  In Figures 16 and 17, the across-track Kompsat-3A original data show much larger reprojection errors than those in along-track Kompsat-3 data. Most cases show errors of 5-25 pixels, but some cases of Harris and SURF methods show much larger errors of more than 50 pixels for probable matching outliers. With regard to the coarse epipolar data, Harris and SURF methods provide unstable results, but MI and PC produce stable error patterns, though the reprojection errors still exceed 10 pixels. Remote Sens. 2020, 12, x FOR PEER REVIEW 20 of 29 In Figures 16 and 17, the across-track Kompsat-3A original data show much larger reprojection errors than those in along-track Kompsat-3 data. Most cases show errors of 5-25 pixels, but some cases of Harris and SURF methods show much larger errors of more than 50 pixels for probable matching outliers. With regard to the coarse epipolar data, Harris and SURF methods provide unstable results, but MI and PC produce stable error patterns, though the reprojection errors still exceed 10 pixels.
We performed RPC-compensation with outlier removal. In Figures 18-21, the terms R-adjusted and L-adjusted, respectively, indicate the reprojection errors of the right and left stereo data after RPC-compensation before outlier removal, and the terms R-filtered and L-filtered indicate the respective errors after outlier removal. In many cases, the outlier removal procedure helped reduce the errors. Harris presents the worst and most unstable results, with errors of up to 2.5 pixels, while patch sizes of 800 and 900 pixels show sub-pixel accuracy after outlier removal. SURF mostly shows sub-pixel errors after outlier removal. MI shows near-sub-pixel-level accuracy with patch sizes larger than 400 pixels. In the case of PC too, a larger patch size helps reduce the error. We performed RPC-compensation with outlier removal. In Figures 18-21, the terms R-adjusted and L-adjusted, respectively, indicate the reprojection errors of the right and left stereo data after RPC-compensation before outlier removal, and the terms R-filtered and L-filtered indicate the respective errors after outlier removal. In many cases, the outlier removal procedure helped reduce the errors. Harris presents the worst and most unstable results, with errors of up to 2.5 pixels, while patch sizes of 800 and 900 pixels show sub-pixel accuracy after outlier removal. SURF mostly shows sub-pixel errors after outlier removal. MI shows near-sub-pixel-level accuracy with patch sizes larger than 400 pixels. In the case of PC too, a larger patch size helps reduce the error. Figure 19 shows the reprojection errors for along-track Kompsat-3 coarse epipolar data. In this case too, Harris has unstable errors with changes in the patch size. The errors in SURF, MI, and PC begin to stabilize at less than the 0.5-pixel level with patch sizes larger than 200 and 400 pixels. The coarse epipolar data clearly have a lower error range and better stability than the original data.
In Figure 20, we show the experimental results for across-track Kompsat-3A stereo data. In most cases, the error range greatly exceeds that for the along-track Kompsat-3 data. Harris still shows an unstable error pattern with changes in the patch size. MI and PC show reprojection errors of 3-7 pixels, and a larger patch size does not seem to help reduce the errors. SURF provided relatively better results with errors of 1-2 pixels, except for 500-and 600-pixel patches. Figure 21 shows the reprojection errors for across-track Kompsat-3A coarse epipolar data. Overall, the errors are much less than those in the original data. Among the four matching methods, MI and PC provide the most stable accuracy with sub-pixel accuracy for patch sizes larger than 200 and 300 pixels. In the case of Harris and SURF, rather unstable reprojection errors were observed. Remote Sens. 2020, 12, x FOR PEER REVIEW 22 of 29

Fine Epipolar Image Resampling and Y-Parallax Analysis
The extracted conjugate points were transformed back to the original image space and used for RPC bias compensation, called relative orientation. In the step of the coarse epipolar image resampling, we modeled a second order polynomial for the transformation from the original image to the resampled image. By computing the inverse transformation of the second order polynomial, we could transform the conjugate points to the original image space. These conjugate points in the original image space are used for bias-compensation of RPCs.
With bias-compensated RPCs, we performed fine epipolar image resampling and analyzed the y-parallax. Figure 22 shows the y-parallax for along-track Kompsat-3 data. For all methods except Harris, the conjugate points from the coarse epipolar images helped improve the y-parallax in the fine epipolar images. MI and PC provided stable sub-pixel y-parallax for patch size larger than 400 pixels. pixels, and a larger patch size does not seem to help reduce the errors. SURF provided relatively better results with errors of 1-2 pixels, except for 500-and 600-pixel patches. Figure 21 shows the reprojection errors for across-track Kompsat-3A coarse epipolar data. Overall, the errors are much less than those in the original data. Among the four matching methods, MI and PC provide the most stable accuracy with sub-pixel accuracy for patch sizes larger than 200 and 300 pixels. In the case of Harris and SURF, rather unstable reprojection errors were observed.

Fine Epipolar Image Resampling and Y-Parallax Analysis
The extracted conjugate points were transformed back to the original image space and used for RPC bias compensation, called relative orientation. In the step of the coarse epipolar image resampling, we modeled a second order polynomial for the transformation from the original image to the resampled image. By computing the inverse transformation of the second order polynomial, we could transform the conjugate points to the original image space. These conjugate points in the original image space are used for bias-compensation of RPCs.
With bias-compensated RPCs, we performed fine epipolar image resampling and analyzed the y-parallax. Figure 22 shows the y-parallax for along-track Kompsat-3 data. For all methods except Harris, the conjugate points from the coarse epipolar images helped improve the y-parallax in the fine epipolar images. MI and PC provided stable sub-pixel y-parallax for patch size larger than 400 pixels.  Figure 23 presents the y-parallax for across-track Kompsat-3A data. Except in the case of Harris and SURF, the conjugate points from coarse epipolar images helped improve the y-parallax in the fine epipolar images. In the case of Harris and SURF methods, a rather unstable y-parallax was observed. MI and PC provided a stable y-parallax of less than 1 pixel for patch sizes larger than 400 pixels.
In terms of the comparison focusing on the datasets, the across-track dataset showed the better improvement compared to the along-track dataset by applying the proposed method. This result was revealed due to the fact that the general matching methods could work properly in the along-track dataset which shows similar properties between the images. In this case, it would be easier to extract reliable conjugate points even applying the general matching method without applying the double-epipolar resampling method. However, in the case of the across-track dataset that is more challenging to extract reliable conjugate points due to dissimilar properties between the images, the general matching methods are likely to fail to extract reliable conjugate points properly. In this case, the double-epipolar resampling can improve the matching performance significantly while reducing the search space as well as removing the rotation and scale differences that the original images involve. Figure 23 presents the y-parallax for across-track Kompsat-3A data. Except in the case of Harris and SURF, the conjugate points from coarse epipolar images helped improve the y-parallax in the fine epipolar images. In the case of Harris and SURF methods, a rather unstable y-parallax was observed. MI and PC provided a stable y-parallax of less than 1 pixel for patch sizes larger than 400 pixels. In terms of the comparison focusing on the datasets, the across-track dataset showed the better improvement compared to the along-track dataset by applying the proposed method. This result was revealed due to the fact that the general matching methods could work properly in the along-track dataset which shows similar properties between the images. In this case, it would be easier to extract reliable conjugate points even applying the general matching method without applying the doubleepipolar resampling method. However, in the case of the across-track dataset that is more challenging to extract reliable conjugate points due to dissimilar properties between the images, the general matching methods are likely to fail to extract reliable conjugate points properly. In this case, the double-epipolar resampling can improve the matching performance significantly while reducing the search space as well as removing the rotation and scale differences that the original images involve. Figure 24 shows the final anaglyph images of both data with magnified views of the samples. They can be stereo-viewed with anaglyph glasses. The y-parallax was minimized compared to that in the coarse epipolar images.

Conclusions
The accurate epipolar resampled images are important for dense matching to reconstruct and display a 3D ground surface. However, RPCs or the sensor models of the stereo data are rather erroneous and not optimal for the reconstruction. In these cases, those errors should be compensated

Conclusions
The accurate epipolar resampled images are important for dense matching to reconstruct and display a 3D ground surface. However, RPCs or the sensor models of the stereo data are rather erroneous and not optimal for the reconstruction. In these cases, those errors should be compensated through an orientation process (or bundle adjustment) which requires conjugate points over the images. Therefore, we proposed a double epipolar resampling approach for the robust conjugate point extraction that enables more accurate sensor modeling. The basic idea is that coarse epipolar images with even inaccurate RPCs show better stereo similarity than the original images, thereby allowing better conjugate point extraction. We tested along-track Kompsat-3 stereo data and across-track Kompsat-3A stereo data using four image-matching methods, namely, Harris, SURF, MI, and PC, and compared their performances between the proposed and traditional approaches. Based on the analysis, we conclude that the conjugate point extraction on coarse epipolar images produce more reliable and accurate results instead of the original image case for all the tested matching methods. The accurately extracted conjugate points enabled the fine epipolar image resampling up to sub-pixel accuracy that was not achievable through the traditional approach. The performance increase was more significant for low similarity data such as across-track data.
We drew the following inferences summarized as follows: 1.
Coarse epipolar images from Kompsat-3/3A data show a large y-parallax of up to 17-25 pixels; this parallax should be minimized for precise 3D mapping.

2.
Coarse epipolar images allow the extraction of a larger number of conjugate points compared to the original images regardless of the matching methods. Moreover, the transformation models constructed from the conjugate points extracted from coarse epipolar images showed better reliability and accuracy than the ones from the original images.

3.
Area-based local-template-matching methods, i.e., PC and MI, tended to extract more conjugate points than the feature-based local-template-matching methods such as SURF and Harris. The distribution of the conjugate points was better in the case of the former methods because of the large number of points.

4.
Given coarse epipolar images, MI and PC with larger patch sizes (>400 pixels) show stable sub-pixel accuracy and y-parallax for along-track and across-track Kompsat-3/3A data.

5.
Between PC and MI, MI provides more stable results, while PC provides faster results. 6.
Harris provided quite unstable results in all cases, but SURF provided mixed results of approximately 1-2 pixels in many cases. 7.
The coarse epipolar image is very helpful for realizing accurate conjugate point extraction. The double epipolar approach showed high effectiveness especially for across-track stereo data for which realizing high stereo consistency is a challenge because of low similarity.
The proposed method showed the advantage of more reliable conjugate point extraction but may have the disadvantage of another epipolar image processing method that requires more processing time. Future studies include further quantitative analysis of similarity increase in the case of the coarse epipolar images compared to the original images. In addition, robust conjugate point extraction methods between heterogeneous data will be investigated.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.