Automated Geo/Co-Registration of Multi-Temporal Very-High-Resolution Imagery

For time-series analysis using very-high-resolution (VHR) multi-temporal satellite images, both accurate georegistration to the map coordinates and subpixel-level co-registration among the images should be conducted. However, applying well-known matching methods, such as scale-invariant feature transform and speeded up robust features for VHR multi-temporal images, has limitations. First, they cannot be used for matching an optical image to heterogeneous non-optical data for georegistration. Second, they produce a local misalignment induced by differences in acquisition conditions, such as acquisition platform stability, the sensor’s off-nadir angle, and relief displacement of the considered scene. Therefore, this study addresses the problem by proposing an automated geo/co-registration framework for full-scene multi-temporal images acquired from a VHR optical satellite sensor. The proposed method comprises two primary steps: (1) a global georegistration process, followed by (2) a fine co-registration process. During the first step, two-dimensional multi-temporal satellite images are matched to three-dimensional topographic maps to assign the map coordinates. During the second step, a local analysis of registration noise pixels extracted between the multi-temporal images that have been mapped to the map coordinates is conducted to extract a large number of well-distributed corresponding points (CPs). The CPs are finally used to construct a non-rigid transformation function that enables minimization of the local misalignment existing among the images. Experiments conducted on five Kompsat-3 full scenes confirmed the effectiveness of the proposed framework, showing that the georegistration performance resulted in an approximately pixel-level accuracy for most of the scenes, and the co-registration performance further improved the results among all combinations of the georegistered Kompsat-3 image pairs by increasing the calculated cross-correlation values.


Introduction
Owing to the frequent accessibility of very-high-resolution (VHR) satellite images, time-series analysis using VHR multi-temporal images has been conducted for a wide range of remote-sensing applications [1][2][3][4][5][6]. For the successful applications using VHR multi-temporal satellite images, accurate georegistration to the map coordinates in approximately pixel-level accuracy should be conducted [7]. Most VHR satellite images provide rational polynomial coefficients (RPCs) that represent the ground-to-image geometry, allowing for photogrammetric processing without requiring a physical sensor model. According to the specifications of the state-of-the-art VHR satellites, the level of RPC accuracy is ±10 pixels at a circular error of 90% (CE90) [8]. To achieve higher accuracy, such as up to a one-pixel level, the bias-compensation of RPCs is often manually conducted utilizing global navigation satellite system (GNSS) survey or accurate reference data. To automate the bias-compensation procedure, it requires corresponding points (CPs) between images together with accurate extraction of three-dimensional (3D) ground control points from reference data. There are many studies focusing on extracting CPs to compensate for the bias involved in the RPCs [8][9][10][11][12][13][14].
In the literature, there are two types of methods for the extraction of CPs: area-based and feature-based [15]. The area-based method uses windows of a predefined size at each image and identifies a peak of similarity between the windows for extracting the CPs. It works well in a situation in which images do not have severe geometric differences in terms of scale, rotation, and translation. In this situation, a search space of the sliding window to identify the similarity peak associated with the CPs can be geometrically limited. The area-based method is likely to fail when the multi-temporal images show significant distortions or large geometric differences. The feature-based method extracts CPs based on representative points on the images; for example, dominant features, line intersections, or centroid pixels of close-boundary regions (i.e., segments). The extracted representative points between images are matched to one another using various descriptors or similarity measures, along with spatial relationships. The feature-based method can be applied even in the case that the images have significant distortions and geometric differences. Thus, the feature-based method is known to be more effective when working with VHR multi-temporal images when compared with the area-based method [16,17]. However, applying well-known feature-based matching methods, such as scale-invariant feature transform (SIFT) [18] and speeded up robust features (SURF) [19], for VHR multi-temporal images also has limitations, because they cannot be used for matching an optical image to heterogeneous non-optical data for georegistration. Moreover, it is not necessary to identify the CPs from the entire image even if a spatial relation between the images (e.g., RPCs or rough transformation coefficients) is known. A limited search space for finding CPs enables their successful extraction. Furthermore, linear features can be an option to perform georegistration, instead of using point-based features. For example, Oh et al. [20] proposed a relative edge cross-correlation (RECC) to extract corresponding linear features between VHR satellite images and airborne lidar data, in order to obtain 3D coordinates of the features exploited for the bias-compensation of RPCs.
Although georegistration based on the bias-compensation of RPCs is conducted, VHR multi-temporal images still show a local misalignment because of dissimilarities in the image acquisition conditions, such as the stability of the acquisition platform, the off-nadir angle of the sensor, and the relief displacement of the considered scene [21]. To solve these problems, a large number of CPs in regions where local misalignments are severe should be detected, after which non-rigid transformation models to mitigate the local distortions should be used [22][23][24][25]. However, obtaining evenly distributed CPs over the entire image is difficult, and thus these models cannot completely solve the problem. Some studies have focused on fine co-registration while minimizing the local misalignment occurring among the georegistered multi-temporal images. An intuitive approach first extracts a few CPs, but ones that are reliable, to coarsely co-register the images, and then achieves a fine co-registration with a large number of CPs extracted using a search space limited from the coarse co-registration step [22,24,25]. In some studies, residual misregistration information occurring at a local level even after co-registration, also referred to as registration noise (RN), is extracted and directly employed in the design of an approach for fine co-registration of VHR multi-temporal images [21,26]. In the case of fine co-registration of VHR images acquired from an unmanned aerial vehicle (UAV) to generate an accurate orthophoto, navigation information derived from GNSS-equipped UAVs is used for constraining the search space to extract a large number of well-distributed CPs [27,28].
Via a related literature review, for successful time-series applications using VHR multi-temporal images, it is important to register them onto the map coordinates (which is related to global georegistration) with a subpixel-level accuracy (which is related to fine co-registration). Therefore, this paper proposes an automated geo/co-registration framework for full-scene images acquired from a VHR optical satellite sensor. The proposed method consists of two primary steps: (1) global georegistration and (2) fine co-registration. During the first step, two-dimensional (2D) multi-temporal satellite images are matched to 3D topographic maps by applying the RECC to accurately assign the map coordinates. Then, dominant RN pixels are extracted between the georegistered images, after which a local analysis of the RN, termed registration noise-based cross-correlation (RNCC), is conducted for extracting a large number of well-distributed CPs. The CPs are finally used to construct a non-rigid transformation function that enables minimization of the local misalignment existing among the images. Experiments conducted on five Kompsat-3 full-scene images confirmed the effectiveness of the proposed framework.
The remainder of this paper is organized as follows. Section 2 introduces the proposed geo/co-registration technique in detail. Section 3 describes the datasets constructed and illustrates the experimental results. Section 4 presents a summary and conclusion.

Methods
The proposed automated geo/co-registration approach consists of two main steps: global georegistration and fine co-registration. During the global georegistration step, layers and digital elevation models (DEMs) are extracted from topographic maps to be used for the bias-compensation of the RPCs. The updated RPCs are then applied to ortho-rectify the VHR satellite images onto the map coordinates. During the fine co-registration step, the dominant RN pixels between the ortho-rectified images are extracted, which are used to identify a large number of well-distributed CPs over the entire overlapping region between the images. A non-linear transformation function estimated by the CPs is finally exploited to apply the fine co-registration. Figure 1 presents a flowchart of the proposed approach. A detailed explanation of each step will be provided in the following sub-sections. images, after which a local analysis of the RN, termed registration noise-based cross-correlation (RNCC), is conducted for extracting a large number of well-distributed CPs. The CPs are finally used to construct a non-rigid transformation function that enables minimization of the local misalignment existing among the images. Experiments conducted on five Kompsat-3 full-scene images confirmed the effectiveness of the proposed framework. The remainder of this paper is organized as follows. Section 2 introduces the proposed geo/coregistration technique in detail. Section 3 describes the datasets constructed and illustrates the experimental results. Section 4 presents a summary and conclusion.

Methods
The proposed automated geo/co-registration approach consists of two main steps: global georegistration and fine co-registration. During the global georegistration step, layers and digital elevation models (DEMs) are extracted from topographic maps to be used for the bias-compensation of the RPCs. The updated RPCs are then applied to ortho-rectify the VHR satellite images onto the map coordinates. During the fine co-registration step, the dominant RN pixels between the orthorectified images are extracted, which are used to identify a large number of well-distributed CPs over the entire overlapping region between the images. A non-linear transformation function estimated by the CPs is finally exploited to apply the fine co-registration. Figure 1 presents a flowchart of the proposed approach. A detailed explanation of each step will be provided in the following subsections.

Global Georegistration
Unlike medium-and low-resolution satellite images, in which 2D georegistration is largely used, VHR images require 3D control points for accurate georegistration via a bias-compensation of the RPCs. Although state-of-the-art VHR satellites show unprecedented georegistration accuracy without any ground control points, reference data are still required for better accuracy up to a onepixel level. Among many reference data, such as LiDAR and aerial images, we utilized topographic maps that contain topographic and man-made features, including streets and buildings, because they are well developed in large-scale detail and periodically updated by government agencies.
One issue is the identification of corresponding locations between VHR images and topographic maps, which is labor-intensive and time-consuming. To automate this task, edge information is extracted from the VHR images and used to match to linear features in maps. Figure 2 shows a flowchart in which multi-temporal VHR images with RPCs and topographic maps are provided. Contours and elevation points in maps enable the creation of DEMs and update elevation information

Global Georegistration
Unlike medium-and low-resolution satellite images, in which 2D georegistration is largely used, VHR images require 3D control points for accurate georegistration via a bias-compensation of the RPCs. Although state-of-the-art VHR satellites show unprecedented georegistration accuracy without any ground control points, reference data are still required for better accuracy up to a one-pixel level. Among many reference data, such as LiDAR and aerial images, we utilized topographic maps that contain topographic and man-made features, including streets and buildings, because they are well developed in large-scale detail and periodically updated by government agencies.
One issue is the identification of corresponding locations between VHR images and topographic maps, which is labor-intensive and time-consuming. To automate this task, edge information is extracted from the VHR images and used to match to linear features in maps. Figure 2 shows a flowchart in which multi-temporal VHR images with RPCs and topographic maps are provided. Contours and elevation points in maps enable the creation of DEMs and update elevation information in the selected layers of roads, land uses, and buildings that were extracted from the topographic maps. The map layers are projected into the image space and rasterized using the given RPCs that need to be bias-compensated. The VHR edge image is generated by applying Canny edge operator to the VHR image. The rasterized map layer is matched to the edge of the VHR image using RECC and the bias over the entire image can be estimated. Finally, the provided RPCs are compensated and used for ortho-rectification of the VHR images. Note that in natural or semi-natural environments where most edges are fuzzy, the applicability can be lower. in the selected layers of roads, land uses, and buildings that were extracted from the topographic maps. The map layers are projected into the image space and rasterized using the given RPCs that need to be bias-compensated. The VHR edge image is generated by applying Canny edge operator to the VHR image. The rasterized map layer is matched to the edge of the VHR image using RECC and the bias over the entire image can be estimated. Finally, the provided RPCs are compensated and used for ortho-rectification of the VHR images. Note that in natural or semi-natural environments where most edges are fuzzy, the applicability can be lower.  Edge matching is conducted using RECC and is expressed in Equation (1) [8]. It computes the number of overlapping edge pixels between the rasterized map layer and the VHR image and divides it by the total number of the edge pixels in both data. As an RECC value is relative depending on the density of edge pixels, is computed as Equation (2) to select reliable matching points. The point with a less than an established threshold is considered a reliable match.
where is a sliding window of the rasterized map layer (size: ) and is a VHR edge image (size: ). is the average distance in pixels from the maximum , to the four largest RECC pixels. The bias-compensated RPCs model is expressed as Equation (3)   Edge matching is conducted using RECC and is expressed in Equation (1) [8]. It computes the number of overlapping edge pixels between the rasterized map layer and the VHR image and divides it by the total number of the edge pixels in both data. As an RECC value is relative depending on the density of edge pixels, CV 4 is computed as Equation (2) to select reliable matching points. The point with a CV 4 less than an established threshold is considered a reliable match.
where L is a sliding window of the rasterized map layer (size: w × w) and R is a VHR edge image (size: (w + bu f f er) × (w + bu f f er)). CV 4 is the average distance in pixels from the maximum (r max , c max ) to the four largest RECC pixels. The bias-compensated RPCs model is expressed as Equation (3) by incorporating shift and drift parameters (A 0 , A 1 , . . . , B 2 ) into the RPC equation. The parameters are estimated using the RECC matching results over the entire VHR image region as follows: Sensors 2018, 18, 1599 where a, b, c, d are RPC coefficients; (φ, λ, h) are the geodetic latitude, longitude, and ellipsoidal height, respectively; (l, s) are the image row and column coordinates, respectively; (U, V, W) are the normalized ground coordinates; are the offset and scale factors, respectively, for the latitude, longitude, height, column, and row; and A 0 , A 1 , . . . , B 2 are for an affine transformation.

RN-Based Fine Co-Registration
The globally georegistered multi-temporal satellite images still have local misalignments between the data. Fine co-registration should thus be applied to minimize such misalignments for achieving accurate results of time-series analysis. To directly address this problem, as previously mentioned, RN is extracted to be exploited for applying the fine co-registration.

RN Extraction
RN mainly occurs along boundaries between objects, meaning that RN has a high probability of being detected in high-frequency content regions. Therefore, we use an edge magnitude image constructed by the difference in the Gaussian (DoG) filter to extract the dominant RN pixels [29]. Let us assume that X h is the edge magnitude image, which is defined as Equation (4) as follows: where G σ is the Gaussian filter with standard deviation σ and k is a constant multiplicative factor. The values of σ and k control the thickness and smoothness of the edges in the image. B is the number of corresponding spectral bands between the images. There are two conditions to satisfy when extracting the RN on the edge magnitude images. The first condition is that the RN occurs in the surroundings of the dominant edges in both images. Pixels along and near the dominant edges have larger edge magnitude values, and the values decrease moving toward homogeneous regions. Thus, candidate RN pixels are those having high edge magnitude in both images, as in Equation (5) as follows: where α is a normalization parameter that controls the balance of the edge magnitude between the two images. It can be derived as α = σ(X h re f )/σ X h sen , where σ(X h re f ) and σ X h sen are standard deviations of each edge magnitude image and T 1 is a threshold value.
In the case that object boundaries have strong edge magnitude in both images, candidate RN pixels, estimated using Equation (5), can be (i) precisely aligned border regions of objects or (ii) RN pixels. To discriminate between the two cases, the difference between the two edge magnitude images X h re f and X h sen is considered. The edge magnitude difference tends to be small for precisely aligned edges and increases as the level of misalignment increases. This is because high edge magnitude regions (i.e., dominant edges) in one image are compared with lower edge magnitude regions (i.e., regions in the neighborhood of dominant edges) in the other image. Accordingly, the second condition RN 2 is denoted as Equation (6) where T 2 is a threshold value that controls the sensitivity to the amount of misregistration. α = σ(X h re f )/σ X h sen is the normalization factor. The final RN map is identified as simultaneously satisfying Equations (5) and (6).

Extraction of CPs for the Reference Image
Theoretically, a large number of CPs should be extracted in regions where the misalignments are severe, in order to minimize such misalignments. In contrast, less CPs are sufficient for well-aligned regions. Based on this concept, local regions associated with the CPs distribution of the reference image are defined. Here, we use a quadtree-based segmentation approach to construct local regions by decomposing the reference image recursively into four equally sized regions. The approach begins with the entire reference image as a single region, after which the image is divided into four sub-regions according to a splitting criterion. Considering the number of extracted RN pixels in a local region compared with those in the entire image, the splitting criterion Q(s) can be defined as Equation (7) as follows: where X re f is the entire reference image, whose size is I × J; and (r, c) and (i, j) are the pixels in a given local region, whose size is w r × w c , and the reference image X re f , respectively. According to Equation (7), the left and right terms of the numerator denote the number of the RN pixels in the given local region and X re f , and those of the denominator denote the number of all pixels in the given local region and X re f , respectively. If the condition Q(s) is satisfied, the corresponding region is subdivided into four equally sized quadrant regions. This process proceeds until the region reaches the predefined minimal region size. From Equation (7), regions having a relatively large number of RN pixels, when compared with the number of RN pixels extracted over the entire image, are recursively subdivided. After finishing the segmentation, centroids of each sub-region are selected as CPs for the reference image. The CPs for the sensed image are at a position according to the consideration of the locally extracted RN amounts, as explained in the following sub-section.

RN-Based Image Matching
RN-based image matching is performed by computing the number of extracted RN pixels within a locally defined region via a quadtree-based segmentation approach. To this end, we propose the RNCC approach, denoted as Equation (8) as follows: where RN r,c is whether an RN pixel is extracted, or not, at a pixel (r, c) in a given segment, whose size is w r × w c . The RNCC within a predefined search space is calculated, and the pixel position (r min , c min ) where the extracted RN pixels are at a minimum is determined as the position of the CPs for the sensed image. Let us assume that a number of S segments are constructed over the entire overlapping region between the multi-temporal images. For each CP c s 1 (s = 1, . . . , S) in the reference image, a location of CP c s 2 (s = 1, . . . , S) in the sensed image is determined using Equation (8) accordingly. After extracting all of the CPs from each segment, we employ a piecewise linear function, which is known to be appropriate for mitigating local distortions between VHR images. The piecewise linear function divides X re f into triangular regions using Delaunay's triangulation algorithm, using CPs as the vertexes of the triangles. When the CPs in X re f are triangulated, the correspondences in X sen are triangulated accordingly. Then, each triangulated region in X sen is mapped to the corresponding region of X re f via affine transformation.

Dataset Construction
Test data are constructed from five Kompsat-3 images, each of which covers approximately 16.8 × 15 km over Daejeon in South Korea with a 0.7-m spatial resolution. The processing level is 1R and is radiometrically corrected. Kompsat-3 images have different incidence and azimuth angles that might cause geometric misalignments among them. The test scenes include various land covers, such as urban, agriculture, mountains, etc. First, the georegistration of the Kompsat-3 images to the map coordinates is conducted, and then all of the possible combinations of the georegistered Kompsat-3 pairs are considered to apply the fine co-registration approach. The detailed specifications of the scene are listed in Table 1.

Global Georegistration Results
For reference data, 1:5000 scale topographic maps were downloaded from the official website of the National Geographic Information Institute of Korea. Among the eight different map layer categories, we selected the road and land use layers for matching purposes, while the contour layers were extracted for DEM generation. Figure 3 shows an example of the selected map layers on the generated DEM.

Dataset Construction
Test data are constructed from five Kompsat-3 images, each of which covers approximately 16.8 × 15 km over Daejeon in South Korea with a 0.7-m spatial resolution. The processing level is 1R and is radiometrically corrected. Kompsat-3 images have different incidence and azimuth angles that might cause geometric misalignments among them. The test scenes include various land covers, such as urban, agriculture, mountains, etc. First, the georegistration of the Kompsat-3 images to the map coordinates is conducted, and then all of the possible combinations of the georegistered Kompsat-3 pairs are considered to apply the fine co-registration approach. The detailed specifications of the scene are listed in Table 1.

Global Georegistration Results
For reference data, 1:5000 scale topographic maps were downloaded from the official website of the National Geographic Information Institute of Korea. Among the eight different map layer categories, we selected the road and land use layers for matching purposes, while the contour layers were extracted for DEM generation. Figure 3 shows an example of the selected map layers on the generated DEM. The selected map layers, in vector form, are projected into a VHR image space and rasterized as one sample window of 500 × 500 pixels, as shown in Figure 4a. The window slides over the image region in Figure 4b to compute RECC and for the matched location. The marker and the box in Figure 4b show the location and the window boundary.  The selected map layers, in vector form, are projected into a VHR image space and rasterized as one sample window of 500 × 500 pixels, as shown in Figure 4a. The window slides over the image region in Figure 4b to compute RECC and CV 4 for the matched location. The marker and the box in Figure 4b show the location and the window boundary.  RECC window matching is conducted for the entire VHR image region, except for the very large black region that is missing because of a security issue. Figure 5 shows the distribution of the matched locations. The matched points are distributed over the entire image space, while the lack of features in some image regions, such as high mountains, hindered the rich matching point extraction. Note that the local geometric errors are estimated at the matched locations and used for the biascompensation modeling. Matching outliers are statistically detected during the bias-compensation model estimation and are marked as 'X' in the figure. The number of detected outliers is listed in Table 2. Table 2 also shows the bias-compensation precision before and after outlier removal. The matched points are used to model the affine model (Equation (3)) and the precision (fitness to the model) was at an approximately one-pixel level. RECC window matching is conducted for the entire VHR image region, except for the very large black region that is missing because of a security issue. Figure 5 shows the distribution of the matched locations. The matched points are distributed over the entire image space, while the lack of features in some image regions, such as high mountains, hindered the rich matching point extraction. Note that the local geometric errors are estimated at the matched locations and used for the bias-compensation modeling. Matching outliers are statistically detected during the bias-compensation model estimation and are marked as 'X' in the figure. The number of detected outliers is listed in Table 2. Table 2 also shows the bias-compensation precision before and after outlier removal. The matched points are used to model the affine model (Equation (3)) and the precision (fitness to the model) was at an approximately one-pixel level. Table 2. Rational polynomial coefficients (RPC) bias-compensation precision before and after outlier removal.

Scene ID Number of Outliers [Points] Bias Precision before Outlier Removal (col/row) [Pixels] Bias Precision after Outlier Removal (col/row) [Pixels]
K3 that the local geometric errors are estimated at the matched locations and used for the biascompensation modeling. Matching outliers are statistically detected during the bias-compensation model estimation and are marked as 'X' in the figure. The number of detected outliers is listed in Table 2. Table 2 also shows the bias-compensation precision before and after outlier removal. The matched points are used to model the affine model (Equation (3)) and the precision (fitness to the model) was at an approximately one-pixel level.     Figure 6 shows the overlay of map layers with the VHR image before and after the global georegistration in which the accuracy improvement can be observed.

Fine Co-Registration Results
As the RN map derived from the VHR multi-temporal images will include noise pixels because of the spatial heterogeneity in the pixels, we first generate pyramid images that resample the image at a reduced size. Applying the RN extraction approach on the pyramid images allows for extraction of reliable RN pixels, as well as improvement of efficiency. The pyramid images were reduced by four times when compared with the original images. The values of and that control the thickness of the edges were set to 1.6 and 2, respectively, which are the values recommended in Han et al. [29]. Two thresholds, and , that are related to the RN extraction were automatically selected by applying an expectation maximization method [30]. The minimal and maximal sizes of the quadtree structure for the segmentation were set to 256 and 1024, respectively. The size of the searching space for finding the CPs for the sensed image was set to a range within 12 pixels (i.e.,

Fine Co-Registration Results
As the RN map derived from the VHR multi-temporal images will include noise pixels because of the spatial heterogeneity in the pixels, we first generate pyramid images that resample the image at a reduced size. Applying the RN extraction approach on the pyramid images allows for extraction of reliable RN pixels, as well as improvement of efficiency. The pyramid images were reduced by four times when compared with the original images. The values of σ and k that control the thickness of the edges were set to 1.6 and 2, respectively, which are the values recommended in Han et al. [29]. Two thresholds, T 1 and T 2 , that are related to the RN extraction were automatically selected by applying an expectation maximization method [30]. The minimal and maximal sizes of the quadtree structure for the segmentation were set to 256 and 1024, respectively. The size of the searching space for finding the CPs for the sensed image was set to a range within 12 pixels (i.e., approximately 8.4 m, according to the spatial resolution of Kompsat-3), which is a relatively limited size owing to the impact of the global georegistration results.
An example of the extracted RN is presented in Figure 7 with a corresponding region over the georegistered multi-temporal images. Figure 7a shows the multi-temporal images with false-color composition of the reference (K3-1) and sensed (K3-2) images. The figure shows a local misalignment, which is expressed by the red color along the boundaries of objects. The extracted RN pixels, which are represented by black pixels in Figure 7b, have a tendency to correspond properly with the local misalignment occurring between the multi-temporal images. For visual assessment, georegistration and co-registration results were compared as shown in Figure 9. The inside rectangles are captured from the reference image (K3-1), and the outside rectangles are captured from the sensed image (K3-2), transformed by the RECC-based georegistration (Figure 9a) and the RNCC-based co-registration (Figure 9b), respectively. The co- The location of the CPs is determined by RNCC. Because of the limited length of this paper, only the extracted CPs from the K3-1 and K3-2 image pair are presented in Figure 8. The CPs are marked as red points with constructed corresponding segments, which are represented as white rectangles in Figure 8a. As one can see, the extracted CPs were evenly distributed over the entire scene, and a particularly larger number of CPs was extracted in built-up regions where a local misalignment was severe in general. Location differences between the CPs extracted from the reference and sensed images are indicated by arrows in Figure 8b.
For visual assessment, georegistration and co-registration results were compared as shown in Figure 9. The inside rectangles are captured from the reference image (K3-1), and the outside rectangles are captured from the sensed image (K3-2), transformed by the RECC-based georegistration (Figure 9a) and the RNCC-based co-registration (Figure 9b), respectively. The co-registration performance can thus be compared while checking the boundaries of the two rectangles (which are highlighted in cyan) and whether the lines or objects along the boundaries are properly aligned. As one can see, applying the co-registration approach improves the alignment when compared with those in which it is not applied. To evaluate the co-registration performance from a time-series point of view, a visual inspection regarding the co-registration results between the multi-temporal image pairs was conducted, and some of the same regions in Figure 9 are presented in Figure 10 (i.e., multi-temporal image pairs between K3-1 and K3-3, K3-1 and K3-4, and K3-1 and K3-5 are presented in Figure 10a-c, respectively). Regardless of the image pairs, they showed reliable geometric alignments along the rectangles. For visual assessment, georegistration and co-registration results were compared as shown in Figure 9. The inside rectangles are captured from the reference image (K3-1), and the outside rectangles are captured from the sensed image (K3-2), transformed by the RECC-based georegistration (Figure 9a) and the RNCC-based co-registration (Figure 9b), respectively. The coregistration performance can thus be compared while checking the boundaries of the two rectangles (which are highlighted in cyan) and whether the lines or objects along the boundaries are properly aligned. As one can see, applying the co-registration approach improves the alignment when compared with those in which it is not applied. To evaluate the co-registration performance from a time-series point of view, a visual inspection regarding the co-registration results between the multitemporal image pairs was conducted, and some of the same regions in Figure 9 are presented in  Figure 10a-c, respectively). Regardless of the image pairs, they showed reliable geometric alignments along the rectangles.
To perform a quantitative analysis to confirm the superiority of the co-registration performance, correlation coefficient (CC) values were calculated between the reference and sensed images. The CC values were calculated from all combinations of the tested image pairs. The calculated results are presented in Table 3, in which one can see that the proposed co-registration approach effectively improved the CC values in all cases because of the properly constructed piecewise linear function with a large number of CPs. A student t-test for the CC improvement between RECC and RNCC was performed. We confirmed that the null hypothesis was rejected at a 99% confidence level, which means that the level of improvement is statistically significant.      To perform a quantitative analysis to confirm the superiority of the co-registration performance, correlation coefficient (CC) values were calculated between the reference and sensed images. The CC values were calculated from all combinations of the tested image pairs. The calculated results are presented in Table 3, in which one can see that the proposed co-registration approach effectively improved the CC values in all cases because of the properly constructed piecewise linear function with a large number of CPs. A student t-test for the CC improvement between RECC and RNCC was performed. We confirmed that the null hypothesis was rejected at a 99% confidence level, which means that the level of improvement is statistically significant. Table 3. Accuracy assessment of georegistration and co-registration results. RECC-relative edge cross-correlation; RNCC-registration noise-based cross-correlation.

Conclusions
This study proposed an automated geo/co-registration framework for full-scene images acquired from a VHR optical satellite sensor. The proposed method first globally georegistered the images based on RECC with a reference map, and then a fine co-registration process was conducted based on RNCC between the georegistered images. Experiments on georegistration implemented using five Kompsat-3 full scenes showed RPC bias-compensation precision of approximately one-pixel level. Furthermore, experiments regarding co-registration could statistically show significant improvement in geometric alignment while reducing local misalignments between all combinations of multi-temporal image pairs. In a future study, VHR images acquired from multi-sensors, such as GeoEye and WorldView, will be tested to prove the robustness of the proposed approach.