Improved Piecewise Linear Transformation for Precise Warping of Very-High-Resolution Remote Sensing Images

: A large number of evenly distributed conjugate points (CPs) in entirely overlapping regions of the images are required to achieve successful co-registration between very-high-resolution (VHR) remote sensing images. The CPs are then used to construct a non-linear transformation model that locally warps a sensed image to a reference image’s coordinates. Piecewise linear (PL) transformation is largely exploited for warping VHR images because of its superior performance as compared to the other methods. The PL transformation constructs triangular regions on a sensed image from the CPs by applying the Delaunay algorithm, after which the corresponding triangular regions in a reference image are constructed using the same CPs on the image. Each corresponding region in the sensed image is then locally warped to the regions of the reference image through an a ﬃ ne transformation estimated from the CPs on the triangle vertices. The warping performance of the PL transformation shows reliable results, particularly in regions inside the triangles, i.e., within the convex hulls. However, the regions outside the triangles, which are warped when the extrapolated boundary planes are extended using CPs located close to the regions, incur severe geometric distortion. In this study, we propose an e ﬀ ective approach that focuses on the improvement of the warping performance of the PL transformation over the external area of the triangles. Accordingly, the proposed improved piecewise linear (IPL) transformation uses additional pseudo-CPs intentionally extracted from positions on the boundary of the sensed image. The corresponding pseudo-CPs on the reference image are determined by estimating the a ﬃ ne transformation from CPs located close to the pseudo-CPs. The latter are simultaneously used with the former to construct the triangular regions, which are enlarged accordingly. Experiments on both simulated and real datasets, constructed from Worldview-3 and Kompsat-3A satellite images, were conducted to validate the e ﬀ ectiveness of the proposed IPL transformation. That transformation was shown to outperform the existing linear / non-linear transformation models such as an a ﬃ ne, third and fourth polynomials, local weighted mean, and PL. Moreover, we demonstrated that the IPL transformation improved the warping performance over the PL transformation outside the triangular regions by increasing the correlation coe ﬃ cient values from 0.259 to 0.304, 0.603 to 0.657, and 0.180 to 0.338 in the ﬁrst, second, and third real datasets, respectively.


Introduction
The development and launch of satellites equipped with very-high-resolution (VHR) optical sensors has allowed researchers in the remote sensing field to exploit VHR multi-temporal images for a wide range of applications [1][2][3]. VHR multi-temporal images are easily acquired by unmanned aerial vehicles mounted on cameras or optical sensors while capturing images [4][5][6][7][8][9]. Image co-registration, the process of geometrically overlaying images acquired over the same areas at different times, should be carried out to exploit the images for time-series remote sensing applications [6][7][8][9][10].
A large number of evenly distributed conjugate points (CPs) over the entire overlapping region of images are required to achieve successful co-registration for the multi-temporal VHR images [11]. The CPs are then used to construct a non-linear transformation model that can minimize the local distortion between images. The sensed image is warped to the geometric location of the reference image through non-linear transformation models such as thin-plate spline, multiquadric, local weighted mean (LWM), and piecewise linear (PL) used for VHR image warping [12,13]. Among these, PL transformation is widely used because of its superior performance as compared to the other warping methods [12,14,15].
PL transformation first constructs triangular regions by applying the Delaunay algorithm to extract CPs from the sensed image, after which the corresponding triangular regions are accordingly constructed in the reference image by using the same CPs. Each triangular region in the sensed image is then locally warped to the region of the corresponding reference image via an affine transformation estimated by the CPs located on the triangle vertices. The PL transformation is well-known for its precise and reliable warping performance where the triangles are constructed, i.e., within the convex hulls of the triangular regions. However, regions outside the convex hulls in which the warping transformation is constructed through extrapolation by extending the boundary triangle planes incur severe geometric distortion. Figure 1 shows an example of this phenomenon resulting from a PL transformation. The image presented in Figure 1a is warped by the PL transformation in Figure 1b; the triangular regions are constructed through the CPs connected with white lines, and as one can see from the red circle in the warped image, severe distortion outside the triangular regions has occurred, particularly in the lower part of the image. These geometric distortions in the overlapping regions make it less feasible to use VHR images for time-series applications. The way to minimize such distortions caused by the warping process is to detect CPs along the image boundary. However, this is a difficult task when we apply well-known feature-based CP extraction approaches, e.g., scale-invariant feature transform (SIFT) [16] and speeded-up robust features (SURF) [17], which define features and their descriptors by considering the pixels near the features' center pixel. Therefore, features and CPs (portions of features that are matched to the features in another image) are extracted while theoretically excluding the region along the image boundary, which implies that the distortions in such regions are inevitable when applying the PL transformation. Therefore, most of the studies that use PL transformation for warping have considered the warping result only from the region inside the convex hulls [18][19][20][21][22]. Research on improving PL transformation by revising the triangular construction algorithm [14] or by combining it with other warping transformations [11,23] has been conducted, but these approaches have not offered a fundamental solution to the intrinsic problem of PL transformation. Therefore, research focusing on minimizing the distortion occurring outside the image boundary is necessary to effectively exploit PL transformation.
In this paper, we propose a simple but effective approach that focuses on the improvement of the warping performance of PL transformation outside of the triangular regions constructed with the CPs. To this end, the proposed improved piecewise linear (IPL) transformation uses CPs extracted using an ordinal matching point extraction approach (e.g., SIFT and SURF) together with additional CPs, called pseudo-CPs, which are intentionally extracted from positions outside the image boundary of the sensed image. The corresponding pseudo-CPs in the reference image are determined by estimating the affine transformation close to the located CPs. Accordingly, the triangular regions are enlarged while both the CPs and the pseudo-CPs are simultaneously exploited for the PL transformation-based warping. We analyzed the warping performance of the IPL transformation by changing two parameters: The extracted number of pseudo-CPs along the boundary and the number of CPs closest to the pseudo-CPs to be used for the location estimation of the pseudo-CPs on the reference image. Experiments using a simulated dataset were implemented to find the optimal range of the parameter values. Real datasets were then exploited to verify the effectiveness of the proposed IPL transformation.
The main contributions of this paper are as follows: First, as the proposed method focuses on improving the warping performance over the PL transformation, any feature-based or area-based method can be used to extract CPs. Moreover, the method can be applied in VHR image pairs irrespective of the acquisition environments such as the scene's size and the land-cover type. Second, the warping performance of the PL transformation, widely used for VHR image co-registration owing to its reliable performance as compared to the other non-rigid transformation models, is maintained in the triangular region constructed from the CPs. The pseudo-CPs are also extracted to enlarge the triangular region. Third, the proposed IPL transformation is evaluated using simulated and real datasets to investigate the reliable range of parameter values used in the IPL transformation as well as to confirm the robustness of the warping performance under various conditions.

Methodology
In this section, we describe the PL transformation procedure with its inherent drawback, after which the concept and the procedure of the proposed IPL transformation are explained.

Piecewise Linear (PL) Transformation
The procedure for the PL transformation can be summarized as follows [12]:

1.
Determine the triangulation of CPs in one image by using the Delaunay triangulation method. The triangulation of the CPs in the other image is accordingly obtained from the corresponding CPs in the images.

2.
Determine the affine transformation T that registers the two triangles for each pair of corresponding triangles in the images. Thus, the triangular regions inside the convex hulls of the CPs in the two images are registered.

3.
Determine the transformation for mapping the points outside the convex hulls by extending the boundary triangle planes and then use this plane to extrapolate the points between the convex hull and the image border. Two planes belonging to two neighboring boundary triangles intersect at a line, the projection of which in the image plane separates the points outside the convex hull and determines the triangle plane to which a point outside the convex hull should belong in the extrapolation process.
Because of the extrapolation process in step (3), severe geometric distortion can occur outside the convex hulls, as shown in Figure 1. The proposed IPL transformation focusing on improving the warping result of such regions is described in the following sub-section.

Improved Piecewise Linear (IPL) Transformation
The proposed IPL transformation focuses on the improvement of the warping performance outside the triangular regions. Assume that the number of M CPs ({C m sen , C m re f }, ∀m = 1, . . . , M) is extracted by using any feature-based matching technique (such as SURF). To determine the IPL transformation, N more CPs (i.e., pseudo-CPs), each of which is extracted along the boundary of the sensed image with the same interval, are necessary. To find the corresponding CPs in the reference image, K CPs closest to the pseudo-CPs extracted along the boundary position of the sensed image are selected. These are then used to estimate affine transformation T by the least-squares method, after which the position of the corresponding pseudo-CPs in the reference image is determined by applying T. Finally, the additional N pseudo-CPs are used together with the original M CPs to construct the PL transformation with an extensive triangular region. Thus, well-constructed triangular regions over the entire sensed image can be produced from the intended additional selection of the pseudo-CPs. The sequence of tasks for the proposed IPL transformation is shown in Figure 2, and the formula for the IPL transformation is shown in Algorithm 1.

Algorithm 1 Improved Piecewise Linear Transformation
Define pseudo-CPs C M+n sen located along the boundary of the sensed image with the same interval To find the optimal range of parameter values, we conducted a sensitivity analysis of the two parameters while changing the parameters N and K. This experiment was carried out on a simulated dataset composed of the same reference and sensed images in which geometric distortions were deliberately included. Because the two images should be perfectly identical if all the geometric distortions are removed, the correlation coefficient (CC) values were calculated to check the performance of the IPL transformation according to the parameter values.

Dataset Construction
We constructed both simulated and real datasets to prove the effectiveness of the proposed IPL transformation. The reference and sensed images of the simulated dataset were basically constructed from the same image acquired by a Worldview-3 multispectral sensor with a spatial resolution of 1.2 m. The reference image with a size of 4096 × 4096 pixels was acquired over Seoul, the capital of South Korea, on 12 February 2015. The image includes diverse land covers such as mountains, buildings, roads, and rivers. The sensed image was generated from this image but included non-linear geometric distortions to clearly check the warping performance according to the parameter values in the IPL transformation. We conducted experiments while varying the level of geometric distortion; the warping results had a similar tendency irrespective of the distortion amount. Here, we put the results from the sensed image obtained with a distortion of sinusoidal deformation in the positive horizontal direction with a 50-pixel amplitude and a 20 • period, and in the negative vertical direction with a 30-pixel amplitude and a 10 • period, only.
Three sites were constructed for the real dataset by using images acquired from Worldview-3 multispectral (site 1 and site 2) and Kompsat-3A panchromatic (site 3) sensors. The first site was the same location as in the simulated dataset, and the reference image of the first dataset was the same as the simulated dataset. The sensed image (3389 × 3406 pixels) was acquired at a spatial resolution of 1.6 m (different from the reference image) on 5 February 2018. The reference and sensed images for the second site covering built-up areas, mountains, rivers, bare soils, etc., were acquired over Gwangju, South Korea. The reference and sensed images consisting of 5833 × 5330 pixels and a spatial resolution of 1.2 m were acquired on 26 May 2017 and 4 May 2018, respectively. The reference and sensed images for the third site were constructed using Kompsat-3A panchromatic imagery with a spatial resolution of 0.55 m. The site acquired over Daejeon, South Korea, is composed primarily of built-up areas. The reference and sensed images were acquired on 28 October 2015 and 2 January 2019, respectively. The size of both the images was 3000 × 3000 pixels. Note that all the used images were geometrically corrected with a coarse digital elevation model (i.e., processing level 2A for Worldview-3 and 1G for Kompsat-3A), which implied that they had local geometric distortions as well as relative location displacements. The constructed datasets are shown in Figure 3, and their specific information is provided in Table 1.

Results with Simulated Dataset
The main purpose of the experiment implemented on the simulated dataset was to define an optimal range of the important two parameters used in the IPL transformation, i.e., the number of additional pseudo-CPs (N) extracted along the image boundary with the same interval and the number of CPs (K) closest to the pseudo-CPs used for determining the location of the pseudo-CPs in the reference image. We conducted the proposed IPL transformation while changing the parameter values. To this end, the SURF-based CP extraction approach was used because of its efficiency [17], and an outlier removal process by random sample consensus (RANSAC) was conducted. A threshold ratio between the first and the second closest distance for SURF matching was set as 0.02. This value was lower than the general range of the values (i.e., 0.6 according to the suggestion in [17]); therefore, only CPs having similar description vectors to the others were extracted. The reason for setting the strict threshold was that the reference and sensed images of the simulated dataset were constructed from the same image in which many correctly and falsely matched CPs were extracted with the general threshold value. The CP extraction process was applied to the red bands on which the seasonal dissimilarities of the radiometric properties could be minimized as compared to the other bands [24]. The sensed image was warped by the constructed transformation through the CPs with a bilinear resampling method. For the accuracy assessment, the CC value between the reference and the warped sensed images was calculated. CC estimated the covariance-based similarity between two images calculated using the following equation: where σ I ref I sen denotes the covariance between the reference and the sensed images, and σ I ref and σ I sen are the respective standard deviations associated with the images. In the case of the simulated dataset, the CC value should be one when the geometric distortions are perfectly removed through the warping process, as the reference and sensed images are originally the same. A higher value means a better warping result. The results of the SURF-based matching revealed that 1654 CPs finally remained after removing the outliers. Figure 4a shows the warping results while changing the number of the nearest CPs for the pseudo-CP location estimation (i.e., K). When N = 4, the accuracy was lower than in the other cases irrespective of the number of the nearest CPs (K). When N was too large, as shown in the graph with the dashed line in Figure 4a (N = 28), the results seemed to be unstable. Without regard to the number of the nearest points, CC had stably higher values when N = 16. In the test for the number of pseudo-CPs, illustrated in Figure 4b, the accuracy slightly improved with an increase in N. However, the improvement was not significant when N reached 8. Moreover, unstable results were observed when N > 20. When K = 7, the results had a tendency to be stable irrespective of the number of CPs. According to the sensitivity analysis, we selected the optimal parameter values of N and K as 16 and 7, respectively.

Results with Real Dataset
The aim of conducting the warping on the real dataset was to confirm the effectiveness of the proposed approach on multi-temporal image pairs captured under different acquisition conditions. The ratio of the first and the second distance for the SURF-matching was the generally recommended value of 0.6 [17]. Outliers were removed by calculating root mean squares errors (RMSEs) of the CPs relating the estimated affine transformation and eliminating CPs with RMSE values larger than 10. The sensed and reference images of the first real dataset had different spatial resolutions (i.e., 1.2 m and 1.6 m of the reference and sensed images, respectively) according to the acquisition environment, although they were both captured by the Worldview-3 multispectral sensor. Moreover, on the basis of the off-nadir angles (14.9 • and 37.9 • of the reference and sensed images, respectively) and the azimuth angles (234.3 • and 140.9 • of the reference and sensed images, respectively) of the images, the built-up areas characterized by tall buildings exhibited different magnitudes and directions of relief displacement. Therefore, general feature-based approaches such as SIFT and SURF could not guarantee the extraction of large numbers of evenly distributed CPs as compared to the simulated dataset constructed from the same image. In this case, the application of a non-linear transformation such as PL and LWM was more likely to incur severe distortions over regions where a sufficient number of CPs were not extracted. By applying the SURF-based matching method with outlier elimination, we extracted 50 CPs by using the real dataset ( Figure 5a). As expected, this represented fewer CPs than the simulated dataset. Again, the same parameter values as those used for the simulated dataset warping were set to warp the sensed image to the reference image's coordinate system. The constructed triangular regions created with the PL and IPL methods overlaid with the sensed image are compared in Figure 6. As we can see from the boundary of the images in Figure 6a) (PL transformation) and 6b (IPL transformation), the IPL transformation could effectively construct the region even along the boundary.
In the case of the second real dataset, 1102 CPs were extracted by applying the SURF-based method with the same parameter values (Figure 5b). This large number of CPs as compared to the first dataset was attributed to the fact that the captured seasons of the reference and sensed images of the second dataset were quite similar. The constructed results of the triangular region using the PL and IPL transformations are also displayed in Figure 6a,b, respectively. Because of the sufficient number of extracted CPs, the constructed triangular regions based on both PL and IPL transformations were sufficiently large to cover all the sensed images.
The third real dataset was constructed from Kompsat-3A panchromatic images with a higher spatial resolution (5.5 m) than that of the Worldview-3 multispectral images (1.2-1.6 m) used for the construction of the first and the second real datasets. The site covered primarily built-up areas with similar objects such as buildings and roads. Because of the similar descriptors used for the CP matching, these objects make it difficult to find a sufficient number of reliable CPs. Accordingly, 84 CPs, which were insufficient to construct a reliable non-rigid transformation, were extracted using the SURF-based method (Figure 5c). The relatively wide portions of regions not triangulated using the PL transformation (Figure 6a) were effectively constructed using the proposed IPL transformation (Figure 6b).  The results of warping by the IPL transformation are presented in Figure 7, in which a portion of blocks from the reference and the warped sensed images are repeatedly attached. To visually emphasize the warping results, the reference image is displayed with a false color composition (i.e., blue, green, and red bands are allocated to the RGB channels, respectively), and the sensed image is displayed with a true color composition (i.e., red, green, and blue bands are allocated to the RGB channels, respectively). As we can see from the boundary of the blocks where lines and objects align correctly, co-registration was reliably conducted by the proposed IPL transformation. For a visual comparison of the warping results by the PL and IPL transformations, some parts of the image boundary are magnified in Figure 8. In the case of the results after warping, severe distortion occurred in some regions after PL transformation (the red circles in Figure 8a), which are not present after the proposed IPL transformation (the red circles in Figure 8b).

Discussion
IPL-based warping was conducted on the basis of the predetermined parameter values, and the accuracy was assessed by calculating the CC value. For the comparative analysis, the sensed image was warped using the existing linear/non-linear transformation models that are used primarily for remote sensing image warping: Affine, third and fourth polynomials, LWM, and PL. For the application of the LWM, the number of points in the local weighted mean calculation was set at 20. The accuracy of warping on the simulated dataset is reported in Table 2. The proposed IPL method achieved the highest accuracy of 0.975. In some regions where CPs were not extracted, we observed that the LWM transformation could not properly warp the image. This resulted in the lowest accuracy. For the real datasets, the CC values were also calculated on the basis of the warping transformations and are shown in Table 3. The absolute CC values were generally lower than those in the simulated dataset. Because of the dissimilarity of the acquisition environment, the radiometric properties between the reference and the sensed images in the real datasets differed. However, similar accuracy patterns were derived with the simulated dataset. Because of the fact that the relatively few CPs (i.e., 50 CPs) could not properly construct the non-linear transformation models to correct the local geometric distortions, the non-linear transformation models (i.e., polynomial, LWM, PL, and IPL transformations) could not attain the significantly higher warping results over the affine-based linear transformation. However, the third polynomial and the proposed IPL transformations could achieve better CC values than the value acquired by the affine transformation. Comparatively, non-linear transformation models could achieve better accuracy than the affine transformation in the second dataset from which a large number of CPs were extracted (i.e., 1102 CPs). Among them, the IPL transformation achieved the best CC value of 0.675. In the third dataset, a result similar to that obtained with the first real dataset was achieved because fewer CPs were extracted (i.e., 84 CPs). The proposed IPL transformation achieved the best warping performance. This implied that it could warp the VHR image by using a limited number of CPs with the additional extracted pseudo-CPs along the image boundary. To verify the effectiveness of the IPL transformation over the PL transformation, only the outer triangular regions generated by the PL transformation were extracted to calculate the CC values. In this case, an extrapolation process was carried out in the PL transformation to warp the outer triangular regions that might cause severe geometric distortions. The results, shown in Table 4, demonstrated that the IPL transformation improved the warping performance as compared to that in the case of the PL transformation in the outer triangular regions by increasing the CC values from 0.259 to 0.304, 0.603 to 0.657, and 0.180 to 0.338 in the first, second, and third real datasets, respectively.

Conclusions
We proposed an IPL transformation for warping VHR remote sensing images by focusing on the image boundary where the triangular region was barely constructed through a general PL transformation. To this end, additional pseudo-CPs were intentionally extracted along the image boundary to construct triangular regions over the entire image. A simulated dataset constructed from Worldview-3 images was used to investigate the effects of the parameters used for applying the proposed IPL transformation. The real datasets acquired in various acquisition environments from the Worldview-3 and Kompsat-3A VHR multi-temporal images were used to evaluate the performance of the IPL transformation compared with that of other warping transformation models. We concluded that the proposed IPL transformation achieved the best and most stable warping accuracy among the warping methods considered. Moreover, it showed better alignment of objects and improved warping performance than PL transformation outside the triangular regions by increasing the CC values from 0.259 to 0.304, 0.603 to 0.657, and 0.180 to 0.338 in the first, second, and third real datasets, respectively. Therefore, the proposed IPL transformation is an effective alternative to PL transformation with higher warping accuracy.