Improved Parameter Estimation of the Line-Based Transformation Model for Remote Sensing Image Registration

Abstract: The line-based transformation model (LBTM), built upon the use of affine transformation, was previously proposed for image registration and image rectification. The original LBTM first utilizes the control line features to estimate six rotation and scale parameters and subsequently uses the control point(s) to retrieve the remaining two translation parameters. Such a mechanism may accumulate the error of the six rotation and scale parameters toward the two translation parameters. In this study, we propose the incorporation of a direct method to estimate all eight transformation parameters of LBTM simultaneously using least-squares adjustment. The improved LBTM method was compared with the original LBTM through using one synthetic dataset and three experimental datasets for satellite image 2D registration and 3D rectification. The experimental results demonstrated that the improved LBTM converges to a steady solution with two to three ground control points (GCPs) and five ground control lines (GCLs), whereas the original LBTM requires at least 10 GCLs to yield a stable solution.


Introduction
Remote sensing is currently undergoing an evolution toward sensor platform advancement and open source development, leading to a growth of data during the past decade.For instance, the United States Geological Survey (USGS) has publicly released the data archive of Landsat images for free usage since 2008, which facilitates a variety of applications found in spatio-temporal modelling (e.g., vegetation phenology and urban changes).On the other hand, innovative sensors, such as LiDAR and Kinect, are capable of collecting high resolution point cloud data that aids in fine-scale 3D modelling in both indoor and outdoor environments.Though such a "big data" growth provides fruitful opportunities in the application domain, it also raises certain challenges in terms of data management, data transmission and data processing [1,2].Regardless of the application, geo-referencing is always the first key step to geometrically align different datasets in the same spatial domain before performing any feature extraction and thematic analysis, and thus such a topic has been intensively researched in the last two decades with different rigorous and approximate (empirical) sensor models [3][4][5].
In order to geometrically align different datasets under the same coordinate system, common features have to be first identified, extracted, matched and subsequently used to establish a transformation function so as to perform geo-referencing.Such a process includes 2D to 2D transformation (image registration) and 2D to 3D transformation (image rectification).In remote sensing, point features have been commonly used as control primitives since they can be easily identified within satellite images (e.g., road intersection or building corner) or measured using surveying instruments (e.g., total station or global positioning system (GPS)).Thus, most of the existing work mainly relies on the use of control points for image registration [6] or image rectification [7], particularly in urban scenes.However, in rural environments such as rugged terrain, desert area or cold region, point features are usually hard to locate or measure, and thus the research community has recently investigated the use of linear features to assist, rather than to substitute, control points in the geo-referencing process.
Habib and Alruzouq [8] proposed using straight line segments as registration primitives and utilized the modified iterated Hough transform as a matching strategy for image registration between IKONOS and Kompsat, SPOT as well as Landsat.Zhang et al. [9] utilized both line and curve features for exterior orientation of stereo aerial images yielding a sub-pixel level of accuracy.Jaw and Perny [10] proposed a two-stage matching strategy for aerial photographs by first projecting 3D line segments into 2D plane and assessing their geometric properties (i.e., angles and lengths) as initial matching.Subsequently, a second stage matching can be conducted through a geometric check of the reference points with the aid of direct linear transform (DLT).Sui et al. [11] developed an iterative line extraction and Voronoi integrated spectral point matching approach for automatic optical-to-SAR image registration.In rural areas such as desert environment, where sharp control points are hard to locate, Li et al. [12] initially proposed utilizing the centroid of bushes as control features for aerial photo to LiDAR data registration based on the use of DLT.They further improved the method by extracting the sand ridges as control line features for data registration [13].Recently, some further attempts can be found using polygons as control features for historical map registration [14], where a framework of registration with multiple features (e.g., points, lines, curves and regions) is claimed to be suitable for all the aforementioned transformation models [15].
Previously, the line-based transformation model (LBTM) was proposed [16][17][18] where the model aimed to provide a satellite image registration/rectification framework using both point and linear features, and the work has laid a solid foundation for the subsequent line-based image registration and rectification research [19].The LBTM is built upon the use of affine transformation as the core transformation function to perform image registration and image rectification, particularly for satellite images collected using a linear array sensor and narrow angle field of view [4,20,21].The LBTM has been demonstrated with case studies using a diversity of remote sensing images such as SPOT, Landsat, ASTER and IRS.The LBTM comprises of two major steps to estimate the transformation parameters, where at least three ground control lines (GCLs) are required to firstly estimate the six rotation and scale parameters and one ground control point (GCP) is used to subsequently retrieve the remaining two translation parameters.Though previous results demonstrated a root-mean-squared (RMS) error within one to five pixels in image registration and image rectification [16][17][18], the error distribution of the translation parameters is somehow dependent on how the rotation and scale parameters are estimated.
In order to improve the robustness and accuracy of the LBTM, this paper presents a new direct estimation method to simultaneously retrieve all the transformation parameters.Unlike the original LBTM that uses GCLs to first estimate the six rotation and scale parameters and subsequently uses redundant GCP(s) to determine the two translation parameters [16][17][18], the proposed LBTM is capable of estimating all eight transformation parameters simultaneously through least-squares adjustment.Such a concept of direct estimation is similar to that implemented in other surveying engineering areas [22][23][24], where arbitrary initial values within specified lower and upper bounds of the parameters are input and mathematical optimization is used to determine the global optimal solution.The rest of the manuscript is organized as follows.Section 2 describes the mathematical models of original LBTM and the improved version of LBTM.Section 3 describes experimental datasets being used to compare the original LBTM and the proposed LBTM.Section 4 demonstrates the results of image registration and rectification with four experimental datasets, and finally conclusions are drawn in Section 5.

The Original Two-Step Approach
The LBTM is originally proposed by Shaker [16], Shi and Shaker [17], Shaker [18], which comprises of two major steps.Recalling the mathematical notations, let the number of linear segments used for parameter estimation be denoted by m.For line segment i, the coordinates of any two points (p 1 and p 2 ) along the line segment in the image space are p 1 i = (x 1 i , y 1 i ) and p 2 i = (x 2 i , y 2 i ), and the coordinates of any other points along the corresponding line in the object space are P Then the ordinary point-based eight-parameter affine transformation model can be presented for points P 1 i and P 2 i (where i = 1, 2, • • • , m) as follows: For point P 1 : For point P 2 : where C 1 , C 2 , • • • , C 8 are model parameters.Subtracting Equation (1) from Equation (3) gives Similarly, subtracting Equation (2) from Equation (4) gives The length of the line segment on the image, l 12 i , performed by connecting p 1 i and p 2 i is given by Similarly, the length of the corresponding line segment on the ground, L 12 , is given by Rewriting Equations ( 5) and (6) by incorporating L 12 i and l 12 i on both sides: The above equations can be represented by unit vector components: where a x i = (x 2 i − x 1 i )/l 12 i and a y i = (y 2 i − y 1 i )/l 12 i are the unit vector components of the line segment connecting points p 1 i and p 2 i in the image space and )/L 12 i are the unit vector components of the corresponding line segment in the object space.The coefficient S i is a scale factor given by l 12 i /L 12 i .
To estimate the translation parameters C 4 and C 8 , let the number of GCPs available be denoted by n.The coordinates of the control points on the image and the object space are denoted by (x j , y j ) and (X j , Y j , Z j ), where j = 1, 2, • • • , n.Then, To estimate model parameters, the existing method involves two steps.First, the optimal values of the rotation and scale parameters (C 1 , C 2 , C 3 , C 5 , C 6 and C 7 ) are determined using Equations ( 11) and ( 12) that minimize the deviations between observed and predicted unit vectors a x i and a y i .Second, using the estimated values of these parameters, the translation coefficients C 4 and C 8 can then be determined using Equations ( 13) and ( 14).In the case of 2D image to image registration, the parameters C 3 and C 7 will not exist, leaving only six parameters (C 1 , C 2 , C 4 , C 5 , C 6 and C 8 ) to be estimated using the GCLs and GCPs provided.To estimate the eight parameters of the model without redundancy, three GCLs and one GCP are needed.The three GCLs form six equations (based on Equations ( 11) and ( 12)) and the GCP forms two equations (based on Equations ( 13) and ( 14)).The eight equations uniquely determine model parameters.If more linear segments and GCPs are used, regression or least-squares analysis should be applied.Figure 1 shows the overall workflow of LBTM.

Improved Parameter Estimation Method
Since the original two-step method first considers the six rotation and scale parameters as fixed, and subsequently retrieves the remaining two translation parameters in the second step, the solution may not produce the best optimal values of all the parameters simultaneously.In addition, the error accumulated in the estimation of the six parameters may somehow carry over toward the two translation parameters.Therefore, we propose a direct method that estimates the eight parameters simultaneously in a single step using least-squares adjustment and thus would lead to a better solution than what the two-step method produces.Mathematically, we combine the Equations ( 11)-( 14) into the following solution structure: The above Equations ( 15)-( 18) can be fitted into a matrix form: By adding all the available GCPs and GCLs, we have: The least-squares solution to estimate the transformation parameters simultaneously can be solved by: where V is the residual matrix.The solution to determine the transformation parameters P would be:

Experimental Testing
Four experiments were conducted in order to compare the performance of the original LBTM and the improved LBTM for 2D registration and 3D rectification.The first experiment aims to demonstrate the capability of both LBTMs using synthetic data that suffered from different levels of random error in the GCLs and GCPs.The second experiment aims to compare the two LBTM methods for image registration using two IKONOS images collected at different days, while the third experiment performed image registration between a Landsat 8 and a WorldView-2 satellite image.Both experiments adopted different combinations of GCPs and GCLs, and thus started with a minimum of one GCP and three GCLs and subsequently increased their corresponding numbers.The last experiment demonstrated the capability of LBTM for 3D rectification of the IKONOS image by using 3D GCLs/GCPs either collected by field survey or provided by the geodetic authority.The results were assessed based on the RMS error derived by a certain number of independent checkpoints (CPs).In order to practically implement these two LBTM methods, one should bear in mind the following three issues.First, the pairwise GCLs being used in the raw and reference dataset should be in the same direction (orientation) in their corresponding coordinate system.Second, due to the lack of 3D ground coordinates, the second and third experiments were implemented to serve the purpose of 2D to 2D image registration.Thus, model parameters C 3 and C 7 were being ignored as mentioned in Section 2. Third, the y-axis of the image coordinate system is usually oriented downwards in the computer system; therefore, the y-coordinates in the image space should be positively reversed upward in order to align in the same direction as the object space (geo-referenced) coordinate system.

Experiment 1: Synthetic Data
The two transformation models were first tested and evaluated by using synthetic data.A total of 30 GCPs and 30 GCLs were generated in a self-defined coordinate system with an extent ranging from 0 m to 1000 m in both x and y directions.The GCLs were digitized with different lengths and orientations, while the location of GCPs were evenly distributed along with the GCLs.All these datasets thus represented a set of control features in the object space.On the other hand, a specific value (C 1 = 0.3, C 2 = 0.5, C 3 = 0, C 4 = 100, C 5 = 0.2, C 6 = 0.3, C 7 = 0 and C 8 = 500) was assigned to the transformation parameters in Equations ( 1)-( 4) in order to retrieve a corresponding set of control features in the image space.In order to evaluate the capability of the improved LBTM solution, we intentionally assigned random noise to the GCLs and GCPs in the image space ranging from 1 m to 10 m so that we can demonstrate the quality of the control features affecting the registration results.Finally, a set of 30 CPs (both in image space and object space) were produced and used to assess the final registration results based on the RMS error.Figure 2 shows the synthetic CPs, GCLs and GCPs located in the self-defined coordinate system.

Experiment 2: IKONOS to IKONOS
The second experiment utilized two IKONOS satellite images (1 m resolution in the panchromatic band and 4 m resolution in multi-spectral bands) collected at different days in order to examine the two LBTM methods.Figure 3a shows a near-infrared band of IKONOS image acquired on 25 June 2006, and Figure 3b shows a red band of IKONOS image acquired on 25 October 2012.Both images cover approximately 7.5 km by 7.5 km in the northwest region of Toronto, Ontario, Canada.The IKONOS image collected in 2006 is an original raw image without any geo-referenced coordinate system, where the reference IKONOS image collected in 2012 has been geo-referenced into the UTM WGS84 zone 17N.A total of 30 GCPs and 30 GCLs with different orientations were manually digitized in both raw and reference images, where these GCLs are mainly located on top of the roads, highways and the airport runway and those GCPs are mainly located at the corners of pedestrian lanes and road intersections.To compare the registration accuracy, another 30 CPs were identified to compute the RMS error after the transformation parameters of both LBTM methods were estimated.The third experiment utilized two different datasets for image registration.The study area is located in the west of city of Toronto, Ontario, Canada, covering the south of the city of Mississauga and part of the Ontario Lake.The area is mainly occupied by residential houses and connected with local roads and highway.The dataset includes a high resolution WorldView-2 satellite image collected on 30 April 2011 in panchromatic mode of 0.5 m resolution and multi-spectral mode of 2 m resolution.The spatial extent covers approximately 5 km by 5 km of Mississauga, and the WorldView-2 image did not associate with any geo-referenced coordinate system.The second dataset includes a Landsat 8 satellite image that was acquired on 9 September 2015, and the image has been geo-referenced into the UTM 17N coordinate system.The image resolution of panchromatic and multi-spectral bands are 10 m and 30 m, respectively.Similar to the experiment 2, 30 GCPs and 20 GCLs were collected to test the two LBTM methods, and another set of 20 CPs was used to evaluate the corresponding results.Figure 4a shows the WorldView-2 image, and Figure 4b

Experiment 4: 3D Rectification of IKONOS
The last experiment demonstrated the capability of the improved LBTM for 3D rectification of the IKONOS image with a case study on a rugged terrain.The IKONOS image was acquired on 23 November 2000 in panchromatic mode with 1 m spatial resolution.The study area covers part of the Kowloon Peninsula and the Hong Kong Island in the Hong Kong Special Administrative Region (HKSAR).The spatial extent covers approximately 11.5 km by 10.2 km.The north side and south side of the image include the Lion Rock Hill and the Victoria Peak, respectively, where with each the elevation difference of the terrain yields up to 500 m.Unlike the previous two experiments, all the CPs, GCLs and GCPs used in this experiment were either collected by GPS field survey or provided by the Lands Department of the HKSAR.A total of 12 GCLs and five GCPs were used to perform the experimental testing, where the location of these GCLs and GCPs are located on either flat ground and rugged terrain and they are evenly distributed within the image scene as shown in Figure 5. Finally, eight evenly distributed CPs were used to evaluate the RMS error after rectifying the IKONOS image.A detailed description of the field survey and GCP collection can be found in [25,26].

Experiment 1: Synthetic Data
Figure 6 shows the registration results derived from the original LBTM and the improved LBTM with different levels of random noises intentionally assigned to the synthetic data.Since the original LBTM is a two-step process, it first estimates the six rotation and scale parameters using the GCLs and subsequently uses the GCPs to retrieve the two translation parameters; any errors found in the GCLs provide a loose solution in the estimation of scale and rotation and thus these errors are being carried over in the subsequent process.Such an argument can be well justified in Figure 6a, which shows the RMS error of image registration of which the random noises were intentionally assigned on the GCLs only.As shown in Figure 6a, the original LBTM suffered from a higher RMS error, comparing to that of the improved LBTM.Such an error difference significantly increased when the the random noises found in the GCLs increased.Nevertheless, if the random noises were only assigned to the GCPs, there was no difference between the two models in terms of the RMS error.As shown in Figure 6b, both models suffered from the same level of noises in GCPs that mainly affected the estimation of translation.Finally, when random noises were assigned to both GCLs and GCPs, the improved LBTM still outperformed the original LBTM, particularly when the random noises increased from 0 m to 10 m (see Figure 6c).

Experiment 2: IKONOS to IKONOS
Figure 7a-i show the RMS error (in terms of pixels) of the original LBTM (as indicated by the blue line) and the improved LBTM (as indicated by the red line) derived using the 30 CPs as mentioned in Section 3. In Figure 7a, when one GCP and three GCLs were used, the original LBTM and the improved LBTM produced a RMS error of 16.5 pixels and 5.7 pixels, respectively.Adding one more GCL led to a reduction of RMS error by 40% in both methods.After incorporating a total of five GCLs, the improved LBTM converged to a steady solution of RMS error less than 3 pixels.Nevertheless, the original LBTM required at least 16 GCLs to converge to a stable solution.Figure 7b shows the results of both LBTM methods with the use of two GCPs.When using two GCPs and three GCLs, the original LBTM produced a RMS error of 14.2 pixels while the improved LBTM yielded a RMS error of 3 pixels.The improved LBTM was able to achieve a RMS error of two pixels only using five GCLs, whereas the original LBTM required 18 GCLs to do so.When three or four GCPs were used, the RMS error even dropped to 1.7 pixels in the improved LBTM with the aid of four to five GCLs.Again, the original LBTM could not reach such a level of error unless more than 20 GCLs were used.After incorporating five GCPs in both LBTM methods (see Figure 7e-i), no significant difference was found in terms of the RMS error pattern.The original LBTM produced a RMS error of 13 to 15 pixels when using a minimum of three GCLs and converged to a steady solution with at least 18 GCLs.On the other hand, the improved LBTM yielded a convergence solution with RMS error ranging from 1.3 to 1.6 pixels even when three GCLs were used.Overall, the improved LBTM produced a lower RMS error with quicker convergence than what the original LBTM achieved.With a minimum 2 to 3 GCPs and 5 GCLs, the improved LBTM yielded an RMS error of less than two pixels.As a baseline of the experiment, we also utilized all 30 GCPs to perform a point-based 2D affine transformation to compare the accuracy against that achieved by the LBTM.The RMS error achieved using the traditional point-based 2D affine transformation yielded 1.4 pixels.Therefore, the improved LBTM can produce a comparable accuracy to that achieved using the traditional point-based affine transformation.

Experiment 3: WorldView-2 to Landsat 8
Figure 8a-i show the RMS error (in terms of pixels) of the original LBTM (as indicated by the blue line) and the improved LBTM (as indicated by the red line) derived using the 20 CPs for the second experiment.One can easily note that the error pattern of both LBTM methods is quite similar to what the second experiment achieved.Due to the different nature of terrain, the RMS error was up to 55 pixels when using one GCP and three GCLs in both LBTM methods (see Figure 8a).However, the RMS error drastically dropped to 13 pixels and reached a steady state after using four GCLs in the improved LBTM.The original LBTM did not yield a stable solution until 16 GCLs were being used.In Figure 8b, when two GCPs were used, the RMS error produced by the improved LBTM ranged from eight to 11 pixels regardless of the number of GCLs.On the other hand, the original LBTM produced a high RMS error of 64 pixels in the initial stage, and dropped to less than 12 pixels when 16 GCLs were incorporated in the model.When three to four GCPs were used, the improved LBTM yielded a solution with RMS error ranging from six to eight pixels regardless of the number of GCLs used, whereas the error pattern of the original LBTM did not have any significant difference afterwards.When 10 to 30 GCPs were incorporated in both LBTM methods, the improved LBTM produced a RMS error of approximately five pixels, which was similar to the traditional point-based affine transformation.

Experiment 4: 3D Rectification of IKONOS
Figure 9a-e show the RMS error (in terms of pixels) of the original LBTM (as indicated by the blue line) and the improved LBTM (as indicated by the red line) derived using the eight CPs in the last experiment.The RMS error pattern behaves quite similar to the previous two experiments.When using only one GCP, both the original LBTM and the improved LBTM behaved exactly the same regardless of the number of GCLs being used.The RMS error yielded up to 18 pixels when using a minimum three GCLs and it drastically dropped to seven pixels when one additional GCL was added.The RMS error was reduced gradually up to three pixels when eight GCLs were used.When using two to four GCPs, all the error patterns behave quite similarly.The original LBTM still produced a high RMS error (∼16 to 17 pixels) when using a minimum of three GCLs, while the improved LBTM performed sightly better with an RMS error of nine to 10 pixels.Both models performed very similarly where the RMS error decreased progressively when adding more GCLs, and they all ended up with a RMS error close to two pixels when using more than 10 GCLs.In the scenario of using five GCPs, the improved LBTM demonstrated its robustness that is superior to the original LBTM.The RMS error of the improved LBTM was bounded by two to three pixels when using three to 13 GCLs, which is close to the point-based image registration result (∼two pixels).Nevertheless, the original LBTM still generated a high RMS error (∼15 pixels) when using three GCLs.The RMS error dropped similar to the improved LBTM when adding more GCLs.Due to the lack of 3D GCPs being collected in the field, it is not possible to demonstrate the scenarios of using 10 to 30 GCPs or using more than 13 GCLs.However, it is believed that the improved LBTM can produce a more steady solution than the original LBTM, similar to all the scenarios presented above.

Discussion
Based on the results achieved from the four experiments, one can observe that the improved LBTM produces a robust solution of image registration with quicker convergence than the original LBTM.The first synthetic experiment proved that the original design of the LBTM accumulated the error of the six rotation and scale parameters toward the subsequent translation parameters, as ascribed by the two-step parameter estimation process.Since the improved LBTM simultaneously estimates all the model parameters using a least-squares solution, such an error-accumulation phenomenon does not exist at all.In the real data tests, the improved LBTM only required five GCLs along with two to three GCPs to generate a RMS error less than two to three pixels in the second experiment and five to six pixels in the third experiment.The slightly higher RMS error achieved in the third experiment can be ascribed to the nature of the terrain as well as the different sensor types being tested.However, the original LBTM required 16 to 18 GCLs along with at least one GCP in order to produce a converged solution.In the last experiment, the improved LBTM still demonstrated a quicker convergence than the original LBTM.As a baseline for result comparison, Figures 7j and 8j show the result of point-based image registration with the aid of the exact same 30 GCPs.As noted, the point-based image registration results generally followed similar trends to those generated by the improved LBTM.The point-based image registration found in Experiment 2 yielded a better solution than that of Experiment 3 with an RMS error of approximately two pixels.Therefore, the new parameter estimation method of LBTM, which simultaneously estimates all the transformation parameters, can yield a more robust solution than the previously proposed LBTM, and is comparable to (or slightly outperforms in some cases) the traditional point-based image registration solution.Figure 10a

Conclusions
This paper presents a new parameter estimation method to improve the previously proposed LBTM to aid in remote sensing image registration.This original LBTM is built upon a two-step approach that first utilizes a minimum of three GCLs to estimate the six rotation and scale parameters and subsequently uses GCP(s) to estimate the remaining two translation parameters.In order to reduce the biases of parameter estimation, we propose a new parameter estimation method that incorporates both the GCPs and GCLs into the LBTM to estimate all the transformation parameters simultaneously using least-squares adjustment.Four rounds of experiments were conducted to compare the RMS error of 2D image registration and 3D rectification using synthetic data and real-world imagery datasets.The first experiment utilized synthetic GCLs and GCPs that suffered from different levels of noises to demonstrate the impact of error prorogation in the original LBTM.The second experiment performed image registration between two IKONOS satellite images while the third experiment used two different satellite images (Landsat 8 and WorldView-2).The last experiment assessed the performance of the improved LBTM with a case study of 3D rectification on a rugged terrain.All the experiments assessed different combinations of GCLs and GCPs, and the derived results were evaluated with the aid of an independent set of CPs.All the experiments demonstrated a similar error pattern with respect to the number of GCPs and GCLs being used in both the original LBTM and the improved LBTM.The improved LBTM only requires two to three GCPs and five GCLs in order to generate a steady solution for image registration in both experiments.However, the traditional LBTM needs at least 10 GCLs regardless of the number of GCPs being used.Therefore, the proposed parameter estimation method can provide a robust solution and a quicker convergence with a smaller number of control features.


Use GCLs and GCPs to estimate all parameters simultaneously.

Figure 2 .
Figure 2. The synthetic CPs, GCLs and GCPs used in the first experiment.

Figure 3 .
Figure 3.The GCLs and GCPs used in the second experiment; (a) Year 2006 IKONOS image; (b) Year 2012 IKONOS image.

Figure 5 .
Figure 5.The GCLs and GCPs used in the fourth experiment.

Figure 6 .
Figure 6.Analysis of the root-mean-squared (RMS) error with different levels of random noises in (a) GCLs, (b) GCPs and (c) both GCLs and GCPs.