A Two-Step Global Alignment Method for Feature-Based Image Mosaicing

Image mosaicing sits at the core of many optical mapping applications with mobile robotic platforms. As these platforms have been evolving rapidly and increasing their capabilities, the amount of data they are able to collect is increasing drastically. For this reason, the necessity for efficient methods to handle and process such big data has been rising from different scientific fields, where the optical data provides valuable information. One of the challenging steps of image mosaicing is finding the best image-to-map (or mosaic) motion (represented as a planar transformation) for each image while considering the constraints imposed by inter-image motions. This problem is referred to as Global Alignment (GA) or Global Registration, which usually requires a non-linear minimization. In this paper, following the aforementioned motivations, we propose a two-step global alignment method to obtain globally coherent mosaics with less computational cost and time. It firstly tries to estimate the scale and rotation parameters and then the translation parameters. Although it requires a non-linear minimization, Jacobians are simple to compute and do not contain the positions of correspondences. This allows for saving computational cost and time. It can be also used as a fast way to obtain an initial estimate for further usage in the Symmetric Transfer Error Minimization (STEMin) approach. We presented experimental and comparative results on different datasets obtained by robotic platforms for mapping purposes.


Introduction
Rapid progress in technology makes it possible to obtain and store a vast amount of optical data even from areas beyond human reach.The data has been used for several different purposes in the computer vision field.Image mosaicing is one of the common computer vision tools being mainly used for optical mapping and panorama creation.Image mosaicing can be defined as a process of creating a single high-resolution image from overlapping relatively low-resolution images [1], and it has been successfully used for different applications, such as video stabilization [2], mapping both aerial [3] and underwater [4], panorama creation [5][6][7], and super-resolution [8], among others.Image mosaicing is composed of two main steps, namely registration and blending.The registration step is further divided into two steps: pairwise and global registration (or GA).Pairwise registration (or image matching) consists of finding a relative motion (or transformation) between the coordinate frames of two overlapping images, while Global Alignment (GA) aims at finding the motion between image and map (or mosaic) coordinate frame.Intensity blending is applied after aligning images geometrically on mosaic frame in order to reduce the photometric inconsistences between images.
The overall image registration framework starts by detecting some salient points in the images referred to as features.Then, features are described with a vector obtained via different image gradient information.Similar descriptors are matched between images by using a distance metric (generally Euclidean distance).Due to noise, some of the descriptors might not be matched correctly.They are called Outliers, and they need to be eliminated.To do so, usually robust estimation algorithms (such as Random Sample Consensus (RANSAC) [9]) are employed.Once the outliers are removed, consistent registration parameters (transformation or motion) between images can be estimated [10] and referred to as relative transformation.Once the pairwise image registration (both time-consecutive and non-time-consecutive) is completed, a list of overlapping image pairs and relative registration parameters between them become available.The next step in the Feature-based Image Mosaicing (FIM) framework is GA to obtain mosaics.GA is the problem of finding the image-to-mosaic registration parameters that comply best with constraints imposed by all overlapping image pairs.Global projection of images (global or absolute transformation) can be calculated by successively multiplying relative transformations between time-consecutive images if one of the image frames is selected as a global frame.Usually, the first image frame is chosen as a global frame if there is no other relevant information available.Since relative transformations encapsulate some errors due to the positions of correspondences and the estimation methods, accumulation of errors yields a bigger error and results in the form of misalignment and distortions on image size.GA methods are needed to overcome this error accumulation.As GA is one of important steps in the image mosaicing pipeline, there have been several methods proposed.One of the classifications can be done according to in which domain the error is minimized, mosaic or image.The minimization on the mosaic frame has as a drawback the tendency to reduce the size of the mosaiced images, as reducing the size also decreases the error.However, this way of minimization can be done linearly up to affine motion model.While the errors defined on image frame do not suffer this scaling problem, the minimization usually requires a non-linear minimization.Sawhney et al. [11] proposed minimizing the distances between correspondences on the mosaic frame with an additional error term on the diagonals of images in order to overcome scaling problem.However, having constraints on the image size can cause some disturbance on image alignment.Capel [8] formulated the GA problem similarly to Bundle Adjustment (BA) [12] approaches in 3D.Point positions on the mosaic frame and absolute transformations are considered as unknowns.This method requires feature tracking and high-overlapping image pairs in order to have certain amount of tracked features on each image.There are also GA methods [13,14] that apply image-to-mosaic registration, known as "online mosaicing".In such cases, if there appears a failure while mapping one image onto the mosaic, future image-to-mosaic registration is likely to fail.Generally, errors (Euclidean distance between where the relative planar transformation maps a point and the correspondence of the point was originally detected.An illustration can be seen in Figure 1) defined on image frame have been preferred and widely used.However, non-linear minimization can be still regarded as a bottleneck for large-scale mapping applications due to the computational cost and prerequisite of good initial estimate.In this paper, we propose a two-step GA method aimed to be fast and less computationally demanding.The first step is to estimate scale and rotation parameters, while the second step is for estimating translational parameters.We also show that our proposal can be used in order to find an accurate initial estimate for gold-standard GA methods.[10].The error is defined as a sum of distances measured in both image frames.This error term is independent of the selected global frame m.

Nomenclature
• n is the total number of images.
• c ij is the total number of correspondences between images i and j.
• s, θ, t x , and t y represents the scale, rotation (in radians) and translation parameters (in pixels) of a similarity type planar transformation • i H j is the transformation relating image points represented in the coordinate frame image j to the coordinate frame of image i and it consists of parameters ( i s j , i θ j , i t x j , i t y j ).• The transformation from image i to the global frame m is represented with m H i .m H i is composed of parameters (s i , θ i , t x i , t y i ) similarly above.For simplicity, m is dropped in the representation of parameters.
• The first image frame is selected as a mosaic frame.Therefore, m H 1 is identity and m equals to 1.
Parameters for the first image are not considered as unknown.

Two-Step Global Alignment for Feature-Based Image Mosaicing (FIM)
Our GA approach is motivated by the constraints imposed by relative transformations between overlapping images as absolute transformations should meet the constraints introduced by them.We use similarity type planar transformations as they generally contain enough Degrees of Freedom (DOFs) for optical mapping with robotic platforms [15].A pipeline of the proposed GA method is illustrated in Figure 2. We propose a method composed of two steps instead of STEMin as a GA in the FIM pipeline.

Scale and Rotation Estimation
First, we try to obtain the optimum scale and rotation parameters for each image by minimizing the error given in Equation ( 1) over scale and rotation without taking into account the translation parameters: where i and j are the indices of the images that were successfully matched overlapping image pairs.Without taking into translation parts, Equation ( 1) can be rewritten as follows: It should be noted that the E 1 is written as sum of errors in scale and rotation.As there is no dependency between parameters, minimization can be applied separately: where and E 12 can be minimized linearly as similarly done in [16].Operating on Euler angles may suffer from singularities; therefore, we apply non-linear minimization using trigonometric functions coming from the equation i R j − R −1 i • R j , where R represents the two-dimensional rotation matrix.For non-linear minimization, the residual vector is written of as follows: Jacobians of the cost functions E 11 and E 12 are rather simple and can be calculated in block form using the following structural elements: As it can be noted, the following relation between Jacobians hold: This allows for fast computation for the Jacobian blocks.

Translation Estimation
After estimating the scale and rotation parameters by minimizing the error term E 1 = E 11 + E 12 , we minimize STE given in Equation ( 4) over only translation parameters by using the obtained scale and angle parameters: where . j p k = ( j x k , j y k , 1) and i p k = ( i x k , i y k , 1) are the corresponding feature points in overlapping images i and j.Coefficients a and b are as follows: As the error term E 2 is minimized over translation parameters, its Jacobian matrices are also easy to compute: The following relation between Jacobians can be seen easily.
. This relation further reduces the computational cost of Jacobian computation.It can also be noted that Jacobian matrices are free of any feature point positions.This allows Jacobian matrices to be computed fast.

Experimental Results
We have tested our proposal on several datasets obtained by different underwater robotic platforms.Main characteristics of the datasets are summarized in Table 1.Datasets I, III, and VI were obtained using a Flea digital camera (Point Grey, Vancouver, BC, Canada) carried by a Phantom XTL Remotely Operated Vehicle (ROV) (Deep Ocean Engineering, Inc., San José, CA, USA) during a survey of a patch of reef located in the Florida Reef Tract [18].Dataset IV was gathered by the ICTI NEU AUV [17] underwater robot during sea experiments on the Mediterranean coast of Spain.The Dataset VII was acquired by the Girona500 AUV [19] during its operational tests in the pool of the Underwater Robotics Center at the University of Girona.The floor of the pool was covered by a big poster in order to simulate realistic environment.The robot was equipped with different navigation sensors that provide pose information as rotation and translation in 3D.By combining pose with camera calibration information, planar transformations for each image can be computed.These transformations are 8-DOF full projective type and obtained results are presented as Sensor 8-DOFs in Table 2. Furthermore, we registered individual images with respect to the image of the poster and obtained results are presented as Sensor 4-DOFs.Although originally the dataset has 286 images, some of them were discarded due to not having sensor readings and not having sufficient correspondences for successful registration as they were acquired close to the borders.Datasets I, II, V, and VI have some (namely, 6, 15, 5, and 42) time-consecutive images that do not have overlapping areas.This causes accumulation of the motion between time-consecutive images that fails to provide an initial estimate of the trajectory.Range for scale and rotation parameters between overlapping image pairs are also presented along with the characteristics.These values are extracted from the planar transformations, which were computed through image registration process.Angles are computed with atan2 function providing values in [−π/2, π/2].Scale Invariant Feature Transform (SIFT) [20] is employed for feature detection and description while RANSAC is used for outlier rejection and transformation estimation.If there are at least 20 remaining correspondences after outlier rejection, image pairs are counted as successfully matched and included as overlapping image pairs.All the tests were performed using a desktop computer with an Intel Xeon E5-1650 TM 3.5 Ghz processor (Intel Corporation, Santa Clara, CA, USA) with a 64-bit operating system and running MATLAB TM on the CPU.Error minimization is carried out using a Levenberg-Marquadt algorithm through lsqnonlin function.We have provided analytic expressions for computing the Jacobian matrix.For comparison, we minimized the STE (Equation ( 4)) over all parameters for each transformation.STE is also chosen for comparing the results obtained as it is independent of the chosen global frame.Results are summarized in Table 2.The second column shows the tested methods: the proposed method and the standard minimization of STE denoted as STEMin.The third, fourth and fifth columns correspond to the average STE, the standard deviation, and maximum error calculated using all the correspondences over all overlapping image pairs.The last column shows the total time spent for error minimization.
For the proposed method, the time column provides the sum of the time spent by both E 1 and E 2 minimizations.Initial estimates for absolute transformations are provided as identity mappings.They are included as a comparison baseline.We also present the basic statistical measures on absolute differences between scale and rotation parameters estimated by the proposed method and STEMin.
Results are summarized in Table 3. From the results, it can be seen that our proposal was able to obtain similar trajectory accuracy with less computational efforts.This is favorable especially on large datasets and/or mapping applications where computational limits may exist.It should be noted that our proposal also makes use of non-linear minimization, which generally requires a good initial estimate for better and quick convergence.Since our proposal performs relatively fast and requires less computational resources, its result can be also used as an initial estimate for minimizing STE, especially when there is no good initial estimate.From the table, it can be seen that running our proposal and using its result as an initial value for STEMin (combined strategy) provides the same level of accuracy with running minimization directly.It should be also noted that using our proposal accelerates the convergence and this yields reduction of the total computational time.If an initial estimate can be obtained by using some other sensor data (e.g., Global Positioning System (GPS), Doppler Velocity Log (DVL),Ultra Short Base Line (USBL)) and/or through relative transformations, this might improve the final convergence.Although symmetric transfer error helps to infer accuracy about the trajectory, it does not always represent the visual quality of the final mosaic.We use invariant color histograms [22] as a visual comparison of final mosaics as done in [23].To compare histograms of two images a and b, we use the same metric in [22] and the computation of differences between their histograms is given in Equation ( 5): where h c denotes the histogram value for color channel c and computed as follows: where f and g denote derivatives in two color channels [22].
For each dataset, we rendered mosaics using last-on-top strategy.Invariant histograms of rendered mosaics were computed and differences between them were calculated using the Equation ( 5).Mosaics obtained with STEMin are used as comparison baseline (i.e., histogram b in Equation ( 5)).Results are presented in Table 4. Dataset IV was excluded for this comparison as it was composed of grayscale images.Final mosaics for Dataset IV and Dataset VI are given in Figures 3 and 4

Conclusions
Large-area optical mapping through image mosaicing has been in demand from different science communities.One the most challenging steps in the Feature-Based Image Mosaicing pipeline is GA, which requires non-linear minimization.In this paper, we present a two-step GA method aimed to be fast and less computationally demanding.The first step is for estimating scale and rotation parameters, while the second step is to obtain optimum translation parameters.We present experimental and comparative results over different challenging datasets obtained with robotic platforms aimed for optical mapping.Our proposal can be used as a standalone GA method and can also be used as a better initial estimate provider to STE minimization for quicker convergence.Our proposal is suitable for typical surveys with robotic platforms.Experiments on seven challenging different underwater datasets have been reported and have showed the efficiency of the proposed approach.

Figure 2 .
Figure 2. Pipeline for FIM with Two-Step GA method on top and FIM with STEMin on bottom.We propose a method composed of two steps instead of STEMin as a GA in the FIM pipeline. .

Figure 3 .
Figure 3. Final mosaics of Dataset IV.(a) the mosaic (3187 × 2602 pixels) obtained with proposed method; (b) the mosaic (3295 × 2674 pixels) with STEMin.Mosaics were blended using a combination of gradient domain imaging and graph cut algorithms [24].

Table 1 .
Properties of datasets used in experiments.The numbers reported in this column are computed over overlapping image pairs given in the fifth column of the table.

Table 2 .
Summary of results.STEMin represents direct minimization of STE.Strategy Combined denotes STEMin using the results of the proposed method as an initial estimate.

Table 3 .
Differences between scale and rotation parameters estimated by the proposed method and STEMin.

Table 4 .
Summary of the invariant histogram comparisons.