Robust Mosaicking of Lightweight UAV Images Using Hybrid Image Transformation Modeling

This paper proposes a robust feature-based mosaicking method that can handle images obtained by lightweight unmanned aerial vehicles (UAVs). The imaging geometry of small UAVs can be characterized by unstable flight attitudes and low flight altitudes. These can reduce mosaicking performance by causing insufficient overlaps, tilted images, and biased tiepoint distributions. To solve these problems in the mosaicking process, we introduce the tiepoint area ratio (TAR) as a geometric stability indicator and orthogonality as an image deformation indicator. The proposed method estimates pairwise transformations with optimal transformation models derived by geometric stability analysis between adjacent images. It then estimates global transformations from optimal pairwise transformations that maximize geometric stability between adjacent images and minimize mosaic deformation. The valid criterion for the TAR in selecting an optimal transformation model was found to be about 0.3 from experiments with two independent image datasets. The results of a performance evaluation showed that the problems caused by the imaging geometry characteristics of small UAVs could actually occur in image datasets and showed that the proposed method could reliably produce image mosaics for image datasets obtained in both general and extreme imaging environments.


Introduction
Lightweight unmanned aerial vehicles (UAVs) are widely used as a remote sensing platform for obtaining high spatial resolution images. Low-altitude flight and easy control are the most distinctive features of UAVs, compared to conventional remote sensing platforms, such as aircrafts and satellites. On the other hand, the improvement in spatial resolution causes a reduction in the ground area that can be covered by a single image. This means that many UAV images are needed in order to analyze wide target areas. As a result, image mosaicking is regarded as an essential task in UAV applications.
Image mosaicking methods can be classified into spatial-data-based methods and feature-based methods. The former methods generate mosaic images using digital surface models (DSMs) or ground control points (GCPs) [1][2][3][4][5][6]. The latter methods generate mosaic images based on tiepoints between adjacent images [7][8][9][10]. In many remote sensing applications, spatial-data-based methods are preferred because they can produce georeferenced or ortho-rectified mosaic images. However, feature-based methods can also be used effectively in investigating disaster regions, such as fire, flood, and earthquake zones; and polar regions, such as icebergs, glaciers, and sea-ice. In these cases, it is of paramount importance to quickly report on-site situations to decision-makers. To do this, spatial-data-based methods require excessive time for constructing DSMs or GCPs [11,12].
In this paper, we present a feature-based mosaicking method to further improve the utilization of lightweight UAVs in extreme environments. Existing studies made an effort to improve the accuracy and speed of image mosaicking. Enhancing mosaicking speed was tried by reducing the processing time for tiepoint extraction, which requires the greatest amount of computation. Moussa and El-Sheimy [13] minimized the number of matching image pairs by structuring UAV images through Delaunay triangulation. Mehrdad et al. [14] reduced tiepoint extraction regions using epipolar geometry established from initial camera parameters. Faraji et al. [15] minimized the computation of tiepoint extraction using reference images in target areas. On the other hand, enhancing mosaicking accuracy was tried by minimizing the accumulated errors and distortions that are likely to occur in image transformation estimation. Moussa and El-Sheimy [13] minimized the accumulated errors using proximity among the images. Xu et al. [9] minimized image distortions by making image transformations as close as possible to rigid transformations. Mehrdad et al. [14] mitigated accumulated errors by estimating image transformations so that reprojection errors of tiepoints are minimized.
The studies described above contributed to improving the performance of image mosaicking. However, considerations for poor imaging environments were not fully discussed. The existing methods assumed high overlaps and well-distributed tiepoints between adjacent images to establish image transformations. The conventional remote sensing platforms can easily meet these requirements, but lightweight UAVs might not. Because UAVs are sensitive to changes in wind direction and speed, overlaps may not be well maintained between adjacent images taken during a flight [16,17]. In addition, since UAVs have low flight altitudes, tiepoint distributions may be biased, especially with low-textured surfaces [16]. In this situation, even if there are sufficient overlaps, the accuracy of transformations would be reduced due to the biased tiepoint distributions. Therefore, in this paper, we investigate a new, robust mosaicking method that can handle problems caused by the imaging geometry characteristics of lightweight UAVs. In one of our previous studies, we found that applying a simple transformation model, such as an affine model, could yield better results for narrowly overlapping image pairs than by applying a sophisticated model, such as a homography model [18]. In the subsequent study, we examined the possibility for the selective use of transformation models [19]. Based on these findings, we propose an image mosaicking method that can establish optimal transformations and also minimize mosaic deformation. In addition, we experimentally demonstrate the problems and analyze the effects on mosaicking results. The proposed method estimates pairwise transformations between adjacent images. The optimal transformation model for each pair is derived from a geometric stability indicator that can consider both overlap and tiepoint distribution simultaneously. The proposed method then estimates global transformations from optimal pairwise transformations that maximize geometric stability between adjacent images and minimize mosaic deformation. The criterion for assessing geometric stability in selecting an optimal transformation model was determined through experiments using two independent image datasets. Performance evaluations were conducted using a highly overlapping image dataset and an inconsistently overlapping image dataset.

Dataset
Three image datasets were used to develop and evaluate the proposed method: Dataset-1A, Dataset-1B, and Dataset-2. These datasets were obtained by experts with flight authorizations for research purposes. Dataset-1A and Dataset-1B consist of single strip images, which were acquired by a small drone, a DJI S900 (DJI, Shenzhen, China), with a total weight of 3.3 kg and a maximum flight time of 18 minutes. The images in Dataset-1A and Dataset-1B were used to determine the criterion of geometric stability for selecting optimal transformation models between adjacent images. Figure 1 and Table 1 show the image acquisition information for the two datasets. poor imaging environments. For a general imaging environment, all images in Dataset-2 were used, as seen in Figure 2a. For an extreme imaging environment, a subset of Dataset-2 with inconsistent overlaps was used, as seen in Figure 2b.     Dataset-2 consists of multistrip images, which were obtained by a small drone, SmartOne (SmartPlanes, Skellefteå, Sweden), with a total weight of 1.5 kg and a maximum flight time of 50 min. The images in Dataset-2 were used to evaluate the proposed method. Figure 2 and Table 2 show the detailed information for Dataset-2. Performance evaluations were conducted for both general and poor imaging environments. For a general imaging environment, all images in Dataset-2 were used, as seen in Figure 2a. For an extreme imaging environment, a subset of Dataset-2 with inconsistent overlaps was used, as seen in Figure 2b.
Remote Sens. 2020, 12, x FOR PEER REVIEW 3 of 18 poor imaging environments. For a general imaging environment, all images in Dataset-2 were used, as seen in Figure 2a. For an extreme imaging environment, a subset of Dataset-2 with inconsistent overlaps was used, as seen in Figure 2b.

Proposed Method
The proposed mosaicking method consists of two parts: tiepoint extraction and hybrid transformation modeling. Conventional mosaicking methods often include an image-blending process to mitigate the discrepancies between pixel values on overlapping image regions. However, because we focus on robustness to poor imaging environments, this additional process is not considered here. Figure 3 shows the workflow of the proposed method. The whole process for image mosaicking and evaluations was implemented in the C++ language and OpenCV library (ver. 2.4.9). For statistics and graph visualization, Microsoft Office 2016 was used.

Proposed Method
The proposed mosaicking method consists of two parts: tiepoint extraction and hybrid transformation modeling. Conventional mosaicking methods often include an image-blending process to mitigate the discrepancies between pixel values on overlapping image regions. However, because we focus on robustness to poor imaging environments, this additional process is not considered here. Figure 3 shows the workflow of the proposed method. The whole process for image mosaicking and evaluations was implemented in the C++ language and OpenCV library (ver. 2.4.9). For statistics and graph visualization, Microsoft Office 2016 was used. In the first part of our method, tiepoints were extracted to estimate pairwise transformations between adjacent image pairs. This process starts with determining matching image pairs to extract tiepoints. This is to avoid unnecessary computations on nonoverlapping image pairs. To achieve this, exterior orientation parameter (EOP)-based methods [14,20], the Delaunay triangulation-based method [13], and a graph-based method [21] can be used. From among them, we employed the EOPbased method using data obtained by a global positioning system/inertial navigation system mounted on a UAV. After determining matching pairs, tiepoints were extracted using a feature-based method. Because tiepoint extraction for UAV images has to achieve not only quickness in processing a large number of images but also robustness to changes in rotation and scale between images, we adopted the fast retina keypoint (FREAK) method, which is known among binary descriptor methods to be invariant to rotation and scale changes [22].
After tiepoint extraction, image transformations were established via hybrid transformation modeling. We divided image transformations into pairwise (image-to-image) transformations and In the first part of our method, tiepoints were extracted to estimate pairwise transformations between adjacent image pairs. This process starts with determining matching image pairs to extract tiepoints. This is to avoid unnecessary computations on nonoverlapping image pairs. To achieve this, exterior orientation parameter (EOP)-based methods [14,20], the Delaunay triangulation-based method [13], and a graph-based method [21] can be used. From among them, we employed the EOP-based method using data obtained by a global positioning system/inertial navigation system mounted on a UAV. After determining matching pairs, tiepoints were extracted using a feature-based method. Because tiepoint extraction for UAV images has to achieve not only quickness in processing a large number of images but also robustness to changes in rotation and scale between images, we adopted the fast retina keypoint (FREAK) method, which is known among binary descriptor methods to be invariant to rotation and scale changes [22].
After tiepoint extraction, image transformations were established via hybrid transformation modeling. We divided image transformations into pairwise (image-to-image) transformations and global (image-to-mosaic) transformations. Because global transformations are derived from pairwise transformations, the performance from image mosaicking depends on the accuracy of pairwise transformations [7][8][9]. In our proposed method, a pairwise transformation can be established between two transformation models: affine transformation and homography models.
Affine transformation with six degrees of freedom (DOF) can describe scale, rotation, translation, and skew between two image planes in two-dimensional (2D) space as follows: where (x, y) and (x , y ) are the image coordinates and their transformed coordinates, respectively. On the other hand, homography with eight DOF can explain general motions between two image planes in 3D space as follows: Consequently, the homography model is generally known to be more appropriate than the affine transformation model in estimation of pairwise transformations [7,9]. However, this presupposes high overlaps and well-distributed tiepoints between adjacent images, whereas small UAVs may not meet those requirements due to unstable flight attitudes and low flight altitudes. In these cases, conventional methods for estimating pairwise transformation may make it difficult to produce reliable results. Figure 4 is an example showing that the distribution of tiepoints can be biased due to a decrease in flight altitude. We can see that the features in the aircraft image are evenly distributed (Figure 4a), while the features in the UAV image are concentrated in some areas ( Figure 4b). Features can usually be extracted from textured surfaces in images, and tiepoints between adjacent images are determined by matching features of each image. Therefore, if textures are nonuniformly distributed for overlapping areas, distributions of tiepoints may also be biased. In this regard, the aircraft image would have well-distributed tiepoints for any overlapping areas if it has sufficient overlaps with other images. On the other hand, the UAV image may have biased tiepoint distributions if overlapping areas are formed on the left side with low-textured surfaces. This suggests that UAV images may have a relatively high proportion of low-textured surfaces for overlapping areas due to low flight altitudes, and thus, distributions of tiepoints may also be more biased. Largely biased tiepoint distributions would cause transformations over-fitted for some areas.
Remote Sens. 2020, 12, x FOR PEER REVIEW 5 of 18 global (image-to-mosaic) transformations. Because global transformations are derived from pairwise transformations, the performance from image mosaicking depends on the accuracy of pairwise transformations [7][8][9]. In our proposed method, a pairwise transformation can be established between two transformation models: affine transformation and homography models. Affine transformation with six degrees of freedom (DOF) can describe scale, rotation, translation, and skew between two image planes in two-dimensional (2D) space as follows: Consequently, the homography model is generally known to be more appropriate than the affine transformation model in estimation of pairwise transformations [7,9]. However, this presupposes high overlaps and well-distributed tiepoints between adjacent images, whereas small UAVs may not meet those requirements due to unstable flight attitudes and low flight altitudes. In these cases, conventional methods for estimating pairwise transformation may make it difficult to produce reliable results. Figure 4 is an example showing that the distribution of tiepoints can be biased due to a decrease in flight altitude. We can see that the features in the aircraft image are evenly distributed (Figure 4a), while the features in the UAV image are concentrated in some areas ( Figure 4b). Features can usually be extracted from textured surfaces in images, and tiepoints between adjacent images are determined by matching features of each image. Therefore, if textures are nonuniformly distributed for overlapping areas, distributions of tiepoints may also be biased. In this regard, the aircraft image would have well-distributed tiepoints for any overlapping areas if it has sufficient overlaps with other images. On the other hand, the UAV image may have biased tiepoint distributions if overlapping areas are formed on the left side with low-textured surfaces. This suggests that UAV images may have a relatively high proportion of low-textured surfaces for overlapping areas due to low flight altitudes, and thus, distributions of tiepoints may also be more biased. Largely biased tiepoint distributions would cause transformations over-fitted for some areas. For these reasons, in our method, pairwise transformation between two images is established with an optimal model selected by geometric stability analysis. For high geometric stability, homography is applied as a precision model, and for low geometric stability, affine transformation is applied as a robust model. As a confidence indicator for geometric stability analysis, overlapping area ratio (OAR) or number of tiepoints (NoT) can be used. However, these indicators cannot For these reasons, in our method, pairwise transformation between two images is established with an optimal model selected by geometric stability analysis. For high geometric stability, homography is applied as a precision model, and for low geometric stability, affine transformation is applied as a robust model. As a confidence indicator for geometric stability analysis, overlapping area ratio (OAR) or number of tiepoints (NoT) can be used. However, these indicators cannot represent tiepoint distributions. Thus, we introduced a new geometric stability indicator, tiepoint area ratio (TAR), which can simultaneously consider both overlap and tiepoint distribution. This indicator is defined as the ratio of the tiepoint area to the entire image area, as shown in Figure 5. The tiepoint area means overlapping regions affected by tiepoints and consists of Delaunay triangles formed by tiepoints. Thus, the TAR is formulated as where W and H are the width and height of the image, respectively, N is the number of Delaunay triangles, and x i 1 , y i 1 , x i 2 , y i 2 , and x i 3 , y i 3 are the image coordinates for the three vertices of the ith triangle. The criterion for the TAR in selecting an optimal transformation model was determined by correlation analysis between transformation errors and the TAR values. The experiments are covered in Section 3.1.
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 18 represent tiepoint distributions. Thus, we introduced a new geometric stability indicator, tiepoint area ratio (TAR), which can simultaneously consider both overlap and tiepoint distribution. This indicator is defined as the ratio of the tiepoint area to the entire image area, as shown in Figure 5. The tiepoint area means overlapping regions affected by tiepoints and consists of Delaunay triangles formed by tiepoints. Thus, the TAR is formulated as where and are the width and height of the image, respectively, is the number of Delaunay triangles, and ( 1 , 1 ), ( 2 , 2 ), and ( 3 , 3 ) are the image coordinates for the three vertices of the th triangle. The criterion for the TAR in selecting an optimal transformation model was determined by correlation analysis between transformation errors and the TAR values. The experiments are covered in Section 3.1. Global transformations of individual images can be established by concatenating pairwise transformations between adjacent images. Because this process may cause error propagation in pairwise transformations, an optimization method is required to minimize error accumulation. To this end, graph methods [23][24][25] and bundle adjustment methods [7,9,15,26,27] have been proposed. We adopted a modified graph method to ensure efficiency and robustness in image mosaicking. Graph methods generally consist of maximum spanning tree (MST) generation and mosaic plane selection. An MST indicates optimal image pairs that minimize error accumulation in concatenating pairwise transformations. This is generally derived from NoT as a weight [21,28]. A mosaic plane means a common 2D plane to reproject raw images. This is determined by a root image that minimizes the depth of the MST derived [23][24][25]. In general cases, graph methods would lead to satisfactory mosaicking results. However, in extreme imaging environments, they may not guarantee acceptable performance. Thus, we modified an existing graph method to consider the imaging geometry characteristics of small UAVs. For MST generation, the TAR was applied as a weight instead of NoT or OAR. This aims to prevent situations where unreliable pairwise transformations are involved in estimating global transformations. For mosaic plane selection, an image that minimizes the deformations of reprojected images was used as a mosaic plane. This avoids unnecessary mosaic deformation that can occur by selecting a largely tilted image as a mosaic plane. Mosaic deformation is calculated from the orthogonality of reprojected images as follows: where is the number of images and ′ indicates the orthogonality of the transformed th image. Orthogonality means the angle between the two image axes, as seen in Figure 6 [29]. Figure 7a,b Global transformations of individual images can be established by concatenating pairwise transformations between adjacent images. Because this process may cause error propagation in pairwise transformations, an optimization method is required to minimize error accumulation. To this end, graph methods [23][24][25] and bundle adjustment methods [7,9,15,26,27] have been proposed. We adopted a modified graph method to ensure efficiency and robustness in image mosaicking. Graph methods generally consist of maximum spanning tree (MST) generation and mosaic plane selection. An MST indicates optimal image pairs that minimize error accumulation in concatenating pairwise transformations. This is generally derived from NoT as a weight [21,28]. A mosaic plane means a common 2D plane to reproject raw images. This is determined by a root image that minimizes the depth of the MST derived [23][24][25]. In general cases, graph methods would lead to satisfactory mosaicking results. However, in extreme imaging environments, they may not guarantee acceptable performance. Thus, we modified an existing graph method to consider the imaging geometry characteristics of small UAVs. For MST generation, the TAR was applied as a weight instead of NoT or OAR. This aims to prevent situations where unreliable pairwise transformations are involved in estimating global transformations. For mosaic plane selection, an image that minimizes the deformations of reprojected images was used as a mosaic plane. This avoids unnecessary mosaic deformation that can occur by selecting a largely tilted image as a mosaic plane. Mosaic deformation is calculated from the orthogonality of reprojected images as follows: where N is the number of images and θ i indicates the orthogonality of the transformed ith image. Orthogonality means the angle between the two image axes, as seen in Figure 6 [29]. Figure 7a,b shows the mosaicking results when the mosaic plane is properly selected and when it is not, respectively.

Evaluation indicators
The proposed method was evaluated from mosaicking errors and distortions. Mosaicking errors are measured for pairwise and global transformations. Because the proposed method generates image mosaics using only tiepoints between adjacent images, mosaicking errors to evaluate geometric performance are calculated from reprojection errors between adjacent images [21,24,25]. The reprojection error is defined by the Euclidean distance between observed tiepoints and calculated tiepoints, as follows: where is the total number of tiepoints and ( , ) and (̂,̂) are the observed and calculated image coordinates for the th tiepoint, respectively.
Mosaicking distortions are measured from orthogonality differences with reference images derived by camera parameters of images. The camera parameters, including interior (focal length, pixel size, principal points, and lens distortion coefficients) and exterior parameters (positions and

Evaluation indicators
The proposed method was evaluated from mosaicking errors and distortions. Mosaicking errors are measured for pairwise and global transformations. Because the proposed method generates image mosaics using only tiepoints between adjacent images, mosaicking errors to evaluate geometric performance are calculated from reprojection errors between adjacent images [21,24,25]. The reprojection error is defined by the Euclidean distance between observed tiepoints and calculated tiepoints, as follows: where is the total number of tiepoints and ( , ) and (̂,̂) are the observed and calculated image coordinates for the th tiepoint, respectively.
Mosaicking distortions are measured from orthogonality differences with reference images derived by camera parameters of images. The camera parameters, including interior (focal length, pixel size, principal points, and lens distortion coefficients) and exterior parameters (positions and orientations), were obtained by commercial software, Pix4D (ver. 4.4.12). Figure 8 shows the mosaic

Evaluation indicators
The proposed method was evaluated from mosaicking errors and distortions. Mosaicking errors are measured for pairwise and global transformations. Because the proposed method generates image mosaics using only tiepoints between adjacent images, mosaicking errors to evaluate geometric performance are calculated from reprojection errors between adjacent images [21,24,25]. The reprojection error is defined by the Euclidean distance between observed tiepoints and calculated tiepoints, as follows: where N is the total number of tiepoints and (x n , y n ) and (x n ,ŷ n ) are the observed and calculated image coordinates for the nth tiepoint, respectively. Mosaicking distortions are measured from orthogonality differences with reference images derived by camera parameters of images. The camera parameters, including interior (focal length, pixel size, principal points, and lens distortion coefficients) and exterior parameters (positions and orientations), were obtained by commercial software, Pix4D (ver. 4.4.12). Figure 8 shows the mosaic generated by the reference images. The red lines indicate the boundaries of the individual reference images.

Criterion Determination for Hybrid Transformation Modeling
The criterion for the TAR in selecting an optimal transformation model was determined by correlation analysis using two independent single image strips in Dataset-1A and Dataset-1B. For correlation analysis between errors in pairwise transformations and values of TAR, many image pairs with different overlaps and tiepoint distributions are required. To achieve this, we created additional image pairs with different conditions from the raw image datasets through overlap adjustment. Overlap adjustment was performed by removing the outer parts of images to preserve the perspective property of the frame images. As a result, 710 image pairs and 768 image pairs were produced from Dataset-1A and Dataset-1B, respectively. Model tiepoints for estimation of pairwise transformations were automatically extracted from raw image pairs using the FREAK algorithm. Check tiepoints for evaluation of transformations were manually obtained. Table 3 shows the number of tiepoints extracted from raw image pairs. Figure 9 illustrates the number of tiepoints used for estimating pairwise transformations of overlap-adjusted image pairs. Transformations for overlapadjusted image pairs were estimated using affine transformation and homography models from model tiepoints within their overlapping areas. On the other hand, transformations were evaluated for all check tiepoints, not only in actually overlapping areas but also in truncated areas. This was intended to analyze transformation errors consistently for all overlap-adjusted image pairs. Transformation errors were measured as reprojection errors, defined by the Euclidean distance between observed tiepoints and calculated tiepoints, as follows: where is the total number of check tiepoints and ( , ) and (̂,̂) are the observed and calculated image coordinates for the th check tiepoint, respectively.

Number of Tiepoints
Model Tiepoints Check Tiepoints

Criterion Determination for Hybrid Transformation Modeling
The criterion for the TAR in selecting an optimal transformation model was determined by correlation analysis using two independent single image strips in Dataset-1A and Dataset-1B. For correlation analysis between errors in pairwise transformations and values of TAR, many image pairs with different overlaps and tiepoint distributions are required. To achieve this, we created additional image pairs with different conditions from the raw image datasets through overlap adjustment. Overlap adjustment was performed by removing the outer parts of images to preserve the perspective property of the frame images. As a result, 710 image pairs and 768 image pairs were produced from Dataset-1A and Dataset-1B, respectively. Model tiepoints for estimation of pairwise transformations were automatically extracted from raw image pairs using the FREAK algorithm. Check tiepoints for evaluation of transformations were manually obtained. Table 3 shows the number of tiepoints extracted from raw image pairs. Figure 9 illustrates the number of tiepoints used for estimating pairwise transformations of overlap-adjusted image pairs. Transformations for overlap-adjusted image pairs were estimated using affine transformation and homography models from model tiepoints within their overlapping areas. On the other hand, transformations were evaluated for all check tiepoints, not only in actually overlapping areas but also in truncated areas. This was intended to analyze transformation errors consistently for all overlap-adjusted image pairs. Transformation errors were measured as reprojection errors, defined by the Euclidean distance between observed tiepoints and calculated tiepoints, as follows: where N is the total number of check tiepoints and (x n , y n ) and (x n ,ŷ n ) are the observed and calculated image coordinates for the nth check tiepoint, respectively.  The results of correlation analysis between errors of transformation and values of geometric stability indicators, such as NoT, OAR, and TAR, are summarized in Table 4. In these results, reprojection errors increased rapidly as the values of geometric stability indicators decreased, so their relationships could be modeled as power function forms. So far, many studies have used NoT or OAR to evaluate the geometric stability between adjacent images [21,23,[28][29][30]. However, as seen in Figure 10, NoT and OAR showed relatively large uncertainties in the low geometric stability range (i.e., adjacent images with small number of tiepoints or narrow overlaps). These results indicate that NoT and OAR may not be able to reliably evaluate the geometric stability, especially between UAV images. On the other hand, TAR, which can simultaneously consider both overlap and tiepoint distribution, showed the highest correlation with reprojection errors. In addition, the TAR appropriately reflected changes in reprojection errors, even in the low geometric stability range. These results demonstrate that TAR can be used effectively as a geometric stability indicator in estimating pairwise transformations of UAV images. Consequently, the criterion for the TAR could be determined by comparing two regression models for affine transformation and homography. As seen in Figure 11, homography-based transformations had smaller reprojection errors in the high TAR range than those from affine transformation, whereas affine-based transformations had smaller reprojection errors in the low TAR range. The TAR value at the reversal point of reprojection errors was about 0.3, and the results were found to be the same for both Dataset-1A and Dataset-1B. Therefore, based on these results, we determined that a TAR value of 0.3 is a reliable criterion for selecting an optimal transformation model.  The results of correlation analysis between errors of transformation and values of geometric stability indicators, such as NoT, OAR, and TAR, are summarized in Table 4. In these results, reprojection errors increased rapidly as the values of geometric stability indicators decreased, so their relationships could be modeled as power function forms. So far, many studies have used NoT or OAR to evaluate the geometric stability between adjacent images [21,23,[28][29][30]. However, as seen in Figure 10, NoT and OAR showed relatively large uncertainties in the low geometric stability range (i.e., adjacent images with small number of tiepoints or narrow overlaps). These results indicate that NoT and OAR may not be able to reliably evaluate the geometric stability, especially between UAV images. On the other hand, TAR, which can simultaneously consider both overlap and tiepoint distribution, showed the highest correlation with reprojection errors. In addition, the TAR appropriately reflected changes in reprojection errors, even in the low geometric stability range. These results demonstrate that TAR can be used effectively as a geometric stability indicator in estimating pairwise transformations of UAV images. Consequently, the criterion for the TAR could be determined by comparing two regression models for affine transformation and homography. As seen in Figure 11, homography-based transformations had smaller reprojection errors in the high TAR range than those from affine transformation, whereas affine-based transformations had smaller reprojection errors in the low TAR range. The TAR value at the reversal point of reprojection errors was about 0.3, and the results were found to be the same for both Dataset-1A and Dataset-1B. Therefore, based on these results, we determined that a TAR value of 0.3 is a reliable criterion for selecting an optimal transformation model. Table 4. Correlation analysis results from the number of tiepoints (NoT), overlapping area ratio (OAR), and tiepoint area ratio (TAR) for Dataset-1A and Dataset-1B.

Models
NoT

Mosaicking Performance Evaluation
We analyzed mosaicking performance for both general and poor imaging environments, as seen in Figure 2. In addition, we compared the proposed method with traditional affine transformationbased and homography-based mosaicking methods. In these comparative methods, MSTs were generated by NoT and OAR, respectively, and mosaic planes were determined by root images that

Mosaicking Performance Evaluation
We analyzed mosaicking performance for both general and poor imaging environments, as seen in Figure 2. In addition, we compared the proposed method with traditional affine transformation-based and homography-based mosaicking methods. In these comparative methods, MSTs were generated by NoT and OAR, respectively, and mosaic planes were determined by root images that minimize the depth of the MSTs. These comparative methods were also implemented by ourselves in the C++ language and OpenCV library (ver. 2.4.9).
Model tiepoints for estimation of pairwise transformations were acquired using the FREAK algorithm, as explained in the experiment described above. On the other hand, check tiepoints for performance evaluation were extracted with two-step processing. We first extracted initial tiepoints using the scale invariant feature transform (SIFT) algorithm [31], which takes more processing time but allows more accurate tiepoint extraction. We then selected multiple tiepoints observed in three or more images as checkpoints. Although this extraction procedure cannot obtain a large number of tiepoints, it secures reliable tiepoints. Table 5 shows the results of tiepoint extraction. The evaluation results for the general imaging environment are summarized in Table 6, where the ranges of NoT, OAR, and TAR were calculated for optimal image pairs, and the mosaicking errors were calculated for all adjacent image pairs. The reprojection errors in the evaluation results may appear relatively larger than the results of existing studies [21,25]. However, these are due to large relief displacements by high elevation changes and low flight altitudes. Note that the existing studies mostly used images taken at high altitudes for flat areas. In this experiment, the homography model was more effective than the affine transformation model for pairwise transformation modeling, as reported in existing studies [7,9], and OAR was more appropriate than NoT for MST generation and mosaic plane selection. Consequently, from among the comparative methods, the mosaicking method using the homography model and OAR showed the best performance.
Meanwhile, the proposed method showed about two times better performance than the best of the comparative methods. In this experiment, the proposed method had to establish all pairwise transformations of optimal image pairs from the homography model, because all TAR values derived for the optimal image pairs were higher than the criterion for hybrid transformation modeling (i.e., a TAR value of 0.3). This means that the performance enhancement from the proposed method was caused by global transformation modeling. Thus, we could know that mosaicking accuracy can vary greatly, depending on how to construct the optimal image pairs. This implies that if there are image pairs with large errors between the optimal image pairs derived, they will greatly propagate the errors to all the images connected to them [32]. In fact, all cases with the same transformation model yielded the same pairwise errors while producing different global errors. Note that the relatively large pairwise error of the proposed method was due to the affine-based transformations that were excluded in MST generation. Therefore, these results conclusively prove that the proposed TAR can realistically reflect the geometric stability between adjacent images in MST generation. On the other hand, we can find many image pairs with high OAR and low TAR values from the scatter plot, as shown in Figure 12. This indicates that there are actually many image pairs with tiepoint distribution biased in a wide overlapping region [16]. In addition, we can see that the NoT-based optimal image pairs had the low OAR range. In contrast, the OAR-based optimal image pairs had the low NoT range, while the TAR-based optimal image pairs showed a balanced result between NoT and OAR. These results conclusively demonstrate our assumption that TAR can simultaneously consider both overlap and tiepoint distribution. These results were also confirmed visually by the distance errors shown in Figure 13, where only distance errors larger than 50 pixels are displayed.
Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 18 OAR as a weight and then determined mosaic planes from root images that minimize the depth of the generated MST [21,23,[28][29][30]. This approach may be reasonable, in that the image with the highest geometric stability between adjacent images is set as a mosaic plane [21]. However, this may not take into account the imaging characteristics of small UAVs that are sensitive to environmental changes. Accordingly, the existing methods may determine a relatively tilted image as a mosaic plane [9,13]. This concern was actually realized, as shown in Figure 13a,b.    The proposed method also showed the best performance in terms of mosaic distortion. The mosaicking result from the proposed method produced the smallest amount of deformation, compared with the reference result in Figure 11. This result demonstrates the effectiveness of the proposed method in mosaic plane selection. The comparative methods generated MSTs with NoT or OAR as a weight and then determined mosaic planes from root images that minimize the depth of the generated MST [21,23,[28][29][30]. This approach may be reasonable, in that the image with the highest geometric stability between adjacent images is set as a mosaic plane [21]. However, this may not take into account the imaging characteristics of small UAVs that are sensitive to environmental changes. Accordingly, the existing methods may determine a relatively tilted image as a mosaic plane [9,13]. This concern was actually realized, as shown in Figure 13a,b.
The evaluation results for the poor imaging environment are summarized in Table 7. In this experiment, the MSTs generated were the same for all methods because overlaps among the images were generally small. This can be seen in that the cases with the same transformation model yielded the same global error. Therefore, this experiment for the poor imaging environment focused on the performance of pairwise transformation modeling. The experiment results showed that the affine transformation model can provide better performance than the homography model. This is in contrast to the previous experiment results, which demonstrated our assumption that a simple transformation model would yield better results than a precision transformation model for images with poor geometric stability. Meanwhile, the proposed method produced the best performance again. This means that the proposed method could variably apply optimal transformation models through hybrid transformation modeling, and that the derived TAR value of 0.3 is valid as a criterion for optimal transformation model selection. This criterion is expected to be used in general because it was derived from independent image datasets.
The mosaicking result from the proposed method was better than the comparative methods, both quantitatively and qualitatively. The homography-based results produced large distortions and inconsistencies in the outer images with small overlaps, as seen in the red circles in Figure 14b, and the affine transformation-based results showed some inconsistencies in regions where multiple images were overlaid, as seen in the red circle in Figure 14a.

Conclusions
We developed a robust image mosaicking method that can handle problems caused by the imaging-geometry characteristics of small UAVs. In this paper, the problems were defined as insufficient overlaps and tilted images owing to unstable flight attitudes and biased tiepoint distributions from low-altitude flights. The proposed method estimated pairwise transformations with optimal transformation models selected by geometric stability analysis between adjacent images. As a geometric stability indicator, TAR was introduced to consider both overlap and tiepoint distribution simultaneously. The valid criterion for the TAR was found to be about 0.3, based on experiments with two independent image datasets. After pairwise transformation modeling between adjacent images, the proposed method estimated global transformations from the MST generated by TAR analysis and the mosaic plane selected by orthogonality analysis. The experiment results showed that the problems raised in this paper could actually occur in image datasets obtained by small UAVs and showed that the proposed method can reliably produce image mosaics for two types of image dataset obtained from general and from extreme imaging environments.
The proposed method does not require any prerequisites in image acquisition, nor any user interventions in image mosaicking. These advantages would even make it possible to mosaic UAV images obtained from a manual flight without the support of a global navigation satellite system. Accordingly, the proposed method could be widely used to quickly and correctly identify situations in sites where the use of existing spatial data and direct access are limited, such as disaster and polar regions. Meanwhile, TAR as proposed in this paper was found to be very effective in geometric stability evaluation between adjacent images. The identification of geometric stability is an important issue in many multiple image-processing techniques, such as structure-from-motion (SfM). Thus, TAR itself also has significant potential for improving many applications in photogrammetry and computer vision.