Image Mosaicing Applied on UAVs Survey

: The use of UAV (unmanned aerial vehicle) technology has allowed for advances in the area of robotics in control processes and application development. Such is the case of image processing, in which, by the use of aerial photographs taken by these aircrafts, it is possible to perform surveillance and monitoring tasks. As an example, we can mention the use of aerial photographs for the generation of panoramic images through the process of stitching images without losing image resolution. Some applications are photogrammetry and mapping, where the main problems to be solved are image alignment and ghosting images, for which different stitching techniques can be applied. These methodologies can be categorized into direct methods or feature-based methods. This paper aims to show an overview of the most frequently applied mosaicing techniques in UAVs by providing an introduction to those interested in developing in this area. For this purpose, a summary of the most applied techniques and their applications is given, showing the trend of the research ﬁeld and the contribution of different countries over time.


Introduction
The aerial mosaic has different applications, such as surveillance mapping and tracking [1][2][3][4][5], search and rescue [6,7], 3D scene reconstruction [8,9], inspection in heritage and archeological applications [10][11][12], and vegetation and forest surveillance [13][14][15]. For these applications, aerial mosaic panorama generation is applied to stitch multiple images into a single image based upon overlapped regions [16,17]. Different approaches have been developed for the stitching process, for example, direct methods (pixel-based) [18][19][20] and feature-based methods [21] or mosaicing based on registration and mosaicing based on blending [22]. In aerial panoramas, image acquisition can be performed by satellites or UAV systems, but satellite technology provides a higher coverage area than that of other systems such as UAVs [23]. One advantage is that satellite image acquisition is faster than that of UAVs [24]. However, there are important factors to evaluate for the use of satellite image acquisition: firstly, if an analysis of a specific section is required, it is necessary to check if any satellite is available for the specific coordinates or has recent information on the area of interest; additionally, if the area is small, when zooming to focus on it, the resolution will be lower in comparison to that of a medium-size UAV camera; an additional consideration is UAVs' dependence on the state of the clouds, owing to the fact that satellites are not subjected to weather inclemency, such as storms, as UAVs are; for satellite aerial panorama, the methods for mosaicing images are based on cross-correlation, Fourier-based, phase correlation, and area-based approaches [25][26][27][28][29]. In the case of UAVs, aerial image panoramas are mainly based on feature-based methods [1,3], due to their flexibility to fly in a specific

Panorama Generation
The basis for image stitching is to relate two images using a geometry model that associates the motion from one image with another; the motion that best fits this relation is the projective transformation, also called the homography matrix [36], which gives an aligned eight-parameter model preserving the straight lines [37,38]. For feature-based methods, the most acknowledged approaches are global single transformation and local hybrid transformation [39]. The sequence followed by these techniques, shown in Figure 1, generates a mosaic. The first stage is image acquisition. This can be achieved by using one camera for translational or rotational acquisition, as shown in Figure 2. This task can be performed in different ways: by using a moving camera, by using more than one camera [40][41][42] fixed on a frame to acquire multiple images at once from different angles, or by using a video camera sequence [43]. To perform the relations between the images, it is important to obtain the camera parameters, such as focal length, that are used in the perspective and projection algorithms [44,45]. The second stage is feature registration, where different features are detected and matched. These features can be: points, lines, or their combinations in general [46]. The third stage is transformation estimation. Once features are established, a register of both images is created from the features detected. Some cases lead to a mismatch between the key points. For this, different algorithms are used to search for the features with the closest distances between the images, as KD-tree, k-nearest neighbor (KNN) pattern classification, and Hamming distance [2,[47][48][49] search for the closest distance from the query location.
wherex is x in homogeneous coordinates and H ∈ R 3×3 defines the homography [49]. Different techniques are proposed to calculate the homography. In practice, robust statistical techniques are employed on a large number of matching points or lines after normalizing the data; these techniques reduce the adverse effects of noise by using the sum of squared difference method or an iterative mathematical model, such as RANSAC (random sample consensus) [50], PROSAC (progressive sample consensus) [51], or direct linear transformation (DLT), to relate the features and reduce the matching points. For featurebased methods, the most used techniques are DLT and RANSAC for their performance and robustness [52]. RANSAC uses the smallest data set possible and proceeds to enlarge this set with consistent data points [53]. The goal is to determine a set of inliers from the presented correspondences so that the homography can be estimated optimally from these inliers [52]. The fourth stage is the warping or stitching phase, where the images are overlapped to stitch together as one. After the matching, the not overlapped region has reprojection errors. In order to solve this problem, the algorithm of bundle adjustment is used. Bundle adjustment is the problem of refining a visual reconstruction to produce jointly optimal 3D structure and viewing parameter (camera pose and/or calibration) estimates. Optimal means that the parameter estimates are found by minimizing some cost function that quantifies the model fitting error and jointly that the solution is simultaneously optimal concerning both structure and camera variations [54]. This optimization problem is usually formulated as a nonlinear least squares problem, where the error is the squared L 2 norm of the difference between the observed feature location and the projection of the corresponding 3D point on the image plane of the camera [55]. The image composition is the last stage, where, when the illumination and brightness of the images stitched may not be continuous, different algorithms can be applied to postprocess the image and blend the mosaic images as one. A method based on the use of gain compensation and multiband blending is proposed in [33]. Gain compensation adjusts the intensity of the mosaic by computing the local mean brightness of the image. Nevertheless, simply adjusting the gain to give all regions the same medium intensity will tend to reduce the intensity in regions with high brightness and increase the dark or low-intensity regions [56]. Multiband image blending is proposed in [57], and it is one of the most popular applications for image fusion due to its easy implementation and its advantage of being insensitive to misalignment. The basic idea of this process is to decompose the original image into a pyramidal representation and blend the images at each level [58,59]. Another approach is presented in [60], with a variant of Gaussian function as the weighting function, and it proposes improved implementation and improvement of the weighted mean method to eliminate the edges.

Stitching Methods
Feature-based methods are also algorithms that extract common features or descriptors from an image that define them, being the most common features used: points, lines, edges, corners, pixels, colors, histograms, or geometric entities [61]. These are extracted from features and compared and matched to their characteristics. These methods have a significant advantage over direct pixel-by-pixel methods, in which the relation is determined by directly minimizing pixel-to-pixel dissimilarities [21]. The feature-based methods can be divided into two categories: the global single transformation, where the main processes are feature detection and registration to perform the global projective transformation, and the local hybrid transformation.

Feature-Based: Global Single Transformation
The feature descriptors must have different characteristics and must be found throughout the image so that the points of coincidence in both images are distinguished. There must be a high number of descriptors; in case of geometric changes, the identifiers can relate images efficiently. Among the most used feature algorithms are the Harris Corner Detector [62], FAST [63], ORB [64], BRIEF [65], BRISK [66], SIFT [67], and SURF [32].

Harris Corner
The Harris Corner Detector [62] was one of the first feature detection methods and it is based on the Moravec Corner Detector. This method uses a small window to scan in different directions for changes in the average light intensity of the image; then, the center point of the window is extracted as a corner point, shifting the window. Should there be a flat region, it will show no change of intensity in all directions. If an edge region is found, then there will be no change of intensity along the edge direction. However, if a corner is found, then there will be a significant change of intensity in all directions [68].
The corresponding eigenvalues provide the actual value amounts of these increases. λ 1 and λ 2 are the eigenvalues of matrix M. Then, the corner, edge, and flat area of the image can be computed from the eigenvalues as follows: • Flat area: both λ 1 and λ 2 are very small. • Edge: one of λ 1 and λ 2 is smaller and the other is bigger. • Corner: both λ 1 and λ 2 are bigger and are nearly equal.

SIFT
One of the feature methodologies most widely used for its performance is SIFT (Scale Invariant Feature Transform) [67]. This low-level feature methodology has the advantage of being robust to occlusion, clutter, and noise with a good quantity of key points generated for even small objects [69]. SIFT uses a sequence of four stages. An image pyramid is constructed by repeatedly convolving input images with Gaussians, including a set of scalespace images, shown on the left, and subtracting the adjacent Gaussian images to produce a difference-of-Gaussian (DoG) pyramid. The scale space is constructed by convolving an image repeatedly using a Gaussian filter, which changes the scales and groups the outputs into octaves [67,68]. After the scale-space construction is complete, DoG images are computed from adjacent Gaussian-blurred images in each octave [21].

FAST
The Features from Accelerated Segment Test (FAST) [63,70] is a corner detection method which can be used to extract feature points and later used to track and map objects in many computer vision tasks. A corner detector should satisfy the following criteria: consistent, insensitive to the variation of noise, detected as close as possible to the correct positions (accuracy), and fast enough (speed) [69]. The segment test criteria operate by considering a circle of sixteen pixels around the corner candidate feature p. The original detector classifies p as a corner if there is a set of n contiguous pixels in the circle which are all brighter than the intensity of the candidate pixel p plus a threshold t or all darker than I p minus t [71].

ORB
The feature matching ORB (Oriented FAST and Rotated BRIEF) algorithm is a descriptor method comparable to SIFT, with low cost and high speed; it is based on BRIEF (Binary Robust Independent Elementary Features) and FAST. One disadvantage of FAST is its lack of an orientation component. For this, ORB uses a multiscale image pyramid that consists of a sequence of images with different resolutions. After locating the key points, ORB assigns an orientation to each key point depending on its level of intensity. BRIEF takes all key points found by the FAST algorithm and converts them into a binary feature vector so that together they can represent an object. A binary feature vector-also known as a binary feature descriptor-is a feature vector that only contains 1 and 0. To sum up, each key point is described by a feature vector which has 128-512 string bits [64,65].

SURF
Speeded Up Robust Features (SURF) is a scale and rotation invariant feature interest point detector and descriptor proposed by [32]. This algorithm has advantages over previous systems, such as SIFT, because it presents similar results of matching points, but its calculations are faster. The approach for interest point detection uses a basic Hessian matrix approximation by relying on integral images for image convolutions: the Hessian matrix He(x, σ) in x at scale σ as the convolution of the Gaussian second-order derivative, with the image I in point x, and similarly for Lxy(x, σ) and Lyy(x, σ) to calculate the determinant of the Hessian matrix. These approximate second-order Gaussian derivatives are evaluated at a very low computational cost using integral images, and regardless of size, they allow fast calculation.

BRISK
The Binary Robust Invariant Scalable Keypoints (BRISK) algorithm [66] is a feature point detection and description algorithm with scale invariance and rotation invariance. It constructs the feature descriptor of the local image through the grayscale relationship of random point pairs in the neighborhood of the local image and obtains the binary feature descriptor. The key concept of the BRISK descriptor makes use of a pattern used for sampling the neighborhood of the key point. Two subsets of distance pairings are defined: one each for the short-distance and long-distance pairings, S and E, respectively. BRISK loses information about the image colors, which can provide more key points for matching points. Owing to this reason, a CBRISK algorithm is proposed to maintain the information of the RGB color channels [72]. To decrease computation time, the SBRISK development shifts the binary vector rather than rotating the image pattern or constellation, as many other descriptors do [73].

Feature-Based: Local Hybrid Transformation
Feature-based panorama generation based on global single transformation has shown good results for pure rotational moves and planar scenes, but in real practice, this condition is rarely satisfied due to movement of the UAV, as shown in Figure 3 [41]. Therefore, ghosting effects frequently happen when the images are aligned. Moreover, the parallax problem remains due to the move of the optical center [74]. Local hybrid transformation is where mesh-based alignment is reviewed, since it is complemented by the other methodologies [61]. Mesh-based alignment divides images into uniform meshes. Each mesh corresponds to an estimated transformation where there are two regions: the overlapped region, which is aligned by the projective transformation, and the nonoverlapped region, which is generally warped by using a similarity transformation by calculating the local homography model to avoid potential distortions.

APAP
One mesh-based algorithm is proposed by Zaragoza [74]. Their algorithm, named As Projective As Possible (APAP), is based on the DLT used to calculate the global homography. Instead, they calculate location-dependent homography (local homography) using moving DLT (MDLT); this produces flexible warps but also maintains the global homography as much as possible. Given the estimated H to align the images and arbitrary pixel at position x * in the source, image I is warped to the position x * in the target image I by: The result shows an overlapped mesh, as the horizontal lines are reserved, reducing the parallax error.

SPHP
As previously presented, the APAP result is a global projective warp with the problem of shape/area distortion in the nonoverlapping area; part of the image is stretched and nonuniformly enlarged. This problem is produced for the single perspective with a wide FOV; for this reason, a multiperspective warp is employed in [75]. Based on a projective warp for the overlapped areas and a similarity warp for the nonoverlapped section, we have the shape-preserving half-projective warp (SPHP).
For R L the transformation, the projective transform goes from H(u, v) → S(u, v), which reduces the distortion images generated from the projective transform.

AANAP
The global similarity transform performed by SPHP may result in a mismatch if the overlapped region contains distinct image plans due to the use of all points to obtain the similarity transform. Due to this, an optimal similarity transformation is proposed in [76]. Between the target and the reference images, the process begins with the feature points' matches, and then extrapolation between the nonoverlapping areas using homography linearization occurs. The resultant image has fewer perspective distortions than the result using APAP. Once the global similarity transform is calculated, it is used to mitigate the perspective distortion, using it as a warp on the target image.
The local homography is represented by i represents the updated local transformation, and S is the global similarity transform.

Aerial Panorama Applications
In the previous section, an introduction of the most used feature-based algorithms was shown. In this section, a résumé of aerial panoramic applications is presented; these applications were developed to generate aerial panoramas as the principal task or to use the stitching methodology as a complementary method for a different application.

Harris Corner
Harris Corner is still a widely used method for its low computational cost. New improvements have been proposed, such as applying a prefilter of the characteristic points detected using Harris Corner, to reduce the ghosting and luminance problems [77]. Another proposal is an improvement replacing the Gaussian Window function for a B-spline function; then, the corner points are preselected to obtain candidate corners, and an autoadaptive threshold method improves the adaptability of the algorithm. Another approach of Harris corner improvement is applying an adaptive nonmaximal corner suppression algorithm to reduce the pixels that cannot be corners. The local representative corners are retained, which reduces the corner detection time by 30.2%, improving the stitching speed [78]. The use of distinct algorithms on the matching process enhances the methodology: as an example, by applying Harris corner in a correlation on the registration, the accuracy and robustness of aerial panoramas are improved [79]. Another proposal is to combine it with another feature algorithm, such as SURF, in one process, making it possible to achieve a more robust algorithm than a simple Harris corner, as proposed in [80]. In Table 2, a résumé of Harris Corner applied on UAV examples is presented.  [79] Efficiency and accuracy are improved by registration constraint.

SIFT
SIFT is one of the most used algorithms for a scaled invariant detector. Although it is efficient at detecting matching points, the time needed to compute operations is its disadvantage. To improve the processing time, different approaches are proposed, such as the one presented in [81]. They propose a binary local image based on SIFT, reducing the complex operations and speeding up almost 50% faster than the original algorithm. An improved SIFT method called AH-SIFT is proposed in [29]. In this method, the descriptor performs more efficiently than the original SIFT, undergoing various levels of geometric and photometric transformations. An optimized projective transformation method to improve SIFT thermal infrared images is proposed using M-least squares to join images obtained by uncooled thermal infrared video in [82]. An approach presented in [83] enhances the speed estimation of a drone and adjusts the velocity of image acquisition, reducing the ghosting effects. A global motion model is used to predict the overlapped region based on the world coordinate frame. Then, SIFT stitching is applied and image quality is evaluated based on gray relational analysis, improving the accuracy [84]. Using a graphics processing unit (GPU), an implementation called the CUDA-SIFT (Compute Unified Device Architecture) approach [85] achieves real-time mosaic generation and tracking. Another approach presented by [82] is a SIFT stitching process based on random M-least square algorithm and super-resolution processes. Some applications use the SIFT process for an earthquake rescue system, where the image mosaicing is used for an image earthquake damage degree (EDD) analysis. This is performed by evaluating the gray level co-occurrence matrix (GLCM) features along with coarseness, contrast, metric, and filters to analyze the EDD [86]. Following the disaster evaluation developed in [87], a methodology to evaluate open-source systems for Urban Search and Rescue (USaR) is used to determine the location of possible trapped victims for fast 3D modeling of fully or partially collapsed buildings using images from UAVs. In the inspection area, some applications use the SIFT mosaicing approach for inspecting photovoltaic systems (PVs) using UAVs with thermal cameras to record videos using GPS for the trajectory. These images will be used to generate a high-resolution image of a PV zone by using SIFT [88]. A measurable aerial panorama based on panoramic images and multiview oblique images is proposed in [89], and it is divided into major stages: projection, matching, and back projection. The stitching process applied is the SIFT methodology to stitch the projected aerial panorama with a down-looking oblique image and the aerial panoramic image after matching the images by their proposed method. Table 3 shows some of the most recent implementations of the SIFT algorithm in UAVs. A FAST 3D modeling of fully or partially collapsed buildings using images from UAVs for the Urban Search and Rescue task is proposed.

FAST
In comparison with Harris Corner, the FAST algorithm can detect more features at the same time, yet, compared with SIFT, the number of features detected is less than half. This can lead to the wrong assumption that SIFT is better than FAST. Nevertheless, in processing time, FAST accomplishes feature detection with simple operations that make it faster than SIFT [90]. This has to be considered to choose the best fit implementation according to the application. A SIFT variation using FAST in each pyramid instead of DoG is proposed in [91]. Such an approach achieves a robust and faster algorithm with more features than what is achieved just using FAST and faster than SIFT. As aforementioned, FAST has the advantage of speed computation; for this case, [92] proposes a real-time application using the FAST feature detector with the correspondence algorithm Bag-of-Word (BoW) to improve the time correspondences compared to the brute-force matching algorithm. Sometimes the stitching process uses multitemporal images, which present more changes in lighting and contrast than when applying any of the feature detection methods. The process will have errors due to the change in the grayscale. Reference [93] applies the use of phase congruency (PC) to maintain the image structure, regardless of the change in the grayscale, once the PC images are obtained. A crowd density estimated by jointly clustering analysis is presented in [94], where two versions of FAST are tested to detect the crowd features. The filtering procedures are used to eliminate the feature points which did not belong to crowd features. Some applications of the algorithm used with drone images are presented in Table 4. Table 4. FAST applied on UAVs.

Author Advantage
T. Botterill, S. Mills, R. Green [92] Images are registered and stitched together seamlessly in real time.
X. Zhang, Q. Hu, M. Ai et al. [93] By applying phase congruence, the images are stitched evenly with color changes and illumination.
Ali Almagbile [94] Accuracy of FAST-9 and FAST-12 methodology, compared in terms of completeness and correctness, is improved.

ORB
As previously stated, ORB methodology has the advantage of speeding up computation compared to most of the feature-based methodologies. As an example is the application of the aerial image mosaicing process based on ORB to remove the mismatch from thousands of putative correspondences by applying locality-preserving matching (LPM), cited in [95]. Another approach based on Bayesian frameworks aims to formulate it as a maximum likelihood problem and solve the geometric algorithm using the expectation maximization (EM) algorithm. To reduce the matching process, principal component analysis (PCA) is used, reducing dimensions and facilitating the feature extraction process without compromising accuracy, as shown in the root mean square errors (RMSE) results [96], improving the time process by using a GPU with CUDA, obtaining a faster matching process compared to SIFT and SURF. Other developments in the ORB methodology may concur in the implementation of techniques to relate the features; in the case of [97], a preprocess phase correlation method is used to obtain the overlapping area between the to-be-stitched image and the reference image, reducing the feature calculation. Then, using Hamming distance, the relation between the image matching points is improved compared to the classical ORB methodology. Similarly, using a mask to register local clustered ORB features and nonmaximal suppression to remove clustered points, only the feature point with the largest response value is retained [98]. Hamming distance is used for the matching step, and finally, PROSAC is applied to eliminate the wrong matches and calculate the transformation matrix between images. The result is an improvement on the correct matching points, slightly less than that of SIFT and almost the same speed as classic ORB. Table 5 presents a summary of the implementations using ORB. The methodology reduces the calculation time of completing the reconstruction of the panorama compared to SIFT and classic ORB.

SURF
SURF-based aerial panoramas are attractive for their accuracy, comparable to SIFT in a shorter period of time. An example is the implementation of the process on workflow technology and a geoprocessing workflow tool called GeoJModelBuilder as a four-step process. First is detecting and registering; second is KNN for matching points [99]. RANSAC is used for transforming estimation and finally warping all images to the same coordinate system. The workflow approach is proposed to provide users a flexible way to create a workflow to fulfill their needs. The workflows could be bound to different algorithms for better results or less time consumption. Tests of the SURF algorithm with fast approximate nearest neighbor search (FANN) feature matching [100] were carried out through ROS using Google Maps for the simulation of the panoramic images. In the tracking object case, the methodology can be used with a Kanade-Lucas-Tomasi tracker (KLT) to track a region of interest [101]. The stitching process can be used for position estimation, as presented in [102], where position estimation methodology is applied for path planning and distance calculation by the triangle similarity principle and fusion images. In Table 6 these implementations are presented. The algorithm is relatively fast compared to alignment algorithms based on SIFT feature matching with a high-quality alignment.

M. Yue, Q. Yan [102]
A real-time reconnaissance and monitoring application can achieve an accurate positioning without the need of increasing the camera accuracy.
A. Micheal, K. Vani [101] Implementing a semiautomatic object tracking method using SIFT or SURF with a high detection rate, the region of interest is specified by the user.
Z. Wu, P. Yue, M. Zhang et al. [99] The workflow approach generates an automatic mosaic of UAV images with the flexibility to edit the workflow depending on the user needs.

BRISK
Comparable with BRIEF applied in ORB, BRISK methodology outperforms SIFT and SURF in speed. With similar results and low calculation cost, this makes it ideal for UAV aerial panoramas, as shown with previous methods. An improved BRISK methodology developed to acquire reliable control points for image registration is presented in [103].
The spatial relationship is analyzed, with the key points derived from the coincidence of descriptors to eliminate the corresponding false points. This methodology proves to be 4.7 times faster than classic BRISK. The use of ground control points allows a more accurate position, as shown in [15], where ground control points are used for thermal orthomosaics generated by BRISK and an RGB camera analyzes the blooming of flowers for their apple orchard management system. This information is summarized and presented in Table 7. Table 7. BRISK applied on UAVs.

Author Advantage
C. Tsai, Y. Lin [103] The positional accuracy of the UAV orthoimage by applying the proposed image registration scheme improves the correctness of the process.
W.Yuan, D. Choi [15] The stitching speed of 100 thermal images within 30 s and RGB correlation and classification are improved.

Feature-Based: Local Hybrid Transformation
From the feature-based methodologies, the most accurate, mesh analyses, are based on SIFT and SURF features. Some methods focus on image compositing. Once the UAV obtains the aerial image, mesh-based stitching blending methods are applied to improve the panoramic result, as presented in [104], where they propose color blending based on superpixels, using simple linear iterative clusters after generating the SPHP panoramic image. Since the number of superpixels is much less than that of individual pixels, such improvement reduces the computational complexity and processing time compared to multiband blending, color transfer based on image gradients, and color matching blending. Another approach is suggested in [105] by calculating SURF and Harris Corner features to obtain the global homography. Applying PROSAC and KNN, it fuses with MDLT to improve the SPHP algorithm, reducing the ghosting on the overlapped image result. An improvement on AANAP is proposed by using superpixel methodology to improve the compositing image. After relating both images, AANAP improves the alignment accuracy and reduces the perspective distortion [15]; then, seam cutting is applied with superpixel segmentation to reduce the ghosting images, and image color blending is finally applied. To reduce the distortion generated by the global homography, the algorithms AANAP and SPHP use the similarity transform. However, in urban scenes, these algorithms cannot preserve the building lines. New developments propose mixing features' inertial navigation systems (INS) in order to improve efficiency or time processing. An indoor application for SIFT and INS is proposed by [106] for camera pose estimation, improving stitch drone-captured indoor video frames. Pose estimation can be achieved by INS to calculate the relation between image frames captured by the UAV to select the most related and reduce the number of image stitching processes. Another option is the use of SIFT to estimate the global transformation parameter. The result will accumulate registration errors and disregard multiple constraints between images [107] to improve the stitching performance. A shape-preserving transform is used to preserve the geometric similarity before reprojecting, which attempts to retain the shapes of local regions and use multiband fusion to process the gain compensation and obtain a natural-looking panoramic image. A matching improvement is proposed in [108] by using the grid-based motion statistics (GMS) algorithm as a means of encapsulating motion smoothness as a statistical likelihood of having a certain number of feature matches between a region pair and removing the mismatches for applying them: the RANSAC. A region-based methodology uses SIFT as the first step to obtain the global transformation, where the overlapped region is divided into small regions and multiple regions have different weights depending on the local homography [109]. Then, RANSAC is used to reduce the outliers, compared to SIFT, APAP, and AANAP. After the global projection estimation, the thin-plate spline (TPS) with a simple radial basis function type formulates the image deformation, due to its good performance in both alignment quality and efficiency, by using REW (robust elastic warping) [110]. REW is a methodology proposed by what can be regarded as a combination of the mesh-based model and the direct deformation strategy to remove mismatches. The radial distortion function allows us to create a perfect reconstruction due to its good alignment quality and efficiency [111], and then, by applying global homography, we can obtain a good effect on the nonoverlapping regions of the target image. Table 8 presents the summary of these implementations. Table 8. Mesh-based methods applied on UAV.

Author Advantage
F. Fang et al. [104] A superpixel image is generated, improving the efficiency and flexibility of the target image to reduce the color differences between the two input images.
J. Leng, S. Wang [105] The SPHP algorithm is improved, removing the ghost image of the stitched image and generating better stitching results.

Y. Zhou et al. (2019) [112]
Image stitching improves from the captured video by eliminating the ghosts caused by moving objects and object detection module, providing high detection accuracy.
Y. Yuan et al. [15] The SLIC algorithm is used to generate superpixels in the seam cutting and color blending stages, affording spatial coherency and improving the efficiency.
Q. Wan et al. [107] The local alignment model introduces parallax errors as a constraint term into the minimum energy function and uses the mesh-based deformation to accelerate the calculation.
L. Luo, Q. Wan, J. Chen et al. [113] The inaccuracy results are compared with RMS and show an improvement compared to APAP, SPHP, and REW in time processing.
Q. Xu, L. Luo, J. Chen et al. [109] The accuracy of the method is improved, compared to most used mesh analyses, and the computational cost is comparable to that of AANAP.

Discussion
The generation of aerial panoramas and mosaicing are very active fields with new approaches each year. The growing trend of this research field can be seen in Figure 4, wherein the period from 2017 to 2019 saw a significant increase in this research field compared to 2020 and the first half of 2021, where the number of articles was nearly half as much as that of the previous period. Thus, it can be assumed that the interest in this field could increase due to the approach applied in these techniques as an effort to solve some of the main problems of stitching methods and the development of different implementations for new UAV applications or improvements in their processes. Some of the issues that are addressed are time processing, matching relation, hybrid transformation to avoid the most common errors of parallax and ghosting images, and image composition by applying different methodologies and techniques based on feature registration mosaicing methods. As it can be observed in Figure 5, the implementation of mosaicing methodologies has increased by almost 46% between 2010 and 2015. In that way, the process has changed from applying the stitching technique on aerial images to implementing it on UAVs by focusing on solving the main problems related to image acquisition from a moving camera, where new solutions based on preprocessing filters are used to improve the feature detection, thus ensuring the relation for the stitching process, or by performing gain compensation techniques on the resulting mosaic image. This study found that the main contribution to this area is presented by the Asian continent, locating it as a zone of interest in this research field. For this reason, it is important to remain close to the progress generated in this region, as shown in Figure 6. As it can be noticed, another region whose progress is important to follow is the American continent, which has remained active in this research field. From these regions, the countries that have contributed the most are presented in Figure 7. The first place is China, followed by India, Taiwan, and Korea from Asia. In the case of America, the USA is the country that has contributed the most in this region.  This may be due to the development of new applications based on the mosaic of aerial images applied in a wide variety of areas. China has been one of the main countries involved in the growth and development of technologies but also one of the largest producers of drones in the industry. They have innovated in introducing this system to daily activities, where the use of these techniques is applied in tasks such as surveillance and monitoring in large areas, but this process can be used for more applications, such as: photogrammetry for archaeology and heritage maintenance, agricultural and forestry surveillance, civil engineering, digital elevation models and 3D mapping, rural roads, geological infrastructure, road information, urban terrain reconstruction, air analysis and pollution (for environmental awareness), urban configuration, and environmental monitoring, among others, as shown in Figure 8.  Among the main objectives of this review is to present the most widely applied mosaic techniques of aerial panoramas in drones and, as presented in the previous sections, the feature-based approaches are the most implemented methodology, as shown in Figure 9. More specifically, the approaches based on estimation of the global single transform features, even more than the local hybrid transformation, are more recent methods with some advantages over the earlier methods. However, this may be related to the fact that these methodologies are faster, more documented, and, in some cases, have a low computation cost compared to the local hybrid transformation approaches.

80% 20%
Global Single Transform Local Single Transform From global single transform feature revision, the most implemented methodologies are the Harris corner, SIFT, FAST, ORB, SURF, and BRISK; these approaches are the most implemented in UAVs. The comparison made between these methods shows that the classic ORB is one of the most applied methodologies due to its speed above other classic methodologies, which makes it a good choice for real-time applications. However, a disadvantage of this method is its low accuracy compared to that of other methods, such as FAST, in which it is based, and BRISK, which outperforms in cases of rotation and fast scale changes. Even so, the SURF technique is implemented more often because of its performance compared to that of SIFT and its speed close to that of ORB; however, if speed is not a key factor but accuracy and robustness are desired, the most applied method is SIFT, as shown in Figure 10, where there are new approaches based on GPU and computing algorithms show an improvement in the speed of SIFT application. In the local hybrid transformation area, the most notorious algorithms are SPHP and APAP, in which SPHP has the advantage by applying local hybrid transformation with similarity transformation to reduce the distortions and preserve the similarity constraints ( Figure 11). New approaches based on improvements in the matching and blending methods were considered within the aforementioned categories. As it can be observed from the tables, feature registration is the most implemented in SIFT, SURF, and ORB for recent development. The same applies for SPHP and AANAP methodologies. In Figure 12, a plot of the principal feature-based transform methodologies implemented on UAVs is presented.
As it has been presented, since 2017, there has been an increase in this research area, mainly undertaken by universities and research centers, where, it should be pointed out, the main works are carried out in Asia. Nonetheless, many countries have joined this research field, in which new proposals explore the use of machine learning-based systems or artificial intelligence (AI)-based techniques. This does not mean that classic methodologies are outdated, since new approaches propose the combination of feature-based techniques with different algorithms, such as LPM, SR, PCA, and REW, which provide a more robust and efficient methodology. It is interesting to think about the new developments that can be generated in combination with different areas of image processing.

Conclusions
As shown in this work, aerial images have been used in many fields, and only in the last decade have new methods been developed, delivering great progress in the panoramic image field. New approaches and methodologies were proposed in this work for different applications. The evolution of this field has attracted the attention of different researchers, where China is one of the countries that has contributed more heavily. More countries are joining in the development of new mosaic-based techniques improving panoramic aerial images by exploring different approaches. These improvements can be faster and more accurate for the generation of aerial images or have more complex applications, such as that of surveillance and tracking systems focused on solving specific tasks, by applying mosaic generation with more algorithms. The results submitted here show the trend of featurebased algorithms. They are based on machine learning and AI together with these methods in order to improve the generation of image mosaics by correcting the errors generated when joining images. The area of drone application has permitted image mosaicing to gain more attention for new developments, which allows for increasing development in the mosaicing of images.