A Rapid Aerial Image Mosaic Method for Multiple Drones Based on Key Frames

Wu, Xiuzhen; Qi, Yahui; Qin, Liang; Yan, Shi; Zhang, Jianxiu

doi:10.3390/automation7020043

Open AccessArticle

A Rapid Aerial Image Mosaic Method for Multiple Drones Based on Key Frames

by

Xiuzhen Wu

^†,

Yahui Qi

^†

,

Liang Qin

^*,

Shi Yan

^* and

Jianxiu Zhang

^*

The Third School, Naval Aeronautical University, Yantai 264000, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Automation 2026, 7(2), 43; https://doi.org/10.3390/automation7020043

Submission received: 21 January 2026 / Revised: 28 February 2026 / Accepted: 3 March 2026 / Published: 5 March 2026

(This article belongs to the Special Issue Inventory Monitoring and Control Through High-Level Coordination of Drone Swarms)

Download

Browse Figures

Versions Notes

Abstract

Due to their advantages of being low-cost, lightweight and flexible, and having wide shooting coverage, UAVs have played an important role in situational awareness in the fields of disaster prevention and mitigation, urban planning and management, etc. In these applications, UAV aerial photography is limited by the field of view, and high-definition panoramic images of the complete target area cannot be obtained. Image mosaic technology is essential, but an image mosaic using only a single UAV cannot meet the high real-time requirements for situational awareness. In response to the above problems, this paper proposes a multi-UAV fast aerial image mosaic method based on key frames. First, the multi-UAV area coverage flight strategy is determined according to the size of the task area and the UAV flight parameters; then, the field of view of the pod, the flight speed, and the flight altitude are used to determine the key frame extraction time period during the UAV aerial photography process. The image matching-rate calculation method is designed and the key frames are extracted during the extraction time period, and the key frames are returned to the ground visual puzzle system; in the ground visual puzzle system, the improved Laplacian pyramid method is used to quickly fuse and stitch the key frames extracted by each UAV to obtain a panoramic stitched map. The experiment shows that the method can quickly obtain high-precision real-scene map information of the task area. Compared with the single-UAV method and the multi-UAV full video stream-splicing method, this method greatly reduces the consumption of computing power and the requirements of communication bandwidth and improves the efficiency and real-time performance of panoramic map acquisition.

Keywords:

multi-UAV; regional coverage; key frame; image mosaic; real scene map

1. Introduction

An Unmanned Aerial Vehicle (UAV) is a kind of unmanned aerial platform managed by a remote control or pre-programmed flight control system. With the advantages of convenient operation, strong maneuverability, economic efficiency and rapid response [1,2,3], UAVs have been widely used in many military and civilian fields [4,5,6]. Their mission scenarios include border reconnaissance, agricultural monitoring, urban surveying, post-disaster rescue, environmental protection supervision, and infrastructure inspection [7]. Remote sensing technology based on UAV platforms can quickly obtain high-resolution images of the target area, providing an intuitive visual reference for relevant decision-making [8]. Compared with traditional satellite remote sensing, UAV remote sensing is less constrained by meteorological conditions, responds more quickly, and obtains images of a higher spatial resolution. It has become a practical technology closely related to production and life [9,10].

However, due to the limited field of view of the airborne camera and the operating height of the UAV, the range covered by a single UAV image is limited, and it is difficult to present the overall information of the target area. Therefore, in order to obtain a panoramic view with a large field of view while maintaining a high resolution, aerial image mosaic technology is required to seamlessly integrate two or more UAV remote sensing images with overlapping areas [11,12].

To construct a panoramic image of the task area, it is usually necessary to plan the coverage flight path of the UAV and implement an image mosaic. The traditional single-UAV area coverage mode is only suitable for small-scale detection tasks. When facing a wide area, if a single UAV is used to obtain the coverage, there are obvious shortcomings; the process is time-consuming and inefficient [13]. Multi-UAV collaborative area coverage refers to the use of multiple UAVs working in parallel to jointly complete the area coverage detection. Compared with the single-machine mode, multi-machine collaboration can significantly shorten the task time and improve the operation efficiency.

However, when using multi-UAV systems for coverage flight and image mosaic, there are also challenges such as the high bandwidth requirements for multi-channel video stream data transmission and large consumption of computing resources for stitching algorithms. To this end, this paper proposes a multi-UAV fast aerial image mosaic method based on key frame extraction. Our method aims to alleviate the computing power pressure and transmission bandwidth bottleneck in the multi-channel image mosaic process through an efficient key frame screening mechanism so as to provide a practical technical strategy for using multi-UAV systems to quickly construct panoramic images of large areas. Finally, the feasibility and effectiveness of the proposed method are verified through experiments. The main innovation of this paper lies in the proposed architecture and method for a fast UAV aerial image stitching system based on keyframes, which mainly includes a regional coverage strategy, a keyframe extraction algorithm, and an image fusion method. It focuses more on application innovation and engineering implementation innovation.

2. Related Work

At present, UAV aerial image mosaic technology usually includes three core steps: image preprocessing, image registration and image fusion. The academic community has carried out extensive research on this topic. Early image registration methods, such as the scheme implemented by Reddy et al. based on fast Fourier transform, can handle simple translation, rotation and scale transformation, but are not suitable for scenes with complex motion and a low overlap rate [14]. Sawhney et al. proposed a corresponding global optimization stitching strategy based on the image topology structure [15]. In terms of feature detection algorithms, the Harris corner detection algorithm has attracted attention due to its small amount of calculation and its selectivity of feature points, but its scale invariance is poor [16]. To overcome this defect, Lowe proposed the scale-invariant feature transform algorithm, which shows good robustness to the translation, rotation, scaling and brightness changes of the image. Subsequently, a fully automatic image mosaic algorithm based on SIFT (scale-invariant feature transform) was proposed, which improved the level of automation by sorting images and removing irrelevant images through a probabilistic model [17,18]. He Jing applied the SIFT algorithm to an aerial image mosaic and adopted a block strategy to reduce the cumulative error. In order to further improve the speed, Bay et al. proposed the Speeded-Up Robust Features (SURFs) algorithm based on SIFT, which uses integral images and Hessian matrix to accelerate the calculation, and significantly improves the calculation efficiency while maintaining strong robustness [19]. Zhang et al. combined the SURF algorithm and designed a fast stitching method and global optimization strategy for low-altitude remote sensing images of UAVs, which effectively reduced the cumulative error [20].

Another representative algorithm is the ORB algorithm proposed by Rublee et al. [21], which integrates direction information to achieve fast feature extraction, but its feature detection is based on fast corner detection, and there are certain limitations in terms of its robustness [22]. Liu et al. used the ORB algorithm for UAV aerial image mosaic, which improved the speed, but the stitching accuracy was lost [23]. In addition, Xu Yaming et al. optimized the stitching effect of aerial images by improving the traditional method based on stitching lines.

In terms of using auxiliary information, Bond et al. proposed a preprocessing method based on the error estimation of flight attitude parameters and optimized the transformation function for each image through pattern search to assist registration [24]. Wang Xiaoli et al. used multi-resolution technology to perform color equalization processing on images with large exposure differences to improve the fusion effect. Deng Tao et al. tried to combine the advantages of SIFT and Harris algorithms in their research. Li Wenwen et al. realized an image mosaic based on state data by correcting the flight state data of the UAV.

In recent years, researchers have also been committed to improving the robustness and efficiency of the algorithm. Lati et al. proposed a robust stitching algorithm that is insensitive to scale and illumination changes and suppresses outliers based on fuzzy theory [25]. Rui et al. accelerated the SIFT algorithm for registration and combined image segmentation technology to eliminate motion ghosting so as to achieve fast and high-quality stitching [26]. In addition, the technology has also been commercialized, and well-known software products such as Pix4UAV (Version 4.5.6) and ICE have emerged, but their core codes are usually not open source, which limits further customized development.

After decades of development, the current technology has become more mature in the precise registration and natural fusion of two images. However, there are relatively few studies on aerial image mosaic for large-scale areas, and there are even fewer studies that combine it with multi-UAV collaborative coverage flight. This study starts by addressing the key issues such as multi-UAV collaborative coverage path planning, multi-channel video stream data processing, and efficient stitching task execution; carries out relevant research; and proposes the aforementioned fast stitching method based on key frames; finally, it verifies the effectiveness of the scheme through actual flight experiments.

3. Proposed Methodology

As shown in Figure 1, the multi-UAV rapid image mosaic method based on key frames proposed in this paper is described as follows: the number of UAVs is determined according to the size of the task area and the size of the UAV pod field of view, and then the UAV flight path is planned at the ground station to form a regional coverage strategy. The task area is covered by formation flight. During the UAV flight, the UAV determines the key frame extraction time period during the UAV aerial photography according to the size of the pod field of view, the flight speed and the flight altitude. During the extraction time period, the image matching-rate calculation method is designed to extract the key frames, and the key frames are returned to the ground image mosaic system. The ground image mosaic system uses the improved Laplacian pyramid method to perform image fusion and mosaic on the key frames and quickly forms a panoramic map and conducts a real-time assessment of the task area.

3.1. Regional Coverage Strategy

As shown in Figure 2, assuming that the flight altitude of the UAV is

h

, the horizontal field of view angle of the UAV pod image is

α

, and the vertical field of view angle is

β

, then the distance of the horizontal coverage area of the image field of view is:

k = 2 h \tan \frac{α}{2}

(1)

Suppose the length of the image mosaic task area is

L

and the width is

K

. In order to ensure that the image mosaic can be realized

τ

, the overlap rate of two adjacent UAV images is set so that if the number of UAVs is

n (n \geq 1)

, then the following should be satisfied:

n \cdot k - (n - 1) \cdot k \cdot τ \geq K

(2)

From Equation (2), we can get:

n \geq \frac{K - k τ}{(1 - τ) k}

(3)

Substituting Formula (1) into Formula (3), we get:

n = ⌈\frac{K - 2 h τ \tan \frac{α}{2}}{2 h (1 - τ) \tan \frac{α}{2}}⌉

(4)

After the number of drones has been determined, the drone route can be planned at the ground station to form a regional coverage flight strategy, and the task area can be covered by formation flight. During the flight, each drone extracts key frames.

3.2. Key Frame Extraction Method

During the coverage flight of the task area by the UAV formation, the UAV pod collects image information in real time. If the video stream is not processed and directly returned to the ground mosaic system for image mosaic, there will be two problems that lead to the failure of the task: one is that the communication bandwidth is limited, and the other is that the image mosaic computing power of the ground mosaic system is limited. Therefore, the video stream information must be preprocessed at the UAV sky end. The method proposed in this paper is to extract the key frames of the video stream information through the airborne processor and return the key frames to the ground mosaic system for image mosaic. The main idea of key frame extraction is that, taking the collection time of the previous key frame image as the initial time, according to the flight altitude, flight speed, pod field of view angle, the maximum overlap and minimum overlap required for the mosaic of two images, the initial time and end time of the new key frame extraction are calculated. Within the initial and end time periods, the matching-rate detection is performed on the pod image and the previous key frame image, and the matching rate that meets the requirements is recorded as a new key frame.

3.2.1. Key Frame Extraction Time Determination Method Based on Overlap

As shown in Figure 2, the distance of the area covered by the vertical direction of the UAV pod image field of view is:

l = 2 h \tan \frac{β}{2}

(5)

Assuming that the flight speed of the UAV is

v

, the maximum overlap required for the stitching of two images is

ε_{\max}

, the minimum overlap is

ε_{\min}

, and the time at which the previous key frame image is determined is recorded as the zero time, then the starting time of the new key frame extraction is:

t_{s t a r t} = \frac{l (1 - ε_{\max})}{v}

(6)

The end time of the new keyframe extraction is:

t_{e n d} = \frac{l (1 - ε_{\min})}{v}

(7)

3.2.2. Key Frame Selection Method Based on Matching-Rate Detection

After the start time

t_{s t a r t}

and end time

t_{e n d}

of the new key frame extraction are determined, the pod image frame can be read in the time period

[t_{s t a r t} t_{e n d}]

, and the feature points are matched with the previous key frame. The size relationship between the feature point matching rate

λ

of the two images and the threshold

\bar{λ}

determines whether the image frame is a key frame.

The previous key frame image is recorded as

I_{l a s t}

, and the determined moment

I_{l a s t}

is the zero moment

t_{0}

. The specific steps for determining the new key frame

I_{n e w}

are as follows:

Step 1: Read the image frame of the pod, which is recorded as

M_{i} (i = 1, 2, \dots)

, and the time is recorded as

t_{i}

;

Step 2: If

t_{i} < t_{s t a r t}

,

i = i + 1

, repeat Step1; if

t_{i} > t_{e n d}

and

i > 1

,

K_{n e w} = M_{i - 1}

,

K_{l a s t} = K_{n e w}

, jump to Step 5, the loop ends; if

t_{i} \geq t_{s t a r t}

and

i > 1

, jump to Step 3.

Step 3: Extract and match feature points of the image

M_{i}

and

K_{l a s t}

to obtain the matching rate

λ

;

Step 4: If

λ \geq \bar{λ}

,

K_{n e w} = M_{i}

,

K_{l a s t} = K_{n e w}

, jump to Step 5, the loop ends; if

λ < \bar{λ}

,

i = i + 1

, repeat Step 1;

Step 5: End of loop.

The flowchart of key frame selection based on matching-rate detection is shown in Figure 3.

3.2.3. Calculation of Matching Rate λ Based on ORB Feature Points

The feature-matching ORB (Oriented FAST and Rotated BRIEF) algorithm is a descriptor method comparable to SIFT, with low cost and high speed; it is based on BRIEFs (Binary Robust Independent Elementary Features) and FAST. One disadvantage of FAST is its lack of an orientation component. For this, ORB uses a multiscale image pyramid that consists of a sequence of images with different resolutions [27].

For images

M_{i}

and

K_{l a s t}

, the ORB feature points of each image are extracted, and the number of feature points is recorded as

N_{1}

and

N_{2}

, respectively. The matching feature point pairs of the two images are found by calculating the Hamming distance of the feature points.

ORB feature points are represented by the binary descriptor BRIEF. Suppose the BRIEF descriptors of two feature points are represented by

b_{1} = [b_{11}, b_{12}, \dots, b_{1 n}]

and

b_{2} = [b_{21}, b_{22}, \dots, b_{2 n}]

respectively (

n = 256

, is the descriptor length), then the Hamming distance between the two feature points is:

d_{H a m m i n g} (b_{1}, b_{2}) = \sum_{i = 1}^{n} |b_{1 i} - b_{2 i}|

(8)

The smaller the Hamming distance, the higher the degree of matching of the feature points. Set a threshold

{\bar{d}}_{H}

. If

d_{H a m m i n g} (b_{1}, b_{2}) < {\bar{d}}_{H}

, the feature points

d_{1}

and

d_{2}

matching are successful.

Through the region search and Hamming distance threshold screening, the matching feature point pairs between the two images can be obtained, and are recorded as

(p_{j}, p_{j}^{'})

,

j = 1, 2, \dots, M

,

M

which represents the number of matching feature point pairs of the two images, and the matching rate of the two images is recorded as:

λ = \frac{M}{\min {N_{1}, N_{2}}}

(9)

3.3. Improved Laplacian Pyramid Image Fusion Method

As shown in Figure 4, the key frames obtained by the airborne processor of each UAV are transmitted back to the ground visual puzzle system for image mosaic, and the panoramic image can be obtained after the key frames are stitched and fused.

In the process of image mosaic of key frames, the key step is to deal with the seam of the overlapping area of the two images. The core of eliminating the seam is to make the overlapping area transition smoothly through the fusion strategy. The methods of eliminating the seam can be divided into traditional fusion methods and deep learning-based fusion methods. The traditional fusion methods include weighted average, Laplacian pyramid, Poisson fusion, optimal seam line, etc. The deep learning-based fusion methods include generative adversarial network (GAN) fusion, attention mechanism fusion, end-to-end stitching and fusion, etc. The advantage of the traditional method is that the calculation amount is small and the real-time performance is good. The advantage of the deep learning-based method is that the seam stitching effect is good, but the calculation amount is large and the real-time performance is poor. In order to ensure the real-time performance of multi-UAV image mosaic and improve the image seam elimination quality, this paper proposes an improved Laplacian pyramid method, which greatly reduces the calculation amount by reducing the number of pyramid layers and simplifying the decomposition and reconstruction steps while ensuring a certain fusion effect.

Suppose that in the visual puzzle system, two key frame images

I_{1}

and

I_{2}

have been aligned through feature point matching and homography matrix solving, and the overlapping area between them is

Ω

. The image fusion steps of the improved Laplacian pyramid method are as follows:

Step 1: Construct the 0th layer (high-frequency layer) of the pyramid:

G_{0}^{1} = I_{1}

,

G_{0}^{2} = I_{2}

;

Step 2: Build the first layer (low-frequency layer) of the pyramid: Perform Gaussian blur on the original image once. The blur kernel adopts Gaussian convolution, with

G_{1}^{1} = G a u s s i a n B l u r (G_{0}^{1}, k s i z e = 3)

,

G_{1}^{2} = G a u s s i a n B l u r (G_{0}^{2}, k s i z e = 3)

;

Step 3: Generate fusion mask: In the overlapping area

Ω

, design the mask to control the fusion weight of the two key frame images. For the key frame image

I_{1}

, the designed mask is

M (d) = \frac{1}{1 + e^{- k (\frac{d}{W} - 0.5)}}

, where

W

is the width of the overlapping area,

d

is the distance from a point to the right edge of

I_{1}

,

0 \leq d \leq W

, and

k

is the steepness parameter; for the key frame image

I_{2}

, the designed mask is

1 - M (d)

;

Step 4: Layer-by-layer fusion: the high-frequency layer image is fused into

F_{0} = M \cdot G_{0}^{1} + (1 - M) \cdot G_{0}^{2}

, and the low-frequency layer image is fused into

F_{1} = M \cdot G_{1}^{1} + (1 - M) \cdot G_{1}^{2}

;

Step 5: Pyramid reconstruction: The fused high-frequency layer image and low-frequency layer image are superimposed to obtain the final image

I_{f u s i o n} = F_{0} + F_{1} - G a u s s i a n B l u r (F_{0}, k s i z e = 3)

.

4. Experimental Verification

4.1. Experimental System

The hardware involved in the UAV experimental system mainly includes the body, flight control, pod, airborne controller, data transmission, ground station system, etc. The ground visual puzzle system software (Version 1.1.0) and flight control ground station software (QGC, Version 1.1.0) are installed on the ground station system computer. The main hardware parameters of the UAV experimental system are shown in Table 1.

The equipment in the UAV experimental system is divided into airborne equipment and ground equipment. The airborne equipment includes the flight control system, airborne controller, pod, and airborne data link. The ground equipment includes the ground station system and ground data link.

The main function of the UAV airborne controller is to run the keyframe extraction algorithm, convert the video stream from the pod into keyframes, and transmit them back to the ground station system via the data link. The ground station system runs the ground visual mosaicking software (Version 1.1.0), which performs image stitching on the keyframes returned by each UAV to generate a panoramic image. All devices communicate with each other through serial communication or Ethernet communication. The data communication relationships between devices are shown in Figure 5.

The experimental system is shown in Figure 6.

4.2. Experimental Protocol

4.2.1. Experimental Area

The experimental area is located in the 1 km × 1 km area ABCD formed by the four coordinate points A, B, C and D on the east side of Yangma Island in Muping District, Yantai City. The length in the east–west direction is 1 km, and the length in the north-south direction is 1 km. As shown in Figure 7, the coordinates of the four points are:

A (121.66481° E, 37.45497° N) B (121.67657°E, 37.45497° N)

C (121.67657° E, 37.44561° N) D (121.66481°E, 37.44561° N)

4.2.2. Regional Coverage Strategy

The horizontal field of view of the drone pod image is

α = 80 °

. The vertical field of view is

β = 46 °

. For keyframe extraction, the Hamming distance threshold of feature points is

{\bar{d}}_{H} = 80

, the matching ratio threshold is

\bar{λ} = 0.8

, and the maximum and minimum overlaps required for image stitching are

ε_{\max} = 0.8

and

ε_{\min} = 0.2

, respectively. The drone flight altitude is set to

h = 200 m

, and the overlap rate of two adjacent drone images is set to

τ = 0.2

. Substituting into Formula (4) can allow us to obtain the number of drones

n = 4

. Therefore, the scheme of four drones flying in formation from east to west is adopted. The coverage flight strategy scheme of four drones for the 1 km × 1 km area is shown in Figure 8.

In Figure 8, the field-of-view coverage of each UAV is 336 m × 170 m, the field-of-view overlap distance between two UAVs is 115 m, and the overlap rate is 0.34, which meets the requirements of image mosaic

τ = 0.2

. If the UAV’s flight speed is 20 m/s, it will take 50 s to complete the coverage flight of the 1 km × 1 km area.

The number of UAVs should be determined comprehensively according to the UAV flight altitude, camera field of view, image overlap rate requirements, mission time requirements, communication bandwidth, and hardware cost. In this experiment, four UAVs represent the minimum number that meets the image overlap rate requirement. Increasing the number of UAVs can further shorten the mission completion time, but will raise the communication bandwidth and hardware cost. If a mission time of 50 s is acceptable, four UAVs are optimal.

4.2.3. Experimental Procedure

The specific experimental process is divided into the following steps: equipment deployment, power-on self-test → UAVs take off in turn → formation → cover the task area for flight → key frame extraction → image mosaic. The experimental process is shown in Figure 9.

4.3. Experimental Results

After the four UAVs created a formation and entered the mission area, multi-UAV image mosaic (Version 1.1.0) software was designed at the ground station to receive the key frame images returned from the UAVs. The improved Laplacian pyramid method image fusion algorithm was used to stitch the key frame images. When the UAVs completed the 1 km × 1 km area coverage flight, a panoramic stitching map of the mission area could be obtained at the ground station.

The operation process of the multi-UAV image mosaic (Version 1.1.0) software is shown in Figure 10.

After the area is covered by the flight, the comparison between the panoramic mosaic map obtained by the multi-UAV image mosaic software and the Baidu map of the same area is shown in Figure 11.

As can be seen from the analysis of Figure 11, for the 1 km × 1 km area, the size of the ground image pixels obtained by the method proposed in this paper is 10,928 × 11,651, and the Baidu satellite map of the same area is 1067 × 1127 pixels, and the resolution is increased by more than 10 times.

In order to better observe the clarity of the map, we laid the digital target 1–9 with a length of 60 cm and a width of 40 cm on the ground, as shown in Figure 12. The digital target was laid on the side of the road before the UAV took off.

The obtained panoramic mosaic map was enlarged and processed to find the digital target laid on one side of the road, as shown in Figure 13.

The panoramic mosaic map was further enlarged to obtain the pixel size of the ground digital target in the mosaic image, as shown in Figure 14.

As shown in Figure 14, the ground digital target occupied about 9 × 12 (pixels) in the mosaic image, and the actual distance covered by 12 pixels was 60 cm. The resolution of the mosaic image was 5 cm/pixel (based on the ground sampling distance, a flight altitude of 200 m, and gimbal parameters of 1080p + FOV-80° × 46°), which reaches high legibility.

4.4. Comparative Experiments

In order to verify the method proposed in this paper, a comparative experiment was designed. With the premise of ensuring the experimental equipment, experimental area and experimental process were kept consistent, the single-UAV full video stream image mosaic experiment, the single-UAV image mosaic experiment based on key frames, and the multi-UAV full video stream image mosaic experiment were carried out, respectively. The comparison was made based on three aspects: coverage time, the number of stitched image frames, and transmission bandwidth. The results are shown in Table 2.

5. Conclusions

As shown in Table 2, compared with the single UAV stitching method, the multi-UAV fast aerial image mosaic method based on key frames proposed in this paper reduces the generation time of the panoramic stitching map by 75%, which greatly improves the real-time performance and efficiency of panoramic map acquisition; compared with the multi-UAV full video stream image mosaic method, this method reduces the number of frames of the stitched image by two orders of magnitude and the bandwidth requirement by one order of magnitude, which significantly reduces the computing power consumption and hardware cost. In summary, the multi-UAV fast aerial image mosaic method based on key frames proposed in this paper has faster speed and higher efficiency in the construction of panoramic images in the task area, and can meet the task requirements in the fields of disaster prevention and mitigation, urban planning and management, etc. with high real-time requirements.

It should be noted that the number of UAVs used for image mosaic is not simply a case of the more, the better. As the number of UAVs increases, communication bandwidth will face greater pressure and hardware costs will also rise. The optimal number of UAVs is determined comprehensively based on flight altitude, camera field of view, image overlap rate requirements, mission time requirements, communication bandwidth, and hardware cost. In practical applications, trade-offs should be made through comprehensive consideration. Based on the premise that communication bandwidth meets the requirements and hardware cost is within a controllable range, using as many UAVs as possible can greatly shorten the time required to obtain the panoramic stitched map and improve mission execution efficiency.

There are two directions for future research in this paper:

First, improve the anti-interference capability of the system. When the UAV formation performs regional coverage, it will inevitably be affected by external disturbances such as wind, turbulence, and navigation errors. These disturbances will reduce the formation accuracy and lead to changes in the image overlap rate. Although image mosaic has a certain redundancy for the image overlap rate, it is still necessary to study how to minimize the impact of external disturbances on image mosaic from the aspects of cooperative path planning, adaptive routing, and optimization among UAVs [28,29].

Second, it is important to improve the accuracy of the image mosaic. To achieve high mosaic efficiency, this paper adopted a relatively simple improved Laplacian pyramid method. In future work, the image mosaic algorithm needs to be further refined and optimized to improve mosaic accuracy while ensuring mosaic efficiency.

Author Contributions

Conceptualization, X.W. and Y.Q.; methodology, X.W.; software, Y.Q.; validation, L.Q., S.Y. and J.Z.; formal analysis, L.Q.; investigation, Y.Q.; resources, S.Y.; data curation, S.Y.; writing—original draft preparation, X.W.; writing—review and editing, X.W.; visualization, Y.Q.; supervision, S.Y.; project administration, J.Z.; funding acquisition, L.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Atik, M.E.; Arkali, M. Comparative Assessment of the Effect of Positioning Techniques and Ground Control Point Distribution Models on the Accuracy of UAV-Based Photogrammetric Production. Drones 2025, 9, 15. [Google Scholar] [CrossRef]
Jiang, X.W.; Wu, Y.Q. Research progress of UAV aerial image mosaic methods. Acta Aeronaut. Astronaut. Sin. 2025, 46, 331799. [Google Scholar]
Jiang, B.; Qu, R.K.; Li, Y.D.; Li, C. Object detection in UAV imagery based on deep learning: Review. Acta Aeronaut. Et Astronaut. Sin. 2021, 42, 524519. [Google Scholar]
Gómez-Reyes, J.K.; Benítez-Rangel, J.P.; Morales-Hernández, L.A.; Resendiz-Ochoa, E.; Camarillo-Gomez, K.A. Image Mosaicing Applied on UAVs Survey. Appl. Sci. 2022, 12, 2729. [Google Scholar] [CrossRef]
Kim, S.; Kim, T. Robust UAV Image Mosaicking Using SIFT and LightGlue. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 2025, 48, 169–175. [Google Scholar] [CrossRef]
Zheng, H.; Chang, Z.; Li, Y.; Zhu, J.; Wang, W.; Yang, Q.; Xie, C.; Zhang, J.; Liu, J. An Efficient and Fast Image Mosaic Approach for Highway Panoramic UAV Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 10454–10467. [Google Scholar] [CrossRef]
Bouguettaya, A.; Zarzour, H.; Kechida, A.; Taberkit, A.M. Deep learning techniques to classify agricultural crops through UAV imagery: A review. Neural Comput. Appl. 2022, 34, 9511–9536. [Google Scholar] [CrossRef] [PubMed]
Tan, X.; Mao, H.Y.; Zhi, X.D.; Hu, X.B.; Ma, A.N.; Yan, L. Research on image data matching method based on infrared spectrum technology of UAV. Spectrosc. Spectr. Anal. 2018, 38, 413–417. [Google Scholar]
Zhou, Y.; Yan, L.; Zhao, Y.; Shu, S.; Han, Y.; Xie, H. EAME: Element-Aware Multi-UAV distributed autonomous exploration for efficient and complete under-canopy measurements. ISPRS J. Photogramm. Remote Sens. 2026, 231, 664–678. [Google Scholar] [CrossRef]
Tian, P.; Wang, Z.; Cheng, P.; Wang, Y.; Wang, Z.; Zhao, L.; Yan, M.; Yang, X.; Sun, X. UCDNet: Multi-UAV Collaborative 3-D Object Detection Network by Reliable Feature Mapping. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5602016. [Google Scholar] [CrossRef]
Zhang, T.; Shen, H.; Yin, Y.; Xu, J.; Yu, J.; Pan, Y. LECES: A Low-Bandwidth and Efficient Collaborative Exploration System With Distributed Multi-UAV. IEEE Robot. Autom. Lett. 2024, 9, 7795–7802. [Google Scholar] [CrossRef]
Wan, Y.; Tang, J.; Zhao, Z.; Chen, X. Distributed Vision-Only Cooperative Flight of Multiple Quadrotors in Unknown Cramped Environments. IEEE Trans. Intell. Veh. 2025, 10, 3902–3915. [Google Scholar] [CrossRef]
Li, J.; Wang, Z.M.; Lai, S.M.; Zhai, Y.; Zhang, M. Parallax-tolerant image mosaic based on robust elastic warping. IEEE Trans. Multimed. 2018, 20, 1672–1687. [Google Scholar] [CrossRef]
Reddy, B.S.; Chatterji, B.N. An FFT-based technique for translation, rotation, and scale-invariant image registration. IEEE Trans. Image Process. 1996, 5, 1266–1271. [Google Scholar] [CrossRef]
Sawhney, H.S.; Hsu, S.; Kumar, R. Robust video mosaicing through topology inference and local to global alignment. In Proceedings of the European Conference on Computer Vision, Freiburg, Germany, 2–6 June 1998; pp. 103–119. [Google Scholar]
Harris, C.G.; Stephens, M. A combined corner and edge detector. In Proceedings of the Alvey Vision Conference, Manchester, UK, 31 August–2 September 1988; pp. 147–152. [Google Scholar]
Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the 7th IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; pp. 1150–1157. [Google Scholar]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 404–417. [Google Scholar]
Zhang, W.; Li, X.; Yu, J.; Kumar, M.; Mao, Y. Remote sensing image mosaic technology based on SURF algorithm in agriculture. EURASIP J. Image Video Process. 2018, 2018, 85. [Google Scholar] [CrossRef]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
Rosten, E.; Drummond, T. Machine learning for high-speed corner detection. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 430–443. [Google Scholar]
Liu, T.; Zhang, J. Improved image mosaic algorithm based on ORB features by UAV remote sensing. Comput. Eng. Appl. 2018, 54, 193–197. [Google Scholar]
Bond, G.; Seyfarth, R. Improving image alignment in aerial image mosaics via error estimation of flight attitude parameters. Comput. Sci. Eng. 2012, 2, 86–91. [Google Scholar] [CrossRef][Green Version]
Lati, A.; Belhocine, M.; Achour, N. Robust aerial image mosaicing algorithm based on fuzzy outliers rejection. Evol. Syst. 2020, 11, 717–729. [Google Scholar] [CrossRef]
Rui, T.; Hu, Y.; Yang, C.; Wang, D.; Liu, X. Research on fast natural aerial image mosaic. Comput. Electr. Eng. 2021, 90, 107007. [Google Scholar] [CrossRef]
Zhang, X.; Tian, B.; Lu, H.; Shen, H.; Lu, J. DEMO-PAST: A Decentralized Multi-MAV Online Navigation System Using Parallel Strategy Acceleration. IEEE Trans. Intell. Veh. 2025, 10, 2115–2126. [Google Scholar] [CrossRef]
Kurdi, S.T.; Al-Haddad, L.A.; Ogaili, A.A.F. Path Optimization for Aircraft Based on Geographic Information Systems and Deep Learning. Automation 2026, 7, 12. [Google Scholar] [CrossRef]
El-Sawi, A.R.; Almslmany, A.; Adel, A.; Saleh, A.I.; Ali, H.A.; Abdelsalam, M.M. Detecting Low-Orbit Satellites via Adaptive Optics Based on Deep Learning Algorithms. Automation 2026, 7, 14. [Google Scholar] [CrossRef]

Figure 1. Architecture of a multi-drone rapid aerial image mosaic system based on key frames.

Figure 2. Schematic diagram of drone field of view.

Figure 3. Key frame selection process based on matching-rate detection.

Figure 4. Keyframe-based visual puzzle system.

Figure 5. Data communication relationships between devices.

Figure 6. The experimental system.

Figure 7. Experimental area.

Figure 8. Multi-drone area coverage flight plan.

Figure 9. Experimental procedure.

Figure 10. Operating interface of multi-drone image mosaic (Version 1.1.0) software.

Figure 11. Comparison between the map obtained by the method in this paper and the Baidu satellite map.

Figure 12. Ground digital target.

Figure 13. Digital targets in panoramic mosaic map.

Figure 14. Pixels occupied by ground digital targets.

Table 1. Experimental system.

No.	Name	Model/Parameter
1	Airframe	Take-off mode: Vertical take-off and landing Wingspan: 2180 mm Fuselage length: 1140 mm Maximum flight speed: 125 KM/H Maximum payload: 1 kg Endurance: ≥60 min
2	Flight Control	PX4
3	Pod	A8 mini 3-axis stabilization 1080p 30fps FOV: 80° × 46°
4	Airborne Controller	NVIDIA Jetson NX
5	Data Transmission	VPA15A Communication distance:15 km Bandwidth: 30 Mbps at most
6	Ground Station System	Processor: i7-9750H + GTX1650 4G Discrete Graphics

Table 2. Comparative experimental results.

No.	Method	Real-Time Performance	Number of Stitched Image Frames	Transmission Bandwidth Requirements
1	Single-UAV full video stream image mosaic	≮200 s	≮6000	3–6 Mbps
2	Single-UAV image mosaic Based on Key Frames	≮200 s	≮50	0.025–0.05 Mbps $(ε_{\max} = 0.8$ $, ε_{\min} = 0.2$ $, \bar{ε} = 0.5$ , Each drone produces one keyframe every 4 s on average)
3	Multi-UAV full video stream image mosaic method	≮50 s	≮6000	12–24 Mbps
4	The multi-UAV image mosaic method based on key frames proposed in this paper	≮50 s	≮50	0.1–0.2 Mbps $(ε_{\max} = 0.8$ $, ε_{\min} = 0.2$ $, \bar{ε} = 0.5$ , Each drone produces one keyframe every 4 s on average)

Note: ① Real-time performance: the total latency from image acquisition to panoramic map display, which is the sum of the coverage time, transmission time, and mosaicking computation time. Compared with the coverage time, the transmission time is negligible. Since an incremental stitching strategy is adopted for image mosaicking, where each new frame is fused with the previously stitched result, the mosaicking computation time is also relatively small compared with the coverage time. ② Transmission bandwidth: The gimbal camera used on the UAV has a resolution of 1080p and a frame rate of 30 fps. For video transmission, YUV420 8-bit + H.265 encoding is adopted, and the data transmission bandwidth for a single gimbal is approximately 3–6 Mbps.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, X.; Qi, Y.; Qin, L.; Yan, S.; Zhang, J. A Rapid Aerial Image Mosaic Method for Multiple Drones Based on Key Frames. Automation 2026, 7, 43. https://doi.org/10.3390/automation7020043

AMA Style

Wu X, Qi Y, Qin L, Yan S, Zhang J. A Rapid Aerial Image Mosaic Method for Multiple Drones Based on Key Frames. Automation. 2026; 7(2):43. https://doi.org/10.3390/automation7020043

Chicago/Turabian Style

Wu, Xiuzhen, Yahui Qi, Liang Qin, Shi Yan, and Jianxiu Zhang. 2026. "A Rapid Aerial Image Mosaic Method for Multiple Drones Based on Key Frames" Automation 7, no. 2: 43. https://doi.org/10.3390/automation7020043

APA Style

Wu, X., Qi, Y., Qin, L., Yan, S., & Zhang, J. (2026). A Rapid Aerial Image Mosaic Method for Multiple Drones Based on Key Frames. Automation, 7(2), 43. https://doi.org/10.3390/automation7020043

Article Menu

A Rapid Aerial Image Mosaic Method for Multiple Drones Based on Key Frames

Abstract

1. Introduction

2. Related Work

3. Proposed Methodology

3.1. Regional Coverage Strategy

3.2. Key Frame Extraction Method

3.2.1. Key Frame Extraction Time Determination Method Based on Overlap

3.2.2. Key Frame Selection Method Based on Matching-Rate Detection

3.2.3. Calculation of Matching Rate λ Based on ORB Feature Points

3.3. Improved Laplacian Pyramid Image Fusion Method

4. Experimental Verification

4.1. Experimental System

4.2. Experimental Protocol

4.2.1. Experimental Area

4.2.2. Regional Coverage Strategy

4.2.3. Experimental Procedure

4.3. Experimental Results

4.4. Comparative Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI