An Edge Computing-Enabled UAV-Based Image Mosaicing System Using a Novel B-SIFT-ILS Algorithm

Wang, Linhui; Liu, Zhizhuang; Yang, Yu; Chen, Lizhi; Zhou, Zhenqi; Zeng, Mengyu; Tan, Yonghong

doi:10.3390/a19060489

Open AccessArticle

An Edge Computing-Enabled UAV-Based Image Mosaicing System Using a Novel B-SIFT-ILS Algorithm

by

Linhui Wang

^1,2,3

,

Zhizhuang Liu

^1,2,4,

Yu Yang

^1,2,

Lizhi Chen

^1,2,

Zhenqi Zhou

^1,2,

Mengyu Zeng

^1,2 and

Yonghong Tan

^1,2,*

¹

School of Intelligent Manufacturing, Hunan University of Science and Engineering, Yongzhou 425199, China

²

Hunan Engineering Research Center for Smart Agriculture (Fruits and Vegetables) Information Perception and Early Warning, Yongzhou 425199, China

³

College of Agricultural Unmanned Systems, China Agricultural University, Beijing 100091, China

⁴

Hunan Golden Ant Intelligent Equipment Co., Ltd., Yongzhou 425100, China

^*

Author to whom correspondence should be addressed.

Algorithms 2026, 19(6), 489; https://doi.org/10.3390/a19060489

Submission received: 29 April 2026 / Revised: 8 June 2026 / Accepted: 15 June 2026 / Published: 18 June 2026

(This article belongs to the Special Issue AI-Driven Optimization for Sustainable Edge-Cloud Continuum)

Download

Browse Figures

Versions Notes

Abstract

In UAV-based remote sensing, accurate and efficient image mosaicing is crucial for achieving real-time monitoring. Traditional cloud-centric processing paradigms, however, face core scientific challenges such as high latency, bandwidth bottlenecks, and limited autonomy, making them inadequate for dynamic, real-time scenarios. To address these issues, this paper proposes an edge-computing-enabled UAV image mosaicing system. The system consists of a UAV remote sensing platform and an edge computing terminal, with the core being our novel B-SIFT-ILS algorithm. The algorithm first uses geographic coordinates for unified registration, constructs a Gaussian scale space for multi-resolution representation, and then precisely locates extrema in the Difference of Gaussian (DoG) space using a 3D quadratic function. A BANSAC algorithm is subsequently employed to refine feature points and extract stable SIFT features, and finally, Iterative Least Squares (ILS) are used to achieve seamless mosaicing. Experimental results demonstrate that, compared with classical RANSAC, the proposed method achieves superior feature sampling accuracy (rotation: 0.879, translation: 0.877) and lower latency. The ILS-based smoothing stage effectively eliminates noise and ghosting without introducing gradient reversal, performing comparably to deep learning methods while significantly outperforming direct averaging and Gaussian approaches. On the NVIDIA Jetson Orin NX edge terminal, a single processing instance requires only 1124 ms, highlighting its strong potential for real-time, low-latency, and autonomous mosaicing tasks. Future research will focus on extending the approach to non-planar terrains and implementing adaptive parameter tuning for the BANSAC algorithm.

Keywords:

edge computing; UAVs; image mosaicing; B-SIFT-ILS

1. Introduction

The concept of image mosaicing involves merging multiple images to produce a cohesive, scalable, and seamless composite image. Such composite images often provide a more comprehensive description, both in space and time, than different individual images. In precision agriculture, timely crop monitoring is essential for effective pest management [1]. Consider a farmer who requires real-time crop condition maps during a pest outbreak to enable targeted spraying operations. In this scenario, the farmer deploys a camera-equipped UAV to survey the farmland. The UAV captures video imagery, which must be seamlessly stitched into a composite map to support decision-making.

In image mosaicing tasks, particularly those involving real-time UAV remote sensing applications, the limitations of the cloud-centric traditional computing model are becoming increasingly evident. First, transmitting the massive volume of raw image data generated at the edge to the cloud imposes prohibitive bandwidth pressure and storage costs [2]. Second, the round-trip latency between edge cameras and cloud data centers often fails to meet the stringent real-time requirements of applications such as disaster response, agricultural monitoring, and autonomous navigation—where image stitching must be completed within milliseconds to support timely decision-making [3]. Third, intermittent or constrained network connectivity in field environments can render cloud-dependent solutions infeasible in practical deployments [4]. Edge computing [5] addresses these challenges by bringing computational tasks closer to the data sources. Unlike fog computing [6,7] and cloudlets [8,9], edge computing enables online image stitching to be performed locally directly on embedded edge terminal systems. This paradigm significantly reduces data transmission overhead, minimizes latency, and preserves system operational autonomy. Therefore, introducing edge computing to image stitching is not merely an optimization but an essential requirement for achieving real-time, resource-efficient, and reliable remote sensing systems in dynamic field environments.

The advent of cost-effective microcomputing platforms like the NVIDIA Jetson series has revolutionized edge computing, enabling energy-efficient AI solutions such as neural networks to flourish. Edge computing is a distributed architecture that positions computational resources near data sources. By employing machine learning, edge AI processes locally captured sensor data directly on the device. This method eliminates the need for cloud dependency, offering lower latency, reduced bandwidth consumption, and enhanced real-time decision-making. The integration of data acquisition and high-performance processing ensures rapid analysis, improved responsiveness, and greater bandwidth efficiency—critical for autonomous operations. Benchmark tests confirm these compact systems’ capability to handle real-time vision-based applications, including self-driving vehicles [10] and monitoring driver alertness [11]. Additionally, combining edge hardware with deep learning has shown promising results in classifying 3D point clouds and hyperspectral imagery, especially in agricultural drone systems [12].

The key to all mosaicing algorithms is determining the alignment of these images. Scholarly works often categorize image registration strategies into two core groups: pixel-level dense algorithms and sparse keypoint-based algorithms, corresponding to direct processing and feature-dependent methodologies [13]. Direct approaches utilize the complete image dataset instead of depending on sparse feature extraction. These techniques concurrently compute transformation parameters and pixel correspondences, achieving greater precision than feature-based methods by leveraging the full image data throughout the estimation procedure. Although direct techniques enhance precision, they demand that initial parameter estimates closely approximate the actual solution and depend on substantial image overlap for convergence. The foundational research in this field was introduced by Lucas and Kanade [14], and Baker’s comprehensive review provides valuable insights into the historical progress and expansion within this framework [15]. Feature-based image alignment methods rely on the detection and matching of salient visual features, including SIFT descriptors, SURF features, and affine-invariant regions, rather than directly comparing image intensities. Due to their sparse distribution, these features enhance computational efficiency, enabling real-time performance.

One of the most popular mosaicing techniques is a graph-based method. In the field of image registration, Kang et al. [16] advocate a new approach utilizing frame diagrams. Their approach requires traversing the frame graph to determine the optimal path and subsequently facilitates the seamless construction of the final mosaic by merging related images. Similarly, Fusiello et al. [17] delve into graph-based global registration strategies, although the focus is on nonlinear Least Squares optimization to generate super-resolution mosaics. On a different tangent, Pfingsthorn et al. [18] introduced an image registration technique utilizing spectral analysis with a phase-oriented matching filter (POMF). The outputs derived from the POMF are then incorporated into pose graph refinement, leading to the generation of intricate mosaic composition.

Significant progress has been made in airborne image mosaicing, particularly for UAV applications. Existing methods have explored SLAM-based registration, reference-image alignment, pairwise image registration, and global homography optimization to improve stitching accuracy and reduce error accumulation [19,20,21,22]. While these approaches have demonstrated satisfactory performance in aerial image registration, they generally rely on computationally intensive feature matching, geometric transformation, and iterative optimization procedures. Furthermore, most were developed for offline processing or cloud-assisted computing environments, where computational resources are relatively abundant.

Despite significant advances in image mosaicing algorithms, including feature-based methods such as SIFT and graph-based global registration techniques, a critical scientific gap remains. The existing literature lacks a real-time, edge-deployable image mosaicing algorithm that simultaneously achieves three often-conflicting objectives: high feature extraction accuracy under varying field conditions, efficient processing performance within the strict computational and energy constraints of embedded edge terminals, and seamless image smoothing that eliminates seams and ghosting artifacts without introducing gradient reversals. While individual techniques exist for feature matching, homography estimation, and image fusion, their integration into a cohesive, edge-native pipeline that respects the latency, power, and memory limitations of UAV-edge systems remains an open problem. This paper addresses two specific objectives: first, to identify a novel image mosaicing algorithm suitable for processing unordered aerial images; second, to develop a low-cost edge computing terminal system that enables real-time automatic mosaicing without cloud dependency.

Therefore, the aim of this paper is not merely to propose a new SIFT variant, but to develop a complete, edge-native image mosaicing pipeline that operates from feature extraction to smooth blending entirely within the computational and energy constraints of a UAV edge terminal. The main contributions of this work are as follows:

We propose an improved homography estimation method. By optimizing the homography estimation process with the Normalized Direct Linear Transformation (NDLT) algorithm and directly calculating the homography between the new image and the reference image, rather than relying on the accumulation of pairwise homography estimates, we effectively reduce error accumulation. This approach minimizes image drift and improves mosaicing accuracy;
We introduce an improved SIFT algorithm for feature extraction and image registration. The conventional SIFT method effectively identifies features that are invariant to scale changes; it may be affected by noise and complex scenes when processing UAV remote sensing images. By incorporating BANSAC, we enhance the accuracy and robustness of feature point extraction in the SIFT algorithm. Our experiments demonstrate that the improved SIFT algorithm achieves rotation and translation sampling accuracies of 0.879 and 0.877, respectively, outperforming the traditional RANSAC algorithm;
In terms of image smoothing, we proposed an Iterative Least Squares (ILS) method to minimize the objective function for achieving global fast optimization, leveraging the computational advantages of Least Squares in image gradient calculation. This method achieved a significant reduction in the objective function energy with relatively few iterations, with a maximum decrease of up to 72%. The smoothing effect completely eliminated noise and ghosting artifacts without any gradient reversal phenomena. The proposed method demonstrated superior performance compared to traditional direct averaging and Gaussian distribution methods across key metrics including information entropy, standard deviation, spatial frequency, average gradient, signal-to-noise ratio (SNR), and peak signal-to-noise ratio (PSNR). Furthermore, it outperformed deep learning approaches in terms of computational speed.
We implement real-time image mosaicing using edge computing systems. By incorporating our novel B-SIFT-ILS algorithm into NVIDIA Jetson Orin NX, we enable real-time creation of high-resolution stitched images from disorganized aerial footage.

2. Materials and Methods

2.1. System Setup

The real-time mosaicing system for UAVs remote sensing images primarily consists of a UAV, an onboard image acquisition system, and a ground edge computing terminal, as illustrated in Figure 1. The onboard image acquisition system is composed of a K510 control board integrated with built-in OV7725 cameras (Canaan Inc., Beijing, China)and a 5G communication module. The ground edge computing terminal is primarily built around an NVIDIA Jetson Orin NX processor(NVIDIA Corporation, Santa Clara, CA, USA), which is equipped with a GPU module to enhance digital image processing capabilities and is responsible for receiving RGB images and performing mosaicing operations.

The key parameters of the K510 control board and the NVIDIA Jetson Orin NX processor are listed in Table 1.

2.2. Data Collection

This remote sensing dataset comprises images of orchards and cornfields captured using a UAV-based remote sensing system; the drone flew at an altitude of approximately 25 m while maintaining a constant speed of about 1 m per second. The captured imagery is in MP4 format, with a frame rate of 29 frames per second and a resolution of 1920 × 1080. From this video footage, one frame was extracted every five frames, ultimately generating a sequence containing 3500 images. A portion of the dataset is available for download on Zenodo (https://doi.org/10.5281/zenodo.20515165, accessed on 8 June 2026).

2.3. Image Mosaicing

2.3.1. Reference-Based Homography Estimation for UAV Image Mosaicing

Image mosaicing involves combining multiple photos captured from different viewpoints to create a unified, continuous scene. This involves aligning the images onto a common plane, typically referred to as the mosaic plane or reference frame. One common technique for aligning sequential images, particularly those captured by UAVs, is through homography estimation, where pairwise alignment is performed between consecutive images. The goal is to ensure that all images are registered and aligned with respect to a reference image, thereby enabling the creation of a composite mosaic. Let Ic be our reference image. For a planar scene with n images

I_{0}

,

I_{1}

,

I_{2}

, …,

I_{n - 1}

, the pairwise homographies

H_{1, 0}

,

H_{2, 1}

,

H_{3, 2}

, …,

H_{n, n - 1}

are known, where

H_{i, i - 1}

represents the transformation mapping

I_{i}

to

I_{i - 1}

. Using these, the homography between a new image

I_{n}

and the reference frame

I_{0}

can be computed as in

H_{n, c} = H_{n, n - 1} H_{n - 1, n - 2} \dots H_{c + 1, c^{'}}

(1)

While this method appears simple, its multiplicative nature leads to rapidly accumulating inaccuracies, causing noticeable misalignment in the final stitched image. Figure 2 illustrates this positional deviation when UAVs revisit the same location and recapture the scene from their starting point. The actual and calculated flight paths are represented by red and green dotted lines, respectively.

The use of the Normalized Direct Linear Transformation (NDLT) method when calculating pairwise homographies helps significantly reduce geometric distortions, as demonstrated in prior studies [8]. Consequently, the formulation of the cost function can be expressed as in

E (H_{i + 1, i}) = \sum_{i} {(I_{i + 1} α - I_{c} H_{i + 1} α_{i})}^{2},

(2)

where

α_{i}

is the matching feature between

I_{i}

and

I_{i - 1}

. Note that the concept of error refers to the image

I_{c}

. However, when aligning

I_{i + 1}

and

I_{i}

to the mosaic, the homography relating them no longer guarantees minimal error. This happens as leftover directional terms from dual-image homography calculations undergo transformation while being aligned. A different method involves calculating the homography directly for the newly captured image in relation to the existing reference mosaic [23]. In this approach, distinctive features of the image

I_{i}

are extracted and matched against those of

I_{i - 1}

. Subsequently, the matched features from

I_{i - 1}

are aligned to the mosaic via

H_{i - 1, c}

and

H_{i, c}

is computed using the aligned

I_{i - 1}

. The cost function utilized in this calculation is modified in the following manner:

E^{'} (H_{i, c}) = \sum_{i} {(H_{i - 1, c} α_{i - 1} - H_{i, c} α_{i})}^{2},

(3)

This approach provides clear benefits by allowing direct calculations on the reference image. Our analysis is based on this technique. Since every image is aligned to the a shared coordinate system, one might wonder whether the selection of an alternative baseline would affect the outcome. A homography transforms the position of a scene point between different camera perspectives, with the reference coordinates obtained through a global homographic adjustment. Consequently, the resulting stitched panorama emulates a unified shot taken from the reference point. However, when the dominant plane of the scene deviates from the reference image plane, perspective warping can emerge in the mosaic, influenced by the degree of misalignment. We integrate homography calculations based on a chosen reference frame

I_{c}

. The precision of these estimates can vary depending on the selected reference. If the reference image’s plane is not parallel to the scene plane, feature reprojection errors may fluctuate, showing patterns of expansion or compression. Consequently, during optimization, errors from points near the image plane can dominate, negatively influencing the final estimation. The optimal reference image should be orthogonal to the scene, with key elements aligned parallel to its primary geometry. Since aerial imagery from UAVs typically meets this condition, our approach maintains reliability in most situations.

2.3.2. Reproducibility Protocol

To guarantee reproducibility, the following data processing protocol was strictly followed:

Step 1: Frame Extraction. Every fifth frame was extracted from raw MP4 footage (29 fps, 1920 × 1080) using FFmpeg v4.4, with the command flag ‘-vf select = not(mod(n,5))’.
Step 2: Preprocessing. Each extracted frame was converted to grayscale and resized to 640 × 480 pixels using ‘cv2.resize()’ function with the ‘INTER_LINEAR’ argument.
Step 3: Calculation of geographical overlap. For each pair of images, the geographic overlap was calculated based on the GPS coordinates stored in the image metadata.
Step 4: Matching of features with BANSAC. BANSAC was applied using the following fixed parameters: max_iterations = 4000, confidence = 0.99, and threshold = 3.0 pixels.
Step 5: Sequential processing pipeline. Further processing, including homography estimation, feature refinement, and iterative least squares smoothing, was conducted exactly as described in Section 2.4, ensuring that each step can be independently repeated.

2.4. Proposed Mosaicing Approach

2.4.1. Geometric Correction of Image Distortion

Geometric correction of remote sensing imagery is typically accomplished using collinear equations or polynomial fitting, both of which require a sufficient number of evenly distributed ground control points (GCPs). Yet, in applications like surveillance and reconnaissance, acquiring such GCPs is often challenging [24]. Consequently, there is a need to develop a UAV-based remote sensing geometric correction model that operates without relying on ground control points. The geometric distortion in UAV remote sensing imagery primarily stems from sensor inaccuracies, external orientation changes, and underlying geophysical properties [25]. The key to the geometric correction of UAVs remote sensing images is to eliminate the external orientation change elements without considering the influence of physical characteristics such as the earth’s rotation and curvature. Therefore, to attain rapid geometric rectification and produce an efficient orthogonal projection, the correction model developed in this study focuses solely on variations in flight attitude parameters.

During UAV operation, key orientation metrics such as pitch (

α

), roll (

β

), and yaw (

γ

) are logged. Variations in these angles enable geometric image rectification by coordinating system linkages, allowing spatial adjustments to the initial imagery [26]. The rectification approach applied in this study is derived from geometric modeling and expressed as in

{[\begin{matrix} x \\ y \\ - k \end{matrix}]}_{s} = R (H) R (γ) R (β) R (α) {[\begin{matrix} x^{'} \\ y^{'} \\ - k \end{matrix}]}_{s},

(4)

In this formulation, (x, y) represents the location of a pixel in the source image, while (x′, y′) indicates the corresponding position in the corrected image. R refers to the parameter matrix, and H indicates the camera installation offset. The matrix k represents the intrinsic parameters of the camera, including the focal length f and the principal point c, expressed as in

k = [\begin{matrix} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{matrix}],

(5)

where

f_{x}

and

f_{y}

denote the focal lengths along the horizontal and vertical image axes, respectively, expressed in pixel units.

c_{x}

and

c_{y}

represent the coordinates of the principal point in the image plane, corresponding to the horizontal and vertical principal point offsets, respectively.

2.4.2. Image Matching

The SIFT algorithm derives distinctive local descriptors that maintain invariance against rotational transformations, scaling variations, and lighting condition alterations in images. Therefore, the SIFT algorithm is very suitable for UAVs remote sensing video images. This paper proposes an improved SIFT feature-matching algorithm. The algorithm flow is shown in Figure 3.

Firstly, the geographic coordinates of each image’s central point were employed to align both original images within a unified spatial reference frame. This alignment enabled direct identification of the overlapping regions between the two images, as illustrated in Figure 3. Here,

θ

denotes the disparity in yaw angles, while r represents the linear separation between the central point S and S′.

Secondly, the image scale space was established. The initial picture was resized using a Gaussian filter, generating a multi-scale depiction in scale space. Subsequently, robust keypoints were identified from this processed representation. Reference [27] demonstrates that, under diverse plausible conditions, the Gaussian function remains the sole viable scale-space kernel. Consequently, the scale-space representation of an image can be formulated as in

L (x, y, ρ) = G (x, y, ρ) \cdot I (x, y),

(6)

The scale space L(x, y,

ρ

) was determined by the image coordinates (x, y), the scaling parameter

ρ

, the original image I(x, y), and the Gaussian convolution kernel G(x, y,

ρ

).

G (x, y, ρ) = \frac{1}{2 π ρ^{2}} e^{- \frac{(x^{2} + y^{2})}{2 ρ^{2}}},

(7)

To enhance the reliability of feature point detection, the Difference of Gaussian (DOG) scale space was established by computing the variance between Gaussian kernel-filtered versions of the source image.

\begin{matrix} DOG (x, y, ρ) & = (G (x, y, k ρ) - G (x, y, ρ)) \cdot I (x, y) \\ = L (x, y, k ρ) - L (x, y, ρ), \end{matrix}

(8)

where k is a constant.

Thirdly, feature points were identified. Candidate extremum locations and their scales are established by evaluating each point against 26 surrounding samples (including 8 in the same scale space and 18 across neighboring scales above and below). Given the DOG operator’s heightened sensitivity to noise and edges, unstable edge points with low contrast must be filtered out. Subsequently, precise extremum coordinates and DOG scale values are derived using a 3D quadratic interpolation method, enhancing both localization precision and robustness against noise.

Alignment estimation: Assuming n overlapping images exist with the new input, the relative positioning between the new image and the reference is computed by minimizing a cost function defined in Equation (9):

E^{'} (H_{i, c}) = \sum_{i = 1}^{n - 1} {(H_{i - 1, c} α_{i - 1} - H_{n, c} α_{n})}^{2},

(9)

At the same time, in order to improve the correct matching rate and computational efficiency during the estimation process, an adaptive sampling algorithm Bayesian Network for Adaptive Sample Consensus (BANSAC) is employed to refine the coarse feature point data [28]. BANSAC is an adaptive sampling RANSAC algorithm based on dynamic Bayesian networks, aiming to enhance the efficiency and accuracy of the RANSAC algorithm [29]. The BANSAC algorithm is shown in Algorithm 1.

Algorithm 1 BANSAC algorithm.

Input: Matched feature points(R_X,R_Y), maximum iteration number K,

initial conditional probability table

{CPT}^{0}

.

Output: Optimal homography model

θ

*, and optimal inlier set C*.

Initialize k ← 1,

{CPT}^{0}

← 0.5,

θ

* ← NULL, C* ← NULL;

while k < K do

S^{k}

← weight_sampling((R_X_k,R_Y_k),

{CPT}^{k - 1}

)

θ^{k}

← hypothesis(

S^{k}

);

C^{k}

← model_evaluation((R_X_k,R_Y_k),

θ^{k}

);

if

C^{k}

better than previous best:

θ^{*}

,

C^{*}

← best_model(

θ^{k}

,

C^{k}

);

{C P T}^{k}

← update_probabilities(

C^{k}

,(R_X_k,R_Y_k)^0:k−1);

if stopping_criteria(

{C P T}^{k}

)

break;

k ← k + 1;

end

The input of the BANSAC algorithm is the feature point subset (R_X,R_Y), and the output is the optimal model and the corresponding inlier subset. We first set the initial value of inlier probability

C P T^{0}

to the predefined 0.5 before the RANSAC iteration. During each iteration, a minimal set

S^{k}

∈ (R_X,R_Y) is generated through weighted sampling, using

C P T^{k - 1}

as the weights. Then, we calculate the hypothetical model

θ^{k}

, conduct inlier counting, and update the optimal model if necessary. Next, the probability

C P T^{k}

for the next iteration is updated. After each iteration, the conditional probability table (CPT) is updated according to the observed classification result of each feature point. Let

P_{k}

(Inlier) denote the probability that a feature point belongs to the inlier set after the k-th iteration. Given the observation

O_{k}

, the posterior probability is updated according to Bayes’ theorem:

P_{k} (I n l i e r ∣ O_{k}) = \frac{P (O_{k} ∣ I n l i e r) P_{k - 1} (I n l i e r)}{P (O_{k})},

(10)

where

P (O_{k}) = P (O_{k} ∣ I n l i e r) P_{k - 1} (I n l i e r) + P (O_{k} ∣ O u t l i e r) P_{k - 1} (O u t l i e r),

(11)

is the marginal probability of the observation.

The probability that the feature point belongs to the outlier set is then computed as

P_{k} (O u t l i e r) = 1 - P_{k} (I n l i e r),

(12)

The updated posterior probabilities are stored in the

C P T^{k}

and are subsequently used to determine the sampling weights of feature points in the next iteration. The sampling weight of the i-th feature point is defined as

ω_{i}^{(k)} = \frac{P_{k} ({I n l i e r}_{i})}{\sum_{j = 1}^{N} P_{k} ({I n l i e r}_{j})},

(13)

where N denotes the total number of candidate feature points.

Consequently, feature points with higher posterior inlier probabilities are assigned larger sampling weights and are therefore more likely to be selected in subsequent iterations. This adaptive sampling strategy improves both the convergence efficiency and the robustness of the homography estimation process. Finally, it is determined whether the set stopping criterion is met; if so, the loop is exited. The iterative process terminates when the number of detected inliers reaches or exceeds the threshold value

C P T^{k}

.

Fourth, the direction of distributing feature points is determined. During computation, sampling occurs at the center of a neighborhood window adjacent to the key point. The gradient direction of surrounding pixels is derived using a gradient histogram with 36 bins. The histogram’s peak indicates the dominant gradient direction around the key point, defining its principal orientation. Additionally, if any other histogram value reaches 80% of the main peak, it is designated as a secondary direction for the key point. By assigning each key point’s orientation based on gradient data, the operator achieves rotational invariance. The feature point’s direction is then computed as in

\{\begin{matrix} m (x, y) = {(T (x + 1, y) - T (x - 1, y))}^{2} + {(T (x, y + 1) - T (x, y - 1))}^{2}, \\ ϑ (x, y) = {tan}^{- 1} (\frac{T (x, y + 1) - T (x, y - 1)}{T (x + 1, y) - T (x - 1, y)}) \end{matrix}

(14)

where m(x,y), L(x,y), and

ϑ (x, y)

represent the gradient, gradient modulus, and gradient direction of the feature point(x,y), respectively.

Fifth, feature point descriptors are generated through a series of steps designed to mitigate illumination variations and geometric distortions. Initially, to maintain rotation invariance, the coordinate axis is aligned with the feature point’s orientation. Using the feature point as the center, a Gaussian-weighted circular window is applied to adjust the axis accordingly. Then, we extracted the center of the 4 × 4 window around the feature point, divided the window into 2 × 2 sub-windows, and counted the gradients in 16 directions in each sub-window through the gradient histogram to form a 32-dimensional feature vector with good robustness, as shown in Figure 4.

Figure 4 illustrates a grid-based representation of pixels surrounding a feature point within its scale space. Arrow directions denote pixel gradient orientations, while their lengths correspond to gradient magnitudes. A 16-direction gradient histogram was computed for every

4 \times 4

grid segment, with seed points derived from cumulative gradient values. This process yields a 32-dimensional SIFT descriptor, renowned for its robustness against noise and high fault tolerance.

To assess similarity between two SIFT-descriptors, the Euclidean distance between their keypoint vectors is computed. For a keypoint in reference image, the two nearest matches in target image are identified. A successful match is confirmed if the ratio of the second-closest to the closest distance exceeds a dynamically adjusted threshold.

2.4.3. Image Smoothing

During the remote sensing image acquisition process, variations in lighting and imaging angles could lead to misalignment between two source images, resulting in uneven seams. To address this issue, pre-smoothing was performed on the images. Commonly used smooth fusion techniques are categorized into three types: the weighted average filtering method, the Gaussian distribution method, the Least Squares method, and the recently popular deep learning-based approaches. Among these, the weighted average method often suffered from over-smoothing in edge processing, leading to halo artifacts and gradient reversal artifacts [30]. The Gaussian distribution method was also prone to producing artifacts in the results and was highly sensitive to parameter settings, where smaller parameter values could significantly increase computational costs [31]. Deep learning-based methods employed different deep neural network architectures to mimic the smoothing effects of existing filters, but they typically required separate training of different models for distinct parameter configurations, resulting in poor generalization in parameter adaptability [32]. The Least Squares method achieved image gradient computation by calculating gradient values along different coordinate axes using discrete differential operators, offering computational simplicity and efficient edge smoothing. Therefore, this study proposed an ILS method to minimize the objective function by computing image gradient values iteratively, achieving global optimization. The minimization objective function of the ILS method was as in

F (O, I) = \sum_{s} ({(O_{s} - I_{s})}^{2} + λ \sum_{* \in {x, y}} ℶ_{p} (\nabla O_{*, s})),

(15)

where I and O represent the original input image and the smoothed output image, respectively, s denotes the pixel position,

λ

is a regularization parameter (gradient weight),

\nabla O_{*, s}

is the gradient value of the output image at pixel s, and

ℶ_{p}

is the penalty function, which is defined as in

ℶ_{p} = {(x^{2} + τ)}^{p / 2}

(16)

where

τ

is a fixed constant, typically set as

τ

= 0.0001. The norm power p∈ (0,1) serves as an edge smoothness adjustment parameter.

The ILS solution to Equation (15) is obtained using the additive half-quadratic minimization method proposed by Geman [33]. According to the optimization process described by this method, the output image after the (n + 1) iteration is given by:

O^{n + 1} = a r g m i n F (O, I, O_{x}^{n}, O_{y}^{n}),

(17)

where

O_{x}^{n}

and

O_{y}^{n}

represent the optimal conditions at the n iteration, which are defined as the gradient values minus the derivative of the penalty function

ℶ_{p}

, expressed as in

\{\begin{matrix} O_{x}^{n} = c \nabla O_{x, s} - ℶ_{p} {(\nabla O_{x, s})}^{'}, \\ O_{y}^{n} = c \nabla O_{y, s} - ℶ_{p} {(\nabla O_{y, s})}^{″} \end{matrix}

(18)

The constant

c = p τ^{\frac{p}{2} - 1}

is a positive value. Substituting Equation (18) into Equation (17), the optimization problem can be rewritten as in

O^{n + 1} = a r g m i n \sum_{s} ({(O_{s} - I_{s})}^{2} + λ \sum_{* \in {x, y}} \frac{1}{2} {(\sqrt{c} \nabla O_{*, s} - \frac{1}{\sqrt{c}} O_{*, s}^{n})}^{2}),

(19)

Equation (19) is referred to as the Iterative Least Squares (ILS) formulation. In our implementation, the iteration number n ranges from

0, 1, 2, \dots, N

, with

N = 12

as determined empirically (see Section 3.2). The regularization parameter

λ

is set to 1.0 to achieve rapid energy reduction, and the norm power p is set to 0.8 for optimal edge preservation.

2.5. Edge Computing Deployment

Edge mosaicing refers to real-time data processing via embedded systems, where models are implemented directly on hardware for instantaneous computation. In this paper, The K510 processor mounted on the UAV is primarily responsible for capturing video data and extracting image frames. Five consecutive images were captured at 1 s intervals and processed by each model. The NVIDIA Jetson Orin NX T801 is adopted as the processor for training and inference tasks of the proposed B-SIFT-ILS algorithm, with its detailed specifications listed in Table 1.

3. Results

3.1. SIFT Feature Extraction Results in Image Overlap Region

The experimental images in our works are derived from UAVs actual aerial photography, and the proposed real-time image mosaicing algorithm has been implemented on Python 2.7 including the OpenCV 4.0 and NumPy 1.16.6 library. Figure 5 displays the SIFT feature extraction outcomes within the overlapping sections of the source images. As illustrated, the detected features predominantly clustered in these shared regions.

To demonstrate the superior performance of BANSAC compared to RANSAC regarding both precision and computational speed, we designed a homography estimation experiment, using the mean Average Accuracy (mAA) and execution time as evaluation metrics. By fixing the number of iterations and enforcing a no-stopping-criterion measure, we observed the changes in the number of iterations ranging from 0 to 10,000 and measured the rotation accuracy, translation accuracy, and execution time. The experimental outcomes are presented in Figure 6.

In Figure 6, as the number of iterations increases, the accuracy of both methods improves overall, but BANSAC performs better, with the highest rotation accuracy reaching 0.879 and the highest translation accuracy reaching 0.877. In terms of runtime, BANSAC is faster. Compared with RANSAC, BANSAC does not need to traverse all data points to update the score in each iteration, thus, BANSAC has a shorter runtime; especially when the number of iterations is large, this advantage is more pronounced.

Figure 7 shows the inlier matching results of the two types of images after 4000 iterations of the BANSAC algorithm, with the update probability at this time being approximately 0.087.

The result of image fusion is obtained through feature matching, and then, using the method proposed in this work, the result of mosaicing multiple frames of images is shown in Figure 8.

3.2. Analysis of ILS-Based Image Smoothing Iteration Results

To demonstrate that ILS can serve as an effective foundational tool for edge smoothing, we analyzed the energy descent effects of the minimization objective function in Equation (15) during the iterative process under norm powers p = 0.2, 0.5, 0.8, 1.0 and weights

ω

= 0.1, 0.5, 1.0, 5.0, 10.0, as illustrated in Figure 9.

As can be seen from Figure 9, when the parameters

ω

and p were set to relatively large values, the objective function energy experienced a significant decline after only 10 iterations (N = 10) of Equation (15). For instance, at N = 8, the total energy decreased to 21–61% of its initial value. When N = 10,

ω

= 1, and p = 0.8, the reduction reached 72%. The energy stabilized after 12 iterations. Therefore, in subsequent calculations, the number of iterations N was fixed at 12, and the smoothing strength of ILS could be controlled by adjusting

ω

. However, to achieve rapid energy reduction of the objective function,

ω

was directly set to 1.0 and p to 0.8.

3.3. Comparative Analysis of Different Smoothing Algorithms

To assess the effectiveness of the proposed ILS method for smoothing remote sensing image mosaics, we conducted a comparative analysis with traditional approaches, including Direct Averaging (DA) [34], Gaussian Distribution (GD) [35], and the DL-based baseline method [36]. The DL-based baseline was implemented using a U-Net architecture comprising an encoder–decoder framework with skip connections to facilitate multi-scale feature fusion. The network contains approximately 7.8 million trainable parameters and was deployed on the NVIDIA Jetson Orin NX platform using TensorRT 10.0 with FP16 precision inference. The smoothing results, using orchard remote sensing images as an example, are illustrated in Figure 10. Figure 10 presents the detail-enhanced results generated by different methods. It should be noted that the parameters of all compared methods were carefully tuned to ensure optimal smoothing performance on the input images. As shown in Figure 10b, the direct averaging method performed poorly in noise removal, exhibiting noticeable halo artifacts. Additionally, it required continuous parameter adjustments during testing to balance smoothing capability and edge preservation, resulting in unsatisfactory outcomes. The Gaussian distribution method, depicted in Figure 10c, suffered from over-smoothing, leading to overall image blurring, while its noise suppression effect remained mediocre. In contrast, the proposed ILS and the deep learning-based method, shown in Figure 10d and Figure 10e, respectively, successfully eliminated noise in the images and effectively smoothed the ghosting artifacts in the vehicle regions of the input images. Nearly no visible gradient reversal or halo effects were observed, demonstrating superior performance in both cases.

We evaluate these smoothing algorithms—using six metrics: information entropy, standard deviation, spatial frequency, average gradient, signal-to-noise ratio (SNR), and peak signal-to-noise ratio (PSNR). For a precise comparison, the algorithms are tested on the mosaicing process between image 001_IMG and image 002_IMG, with results averaged to derive statistical properties. The findings are summarized in Table 2.

As shown in Table 2, the ILS algorithm used in our approach is better than the DA and GD smoothing algorithms in six metrics, which is mainly because there are obvious gaps in the mosaicing process between direct average fusion and Gaussian distribution fusion, such that the scale and illumination of the local area of the stitched image change drastically. Compared with the metric performance of deep learning, the ILS algorithm only performs relatively well on the standard deviation metric. In summary, the ILS algorithm can effectively overcome the effects of image rotation, scale and illumination changes.

To demonstrate the advantages of the proposed B-SIFT-ILS algorithm in the operation of the constructed remote sensing image stitching system, we tested the performance of different smoothing methods on the NVIDIA Jetson Orin NX platform, with both the feature extraction and matching stages based on the B-SIFT algorithm.The execution speeds fluctuated based on the specific algorithms used (refer to Table 3).

As shown in Table 3, the RMS of the ILS was 2.3478 pix, significantly lower than that of the DA algorithm (4.2301 pix) and the GD algorithm (3.4859 pix). This demonstrates that the B-SIFT-ILS achieved superior performance in terms of mosaicing accuracy, effectively minimizing errors and seams in the mosaiced images. In terms of real-time performance, the mosaicing time of the B-SIFT-ILS on the NVIDIA Jetson Orin NX was 1124 ms, approximately 34% faster than the B-SIFT-DA and 54% faster than the B-SIFT-GD. Compared with the DL approach, the B-SIFT-ILS exhibited a slightly higher RMS than the B-SIFT-DL. However, since DL relies on a simulated smoothing strategy requiring complex network architectures, its execution time on the NVIDIA Jetson Orin NX platform reached 3165 ms, far exceeding the requirements for real-time rapid mosaicing. This indicated that in practical applications, the B-SIFT-ILS could accomplish the image mosaicing task within a significantly shorter timeframe. Overall, the proposed B-SIFT-ILS achieved an optimal balance between mosaicing accuracy and computational efficiency, demonstrating strong practical applicability.

4. Discussion

This study proposed a UAV-based image mosaicing system that integrates an edge computing platform with the B-SIFT-ILS algorithm. Experimental results demonstrated that the proposed framework improves both feature matching accuracy and image stitching efficiency compared with conventional SIFT-RANSAC approaches, while enabling real-time deployment on resource-constrained edge devices.

The proposed system achieved a processing time of 1124 ms on the NVIDIA Jetson Orin NX platform, satisfying the sub-2 s latency requirement for real-time UAV applications. This latency is compatible with the operational needs of agricultural monitoring tasks, such as crop health assessment, pest detection, irrigation management, and field boundary mapping, which generally require response times on the order of seconds rather than milliseconds. Consequently, image mosaicing can be completed during the same flight mission, enabling near-real-time decision-making without interrupting UAV operations. These findings support the observations of Satyanarayanan [5] and Zhou et al. [4], who emphasized that low-latency edge computing is essential for time-sensitive applications. By performing image processing directly at the edge, the proposed framework effectively avoids the latency, bandwidth, and connectivity limitations associated with cloud-based solutions.

The improved registration performance is primarily attributed to the BANSAC-based feature purification strategy. Originally proposed by Piedade and Miraldo [28], BANSAC employs Bayesian inference to guide feature sampling. Our implementation extends this approach by integrating BANSAC into a UAV-oriented SIFT pipeline and incorporating geographic registration constraints. As a result, the proposed method achieved a rotation accuracy of 0.879 and a translation accuracy of 0.877, outperforming conventional RANSAC in both accuracy and computational efficiency.

The proposed ILS smoothing method builds upon the half-quadratic regularization framework of Geman and Yang [33]. Through iterative optimization and a robust penalty function, the method effectively suppresses noise while preserving image structures. The objective function energy decreased by 72. Within 12 iterations, indicating rapid convergence and stable optimization behavior. Although the deep learning baseline achieved slightly higher PSNR and SNR values, its processing time (3165 ms) was nearly three times longer than that of the proposed ILS method. This result highlights an important trade-off between image quality and computational cost. For edge computing scenarios where energy efficiency and low latency are critical, the proposed ILS approach provides a more practical balance between stitching quality and processing performance.

Beyond its technical contributions, the proposed framework offers significant practical value. The ability to complete UAV image mosaicing within approximately 1.1 s on a low-power edge device enables crop condition assessments to be performed during the same flight mission without cloud connectivity. This capability reduces storage and communication costs while supporting rapid responses to agricultural challenges such as pest infestations, irrigation failures, and nutrient deficiencies. Furthermore, the low-cost edge architecture makes precision agriculture more accessible in regions with limited network infrastructure, promoting broader adoption of remote sensing technologies and more sustainable agricultural practices.

Despite these promising results, several limitations remain. The current system assumes relatively stable flight conditions and predominantly planar scenes, and its performance under adverse weather, low-light environments, and non-planar terrain has not yet been fully evaluated. In addition, the BANSAC algorithm still relies on manual parameter tuning. Future work will focus on adaptive parameter optimization, multi-plane homography estimation, lightweight deep learning integration, and validation under a wider range of agricultural and environmental conditions.

5. Conclusions

Cloud-centric UAV image mosaicing remains impractical for real-time field applications due to communication latency, bandwidth constraints, and dependence on network connectivity. To address these limitations, this paper developed an edge-native UAV image mosaicing system based on the proposed B-SIFT-ILS algorithm, which integrates feature extraction, homography estimation, image registration, and seamless image blending within a resource-constrained edge computing environment.

The proposed framework contributes to the field in three key aspects. First, a BANSAC-based feature purification strategy was integrated into the SIFT feature extraction process, improving the robustness of feature matching for UAV remote sensing imagery. Experimental results demonstrated a rotation accuracy of 0.879 and a translation accuracy of 0.877, outperforming conventional RANSAC-based approaches. Second, an Iterative Least Squares (ILS) smoothing method was developed to achieve seamless image blending. The proposed optimization strategy reduced the objective function energy by up to 72% within 12 iterations while effectively suppressing ghosting artifacts without introducing gradient reversal. Third, the complete B-SIFT-ILS pipeline was successfully deployed on an NVIDIA Jetson Orin NX edge computing platform, achieving a processing time of 1124 ms per image pair. This latency satisfies the sub-2 s requirement commonly associated with real-time UAV monitoring tasks and demonstrates the feasibility of performing image mosaicing directly at the network edge.

The results further indicate that the proposed framework achieves a favorable balance between stitching quality and computational efficiency. Although deep learning-based methods produced slightly higher PSNR and SNR values, their substantially longer processing time makes them less suitable for resource-constrained edge platforms. In contrast, the proposed B-SIFT-ILS framework provides competitive image quality while maintaining the low latency required for real-time deployment.

Beyond its technical contributions, the proposed system offers practical value for a wide range of low-latency remote sensing applications, including crop monitoring, forest boundary mapping, disaster damage assessment, and other time-sensitive UAV missions. By eliminating the need for cloud-based processing, the framework reduces communication overhead and enables near-real-time aerial image analysis during the same flight mission. Furthermore, the reproducibility protocol presented in Section 2, together with the algorithm descriptions and pseudocode, provides a clear implementation pathway for researchers and practitioners seeking to deploy edge-native image mosaicing systems.

Several limitations remain. Although the reference-based homography estimation framework and BANSAC-enhanced feature matching provide robustness against moderate image drift and flight perturbations, the system has not yet been systematically evaluated under severe wind disturbances, large attitude variations, adverse weather conditions, or highly non-planar terrains. In addition, BANSAC currently requires manual parameter configuration. Future work will focus on adaptive parameter optimization, IMU-assisted motion compensation, multi-plane homography estimation, and the integration of lightweight deep learning models optimized for edge computing platforms. The dataset and source code will be made publicly available through the provided repository prior to final publication to further support reproducibility and future research.

Author Contributions

Conceptualization, L.W. and Z.L.; methodology, L.W.; software and algorithm, L.W. and Y.Y.; validation, L.W. and L.C.; formal analysis, L.W.; investigation, L.C., M.Z. and Z.Z.; resources, Y.T. and Z.L.; data curation, L.W.; writing—original draft preparation, L.W.; writing—review and editing, L.W., Y.T. and Z.L.; visualization, Y.Y.; supervision, L.W.; project administration, Y.T.; funding acquisition, L.W. and Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the SCIENTIFIC RESEARCH PROJECT OF THE NATURAL SCIENCE FOUNDATION OF HUNAN PROVINCE, Grant Number 2024JJ6226. This research was also partly supported by the SCIENTIFIC RESEARCH FUND OF HUNAN PROVINCIAL EDUCATION DEPARTMENT, Grant Numbers 25A0593 and 23B0761. Yonghong Tan is corresponding authors.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in Github at https://github.com/WLH-SCAU/UAV-based-Image-Mosaicing, accessed on 16 June 2026.

Acknowledgments

Special thanks are extended to the Hunan Engineering Research Center for Smart Agriculture (Fruits and Vegetables) Information Perception and Early Warning for their provision of equipment support.

Conflicts of Interest

Author Zhizhuang Liu was employed by the company Hunan Golden Ant Intelligent Equipment Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Mamabolo, E.; Mashala, M.J.; Mugari, E.; Mogale, T.E.; Mathebula, N.; Mabitsela, K.; Ayisi, K.K. Application of precision agriculture technologies for crop protection and soil health. Smart Agric. Technol. 2025, 12, 101270. [Google Scholar] [CrossRef]
Bablu, T.A.; Rashid, M.T. Edge computing and its impact on real-time data processing for IoT-driven applications. J. Adv. Comput. Syst. 2025, 5, 26–43. [Google Scholar]
Zhou, Z.; Abawajy, J.; Chowdhury, M.; Hu, Z.; Li, K.; Cheng, H.; Alelaiwi, A.A.; Li, F. Minimizing SLA violation and power consumption in Cloud data centers using adaptive energy-aware algorithms. Future Gener. Comput. Syst. 2018, 86, 836–850. [Google Scholar] [CrossRef]
Zhou, Z.; Shojafar, M.; Alazab, M.; Abawajy, J.; Li, F. AFED-EF: An energy-efficient VM allocation algorithm for IoT applications in a cloud data center. IEEE Trans. Green Commun. Netw. 2021, 5, 658–669. [Google Scholar] [CrossRef]
Satyanarayanan, M. The emergence of edge computing. Computer 2017, 50, 30–39. [Google Scholar] [CrossRef]
Bonomi, F.; Milito, R.; Zhu, J.; Addepalli, S. Fog computing and its role in the internet of things. In MCC ’12: Proceedings of the First Edition of the MCC Workshop on Mobile Cloud Computing; ACM Digital Library: New York, NY, USA, 2012; pp. 13–16. [Google Scholar]
Yi, S.; Li, C.; Li, Q. A survey of fog computing: Concepts, applications and issues. In Mobidata ’15: Proceedings of the 2015 Workshop on Mobile Big Data; ACM Digital Library: New York, NY, USA, 2015; pp. 37–42. [Google Scholar]
Gai, K.; Qiu, M.; Zhao, H.; Tao, L.; Zong, Z. Dynamic energy-aware cloudlet-based mobile cloud computing model for green computing. J. Netw. Comput. Appl. 2016, 59, 46–54. [Google Scholar] [CrossRef]
Zhou, Z.; Abawajy, J. Reinforcement learning-based edge server placement in the intelligent internet of vehicles environment. In IEEE Transactions on Intelligent Transportation Systems; IEEE: New York, NY, USA, 2025. [Google Scholar]
Otterness, N.; Yang, M.; Rust, S.; Park, E.; Anderson, J.H.; Smith, F.D.; Berg, A.; Wang, S. An evaluation of the NVIDIA TX1 for supporting real-time computer-vision workloads. In 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS); IEEE: New York, NY, USA, 2017; pp. 353–364. [Google Scholar]
Reddy, B.; Kim, Y.H.; Yun, S.; Seo, C.; Jang, J. Real-time Driver drowsiness detection for embedded system using model compression of deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition Workshops; IEEE: New York, NY, USA, 2017; pp. 121–128. [Google Scholar]
Kekec, T.; Yildirim, A.; Unel, M. A new approach to real-time mosaicing of aerial images. Robot. Auton. Syst. 2014, 62, 1755–1767. [Google Scholar] [CrossRef]
Patil, H.; Sharma, S.; biswas, S. A Robust Image Mosaicing Using Improved SIFT Technique. In International Conference on Advances in Data-Driven Computing and Intelligent Systems; Springer: Berlin/Heidelberg, Germany, 2023; pp. 457–468. [Google Scholar]
Lucas, B.D.; Kanade, T. An iterative image registration technique with an application to stereo vision. In IJCAI’81: 7th International Joint Conference on Artificial Intelligence; ACM Digital Library: New York, NY, USA, 1981; Volume 2, pp. 674–679. [Google Scholar]
Baker, S.; Matthews, I. Lucas-kanade 20 years on: A unifying framework. Int. J. Comput. Vis. 2004, 56, 221–255. [Google Scholar] [CrossRef]
Kang, E.Y.; Cohen, I.; Medioni, G. A graph-based global registration for 2d mosaics. In 15th International Conference on Pattern Recognition. ICPR-2000; IEEE: New York, NY, USA, 2000; Volume 1, pp. 257–260. [Google Scholar]
Fusiello, A.; Aprile, M.; Marzotto, R.; Murino, V. Mosaic of a video shot with multiple moving objects. In 2003 International Conference on Image Processing (Cat. No. 03CH37429); IEEE: New York, NY, USA, 2003; Volume 2, p. II-307. [Google Scholar]
Pfingsthorn, M.; Birk, A.; Schwertfeger, S.; Bülow, H.; Pathak, K. Maximum likelihood mapping with spectral image registration. In 2010 IEEE International Conference on Robotics and Automation; IEEE: New York, NY, USA, 2010; pp. 4282–4287. [Google Scholar]
Steder, B.; Grisetti, G.; Stachniss, C.; Burgard, W. Visual SLAM for flying vehicles. IEEE Trans. Robot. 2008, 24, 1088–1093. [Google Scholar] [CrossRef]
Botterill, T.; Mills, S.; Green, R. Real-time aerial image mosaicing. In 2010 25th International Conference of Image and Vision Computing New Zealand; IEEE: New York, NY, USA, 2010; pp. 1–8. [Google Scholar]
Lin, Y.; Medioni, G. Map-enhanced UAV image sequence registration and synchronization of multiple image sequences. In 2007 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2007; pp. 1–7. [Google Scholar]
Marburg, A.; Hayes, M.P. Smartpig: Simultaneous mosaicking and resectioning through planar image graphs. In 2015 IEEE International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2015; pp. 5767–5774. [Google Scholar]
Ullah, S.; Kim, D.H. Benchmarking Jetson platform for 3D point-cloud and hyper-spectral image classification. In 2020 IEEE International Conference on Big Data and Smart Computing (BigComp); IEEE: New York, NY, USA, 2020; pp. 477–482. [Google Scholar]
Li, L.; Hong, H. SAR image geometry correction technology based on block parallel signal processing. In SSPS ’23: Proceedings of the 2023 5th International Symposium on Signal Processing Systems; ACM Digital Library: New York, NY, USA, 2023; pp. 38–43. [Google Scholar]
Wang, L.; Yue, X.; Wang, H.; Ling, K.; Liu, Y.; Wang, J.; Hong, J.; Pen, W.; Song, H. Dynamic inversion of inland aquaculture water quality based on UAVs-WSN spectral analysis. Remote Sens. 2020, 12, 402. [Google Scholar] [CrossRef]
Gong, L.H.; Luo, H.X. Dual color images watermarking scheme with geometric correction based on quaternion FrOOFMMs and LS-SVR. Opt. Laser Technol. 2023, 167, 109665. [Google Scholar] [CrossRef]
Yuan, Y.; Chu, J.; Leng, L.; Miao, J.; Kim, B.G. A scale-adaptive object-tracking algorithm with occlusion detection. EURASIP J. Image Video Process. 2020, 2020, 7. [Google Scholar] [CrossRef]
Piedade, V.; Miraldo, P. Bansac: A dynamic bayesian network for adaptive sample consensus. In IEEE/CVF International Conference on Computer Vision; IEEE: New York, NY, USA, 2023; pp. 3738–3747. [Google Scholar]
Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
Gastal, E.S.; Oliveira, M.M. Adaptive manifolds for real-time high-dimensional filtering. ACM Trans. Graph. (TOG) 2012, 31, 1–13. [Google Scholar] [CrossRef]
Mazumdar, A.; Alaghi, A.; Barron, J.T.; Gallup, D.; Ceze, L.; Oskin, M.; Seitz, S.M. A hardware-friendly bilateral solver for real-time virtual reality video. In HPG ’17: Proceedings of High Performance Graphics; ACM Digital Library: New York, NY, USA, 2017; pp. 1–10. [Google Scholar]
Gharbi, M.; Chen, J.; Barron, J.T.; Hasinoff, S.W.; Durand, F. Deep bilateral learning for real-time image enhancement. ACM Trans. Graph. (TOG) 2017, 36, 1–12. [Google Scholar] [CrossRef]
Geman, D.; Yang, C. Nonlinear image recovery with half-quadratic regularization. IEEE Trans. Image Process. 1995, 4, 932–946. [Google Scholar] [CrossRef] [PubMed]
Rabbani, M.B.A.; Musarat, M.A.; Alaloul, W.S.; Rabbani, M.S.; Maqsoom, A.; Ayub, S.; Bukhari, H.; Altaf, M. A comparison between seasonal autoregressive integrated moving average (SARIMA) and exponential smoothing (ES) based on time series model for forecasting road accidents. Arab. J. Sci. Eng. 2021, 46, 11113–11138. [Google Scholar] [CrossRef]
He, J.; Peng, B.; Feng, Z.; Zhong, S.; He, B.; Wang, G. A Gaussian mixture unscented Rauch–Tung–Striebel smoothing framework for trajectory reconstruction. IEEE Trans. Ind. Inform. 2024, 20, 7481–7491. [Google Scholar] [CrossRef]
Liu, W.; Zhang, Z.; Li, X.; Hu, J.; Luo, Y.; Du, J. Enhancing recommendation systems with GNNs and addressing over-smoothing. In 2024 4th International Conference on Electronic Information Engineering and Computer Communication (EIECC); IEEE: New York, NY, USA, 2024; pp. 1184–1189. [Google Scholar]

Figure 1. UAVs remote sensing image real-time mosaicing system.

Figure 2. Drift caused by estimation errors.

Figure 3. Image matching flowchart.

Figure 4. Feature vector generating from key points neighborhood gradient information.

Figure 5. Results of extracting SIFT feature vectors from overlapping regions: (a) Orchard image 001_IMG overlapping extraction; (b) Orchard image 002_IMG overlapping extraction; (c) Cornfield Image 001_IMG overlapping extraction; (d) Cornfield image 002_IMG overlapping extraction.

Figure 6. Results for a fixed number of iterations, i.e., without stopping criterion.

Figure 7. Remote sensing image matching results: (a) Orchard image matching results; (b) Cornfield image matching results.

Figure 8. Multi-frame image mosaicing results: (a) Orchard mosaicing image; (b) Cornfield mosaicing image.

Figure 9. Objective function energy plots in Equation (11) with respect to the iteration number of Equation (15): (a)

ω

= 1; (b) p = 0.8.

Figure 9. Objective function energy plots in Equation (11) with respect to the iteration number of Equation (15): (a)

ω

= 1; (b) p = 0.8.

Figure 10. Visual comparison of image detail enhancement in terms of gradient reversals: (a) Input; (b) DA; (c) GD; (d) ILS; (e) DL.The yellow and red boxes represent fruit-tree and vehicle feature regions, respectively.

Table 1. Parameters of the K510 and NVIDIA Jetson Orin NX.

Type	K510	NVIDIAJetsonOrinNX
Parameters	CPU: 64-Bit, RISC-V	CPU: 64-Bit Cortex-A78AE
	Calculate peak capacity: 2.5 T flops	GPU: 1024-core with 1024 CUDA cores
	Memory: 512 M	Calculate peak capacity: 157 T flops
	Storage: 8 GB	Memory: 16 GB
	Power: 1.8 W	Storage: 64 GB
	OS: FreeRTOS	Power: 7.5 W/15 W
	Camera: OV7725	OS: Linux

Table 2. Comparison of statistical results of smoothing algorithms.

Index	SmoothingAlgorithm
Index	ILS	DA	GD	DL
Information entropy	5.7005	5.8641	5.8922	5.6142
Standard deviation	34.1321	29.4862	29.8901	33.9875
Spatial frequency (Hz)	3.6358	2.9841	3.8672	4.2153
Average gradient	7.5102	5.7823	6.0125	7.8542
SNR (dB)	37.5684	21.5846	31.6249	39.1524
Peak SNR (dB)	46.5461	44.5713	44.5712	46.9852

Table 3. Computational time comparison of different B-SIFT variants on the NVIDIA Jetson Orin NX platform.

Index	SmoothingAlgorithm
Index	ILS	DA	GD	DL
Total matches	610,150	610,150	610,150	610,150
RMS (pix)	2.3478	4.2301	3.4859	2.0055
Mosaic time on NVIDIA Jetson Orin NX (ms)	1015	1550	2224	3165

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, L.; Liu, Z.; Yang, Y.; Chen, L.; Zhou, Z.; Zeng, M.; Tan, Y. An Edge Computing-Enabled UAV-Based Image Mosaicing System Using a Novel B-SIFT-ILS Algorithm. Algorithms 2026, 19, 489. https://doi.org/10.3390/a19060489

AMA Style

Wang L, Liu Z, Yang Y, Chen L, Zhou Z, Zeng M, Tan Y. An Edge Computing-Enabled UAV-Based Image Mosaicing System Using a Novel B-SIFT-ILS Algorithm. Algorithms. 2026; 19(6):489. https://doi.org/10.3390/a19060489

Chicago/Turabian Style

Wang, Linhui, Zhizhuang Liu, Yu Yang, Lizhi Chen, Zhenqi Zhou, Mengyu Zeng, and Yonghong Tan. 2026. "An Edge Computing-Enabled UAV-Based Image Mosaicing System Using a Novel B-SIFT-ILS Algorithm" Algorithms 19, no. 6: 489. https://doi.org/10.3390/a19060489

APA Style

Wang, L., Liu, Z., Yang, Y., Chen, L., Zhou, Z., Zeng, M., & Tan, Y. (2026). An Edge Computing-Enabled UAV-Based Image Mosaicing System Using a Novel B-SIFT-ILS Algorithm. Algorithms, 19(6), 489. https://doi.org/10.3390/a19060489

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Edge Computing-Enabled UAV-Based Image Mosaicing System Using a Novel B-SIFT-ILS Algorithm

Abstract

1. Introduction

2. Materials and Methods

2.1. System Setup

2.2. Data Collection

2.3. Image Mosaicing

2.3.1. Reference-Based Homography Estimation for UAV Image Mosaicing

2.3.2. Reproducibility Protocol

2.4. Proposed Mosaicing Approach

2.4.1. Geometric Correction of Image Distortion

2.4.2. Image Matching

2.4.3. Image Smoothing

2.5. Edge Computing Deployment

3. Results

3.1. SIFT Feature Extraction Results in Image Overlap Region

3.2. Analysis of ILS-Based Image Smoothing Iteration Results

3.3. Comparative Analysis of Different Smoothing Algorithms

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI