LargeStitch: Efficient Seamless Stitching of Large-Size Aerial Images via Deep Matching and Seam-Band Fusion

Zhou, Jianglei; Wei, Zhaoyu; Zhong, Yisen; He, Xianqiang

doi:10.3390/rs18101481

Open AccessArticle

LargeStitch: Efficient Seamless Stitching of Large-Size Aerial Images via Deep Matching and Seam-Band Fusion

¹

School of Oceanography, Shanghai Jiao Tong University, Shanghai 200030, China

²

State Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(10), 1481; https://doi.org/10.3390/rs18101481

Submission received: 16 December 2025 / Revised: 16 February 2026 / Accepted: 5 March 2026 / Published: 9 May 2026

(This article belongs to the Special Issue Intelligent Image Analysis: Advancing Remote Sensing with Artificial Intelligence)

Download

Browse Figures

Versions Notes

Highlights

We propose an efficient and seamless stitching method for large-size aerial imagery. By leveraging a deep learning-based dense matching approach, accurate alignment is achieved, while a Seam-band fusion strategy is designed to eliminate ghosting and misalignment. Compared with existing methods, our approach significantly enhances processing efficiency while maintaining high-quality stitching performance.

What are the main findings?

Superior Alignment Robustness: The framework demonstrates high precision in challenging scenarios involving significant rotation, size variation, low overlap, and textureless regions.
Optimized Computational Efficiency: Significant speedup is achieved via a pre-stitching filtering strategy and a Seam-band fusion approach that avoids the heavy overhead of traditional Seam-driven optimization.
Seamless Visual Reconstruction: The method transitions from pixel-level blending to feature-level content coordination, ensuring ghosting-free, seamless panoramas for large-size imagery.

What is the implication of the main finding?

Practical Solutions for Large-size Monitoring: The proposed method provides a high-speed stitching solution for applications such as environmental monitoring, enabling the rapid acquisition of refined and comprehensive spatial intelligence for target areas.

Abstract

High-resolution panoramas generated by UAV image stitching are indispensable image resources for remote sensing applications. However, most existing stitching methods are designed for small-size images, making it difficult to process large-size images efficiently, leading to problems such as image feature misalignment and low generation efficiency. This paper presents LargeStitch, a novel batch stitching method for large-size UAV images. The method introduces advanced image matching and alignment strategies through deep learning techniques to achieve efficient extraction and accurate alignment of dense features. To further optimize the stitching effect, this paper also proposes a seamless fusion method based on Seam-band, which effectively solves the problem of ghosting and misalignment in the overlapping region of large-size images. In addition, we designed a mask-based pre-stitching image filtering strategy, which optimizes the selection of candidate images to reduce content redundancy, thereby effectively avoiding unnecessary computational overhead and time consumption. The experimental results show that LargeStitch is not only capable of realizing fast stitching of high-precision and large-size aerial images but also significantly outperforms existing methods in terms of stitching quality and processing efficiency, making it a practical solution for realizing high-efficiency and seamless aerial image stitching.

Keywords:

seamless stitching; large-size images; deep learning; homography; unmanned aerial vehicle (UAV)

1. Introduction

With the rapid advancement of remote sensing technologies, unmanned aerial vehicles (UAVs) have become essential tools for high-resolution geographic data acquisition [1,2,3,4]. Owing to their flexibility, cost-effectiveness, and ability to acquire ultra-high-resolution imagery, UAV-based data are widely applied in urban modeling [5], disaster and hazard monitoring [6,7,8], geological and environmental mapping, including seismic and tectonic analyses [9], and precision agriculture [10]. However, a single UAV image is often limited by the camera’s field of view, making it insufficient to cover larger geographic areas. This constraint means that a single image cannot meet the demand for high-precision remote sensing applications for large-size, detail-rich geographic information [11,12]. Consequently, the stitching of multiple UAV images together to create high-resolution panoramic images that cover larger areas has become a key technique to enhance the quality and depth of remote sensing data [13].

Image stitching generally involves two steps: image alignment and image composition [14,15]. Image alignment relies on extracting manual features from image pairs to warp images with different positions and orientations onto a common plane, aiming to align as much of the overlapping region as possible. Perfectly aligning every pixel is a challenging task, often resulting in artifacts and unnatural transitions in the stitched image [16,17,18]. As a result, image composition is essential for improving stitching quality. Two commonly used composition techniques are optimal Seam detection and pixel-based blending in the overlapping regions [19]. The optimal Seam refers to a set of pixels that minimizes both intensity and geometric differences between adjacent images, effectively hiding visible artifacts by selecting the Seam in the region with the least misalignment. Pixel-based blending is employed to mitigate significant color discrepancies between neighboring images, which are often caused by variations in lighting, weather, or camera exposure.

Currently, numerous image stitching methods have been proposed to generate panoramic results [3,20,21,22,23,24,25]. These methods are primarily designed for small-size images (<10 megapixels) and are not directly applicable to seamless stitching of large-size aerial images. In contrast to smaller images, high-resolution aerial images require more stringent alignment accuracy to prevent misalignment and artifacts [26]. Additionally, the computational cost of keypoint matching and optimal Seam detection increases exponentially as image resolution rises [27,28]. Moreover, UAV images often face inherent challenges such as large image numbers, limited coverage, and significant overlap due to constraints in flight altitude and camera focal length [18]. These factors make stitching large-size images more complex, with notable bottlenecks in robustness, processing efficiency, and stitching accuracy. In practical applications, some traditional methods may take several hours or even days to complete the stitching and scene reconstruction of large-size images [29]. Clearly, such inefficiencies hinder the ability to meet the demand for fast and accurate real-time scene distribution of target areas.

In recent years, researchers have proposed various improvements to address the challenges of stitching large-size images, including feature point optimization [30,31], image partitioning [32,33], and non-feature-based pose transformation [16,29]. While these methods have made progress in improving accuracy and efficiency, they still have limitations. Feature matching optimization improves image alignment accuracy but increases computational demands. Image partitioning alleviates memory pressure but introduces additional processing time. Parallel computing and non-feature-based pose transformations accelerate the image transformation process but sacrifice stitching accuracy. Overall, existing methods struggle to balance the accuracy, efficiency, and robustness required for large-size image stitching, highlighting the need for a more comprehensive solution.

Addressing the challenges of fast and high-quality stitching for large-size aerial images, we propose a new efficient and seamless stitch method, which is named LargeStitch. This method aims at the stitching scenario of large-size aerial images, taking into account the improvement of stitching efficiency and quality. By incorporating deep learning technology, we achieve more precise and efficient image matching and registration. Additionally, we proposed a Seam-band image fusion method to address misalignment and ghosting issues in overlapping image regions effectively. Furthermore, a mask pre-selection framework is designed to optimize the selection of images for stitching, reducing redundant content and shortening the stitching process. Experimental results on real UAV data demonstrate that the proposed method outperforms several state-of-the-art stitching methods in both objective and subjective evaluations.

The contributions of this work include the following:

We propose a novel robust stitching framework. To the best of our knowledge, it is the first method integrating deep learning for efficient, seamless stitching of large-size images.
We propose a novel Seam-band fusion method that transforms the traditional pixel-level composition problem into an image content harmonization task, effectively reducing misalignment and artifacts.
We design a mask-based image filtering strategy to reduce redundant images, optimize computational resources, and minimize cumulative errors during stitching.
Extensive experiments demonstrate that the proposed LargeStitch method outperforms several state-of-the-art stitching techniques in both qualitative analysis and quantitative metrics.

The remainder of this paper is organized as follows: Section 2 provides a brief review of related image stitching methods and their limitations; Section 3 details the technical aspects of the LargeStitch method; Section 4 presents the experimental results and analysis; and Section 5 and Section 6 discusses future research directions and concludes the paper.

2. Related Work

Image stitching methods are generally categorized into three main types based on the alignment approach: traditional feature-based methods, georeferencing-based methods, and deep learning-based methods. In this section, we will review these methods and discuss the challenges associated with large-size image stitching.

2.1. Traditional Feature-Based Image Stitching Methods

Classic feature extraction methods, such as SIFT [34], SURF [35], and ORB [36], generate key points and corresponding descriptors to identify correspondences between images. By detecting and matching these key points, images with different disparities can be aligned onto a common reference plane [37].

Adaptive Warping Methods: These methods align images by applying global geometric transformations or local spatial warping to overlapping regions. Global warping methods [16,38,39] use homography transformations to estimate the geometric distortion of the image overlap. In contrast, local spatial warping methods [20,21,25] divide overlapping regions into multiple planes, using local homography or grid deformations to map feature points topologically, achieving more flexible alignment. This approach is particularly robust when handling scene variations and non-uniform geometries. However, as image resolution and the number of images increase, the computational cost of traditional feature extraction methods escalates, leading to issues such as mismatches and insufficient feature points, resulting in suboptimal alignment [40]. Additionally, local warping methods face the challenge of maintaining topological consistency, which further increases computational load [41] and leads to projection distortions in non-overlapping regions, causing visible artifacts and deformations in large-size images [42].

Seam-Driven Methods: Unlike adaptive warping methods, which focus on geometric alignment, Seam-driven methods [19,43,44,45,46,47,48,49] aim to eliminate visible artifacts at image boundaries by optimizing the Seam placement in overlapping regions, enhancing the visual quality of the stitched result. These methods rely on coarse alignment through feature matching, followed by energy minimization (based on color differences, gradients, or exposure variations) to determine the optimal Seam. While effective in improving visual quality, Seam-driven methods are computationally expensive, even for small images, as finding the optimal Seam is an iterative process that is time-consuming [50]. This high computational cost becomes a significant bottleneck for their rapid application in large-size image stitching.

2.2. Deep Learning-Based Image Stitching Methods

In recent years, deep learning-based methods have emerged as a new research trend in image stitching, gaining widespread attention and application. Shi et al. [51] proposed incorporating Convolutional Neural Networks (CNNs) into the feature detection phase, allowing the model to learn features directly from data, eliminating the need for traditional hand-crafted feature extraction. Nie et al. [52] introduced the first complete learning-based framework for stitching images from free viewpoints, which was later expanded into an unsupervised seamless stitching method [37,53]. However, as image size increases, maintaining accurate alignment and avoiding artifacts becomes increasingly difficult. Additionally, due to GPU memory limitations, high-resolution input images are challenging to process directly. To address these limitations, MO et al. [30] proposed a novel deep learning-based matching framework that alleviates the resolution constraints, enabling the stitching of hyperspectral images (HSI). However, artifacts such as Seams still occur, especially when the overlap is low or when applied to large-size images with complex structures. In video stitching, Lai [54] leveraged the spatiotemporal relationships between image sequences, achieving success in synchronizing frames. However, these methods have not yet been extended to large-size or high-resolution image stitching applications.

Compared to traditional feature-based methods, deep learning-based stitching approaches overcome the need for endless geometric feature design, making them more robust to challenging scenes [55]. These methods have shown impressive results in terms of stitching quality and demonstrate significant potential in overcoming the limitations of conventional techniques [53].

2.3. Georeferencing-Based Image Stitching Method

Georeferencing-based image stitching methods primarily rely on GNSS and Inertial Measurement Unit (IMU) data, or ground control points, to estimate the transformation matrices between image views. These methods [15,56,57] are generally fast and resolution-independent, as they eliminate the need for complex geometric feature extraction. However, they heavily depend on the accuracy and sensitivity of onboard sensors, requiring high-cost, high-precision GNSS and IMU devices [15,50], and precise timestamp alignment, as even minor pose variations within short time intervals can affect the results.

Georeferencing-based methods typically struggle to achieve optimal alignment. To improve accuracy, these methods are often combined with feature-based approaches. Specifically, geospatial attitude information is used for initial registration, followed by finer alignment through feature matching. This hybrid approach is commonly found in commercial software like Agisoft Metashape [58] and Pix4Dmapper [59].

Overall, current methods still face challenges in large-size image stitching, particularly in balancing stitching quality and computational efficiency. In terms of stitching quality, image alignment is highly sensitive to feature-matching accuracy, a challenge that becomes more pronounced with larger images. Larger image areas require precise view transformations; otherwise, misalignments and Seams become noticeable. On the computational efficiency side, large images significantly increase the computational burden, making fast stitching difficult. Therefore, there is an urgent need for an efficient, accurate, and robust solution that can make large-size aerial image stitching as straightforward and effective as stitching small images.

3. Materials and Methods

Inspired by the classic Seam-driven image stitching pipeline, a novel deep learning-based method, LargeStitch, is proposed for fast and accurate stitching of large UAV image sequences. Figure 1 illustrates the workflow of LargeStitch, and Figure 2 describes the specific implementation details of the key stitching process. The LargeStitch framework consists of three main stages: (1) Mask-based pre-stitching filtering: Considering the significant overlap in UAV image sequences, a pre-stitching filtering strategy is proposed. Through downsampling and generating alignment masks, the minimal set of images required for constructing the final mosaic is identified, thus reducing redundancy in the stitching process. (2) Deep feature matching and alignment: Advanced deep learning-based feature matching techniques are employed for high-precision keypoint detection. The Graph-Cut RANSAC method is utilized to eliminate mismatches, ensuring robust alignment and minimizing pixel-level misalignments. (3) Image fusion based on the Seam-band: The boundary area between the current image and the next image to be stitched is defined as the Seam-band, and this region is further regarded as the foreground and embedded into the reference image. This approach transforms the pixel blending task into a region-specific harmonization process, preserving the texture details of the reference image.

LargeStitch is designed to provide efficient and accurate stitching for UAV image sequences and represents the first deep learning-based method capable of achieving seamless stitching for large-size aerial image datasets.

3.1. Deep Feature Matching

Feature matching is a crucial step in the geometric estimation of stereo images [50]. Accurate matching ensures precise keypoint detection and reliable point correspondences, which are essential for achieving high-quality image stitching, even in complex environments [60].

Figure 1. The framework of the proposed LargeStitch method.

Figure 2. The specific implementation details of the proposed LargeStitch method (the visual examples used in this flow diagram are obtained from the Dataset_village dataset).

In this study,

G I M_{d k m}

[61] (Generalizable Image Matcher for Dense Kernelized Feature Matching), a state-of-the-art deep learning-based dense feature matching method, is employed to replace traditional feature detection techniques. This method leverages powerful, dense deterministic estimation in combination with a balanced distortion sampling mechanism to achieve superior global matching. Compared to sparse or semi-sparse feature detectors [62,63],

G I M_{d k m}

generates a significantly higher number of accurate keypoint correspondences in overlapping regions, which greatly enhances the homography estimation between image pairs [64]. Considering that the original training dataset of

G I M_{d k m}

[61] already encompasses high-resolution video frames from various complex scenarios, the method is capable of handling diverse image transformations, such as scale variations, irregular rotations, and other extreme conditions. To ensure the broad applicability of our proposed stitch framework, we directly utilized the pre-trained weights provided by

G I M_{d k m}

[61] without any additional fine-tuning on UAV-specific datasets, which would provide a fair and rigorous demonstration of the robustness of our stitch method. Notably, the

G I M_{d k m}

deep match process is compatible with GPUs, which helps in efficient inference and fully utilizes hardware capabilities. The more important advantage is that it is more accurate and less demanding on the resolution requirement of the input image compared to other deep feature matching methods, which is an important support for the fast alignment of large-size image pairs.

3.2. Graph-Cut RANSAC for Robust Outlier Removal

Outliers and mismatches are inevitable during feature matching between image pairs. These issues can be mitigated through robust estimation, typically optimized using the classical RANSAC algorithm [65]. More advanced estimators can improve registration accuracy and reduce error accumulation. However, this step is often overlooked in the field of image stitching.

In this paper, the Graph-Cut RANSAC (GC-RANSAC) algorithm [66] is employed to optimize feature registration and estimate the global homography transformation matrix between image pairs. Compared to standard RANSAC, GC-RANSAC is more effective at handling complex geometric relationships and improves the accuracy of outlier detection. Experimental results demonstrate that, on a dataset consisting of various complex scenes, GC-RANSAC outperforms most other estimators [67]. Therefore, GC-RANSAC is chosen as the robust estimator for image pair registration and alignment in this study.

Given a set of matched point pairs

{(p_{i}, p_{j})}

, the transformation is estimated through the homography matrix H. The model error can be expressed as:

d_{i} = ∥ H p_{i} - p_{j} ∥,

(1)

where

d_{i}

is the reprojection error for points

p_{i}

and

p_{j}

under the model H.

For each model generated in the iteration, the reprojection error is computed for each data point, and the inlier set is defined as follows:

I = {i ∣ d_{i} < τ},

(2)

where points with an error below a threshold

τ

are considered inliers, and

I

represents the set of inliers with a reprojection error below the threshold.

In the GC-RANSAC algorithm, Graph-Cut techniques are integrated to optimize inlier selection and model estimation. By constructing a graph

G = (V, E)

, where V is the set of points and E is the set of edges connecting matched points, each edge is assigned a weight

w_{i j}

based on the error difference between neighboring point pairs:

w_{i j} = \exp (- \frac{| d_{i} - d_{j} |^{2}}{2 σ^{2}}),

(3)

where

σ

is a scale parameter for error differences. By performing Graph-Cut on this constructed graph, GC-RANSAC effectively retains a locally consistent inlier set.

To identify the optimal model, GC-RANSAC maximizes the inlier count while minimizing the Graph-Cut energy. The objective function

E (H)

is defined as:

E (H) = \sum_{i \in I} d_{i} + λ \sum_{(i, j) \in E} w_{i j},

(4)

where

λ

is a parameter that balances the number of inliers and the Graph-Cut energy. The goal is to minimize

E (H)

over all candidate models H.

The final model

H^{*}

chosen by GC-RANSAC is the one that minimizes the objective function

E (H)

:

H^{*} = \arg \min_{H} E (H) .

(5)

3.3. Image Alignment

The purpose of homography transformation in image alignment is to unify the projection planes of two images taken from different perspectives onto a common reference plane [50]. Figure 3 illustrates the homography transformation of two dual-view images. Given that aerial images captured by UAVs are taken from a significant altitude, variations in ground elevation can generally be ignored, allowing the scene within the images to be treated as lying on the same projection plane

π

.

n^{T} P + d = 0,

(6)

where

n

is the normal vector of the plane,

P

is a point on the plane, and d is the distance from the origin to the plane.

Therefore, two images with overlapping regions can be accurately aligned using the homography matrix H, as described by the following equation:

H = K (R + \frac{t n^{T}}{d}) K^{- 1},

(7)

where K is the camera intrinsic matrix, R is the rotation matrix between the two views, and t is the translation vector between the two views.

The homography matrix H enables the projection of points from one image to another when the target points lie on the same reality plane. The relationship between a pixel in image

I_{1}

and its corresponding pixel in image

I_{2}

can be described using the homography matrix H. For a point

p_{1} = {(x_{1}, y_{1}, 1)}^{T}

in image

I_{1}

, its corresponding point

p_{2} = {(x_{2}, y_{2}, 1)}^{T}

in image

I_{2}

can be expressed as:

p_{2} = H_{1, 2} p_{1},

(8)

where

H_{1, 2}

is the homography matrix that maps points from image

I_{1}

to image

I_{2}

.

The root mean square error (RMSE) for the reprojection of a pixel

(u_{i}, v_{i})

measures the deviation between its location in

I_{1}

and its mapped location in

I_{2}

using the homography transformation. If

(x_{i}^{'}, y_{i}^{'})

is the reprojected location of

(x_{i}, y_{i})

based on H, the RMSE is defined as:

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} ({(x_{i} - x_{i}^{'})}^{2} + {(y_{i} - y_{i}^{'})}^{2})},

(9)

where N is the total number of matching points,

(x_{i}, y_{i})

are the original pixel coordinates in

I_{1}

, and

(x_{i}^{'}, y_{i}^{'})

are the corresponding reprojected coordinates in

I_{2}

.

To compute the homography matrix between each image in a sequence relative to the first image, consider a sequence of images

I_{1}, I_{2}, \dots, I_{n}

. Let

H_{i, i + 1}

be the homography matrix that maps image

I_{i}

to

I_{i + 1}

. The homography matrix

H_{1, i}

, which maps image

I_{i}

back to the first image

I_{1}

, can be computed by composing the intermediate homography as follows:

H_{1, i} = H_{1, 2} H_{2, 3} \dots H_{i - 1, i},

(10)

where

H_{1, i}

represents the cumulative homography that relates the ith image to the first image in the sequence.

By integrating state-of-the-art feature matching and geometric estimation algorithms, this study ensures precise alignment of overlapping regions in the early stages of the image stitching process, thereby enhancing the quality of the resulting panoramic image and effectively reducing the error accumulation caused by multiple image operations.

3.4. Image Harmonization Based on Seam-Band

Accurate feature matching and geometric estimation allow two images with parallax to be effectively aligned under the planar assumption [39]. After transformation, the images are mapped into a unified coordinate system. However, due to potential brightness, color differences, or slight misalignments between the floating image and the reference image, direct blending using methods such as average blending or weighted transition often fails to achieve smooth visual effects in the overlapping regions [44].

Given that achieving perfect pixel-to-pixel alignment is impossible, height variations of objects within the images also introduce misalignments. Therefore, we proposed an image harmonization strategy based on the Seam-band to achieve high-quality fusion while avoiding projection distortion. As shown in Figure 4, the Seam-band region is defined along the overlapping boundary between the reference image

I_{1}

and the floating image

I_{2}

. The non-Seam regions, after transformation, are aligned to the same coordinate system as the background image S. In contrast, the Seam-band is treated as the foreground and harmonized with the background image to ensure visual consistency.

The novelty of this method lies in converting the pixel-level alignment blending process into a feature-level image harmonization task, utilizing the precise and effective geometric alignment effect brought by advanced feature matching, and solving the projection distortion and blending artifacts that usually appear in the overlapping area through feature gradient transition, ultimately achieving seamless stitching. Compared to local warping or Seam-driven blending techniques, this method has the advantages of reduced computational complexity and support for multi-image stitching.

To achieve harmonious fusion between the floating image and the reference image within the Seam zone, Poisson fusion [68] is adopted for gradient-domain transition, where the problem is formulated as solving the Poisson equation on the target region

Ω

under specified boundary conditions. Given a target region

Ω

in the overlapping area and its boundary

\partial Ω

, Poisson fusion seeks to find an intensity function f over

Ω

that minimizes the difference in gradients between the source image g and the target image f. For color images, each channel

c \in {R, G, B}

is fused independently by solving a separate Poisson equation:

Δ f_{c} = div v_{c} in Ω,

(11)

where

f_{c}

is the intensity function for color channel c in the target image;

v_{c} = \nabla g_{c}

represents the gradient field of the corresponding channel in the source image.

The boundary condition for each channel c is defined similarly:

f_{c} |_{\partial Ω} = f_{c}^{*},

(12)

where

f_{c}^{*}

represents the boundary values for channel c from the reference image.

To formalize the harmonization fusion process, the objective is to minimize the gradient difference in

Ω

. This can be expressed as minimizing the following energy function:

E (f) = \int_{Ω} {∥ \nabla f - v ∥}^{2} d Ω,

(13)

where

E (f)

represents the energy to be minimized, ensuring that f closely follows the gradient field

v

, thus achieving a seamless fusion between the floating and reference images.

By solving these equations for each channel, we ensure that color consistency is maintained across the Seam region, resulting in a visually seamless harmonization.

The proposed Seam-based harmonization strategy has several notable advantages over traditional blending methods. First, by isolating the Seam region for harmonization, the approach minimizes the introduction of color and brightness discrepancies across the entire image. Additionally, the computational complexity is reduced as only a localized region requires seamless harmonization, making it efficient for large-size multi-image stitching applications. This process effectively resolves common alignment and blending issues in overlapping regions, enabling high-quality and computationally efficient image mosaicking.

3.5. Mask-Based Pre-Stitching Image Filtering Strategy

In panoramic stitching, redundant images increase computation time and lead to alignment errors. To address this, a mask-based pre-stitching strategy with downsampled images is used to improve efficiency by excluding images with excessive overlap, as shown in Figure 5.

All input images are first downsampled to reduce resolution while retaining essential structural features. Let I represent an input image,

I_{d}

denote the downsampled version of I, and M be the corresponding binary mask generated for pre-stitching purposes:

I_{d} = Downsample (I),

(14)

M = GenerateMask (I_{d}) .

(15)

Next, a real-time and robust AKAZE feature matching method [69] is applied to the downsampled images for coarse alignment. Given a pair of downsampled images

I_{d_{i}}

and

I_{d_{i + 1}}

, the AKAZE matching process generates a set of matched keypoints

{p_{i, i + 1}}

:

{p_{i, i + 1}} = AKAZE (I_{d_{i}}, I_{d_{i + 1}}) .

(16)

Based on these matched keypoints, a rough transformation

T_{i, j}

is estimated to align image

I_{d_{i}}

with

I_{d_{j}}

:

T_{i, i + 1} = EstimateTransform ({p_{i, i + 1}}) .

(17)

Using these transformations, each mask

M_{i}

is aligned to the reference frame, and all aligned masks are merged to form a composite mask

M_{c}

:

M_{c} = ⋃_{i = 1}^{N} T_{i} M_{i} .

(18)

Here,

T_{i}

represents the cumulative transformation that aligns mask

M_{i}

to the panoramic reference frame.

The composite mask

M_{c}

is then compared with the current panoramic mask

M_{p}

to identify and filter redundant images that do not add new coverage. The filtering function F selects only essential images required for the panorama:

{I_{selected}} = F (M_{c}, M_{p}) .

(19)

Finally, to ensure adequate overlap between adjacent images, interpolation is applied to fill any gaps in the filtered sequence. This mask-based strategy minimizes the number of processed images, reduces cumulative errors, and improves computational efficiency in large-size panoramic stitching.

3.6. Algorithm

Algorithm 1 introduces the LargeStitch framework, designed for high-quality, efficient stitching of large-size UAV images. This framework integrates deep feature extraction, robust alignment, and seamless blending in three streamlined steps, maximizing alignment accuracy, computational efficiency, and visual coherence. The Seam-band fusion technique further enhances detail retention and ensures smooth transitions, effectively leveraging deep feature alignment for seamless panoramic results.

Algorithm 1 LargeStitch: deep learning-based aerial image stitching framework

1:: Input: UAV aerial images $I = {I_{1}, I_{2}, \dots, I_{N}}$
2:: Step 1: Pre-Stitching Mask Filtering (Optional)
3:: if high sampling frequency and large overlap then
4:: Downsample images I and perform coarse matching
5:: Estimate transformations $T_{i, i + 1}$ and generate masks $M_{i}$
6:: Merge aligned masks to form composite mask $M_{c}$ , then filter redundant images by Equation (18)
7:: return $I = I_{selected}$ by Equation (19)
8:: Step 2: Deep Feature Matching and Alignment
9:: Initialize input sequence ${I_{input}}$ based on Step 1 results
10:: for each image pair $(I_{i}, I_{i + 1}) \in {I_{input}}$ do
11:: Extract dense features with $G I M_{d k m}$ , match keypoints ${p_{i}, p_{i + 1}}$
12:: Filter matches with Graph-Cut RANSAC (threshold =1) and estimate $H_{i, i + 1}$ by Equation (5)
13:: Step 3: Image Harmonization Based on the Seam-Band
14:: Initialize reference image $I_{pan} = I_{1}$ ( $i = 1$ )
15:: for each aligned image $I_{i + 1}$ do
16:: Align $I_{pan}$ and $I_{i + 1}$ using $H_{1, i + 1}$ by Equation (10)
17:: Incrementally stitch the non-overlapping regions of $I_{i + 1}$ based on $I_{pan}$ ; update image $I_{pan}$
18:: Define the Seam-band in the overlap of $I_{pan}$ and $I_{i + 1}$
19:: Apply Poisson fusion (mode set to NORMAL_CLONE) in the Seam-band region and update image $I_{pan}$
20:: end for
21:: Output: Stitched panoramic image $I_{pan}$

4. Results

In this section, extensive experiments are conducted to validate the effectiveness of the proposed LargeStitch method on large-size aerial images.

4.1. Dataset and Implementation Details

To evaluate the effectiveness and robustness of the proposed stitching method, experiments were conducted on four aerial image datasets: the UAV-AIRPAI dataset [16], the UAV-VisLoc dataset [70], the Switzerland dataset, and the Loubiere dataset [59]. These datasets cover diverse scenarios, providing a comprehensive benchmark for testing the method’s performance in complex environments.

As shown in Table 1, we have constructed four scene-specific datasets using representative aerial image sequences, covering major backgrounds such as grassland, villages, farmland, and buildings. These four types of environmental contexts encompass challenging scenarios, including illumination variations, low texture, and repetitive patterns, which effectively validate the robustness and applicability of the proposed method across different scenes.

This paper comprehensively evaluates the performance of the proposed method in comparison to other techniques, considering both image pairs and multi-image stitching scenarios. For multi-image panoramic stitching, the proposed method is compared with (SIFT + RANSAC) [71], UDIS++ [53], and classical commercial methods such as Autostitch [39] and Agisoft Metashape [58], with a qualitative analysis of the visual quality. In image pair stitching, the proposed method is compared with seven state-of-the-art stitching techniques, with a quantitative comparison of three image quality metrics. These methods cover a variety of strategies, including traditional global transformations, local transformations, deep learning-based methods, and georeferencing techniques with feature optimization. This allows for a multi-faceted assessment of the advantages of the proposed approach.

All experiments were performed on a system equipped with an Intel i7-14700KF CPU (3.6 GHz, 64 GB RAM) and an NVIDIA RTX 3090 GPU, with the deep learning environment configured using CUDA 12.4 and PyTorch 2.3.1.

4.2. Parameter Sensitivity Analysis

In order to evaluate the performance of image stitching methods, commonly used metrics such as the PSNR [72], the SSIM [73], and LPIPS [74] were employed. These metrics provide comprehensive evaluations from pixel differences, signal-to-noise ratio, structural consistency, and perceptual quality perspectives.

4.2.1. PSNR (Peak Signal-to-Noise Ratio)

The PSNR evaluates the visual quality of the reconstructed image, with higher values indicating better quality. The formula is:

PSNR = 10 \cdot {log}_{10} (\frac{L^{2}}{MSE}),

(20)

where L is the maximum pixel intensity (typically 255), and MSE is the mean squared error.

4.2.2. SSIM (Structural Similarity Index)

The SSIM measures the similarity of the structure, luminance, and contrast between two images. It ranges from

[0, 1]

, with higher values indicating greater structural consistency. It is defined as:

SSIM (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})},

(21)

where

μ_{x}

and

μ_{y}

are local means,

σ_{x}^{2}

and

σ_{y}^{2}

are local variances,

σ_{x y}

is the covariance, and

C_{1}

and

C_{2}

are stabilization constants.

4.2.3. LPIPS (Learned Perceptual Image Patch Similarity)

LPIPS is a perceptual metric based on deep learning, measuring perceptual differences between images. Lower LPIPS values indicate better quality. It is expressed as:

LPIPS = \sum_{l} w_{l} \cdot {∥ ϕ_{l} (I_{stitch}) - ϕ_{l} (I_{ref}) ∥}^{2},

(22)

where

ϕ_{l}

represents the feature map from a deep neural network, and

w_{l}

denotes the layer’s weight.

To evaluate the impact of the proposed Seam-band fusion method on the stitching quality, experiments were conducted with different Seam-band width values. Figure 6 demonstrates the performance of this parameter across four image datasets. The best overall performance for the PSNR, SSIM, and LPIPS metrics is around a Seam-band width = 50. As the parameter value continues to increase, the performance of the metrics gradually declines and stabilizes around a Seam-band width of 300. Based on the sensitivity analysis of the parameter, the Seam-band width is therefore set to 50.

4.3. Subjective Visual Quality Qualitative Comparison

4.3.1. Results of Multi-Image Panoramic Stitching

The visual quality of the stitched panoramic images was evaluated, and representative panorama results from the four datasets are shown in Figure 7, Figure 8, Figure 9 and Figure 10, which give the panoramas yielded by Hossein’s method [71], the Autostitch method [39], the Agisoft Metashape method [58], Peng’s method [19] and the proposed method, respectively. Given the substantial computational demands of the Seam-driven method, we downsampled the image datasets by a factor of three when implementing the method by Peng [19] to maintain an acceptable processing time. Despite this adjustment, Peng’s method [19] failed to handle the full image sequences, successfully processing only 30, 12, and 11 images for Dataset_village, Dataset_field, and Dataset_building, respectively. Consequently, this led to partial content omissions in the final panoramic stitch results.

Figure 6. Sensitivity analysis of Seam-band width parameter on PSNR, SSIM, and LPIPS.

Figure 7. ‘Dataset_grass’ stitching results. (a) Hossein’s method [71], (b) Autostitch [39], (c) Metashape [58], (d) Peng’s method [19], and (e) proposed method. Magnified local details are shown in rectangles of different colors.

Figure 8. ‘Dataset_village’ stitching results. (a) Hossein’s method [71], (b) Autostitch [39], (c) Metashape [58], (d) Peng’s method [19], and (e) proposed method. Magnified local details are shown in rectangles of different colors.

Figure 9. ‘Dataset_field’ stitching results. (a) Hossein’s method [71], (b) Autostitch [39], (c) Metashape [58], (d) Peng’s method [19], and (e) proposed method. Magnified local details are shown in rectangles of different colors.

Figure 10. ‘Dataset_building’ stitching results. (a) Hossein’s method [71], (b) Autostitch [39], (c) Metashape [58], (d) Peng’s method [19], and (e) proposed method. Magnified local details are shown in rectangles of different colors.

The experimental results reveal significant differences in the visual quality of panoramic stitching achieved by various methods. On Dataset_grass, the baseline method exhibits noticeable Seams and artifacts due to its simple pixel-averaging blending strategy. Enlarged feature maps further reveal evident misalignment and blurring. The Autostitch method produces a more harmonious overall panorama but still suffers from minor ghosting and unnatural transitions. Agisoft MateShape demonstrates good sharpness but suffers from detail loss and localized omissions. Peng’s method demonstrates good overall visual quality on this dataset and exhibits satisfactory alignment in the magnified details; however, conspicuous Seams are present in the panoramic stitch. In contrast, the proposed method preserves more image details and clarity, achieving seamless stitching without blurring or artifacts in overlapping regions.

On Dataset_villiage, the baseline method experiences severe cumulative errors, leading to significant misalignment and distortion in the upper regions of the panorama. The Autostitch method achieves better global alignment but exhibits slight blurring in overlapping areas due to pixel color blending. Agisoft MateShape continues to show localized omissions. Peng’s method suffers from significant content loss on Dataset_village. The failure in image sequences occurs when the flight path begins its turnaround, suggesting that this method may face challenges in managing rotation-involved stitch registration. The proposed method avoids these issues, producing a more natural and stable stitching result.

On Dataset_field, the baseline method demonstrates robust alignment for low texture regions through SIFT feature matching but suffers from noticeable ghosting and Seam artifacts in enlarged views. The Autostitch method delivers good overall visual quality but introduces distortion during the warping process, turning straight field boundaries into curves, alongside visible misalignment in certain areas. Agisoft MateShape achieves satisfactory alignment but generates darker color contrasts, inconsistent with the original images. Peng’s method exhibits partial content omission due to the inability to achieve a full-sequence stitch, yet it shows competent alignment performance in the magnified local regions. The proposed method effectively mitigates these issues, providing more realistic and consistent reconstruction results.

On Dataset_building, the baseline, Autostitch, and Agisoft MateShape exhibit varying degrees of misalignment, artifacts, and blurring. Peng’s method achieves favorable alignment on this dataset, with no apparent Seams or artifacts. This indicates that Seam-driven methods are robust in handling misalignments caused by altitude-induced parallaxes of ground targets. Nevertheless, due to the failure to achieve a complete image sequence stitch, content loss occurs on the left side of the panorama. Their overall alignment and visual quality are inferior to the proposed LargeStitch method, which produces more natural and cohesive results.

In general, the proposed method demonstrates superior stitching performance across all datasets. Adopting the Seam-band fusion strategy avoids direct pixel blending in overlapping regions, significantly reducing issues such as blurring, artifacts, and Seams. Consequently, the generated panoramas exhibit higher visual quality and more harmonious alignment. These results demonstrate the effectiveness of LargeStitch in handling complex scenes with varying challenges such as low texture, large overlaps, and significant illumination differences.

4.3.2. Running Time

The panorama stitching times for the four datasets are presented in Table 2. Compared to other methods, the proposed LargeStitch method achieves the fastest stitching speed on three datasets and ranks second on the remaining dataset, while significantly surpassing the baseline method in alignment quality despite its marginal speed advantage. Notably, on the Switzerland_field dataset, which features extremely low texture scenes, other methods require considerably more computation time. Although Agisoft MateShape employs images with coordinates for coarse alignment, it still demands approximately three times the runtime of the proposed method.

In challenging scenarios, such as complex environments and low texture regions, LargeStitch not only significantly improves stitching efficiency but also maintains high-quality panoramic results. These findings underscore its superior effectiveness and performance for real-world applications.

4.3.3. Robustness

To demonstrate the adaptability and effectiveness of the proposed stitching algorithm under challenging conditions, we conducted a visual evaluation across four representative scenarios: low overlap, rotation, low texture, and scale variation. These scenarios are commonly encountered in real-world UAV remote sensing tasks and pose significant challenges for stitching algorithms.

As shown in Figure 11, the stitching results under these four scenarios are shown below, highlighting the robustness and feasibility of the proposed algorithm in complex environments.

The proposed stitching method demonstrates the capability to accurately align overlapping regions and achieve stable results even under low overlap conditions, a feature that many fast stitching methods lack. Furthermore, LargeStitch exhibits excellent rotation and scale invariance, and low texture feature recognition, handling up to a 180-degree rotational parallax. This flexibility significantly enhances the freedom and adaptability of UAV image acquisition tasks, reducing the operational requirements for flight altitude and attitude.

To further demonstrate the robustness of the proposed method in natural scenes characterized by a large parallax, we selected six sets of images from the LPC dataset for evaluation. These images encompass diverse scenarios, including parks, buildings, and sky backgrounds. Compared to aerial imagery, the primary challenge in these scenes lies in the presence of multiple depth planes; specifically, the field of view no longer satisfies the simple planar scene assumption.

As illustrated in Figure 12, our method maintains robust visual stitch performance in these challenging natural scenarios thanks to the Seam-band fusion strategy, even though it relies on a global homography transformation. This performance is comparable to multi-plane methods such as APAP [20], SPHP [21], and LPC [75], which incorporate adaptive warping. Notably, our approach simultaneously eliminates artifacts and Seams, yielding a more natural visual output.

4.4. Objective Quantitative Evaluation Metric Comparison

Seam-driven methods consume significant computational resources to optimize stitching quality and typically achieve the best results; Liao’s method [76] is used as the ground truth reference for this analysis. Since the time cost of stitching computation based on Seam driving is very expensive, adjacent pairs of images are stitched for quantitative comparison.

Table 3 presents the PSNR, SSIM, and LPIPS values for the four datasets. Images are converted to YUV format, and PSNR and SSIM calculations are evaluated uniformly on the luminance Y channel. It was observed that some methods failed to perform stitching on Dataset_building due to insufficient overlap between adjacent images. It should be noted that our method successfully stitched all sequences across all datasets. To ensure statistical consistency, although our method encountered no failed stitching scenarios, any cases where other methods failed were not included in the calculations for Table 3.

Compared to the other methods, the proposed LargeStitch method achieves the most superior performance across PSNR, SSIM, and LPIPS metrics, highlighting its advantages in large-size image stitching, particularly in image alignment, structural similarity, and perceptual quality. The exceptional image quality and perceptual fidelity further affirm LargeStitch as a robust and efficient solution for large-size image stitching tasks.

4.5. Ablation Experiments

In this paper, an ablation experiment analysis was used to assess the effectiveness of the components of the proposed methodology. Hossein’s method [71] using SIFT feature matching is used as the baseline method.

4.5.1. Effectiveness of Deep Feature Matching Algorithm

We analyzed the effect of the deep feature matching algorithm on homography matrix estimation. The ablation experiments for the deep-matched component were conducted on the Dataset_grass dataset, as this dataset provides partial ground truth labeling.

Figure 13 illustrates the matching results obtained via the SIFT method and the

G I M_{d k m}

approach, where matching points between the two images are connected by lines. The results indicate that for certain challenging scenarios, the traditional SIFT method yields a limited number of matches and contains several mismatches, which adversely affect the accuracy of image alignment. In contrast, the

G I M_{d k m}

component employed in our framework demonstrates superior advantages in both the quantity and precision of matches, providing a more robust input for the subsequent stitch process.

To measure the registration accuracy, the registration error (RE) was adopted, defined as follows:

RE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {∥ H p_{i} - q_{i} ∥}^{2}},

(23)

where H is the estimated homography matrix, and

p_{i}

and

q_{i}

represent the ground truth pixel in the reference image and the corresponding transformed pixel in the target image, respectively.

Figure 14 illustrates the registration error (RE) results for two feature-matching algorithms. The results indicate that the proposed deep feature matching method,

G I M_{d k m}

, achieves lower RE values, demonstrating its superior accuracy in homography estimation. This suggests that the detected keypoints are more accurately mapped to their true pixel coordinates.

The RMSE measures the pixel-level difference between the stitched image

I_{stitch}

and the reference image

I_{ref}

. A lower RMSE means that the reprojection error of feature pairs is smaller, which does not directly reflect the stitching alignment effect but may indicate that fewer feature point mismatches are detected.

Moreover, Table 4 highlights the impact of the deep feature matching algorithm on RSME metrics. Compared to the traditional SIFT-based matching, our method improves the root mean square error (RMSE) by

62.7 %

while requiring only

1 / 7

of the processing time. The results demonstrate that employing

G I M_{d k m}

significantly improves image match quality and efficiency. This finding validates the effectiveness and robustness of the deep feature matching approach in enhancing alignment precision.

4.5.2. Seam-Band Fusion of Image Harmonization

Stitching results without the Seam-band method exhibit noticeable Seam artifacts and ghosting, particularly in overlapping regions with abrupt color and texture transitions. In contrast, results using the Seam-band method demonstrate smoother transitions in the Seam regions, with improved preservation of texture and color details, significantly enhancing visual consistency. To validate the effectiveness of the Seam-band method in image fusion, a comparative experiment was conducted between the Seam-band method and traditional weighted average fusion.

Table 5 illustrates the impact of the Seam-band fusion strategy on objective metrics. For both the baseline and the proposed LargeStitch method, adopting the Seam-band image harmonization strategy yields notable quality improvements. Specifically, the strategy improves the PSNR in the baseline method by 0.02 db, and the SSIM and LPIPS by 0.71% and 15.82%, respectively. In addition, the corresponding metrics of the proposed method are improved by 0.1 db, 0.04% and 1.82%, respectively. These improvements collectively validate the effectiveness of the Seam-band fusion strategy in achieving high-quality seamless image stitching with minimal artifacts and distortions.

4.5.3. Mask-Based Pre-Stitching Strategy

We conducted comparative experiments to evaluate the effectiveness of the mask-based pre-stitching filtering strategy in improving stitching efficiency and reducing time consumption. Table 6 summarizes the time metrics for scenarios with and without the filtering strategy.

The mask-based pre-stitching strategy effectively filters out redundant images, reducing computational overhead. While the strategy itself incurs approximately 30 s of processing time, it significantly lowers the number of images to be processed, resulting in a shorter total stitching duration. Specifically, for the baseline method, the total stitching time is reduced from 612.75 s to 389.99 s, saving approximately 222 s. For the proposed LargeStitch method, the stitching time is reduced by 104 s, nearly one-third of the original duration. The integration of this strategy with the proposed method achieves a substantially lower stitching time, demonstrating its applicability to large-size UAV image stitching tasks.

5. Discussion

The proposed LargeStitch method provides an efficient and reliable framework for high-resolution UAV image stitching, with its core contribution being the significant reduction in computational overhead when processing large-size aerial datasets. By eliminating redundant image content prior to the computationally intensive warping phase, this method minimizes unnecessary processing while maintaining the fidelity of the final panorama. Furthermore, the deep learning-driven dense matching technology significantly enhances alignment robustness under conditions of scene rotation and scale variation, where traditional local feature descriptors often fail due to insufficient correspondence density.

However, the adoption of a global affine model to ensure processing speed involves a known trade-off: it may not fully account for local non-rigid deformations in complex areas with significant relief displacement. Although the Seam-band fusion strategy effectively suppresses ghosting and visual misalignment at the boundaries, minor geometric inconsistencies may persist in extreme non-coplanar environments. While the proposed filtering strategy partially mitigates the error propagation observed in long image sequences, it nonetheless highlights the inherent limitations of iterative geometric transformations. Future research will explore the potential of integrating generative models to reconstruct occluded areas using overlapping priors, which may offer a more robust alternative to traditional techniques.

6. Conclusions

In this study, we introduce LargeStitch, an innovative framework specifically designed to address the challenges associated with stitching large-size aerial image sequences. With advanced deep feature matching to improve image alignment, seamless stitching by the Seam-band fusion strategy, and a mask-based pre-stitching strategy to improve the efficiency of multi-image stitching, LargeStitch not only achieves high-quality seamless stitching results but also significantly improves computational efficiency.

Extensive experiments on four UAV datasets demonstrated the superiority of LargeStitch over traditional geometric transformation methods, deep learning-based approaches, and commercial tools. The proposed method consistently achieved better alignment quality and visual fidelity, excelling in challenging scenarios such as sparse textures and illumination variations. Quantitatively, LargeStitch outperformed competing methods in PSNR, SSIM, and LPIPS metrics, showcasing its robustness and precision. Furthermore, ablation studies confirmed the independent contributions of LargeStitch’s three core components, each enhancing either computational efficiency or stitching quality.

In conclusion, LargeStitch combines cutting-edge feature matching, innovative fusion strategies, and computational optimization to achieve efficient and robust stitching for large-size UAV images. Future work will focus on enhancing the global optimization capabilities to improve stitching quality.

Author Contributions

Conceptualization, J.Z. and Z.W.; Methodology, J.Z.; Formal Analysis, X.H. and Y.Z.; Data Curation, J.Z. and Z.W.; Writing—Original Draft Preparation, J.Z.; Writing—Review and Editing, Z.W., X.H. and Y.Z.; Project Administration, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Oceanic Interdisciplinary Program of Shanghai Jiao Tong University (grant number SL2022ZD204) and the National Natural Science Foundation of China (grant number 52371327).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Trinh, H.L.; Kieu, H.T.; Pak, H.Y.; Pang, D.S.C.; Cokro, A.A.; Law, A.W.-K. A framework for survey planning using portable Unmanned Aerial Vehicles (p UAVs) in coastal hydro-environment. Remote Sens. 2022, 14, 2283. [Google Scholar] [CrossRef]
Xu, Q.; Chen, J.; Luo, L.; Gong, W.; Wang, Y. UAV image stitching based on mesh-guided deformation and ground constraint. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4465–4475. [Google Scholar]
Cai, W.; Du, S.; Yang, W. UAV image stitching by estimating orthograph with RGB cameras. J. Vis. Commun. Image Represent. 2023, 94, 103835. [Google Scholar] [CrossRef]
Alphen, R.V.; Rodgers, M.; Malservisi, R.; Wang, P.; Cheng, J.; Valleé, M. Application of UAV Structure-From-Motion Photogrammetry to a Nourished Beach for Assessment of Storm Surge Impacts, Pinellas County, Florida. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4409812. [Google Scholar]
Wei, C.; Xia, H.; Qiao, Y. Fast unmanned aerial vehicle image matching combining geometric information and feature similarity. IEEE Geosci. Remote Sens. Lett. 2020, 18, 1731–1735. [Google Scholar]
Gomez, C.; Purdie, H. UAV-based photogrammetry and geocomputing for hazards and disaster risk monitoring—A review. Geoenviron. Disasters 2016, 3, 23. [Google Scholar] [CrossRef]
Dolcetti, G.; Krynkin, A.; Alkmim, M.; Cuenca, J.; Ryck, L.D.; Sailor, G.; Muraro, F.; Tait, S.J.; Horoshenkov, K.V. Reconstruction of the frequency-wavenumber spectrum of water waves with an airborne acoustic Doppler array for non-contact river monitoring. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4202214. [Google Scholar] [CrossRef]
Acharya, B.; Barber, M.E. Post-Fire Streamflow Prediction: Remote Sensing Insights from Landsat and an Unmanned Aerial Vehicle. Remote Sens. 2025, 17, 3690. [Google Scholar] [CrossRef]
Cirillo, D.; Tangari, A.C.; Scarciglia, F.; Lavecchia, G.; Brozzetti, F. UAV-PPK photogrammetry, GIS, and soil analysis to estimate long-term slip rates on active faults in a seismic gap of Northern Calabria (Southern Italy). Remote Sens. 2025, 17, 3366. [Google Scholar] [CrossRef]
Sishodia, R.P.; Ray, R.L.; Singh, S.K. Applications of remote sensing in precision agriculture: A review. Remote Sens. 2020, 12, 3136. [Google Scholar] [CrossRef]
Ruiz, J.; Caballero, F.; Merino, L. MGRAPH: A multigraph homography method to generate incremental mosaics in real-time from UAV swarms. IEEE Robot. Autom. Lett. 2018, 3, 2838–2845. [Google Scholar]
Luo, X.; Zhao, H.; Liu, Y.; Liu, N.; Chen, J.; Yang, H. A High-Precision Virtual Central Projection Image Generation Method for an Aerial Dual-Camera. Remote Sens. 2025, 17, 683. [Google Scholar] [CrossRef]
Chen, J.; Wan, Q.; Luo, L.; Wang, Y.; Luo, D. Drone image stitching based on compactly supported radial basis function. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 4634–4643. [Google Scholar] [CrossRef]
Zhou, H.; Yu, W.; Zhang, J.; Liu, X. Seamless stitching of large area UAV images using modified camera matrix. In Proceedings of the 2016 IEEE International Conference on Real-time Computing and Robotics (RCAR), Angkor Wat, Cambodia, 6–10 June 2016; IEEE: New York, NY, USA, 2016; pp. 561–566. [Google Scholar]
Liu, J.; Wang, L.; Chen, X.; Zhao, M. A novel adjustment model for mosaicking low-overlap sweeping images. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4089–4097. [Google Scholar]
Ren, M.; Li, J.; Song, L.; Li, H.; Xu, T. MLP-based efficient stitching method for UAV images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 2503305. [Google Scholar] [CrossRef]
Mehrdad, S.; Khosravi, M.; Mohammadi, A.; Karami, E. Toward real time UAVs’ image mosaicking. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 41, 941–946. [Google Scholar]
Zhang, F.; Xu, Q.; Wu, H.; Li, G. Image-only real-time incremental UAV image mosaic for multi-strip flight. IEEE Trans. Multimed. 2020, 23, 1410–1425. [Google Scholar] [CrossRef]
Peng, Z.; Zhang, X.; Liu, H.; Wang, Y. Seamless UAV hyperspectral image stitching using optimal seamline detection via graph cuts. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar]
Zaragoza, J.; Chin, T.-J.; Tran, Q.-H.; Brown, M.; Suter, D. As-projective-as-possible image stitching with moving DLT. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; IEEE: New York, NY, USA, 2013; pp. 2339–2346. [Google Scholar]
Chang, C.; Sato, Y.; Chuang, Y. Shape-preserving half-projective warps for image stitching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; IEEE: New York, NY, USA, 2014; pp. 3254–3261. [Google Scholar]
Li, R.; Gao, P.; Cai, X.; Chen, X.; Wei, J.; Cheng, Y.; Zhao, H. A real-time incremental video mosaic framework for UAV remote sensing. Remote Sens. 2023, 15, 2127. [Google Scholar] [CrossRef]
Wei, Z.; Lan, C.; Xu, Q.; Wang, L.; Gao, T.; Yao, F.; Hou, H. SatellStitch: Satellite imagery-assisted UAV image seamless stitching for emergency response without GCP and GNSS. Remote Sens. 2024, 16, 309. [Google Scholar]
Chen, J.; Luo, Y.; Wang, J.; Tang, H.; Tang, Y.; Li, J. Elimination of irregular boundaries and seams for UAV image stitching with a diffusion model. Remote Sens. 2024, 16, 1483. [Google Scholar] [CrossRef]
Xiang, T.; Liu, Q.; Zhang, D.; Chen, J. Image stitching by line-guided local warping with global similarity constraint. Pattern Recognit. 2018, 83, 481–497. [Google Scholar] [CrossRef]
Xu, Y.; Zhang, L.; Li, P.; Huang, B. Mosaicking of unmanned aerial vehicle imagery in the absence of camera poses. Remote Sens. 2016, 8, 204. [Google Scholar] [CrossRef]
Botterill, T.; Mills, S.; Green, R. Real-time aerial image mosaicing. In Proceedings of the 2010 25th International Conference of Image and Vision Computing New Zealand (IVCNZ), Queenstown, New Zealand, 8–10 November 2010; IEEE: New York, NY, USA, 2010; pp. 1–8. [Google Scholar]
Kekec, T.; Yildirim, A.; Unel, M. A new approach to real-time mosaicing of aerial images. Robot. Auton. Syst. 2014, 62, 1755–1767. [Google Scholar] [CrossRef]
Zhao, Y.; Li, F.; Chen, X.; Wu, L. RTSfM: Real-time structure from motion for mosaicing and DSM mapping of sequential aerial images with low overlap. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–15. [Google Scholar] [CrossRef]
Mo, Y.; Zhang, H.; Wang, L.; Chen, J. A robust UAV hyperspectral image stitching method based on deep feature matching. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–14. [Google Scholar] [CrossRef]
Li, W. SuperGlue-Based Deep Learning Method for Image Matching from Multiple Viewpoints. In Proceedings of the 2023 8th International Conference on Mathematics and Artificial Intelligence, Tokyo, Japan, 20–23 March 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 53–58. [Google Scholar]
Yuan, X.; Liu, Y.; Zhang, Q.; Sun, P. Large aerial image tie point matching in real and difficult survey areas via deep learning method. Remote Sens. 2022, 14, 3907. [Google Scholar] [CrossRef]
Pan, W.; Li, A.; Liu, X.; Deng, Z. Unmanned aerial vehicle image stitching based on multi-region segmentation. IET Image Process. 2024, 18, 4607–4622. [Google Scholar] [CrossRef]
Low, D. Distinctive image features from scale-invariant keypoints. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-up robust features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; IEEE: New York, NY, USA, 2011; pp. 2564–2571. [Google Scholar]
Nie, L.; Zhang, F.; Xu, H.; Yang, Y. Unsupervised deep image stitching: Reconstructing stitched features to images. IEEE Trans. Image Process. 2021, 30, 6184–6197. [Google Scholar] [CrossRef]
Sun, Y.; Zhang, Z.; Liu, H.; Li, F. Image stitching method of aerial image based on feature matching and iterative optimization. In Proceedings of the 2021 40th Chinese Control Conference (CCC), Shanghai, China, 26–28 July 2021; IEEE: New York, NY, USA, 2011; pp. 3024–3029. [Google Scholar]
Brown, M.; Lowe, D. Automatic panoramic image stitching using invariant features. Int. J. Comput. Vis. 2007, 74, 59–73. [Google Scholar]
Garcia-Fidalgo, E.; Ortiz, A.; Ponsa, D.; Andrade-Cetto, J. Fast image mosaicking using incremental bags of binary words. In Methods for Appearance-Based Loop Closure Detection: Applications to Topological Mapping and Image Mosaicking; Springer: Berlin, Germany, 2018; pp. 141–156. [Google Scholar]
Liu, B.; Zhang, J.; Li, Z. An improved APAP algorithm via line segment correction for UAV multispectral image stitching. In Proceedings of the IGARSS 2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; IEEE: New York, NY, USA, 2011; pp. 6057–6060. [Google Scholar]
He, L.; Zhao, Q.; Liu, S.; Wang, J. VSP-based warping for stitching many UAV images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5624717. [Google Scholar] [CrossRef]
Gao, J.; Li, S.; Kim, S.; Brown, M. Seam-driven image stitching. In Proceedings of Eurographics (Short Papers), Girona, Spain, 6–10 May 2013; The Eurographics Association: Eindhoven, The Netherlands, 2013; pp. 45–48. [Google Scholar]
Zomet, A.; Levin, A.; Peleg, S.; Weiss, Y. Seamless image stitching by minimizing false edges. IEEE Trans. Image Process. 2006, 15, 969–977. [Google Scholar] [CrossRef] [PubMed]
Li, L.; Zhang, W.; Liu, Y.; Wu, X. Optimal seamline detection for multiple image mosaicking via graph cuts. ISPRS J. Photogramm. Remote Sens. 2016, 113, 1–16. [Google Scholar]
Chen, Q.; Xu, W.; Zhang, H.; Li, Y. Automatic seamline network generation for urban orthophoto mosaicking with the use of a digital surface model. Remote Sens. 2014, 6, 12334–12359. [Google Scholar]
Pan, J.; Zhou, Q.; Wang, M. Seamline determination based on segmentation for urban image mosaicking. IEEE Geosci. Remote Sens. Lett. 2013, 11, 1335–1339. [Google Scholar] [CrossRef]
Laaroussi, S.; Benjdira, B.; Koubaa, A.; Ammar, A. Dynamic mosaicking: Region-based method using edge detection for an optimal seamline. Multimed. Tools Appl. 2019, 78, 23225–23253. [Google Scholar]
Yu, L.; Chen, J.; Wang, T.; Huang, X. Towards the automatic selection of optimal seam line locations when merging optical remote-sensing images. Int. J. Remote Sens. 2012, 33, 1000–1014. [Google Scholar]
Szeliski, R.; Kang, S.B.; Uyttendaele, M. Image alignment and stitching: A tutorial. In Foundations and Trends in Computer Graphics and Vision; Now Publishers Inc.: Hanover, MA, USA, 2007; Volume 2, pp. 1–104. [Google Scholar]
Shi, Z.; Liu, R.; Tang, L.; Xu, X. An image mosaic method based on convolutional neural network semantic features extraction. J. Signal Process. Syst. 2020, 92, 435–444. [Google Scholar]
Nie, L.; Zhang, F.; Xu, H.; Yang, Y. A view-free image stitching network based on global homography. J. Vis. Commun. Image Represent. 2020, 73, 102950. [Google Scholar] [CrossRef]
Nie, L.; Zhang, F.; Xu, H.; Yang, Y. Parallax-tolerant unsupervised deep image stitching. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; IEEE: NewYork, NY, USA, 2020; pp. 7399–7408. [Google Scholar]
Lai, W.-S.; Gallo, O.; Gu, J.; Sun, D.; Yang, M.-H.; Kautz, J. Video stitching for linear camera arrays. arXiv 2019, arXiv:1907.13622. [Google Scholar] [CrossRef]
Nie, L.; Zhang, F.; Xu, H.; Yang, Y. Eliminating warping shakes for unsupervised online video stitching. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2025; Springer: Berlin/Heidelberg, Germany, 2025; pp. 390–407. [Google Scholar]
Rahaman, H.; Champion, E. To 3D or not 3D: Choosing a photogrammetry workflow for cultural heritage groups. Heritage 2019, 2, 1835–1851. [Google Scholar] [CrossRef]
Santana, L.S.; Ferreira, R.; Almeida, J.; Gonçalves, G. Influence of flight altitude and control points in the georeferencing of images obtained by unmanned aerial vehicle. Eur. J. Remote Sens. 2021, 54, 59–71. [Google Scholar] [CrossRef]
Agisoft LLC. Agisoft Metashape. Available online: https://www.agisoft.com/ (accessed on 16 February 2026).
Pix4D. The Datasets of Agriculture Field and Building. Available online: https://support.pix4d.com/hc/en-us/articles/360000235126 (accessed on 16 February 2026).
Chen, J.; Wang, Y.; Li, B.; Zhou, X. UAV image stitching based on optimal seam and half-projective warp. Remote Sens. 2022, 14, 1068. [Google Scholar] [CrossRef]
Shen, X.; Cai, Z.; Yin, W.; Müller, M.; Li, Z.; Wang, K.; Chen, X.; Wang, C. Gim: Learning generalizable image matcher from internet videos. arXiv 2024, arXiv:2402.11095. [Google Scholar] [CrossRef]
Sarlin, P.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperGlue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: NewYork, NY, USA, 2020; pp. 4938–4947. [Google Scholar]
Sun, J.; Shen, Z.; Wang, Y.; Bao, H. LoFTR: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; IEEE: NewYork, NY, USA, 2020; pp. 8922–8931. [Google Scholar]
Hedborg, J.; Forssén, P.; Felsberg, M. Fast and accurate structure and motion estimation. In Advances in Visual Computing: 5th International Symposium (ISVC 2009), Las Vegas, NV, USA, 30 November–2 December 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 211–222. [Google Scholar]
Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
Barath, D.; Matas, J. Graph-cut RANSAC. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; IEEE: NewYork, NY, USA, 2018; pp. 6733–6741. [Google Scholar]
Edstedt, J.; Kjellström, H.; Felsberg, M.; Danelljan, M. DKM: Dense kernelized feature matching for geometry estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; IEEE: NewYork, NY, USA, 2023; pp. 17765–17775. [Google Scholar]
Perez, P.; Gangnet, M.; Blake, A. Poisson image editing. In Proceedings of ACM SIGGRAPH 2003 Papers, San Diego, CA, USA, 27–31 July 2003; Association for Computing Machinery: New York, NY, USA, 2003; pp. 313–318. [Google Scholar]
Alcantarilla, P.F.; Solutions, T. Fast explicit diffusion for accelerated features in nonlinear scale spaces. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 1281–1298. [Google Scholar]
Xu, W.; Zhang, L.; Liu, H.; Wang, X. UAV-VisLoc: A large-scale dataset for UAV visual localization. arXiv 2024, arXiv:2405.11936. [Google Scholar]
Hossein-Nejad, Z.; Nasri, M. Natural image mosaicing based on redundant keypoint elimination method in SIFT algorithm and adaptive RANSAC method. Signal Data Process. 2021, 18, 147–162. [Google Scholar] [CrossRef]
Huynh-Thu, Q.; Ghanbari, M. Scope of validity of PSNR in image/video quality assessment. Electron. Lett. 2008, 44, 800–801. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; IEEE: NewYork, NY, USA, 2018; pp. 586–595. [Google Scholar]
Jia, Q.; Li, Z.; Fan, X.; Zhao, H.; Teng, S.; Ye, X.; Latecki, L.J. Leveraging Line-Point Consistence to Preserve Structures for Wide Parallax Image Stitching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; IEEE: NewYork, NY, USA, 2021; pp. 12186–12195. [Google Scholar]
Liao, T.; Chen, J.; Xu, Y. Quality evaluation-based iterative seam estimation for image stitching. Signal Image Video Process. 2019, 13, 1199–1206. [Google Scholar] [CrossRef]
Zarei, A.; Ghaffari, M.; Mahdianpari, M.; Homayouni, S. MegaStitch: Robust large-scale image stitching. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4408309. [Google Scholar] [CrossRef]

Figure 3. The dual-view model of the homography transformation.

Figure 4. Image harmonization strategy based on Seam-band fusion.

Figure 5. Pre-stitching after aligning downsampled masks without retaining redundant images.

Figure 11. Robustness performance of the proposed LargeStitch method for complex scenes: (a) low overlap, (b) low texture scenarios, (c) rotation and (d) scale variation. The flight altitudes in (d) are 1970 m and 2570 m, respectively.

Figure 12. Stitching results on six sets of natural scenes with varying depths of field. The first row shows the input image pairs, while the subsequent rows present the results of different methods: (a) APAP [20], (b) SPHP [21], (c) LPC [75], and (d) the proposed method.

Figure 13. Visualization of matching results. Column 1 presents the pair of input images, Red and green lines represent the matched pairs detected by the two different methods. (a) Input images, (b) SIFT, and (c)

G I M_{d k m}

.

Figure 13. Visualization of matching results. Column 1 presents the pair of input images, Red and green lines represent the matched pairs detected by the two different methods. (a) Input images, (b) SIFT, and (c)

G I M_{d k m}

.

Figure 14. Distribution of average RE between the baseline method and the proposed method on Dataset_grass.

Table 1. Dataset introduction.

Dataset Name	Resolution	Num.	Height	Location (Coordinates)	Description of the Source
Dataset_grass	5472 × 3648	55	150 m	Qingdao, China (36°16′N, 120°16′E)	UAV-AIRPAI dataset [16]
Dataset_village	3976 × 2652	52	460 m	Taizhou, China (32°18′N, 119°54′E)	UAV-VisLoc dataset [70]
Dataset_field	6000 × 4000	25	650 m	Lausanne, Switzerland (46°38′N, 6°36′E)	Switzerland dataset [59]
Dataset_building	5472 × 3648	26	230 m	Roseau, Dominica (15°17′N, 61°22′W)	Loubiere dataset [59]

Table 2. Quantitative consistency analysis for runtime (s) on four different datasets. Bold indicates the best.

Method	Dataset_Grass	Dataset_Village	Dataset_Field	Dataset_Building
Hossein’s [71]	612.75	160.62	552.85	163.65
Autostitch [39]	988.00	592.00	651.00	496.00
Metashape [58]	472.00	343.00	332.00	321.00
Peng’s [19]	6954.00	9749.00	7429.00	2160.00
Proposed	212.92	259.67	97.46	142.12

Table 3. Quantitative consistency comparison for PSNR, SSIM, and LPIPS on four different datasets. The symbols ↑ and ↓ indicate that higher and lower values represent better performance, respectively.

Metrics	Methods	Dataset_Grass	Dataset_Village	Dataset_Field	Dataset_Building
PSNR (↑)	Hossein’s [71]	29.68	33.56	29.91	29.38
	UDIS++ [53]	28.74	30.43	28.89	Failed
	StableStitch2 [55]	28.26	29.30	28.41	28.17
	Autostitch [39]	28.65	30.38	28.44	28.30
	SPHP [21]	29.17	31.26	29.02	28.73
	MGRAPH [11]	28.97	31.61	28.95	Failed
	MegaStitch [77]	29.26	31.23	28.91	28.66
	Proposed	29.81	34.38	l30.03	29.46
SSIM (↑)	Hossein’s [71]	0.8031	0.9157	0.7776	0.8214
	UDIS++ [53]	0.7329	0.8645	0.7114	Failed
	StableStitch2 [55]	0.7049	0.8313	0.7048	0.7236
	Autostitch [39]	0.7258	0.8584	0.6811	0.7422
	SPHP [21]	0.7510	0.8682	0.7035	0.7812
	MGRAPH [11]	0.7418	0.8832	0.6919	Failed
	MegaStitch [77]	0.7587	0.8772	0.6889	0.7805
	Proposed	0.8146	0.9353	0.7894	0.8216
LPIPS (↓)	Hossein’s [71]	0.0834	0.0539	0.0624	0.1767
	UDIS++ [53]	0.3160	0.3932	0.3917	Failed
	StableStitch2 [55]	0.6675	0.5570	0.5809	0.6736
	Autostitch [39]	0.3535	0.3608	0.5432	0.5127
	SPHP [21]	0.1518	0.2485	0.2980	0.2793
	MGRAPH [11]	0.2086	0.2194	0.2939	Failed
	MegaStitch [77]	0.1183	0.2198	0.2970	0.2590
	Proposed	0.0701	0.0310	0.0566	0.1560

Table 4. Time comparison of the deep feature matching and traditional matching algorithm on Dataset_grass. Bold indicates the best.

Match Method	Homography Estimation Time (s)	RMSE (↓)
$G I M_{d k m}$	76.83	32.738
SIFT	546.48	87.85

Table 5. Ablation analysis of Seam-band fusion strategy on Dataset_grass. Bold indicates the best, and the symbols ↑ and ↓ indicate that higher and lower values represent better performance, respectively.

Method	Seam-Band	PSNR (↑)	SSIM (↑)	LPIPS (↓)
lProposed	✓	29.81	0.8146	0.0701
Proposed	×	29.71	0.8143	0.0714
Baseline [71]	✓	29.70	0.8088	0.0702
Baseline [71]	×	29.68	0.8031	0.0834

Table 6. Time comparison of mask-based pre-stitching filtering strategy on Dataset_grass. Bold indicates the best.

Method	Pre-Stitching	Filter Time (s)	Match Time (s)	Total Time (s)
Proposed	✓	31.2	42.28	212.92
Proposed	×	/	78.81	317.58
Baseline [71]	✓	31.2	314.49	389.99
Baseline [71]	×	/	540.61	612.75

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, J.; Wei, Z.; Zhong, Y.; He, X. LargeStitch: Efficient Seamless Stitching of Large-Size Aerial Images via Deep Matching and Seam-Band Fusion. Remote Sens. 2026, 18, 1481. https://doi.org/10.3390/rs18101481

AMA Style

Zhou J, Wei Z, Zhong Y, He X. LargeStitch: Efficient Seamless Stitching of Large-Size Aerial Images via Deep Matching and Seam-Band Fusion. Remote Sensing. 2026; 18(10):1481. https://doi.org/10.3390/rs18101481

Chicago/Turabian Style

Zhou, Jianglei, Zhaoyu Wei, Yisen Zhong, and Xianqiang He. 2026. "LargeStitch: Efficient Seamless Stitching of Large-Size Aerial Images via Deep Matching and Seam-Band Fusion" Remote Sensing 18, no. 10: 1481. https://doi.org/10.3390/rs18101481

APA Style

Zhou, J., Wei, Z., Zhong, Y., & He, X. (2026). LargeStitch: Efficient Seamless Stitching of Large-Size Aerial Images via Deep Matching and Seam-Band Fusion. Remote Sensing, 18(10), 1481. https://doi.org/10.3390/rs18101481

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LargeStitch: Efficient Seamless Stitching of Large-Size Aerial Images via Deep Matching and Seam-Band Fusion

Highlights

Abstract

1. Introduction

2. Related Work

2.1. Traditional Feature-Based Image Stitching Methods

2.2. Deep Learning-Based Image Stitching Methods

2.3. Georeferencing-Based Image Stitching Method

3. Materials and Methods

3.1. Deep Feature Matching

3.2. Graph-Cut RANSAC for Robust Outlier Removal

3.3. Image Alignment

3.4. Image Harmonization Based on Seam-Band

3.5. Mask-Based Pre-Stitching Image Filtering Strategy

3.6. Algorithm

4. Results

4.1. Dataset and Implementation Details

4.2. Parameter Sensitivity Analysis

4.2.1. PSNR (Peak Signal-to-Noise Ratio)

4.2.2. SSIM (Structural Similarity Index)

4.2.3. LPIPS (Learned Perceptual Image Patch Similarity)

4.3. Subjective Visual Quality Qualitative Comparison

4.3.1. Results of Multi-Image Panoramic Stitching

4.3.2. Running Time

4.3.3. Robustness

4.4. Objective Quantitative Evaluation Metric Comparison

4.5. Ablation Experiments

4.5.1. Effectiveness of Deep Feature Matching Algorithm

4.5.2. Seam-Band Fusion of Image Harmonization

4.5.3. Mask-Based Pre-Stitching Strategy

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI