SC-AOF: A Sliding Camera and Asymmetric Optical-Flow-Based Blending Method for Image Stitching

Parallax processing and structure preservation have long been important and challenging tasks in image stitching. In this paper, an image stitching method based on sliding camera to eliminate perspective deformation and asymmetric optical flow to solve parallax is proposed. By maintaining the viewpoint of two input images in the mosaic non-overlapping area and creating a virtual camera by interpolation in the overlapping area, the viewpoint is gradually transformed from one to another so as to complete the smooth transition of the two image viewpoints and reduce perspective deformation. Two coarsely aligned warped images are generated with the help of a global projection plane. After that, the optical flow propagation and gradient descent method are used to quickly calculate the bidirectional asymmetric optical flow between the two warped images, and the optical-flow-based method is used to further align the two warped images to reduce parallax. In the image blending, the softmax function and registration error are used to adjust the width of the blending area, further eliminating ghosting and reducing parallax. Finally, by comparing our method with APAP, AANAP, SPHP, SPW, TFT, and REW, it has been proven that our method can not only effectively solve perspective deformation, but also gives more natural transitions between images. At the same time, our method can robustly reduce local misalignment in various scenarios, with higher structural similarity index. A scoring method combining subjective and objective evaluations of perspective deformation, local alignment and runtime is defined and used to rate all methods, where our method ranks first.


Introduction
Image stitching is a technology that can align and blend multiple images to generate a high-resolution, wide field-of-view and artifact-free mosaic.It has broad and promising applications in many fields such as virtual reality, remote sensing mapping, and urban modeling.The calculation of the global homography, as an important step in image stitching [1,2], directly determines the image alignment accuracy and the final user experience.However, global homography only works for planar scenes or rotation-only camera motions.For non-planar scenes or when the optical centers of cameras do not coincide, homography tends to cause misalignment, resulting in blurring and ghosting in the mosaic.It can also cause perspective deformation, making the final mosaic blurred and severely stretched at the edges.Many solutions have been proposed to solve the problems of parallax and perspective deformation in image stitching, so as to improve the quality of stitched images.But most state-of-the-art mesh-based [3][4][5] and multi-plane [6][7][8] methods are time-consuming and vulnerable to false matches.
In this work, an innovative image stitching method combining sliding camera (SC) and asymmetric optical flow (AOF), referred to as the SC-AOF method, is proposed to reduce both perspective deformation and alignment error.In the non-overlapping area of the mosaic, the SC-AOF method manages to keep the viewpoint of the mosaic the same as or one rotation around the camera Z axis from those of the input images.In the overlapping area of the mosaic, the viewpoint is changed from one input image viewpoint to another, which can effectively solve the perspective deformation at the edge.A global projection plane is estimated to project input images onto the mosaic.After that, an asymmetric optical flow method is employed to further align the images.In the blending, the softmax function and alignment error are used to dynamically adjust the width of the blending area to further eliminate ghosting and improve the mosaic quality.This paper makes the following contributions: • The SC-AOF method innovatively uses an approach based on sliding camera to reduce perspective deformation.Combined with either a global projection model or a local projection model, this method can effectively reduce the perspective deformation.• An optical-flow-based image alignment and blending method is adopted to further mitigate misalignment and improve the stitching quality of the mosaic generated by a global projection model.

•
Each step in the SC-AOF method can be combined with other methods to improve the stitching quality of those methods.
This article is organized as follows.Section 2 presents the related works.Section 3 first introduces the overall method of this article, then an edge stretching reduction method based on sliding camera and a local misalignment reduction method based on asymmetric optical flow are elaborated in detail.Section 4 presents our qualitative and quantitative experimental results compared with other methods.Finally, Section 5 summarizes our method.

Related Works
For the local alignment, APAP (as-projective-as-possible) [8,9] uses the weighted DLT (direct linear transform) method to estimate the location-dependent homography and then eliminate misalignment.However, if some key points match incorrectly, the image areas near these key points may have incorrect homography, resulting in serious alignment errors and distortion.APAP needs to estimate homography using DLT for each image cell, and therefore APAP runs much slower than the global homography warping.REW (robust elastic warping) [10,11] uses the TPS (thin-plate spline) interpolation method to convert discrete matched feature points into a deformation field, which is used to warp the image and achieve accurate local alignment.The estimation of TPS parameters and the deformation field is fast, so REW has excellent running efficiency.TFT (triangular facet approximation) [6] uses the Delaunay triangulation method and the matched feature points to triangulate the mosaic canvas, and the warping inside each triangle is determined by the homography calculated based on the three triangle vertices, so the false matches will lead to serious misalignment.TFT estimates a plane for every triangle instead of a homography for every cell, so TFT depends on the number of triangular facets in efficiency and runs faster than APAP generally.The warping-residual-based image stitching method [7] first estimates multiple homography matrices, and calculates warping residuals of each matched feature point using the multiple homography matrices.The homography of each region is estimated using moving DLT with the difference of the warping residuals as weight, which means the method can handle larger parallax than APAP, but is less robust to the incorrectly estimated homography and runs slower than APAP.The NIS (natural image stitching) [12] method estimates a pixel-to-pixel transformation based on feature matches and the depth map to achieve accurate local alignment.In [13], by increasing feature correspondences and optimizing hybrid terms, sufficient correct feature correspondences are obtained in the low-texture areas to eliminate misalignment.The two methods require additional runtime to enhance robustness, but also are susceptible to the uneven distribution and false matches of feature points.For perspective deformation, SPHP (shape preserving half projective) [14,15] spatially combines perspective transformation and similarity transformation to reduce deformation.Perspective transformation can better align pixels in overlapping areas, and similarity transformation preserves the viewpoint of the original image in non-overlapping areas.AANAP (adaptive as-natural-as-possible) [16] derives the appropriate similarity transform directly based on matched feature points, and uses weights to gradually transit from perspective transform to similarity transform.The transitions from the homography of the overlapping area to the similarity matrix of the non-overlapping area adopted by SPHP and AANAP are artificial and unnatural, and can generate some "strange" homography matrices, causing significant distortion in the overlapping area.Both SPHP and AANAP require the estimation of homography or similarity matrices for each cell, and thus have the same efficiency issue as APAP.GSP (global similarity prior) [17,18] adds a global similarity prior to constrain the warping of each image so that it resembles a similarity transformation as a whole and avoids large perspective distortion.In SPW (single-projective warp) [19], the quasi-homography warp [20] is adopted to mitigate projective distortion and preserve the single perspective.SPSO (Structure Preservation and Seam Optimization) [4] uses a hybrid warping model based on multi-homography and mesh-based warp to obtain precise alignment of areas at different depths while preserving local and global image structures.GES-GSP (geometric structure preserving-global similarity prior) [21] employs deep learning-based edge detection to extract various types of large-scale edges, and further introduces large-scale geometric structure preservation to GSP to preserve the curves in images and protect them from distortion.GSP, SPW, SPSO and GES-GSP are based on content preserving warping and require constructing and solving a linear equation with m variables and n equations to acquire the corresponding coordinates after mesh warping, in which m is the number of cell vertices multiplied by 2, n is the number of alignment constraints, structural preservation constraints, and other constraints.Both m and n are generally larger, therefore more runtime is required.
Based on the above analysis, generating a natural mosaic quickly and robustly remains a challenging task.

Methodology
The flow chart of the SC-AOF algorithm is illustrated in Figure 1.The details on each of its steps are described below.mation.Perspective transformation can better align pixels in overlapping areas, and similarity transformation preserves the viewpoint of the original image in non-overlapping areas.AANAP (adaptive as-natural-as-possible) [16] derives the appropriate similarity transform directly based on matched feature points, and uses weights to gradually transit from perspective transform to similarity transform.The transitions from the homography of the overlapping area to the similarity matrix of the non-overlapping area adopted by SPHP and AANAP are artificial and unnatural, and can generate some "strange" homography matrices, causing significant distortion in the overlapping area.Both SPHP and AA-NAP require the estimation of homography or similarity matrices for each cell, and thus have the same efficiency issue as APAP.GSP (global similarity prior) [17,18] adds a global similarity prior to constrain the warping of each image so that it resembles a similarity transformation as a whole and avoids large perspective distortion.In SPW (single-projective warp) [19], the quasi-homography warp [20] is adopted to mitigate projective distortion and preserve the single perspective.SPSO (Structure Preservation and Seam Optimization) [4] uses a hybrid warping model based on multi-homography and mesh-based warp to obtain precise alignment of areas at different depths while preserving local and global image structures.GES-GSP (geometric structure preserving-global similarity prior) [21] employs deep learning-based edge detection to extract various types of large-scale edges, and further introduces large-scale geometric structure preservation to GSP to preserve the curves in images and protect them from distortion.GSP, SPW, SPSO and GES-GSP are based on content preserving warping and require constructing and solving a linear equation with  variables and  equations to acquire the corresponding coordinates after mesh warping, in which  is the number of cell vertices multiplied by 2,  is the number of alignment constraints, structural preservation constraints, and other constraints.Both  and  are generally larger, therefore more runtime is required.
Based on the above analysis, generating a natural mosaic quickly and robustly remains a challenging task.

Methodology
The flow chart of the SC-AOF algorithm is illustrated in Figure 1.The details on each of its steps are described below.
Figure 1.Flow chart of SC-AOF method.After the detection and matching of feature points, the camera parameters are obtained in advance or estimated.Then the two warped images are calculated using SC method, and the mosaic that is coarsely aligned can be obtained.Finally, the AOF method is used to further align the two warped images to generate a blended mosaic with higher alignment accuracy.

Figure 1.
Flow chart of SC-AOF method.After the detection and matching of feature points, the camera parameters are obtained in advance or estimated.Then the two warped images are calculated using SC method, and the mosaic that is coarsely aligned can be obtained.Finally, the AOF method is used to further align the two warped images to generate a blended mosaic with higher alignment accuracy.
Step 1: Feature point detection and matching.SIFT (scale-invariant feature transform) and SURF (speed-up robust feature) methods are generally used to detect and describe key points from two input images.Using the KNN (k-nearest neighbor) method, a group of matched points is extracted from the key points and used for camera parameter estimation in step 2 and global projection plane calculation in step 3.
Step 2: Camera parameter estimation.The intrinsic and extrinsic camera parameters are the basis of the SC method, and can be obtained in advance or estimated.When camera parameters are known, we can skip step 1 and directly start from step 3. When camera parameters are unknown, they can be estimated by minimizing the epipolar and planar errors, as described in Section 3.3.
Step 3: Sliding camera-based image projection.In this step, we estimate the global projection plane first, then adjust the camera projection matrix and generate a virtual camera in the overlapping area by interpolation, and obtain the warped images by global planar projection, as detailed in Section 3.1.Misalignment can be found in the two warped images obtained in the current step.Therefore, we need to use the AOF method in step 4 to further improve the alignment accuracy.
Step 4: Flow-based image blending.In this step, we first calculate the bidirectional asymmetric optical flow between the two warped images, then further align and blend the two warped images to generate a mosaic using the optical flow (see Section 3.2 for more details).

SC: Viewpoint Preservation Based on Sliding Camera
The sliding camera (SC) method is proposed for the first time to solve perspective deformation, and is the first step in the SC-AOF method.For this reason, this section will first introduce the stitching process of this method, and then detail how to calculate the global projection plane and the sliding projection matrix required by this method.

SC Stitching Process
In order to ensure that the mosaic can maintain the perspective of the two input images, the SC method is used.That is, in the non-overlapping area, the viewpoints of the two input images are preserved.In the overlapping area, the viewpoint of the camera is gradually transformed from I 1 to I 2 .
As shown in Figure 2, the image I 1 and I 2 are back-projected onto the projection surface n, so that the corresponding non-overlapping areas Ω 1 , Ω 2 and overlapping area Ω o are obtained.Assume that the pixels in the mosaic I are u 1 , u 2 , . . ., u 8 , which correspond to the sampling points S 1 , S 2 , . . ., S 8 on the projection surface n.When the sampling points are within the projection area Ω 1 of image I 1 , the mosaic is generated from the viewpoint of I 1 .S 1 , S 2 , S 3 are the intersection points of the backprojection lines of u 1 , u 2 , u 3 in I 1 and the projection surface n.Therefore, u i = P 1 S i (i = 1, 2, 3), where P 1 is the projection matrix of I 1 .When the sampling points are within the projection area Ω 2 of image I 2 , the mosaic is generated from the camera viewpoint of I 2 .Similarly, we obtain S i and u i = P 2 S i , where i = 6, 7, 8.In the overlapping area Ω o of I 1 and I 2 , the SC method is used to generate a virtual camera, whose viewpoint gradually transitions from the viewpoint of I 1 to that of I 2 .S 4 and S 5 are the intersection points of the back-projection lines of u 4 , u 5 in the visual camera and projection plane n, respectively.The virtual camera's image is generated from images I 1 and I 2 using perspective transformation.For example, pixel u 4 of the virtual camera corresponds to pixel u 4  1 in I 1 and pixel u 4 2 in I 2 , and are generated by blending the latter two pixels.

Figure 2.
Image stitching based on sliding cameras. is the projection surface, which is fitted scene points  1 ,  2 , . . .,  6 .Stitched image  can be generated by projection of sampling po  1 ,  2 , . . .,  8 .The points  1 ,  2 ,  3 in the area  1 are generated by back-projection of pixels in Similarly, the points  6 ,  7 ,  8 in the area  2 are generated by back-projection of pixels in  2 .points  4 ,  5 in the area   are generated by back-projection of pixels in virtual cameras.The p values of  4 ,  5 correspond to the fused pixel values of projection in  1 and  2 . 1 and  2 are camera projection matrices of images  1 and  2 .To unify the pixel coordinates of  1 and  2 ,  adjusted to  2 ′ using the method in Section 3.1.3.
Global projection surface calculation.In order to match the corresponding pix  1 4 of  1 and  2 4 of  2 , the projection surface  needs to be as close as possible to real scene point; we can use the moving plane method [7][8][9] or the triangulation meth [6] to obtain a more accurate scene surface.Since the SC-AOF method will use the opt flow to further align the images, for the stitching speed and stability, only the global pl is calculated as the projection surface.Section 3.1.2will calculate the optimal global p jection surface using the matched points.
Sliding camera generation.Generally, since the pixel coordinates of  1 and  2 not uniform, in the mosaic  , when ( ̃) =  1 ( 1 ) in the non-overlapping area of ( ̃) =  2 ( 2 ) is false in the non-overlapping area of  2 , where  is the sampling po on the projection surface.It is necessary to adjust the projection matrix of  2 to  2 ′ , so t ( ̃) =  2 ( 2 ′ ).The red camera is shown in Figure 2. Section 3.1.3will deduce the adju ment method of the camera projection matrix, and interpolate in the overlapping area generate a sliding camera, and obtain the warped images of  1 and  2 .

Global Projection Surface Calculation
The projection matrices of cameras  1 and  2 corresponding to images  1 and are: where  1 and  2 are the intrinsic parameter matrices of  1 and  2 respectively;  the inter-camera rotation matrix; and  is location of the optical center of  2 in the co dinate system  1 .
The relationship between the projection  1 in  1 and the projection  2 in  2 o 3D point  on plane  is: Global projection surface calculation.In order to match the corresponding pixels u 4 1 of I 1 and u 4  2 of I 2 , the projection surface n needs to be as close as possible to the real scene point; we can use the moving plane method [7][8][9] or the triangulation method [6] to obtain a more accurate scene surface.Since the SC-AOF method will use the optical flow to further align the images, for the stitching speed and stability, only the global plane is calculated as the projection surface.Section 3.1.2will calculate the optimal global projection surface using the matched points.
Sliding camera generation.Generally, since the pixel coordinates of I 1 and I 2 are not uniform, in the mosaic I, when I ∼ u = I 1 (P 1 S) in the non-overlapping area of I 1 , ) is false in the non-overlapping area of I 2 , where S is the sampling point on the projection surface.It is necessary to adjust the projection matrix of I 2 to P ′ 2 , so that ).The red camera is shown in Figure 2. Section 3.1.3will deduce the adjustment method of the camera projection matrix, and interpolate in the overlapping area to generate a sliding camera, and obtain the warped images of I 1 and I 2 .

Global Projection Surface Calculation
The projection matrices of cameras C 1 and C 2 corresponding to images I 1 and I 2 are: where K 1 and K 2 are the intrinsic parameter matrices of C 1 and C 2 respectively; R is the inter-camera rotation matrix; and t is location of the optical center of C 2 in the coordinate system C 1 .
The relationship between the projection u 1 in I 1 and the projection u 2 in I 2 of a 3D point p on plane n is: We use Equation ( 3) of all matched points to construct an overdetermined equation and obtain the fitted global plane n by solving this equation.Since the optical-flow-based stitching method will be used to further align the images, the RANSAC method is not used here to calculate the plane with the most inliers.Instead, the global plane that fits all feature points as closely as possible is selected, misalignment caused by global plane projection will be better solved during optical flow blending.

Projection Matrix Adjustment and Sliding Camera Generation
To preserve the viewpoint in the non-blending area of I 2 , it is only required to satisfy where ∼ u is the homogeneous coordinate of a pixel in the mosaic, ∼ u 2 is the homogeneous coordinate of a pixel in I 2 , N is a similarity transformation between I 2 and I, and can be obtained by fitting the matched feature points: where ∼ u j 1 and ∼ u j 2 are the homogeneous coordinates of pixels in I 1 and I 2 respectively.Therefore, in the non-overlapping area of I 2 , , where S is the corresponding 3D point of ∼ u 2 on plane n.So we get the projection matrix By RQ decomposition, the internal parameter matrix K ′ 2 and rotation R ′ 2 are extracted from P ′ 2 : where K ′ 2 and R ′ 2 are upper triangular matrix and rotation matrix respectively; and the third line of both matrices is 0 0 1 .
Compared with P 2 , P ′ 2 has a different intrinsic parameter matrix, and its rotation matrix only differs by one rotation around Z axis, and its optical center t is not changed.
where q 1 , q 2 , q m represent the quaternions corresponding to I 3×3 , R ′ 2 and R m , θ is the angle between q 1 and q 2 , and m is the weighting coefficient.
As depicted by Figure 3, the weighting coefficient m can be calculated by the method in AANAP [16]: Equation ( 10) and Equation ( 11) are also applicable to the no jecting  1 and  2 through   1 and   2 onto the mosaic, respect ages  1 ′ and  2 ′ .Obviously: Figure 4 shows the experiment result on two school images parallax between  1 and  2 , blending  1 ′ and  2 ′ will cause ghos section will use an optical-flow-based blending method (AOF) to  In the overlapping area, if u corresponds to sliding camera (K m , R m , t m ), then the relation between u and u i in I i (i = 1, 2) can be expressed as: Equations ( 10) and ( 11) are also applicable to the non-overlapping area.Projecting I 1 and I 2 through H 1 m and H 2 m onto the mosaic, respectively, to get warped images I ′ 1 and I ′ 2 .Obviously: Figure 4 shows the experiment result on two school images used in [10].Due to the parallax between I 1 and I 2 , blending I ′ 1 and I ′ 2 will cause ghosting.Therefore, the next section will use an optical-flow-based blending method (AOF) to further align the images.

AOF: Image Alignment Based on Asymmetric Optical Flow
The mosaic generated by the SC method will inevitably have misalignment in most cases.So, the optical-flow-based method is further employed to achieve more accurate alignment.This section firstly introduces the image alignment process based on asymmetric optical flow (AOF), and then details the calculation method of AOF.

Image Blending Process of AOF
1 and  2 are projected onto the custom projection surface to obtain warped images  1 ′ and  2 ′ , which are then blended to generate the mosaic .As the 3D points of the scene are not always on the projection plane, ghosting artifacts can be seen in the mosaic, as shown in Figure 4 in the previous section.Direct multi-band image blending will lead to artifacts and blurring.As shown in Figure 5, point  is projected to two points  1 and  2 in the mosaic, resulting in duplication of content.To solve the ghosting problem in the mosaic, the optical-flow-based blending method in [22] is adopted.
Suppose  2→1 ( 2 ) represents the optical flow value of  2 in  2 ′ and  1→2 ( 1 ) represents the optical flow value of  1 in  1 ′ .If the blending weight of pixel  ̃ in the overlapping area is  (from the non-overlapping area of  1 ′ to the non-overlapping area of  2 ′ ),  gradually transitions from 0 to 1, as shown in Figure 5, then after blending, the pixel value of image  at  ̃ is: That is, in the overlapping area, the blended value is

AOF: Image Alignment Based on Asymmetric Optical Flow
The mosaic generated by the SC method will inevitably have misalignment in most cases.So, the optical-flow-based method is further employed to achieve more accurate alignment.This section firstly introduces the image alignment process based on asymmetric optical flow (AOF), and then details the calculation method of AOF.

Image Blending Process of AOF
I 1 and I 2 are projected onto the custom projection surface to obtain warped images I ′ 1 and I ′ 2 , which are then blended to generate the mosaic I.As the 3D points of the scene are not always on the projection plane, ghosting artifacts can be seen in the mosaic, as shown in Figure 4 in the previous section.Direct multi-band image blending will lead to artifacts and blurring.As shown in Figure 5, point P is projected to two points p 1 and p 2 in the mosaic, resulting in duplication of content.To solve the ghosting problem in the mosaic, the optical-flow-based blending method in [22] is adopted.
where  ̃1 =  ̃+  2→1 ( ̃) represents the corresponding value of  ̃ in  1 ′ , and That is, for any pixel  ̃ in the overlapping area of the mosaic, its final pixel value can be obtained by a weighted combination of its corresponding values in the two warped images using optical flow.
To achieve get a better blending effect, following the method presented by Meng and Liu [23], a softmax function is used to facilitate the mosaic transition quickly from  1 ′ to  2 ′ , narrowing the blending area.Furthermore, if the optical flow value of a warped image is larger, the salience is higher, and the blending weight of the warped image should be increased accordingly.Therefore, the following blending weight  can be employed: where  1 = || 2→1 ( ̃)|| and  2 = || 1→2 ( ̃)|| represents the optical flow magnitude;   is the shape coefficient of the softmax function; and   denotes the enhancement coefficient of the optical flow.The larger   and   are, the closer  is to 0 or 1, and the smaller the image transition area becomes.Also, similar to multi-band blending, a wider blending area is used in smooth and color-consistent areas, and a narrower blending area is used in color-inconsistent areas.And the pixel consistency is measured using   : The final blending parameter  is obtained: ) corresponds to a fast transition from  1 ′ to  2 ′ ,  corresponds to a linear transition from  1 ′ to  2 ′ .When the color differs slightly, the transition from  1 ′ to  2 ′ is linear, and when the color difference is large, we tend to have a fast transition from  1 ′ to  2 ′ .Then the pixel value of the mosaic is: where and That is, for any pixel ∼ p in the overlapping area of the mosaic, its final pixel value can be obtained by a weighted combination of its corresponding values in the two warped images using optical flow.
To achieve get a better blending effect, following the method presented by Meng and Liu [23], a softmax function is used to facilitate the mosaic transition quickly from I ′ 1 to I ′ 2 , narrowing the blending area.Furthermore, if the optical flow value of a warped image is larger, the salience is higher, and the blending weight of the warped image should be increased accordingly.Therefore, the following blending weight β can be employed: where α s is the shape coefficient of the softmax function; and α m denotes the enhancement coefficient of the optical flow.The larger α s and α m are, the closer β is to 0 or 1, and the smaller the image transition area becomes.Also, similar to multi-band blending, a wider blending area is used in smooth and color-consistent areas, and a narrower blending area is used in color-inconsistent areas.And the pixel consistency is measured using D c : Sensors 2024, 24, 4035 10 of 38 The final blending parameter α is obtained: β corresponds to a fast transition from I ′ 1 to I ′ 2 , λ corresponds to a linear transition from I ′ 1 to I ′ 2 .When the color differs slightly, the transition from I ′ 1 to I ′ 2 is linear, and when the color difference is large, we tend to have a fast transition from I ′ 1 to I ′ 2 .Then the pixel value of the mosaic is: The curve in the left panel of Figure 6 shows the curve of β with respect to λ under different optical flow intensities.β can be used to achieve quick transition of the mosaic from I ′ 1 to I ′ 2 , narrowing the transition area.In the case of a large optical flow, the blending weight of the corresponding image can be increased to reduce the transition area.The curve in the right panel of Figure 6 shows the influence of λ d on the curve of α as a function of λ.When λ d is small, a wider fusion area tends to be used; otherwise, a narrower fusion area is used, which is similar to the blending of different frequency bands in a multi-band blending method.
OR PEER REVIEW 10 of 37 The curve in the left panel of Figure 6 shows the curve of  with respect to  under different optical flow intensities. can be used to achieve quick transition of the mosaic from  1 ′ to  2 ′ , narrowing the transition area.In the case of a large optical flow, the blending weight of the corresponding image can be increased to reduce the transition area.The curve in the right panel of Figure 6 shows the influence of   on the curve of  as a function of .When   is small, a wider fusion area tends to be used; otherwise, a narrower fusion area is used, which is similar to the blending of different frequency bands in a multi-band blending method.

Calculation of Asymmetric Optical Flow
The general pipeline of the optical flow calculation is to construct an image pyramid, calculate the optical flow of each layer from coarse to fine, and use the estimated currentlayer optical flow divided by the scaling factor as the initial optical flow of the finer layer until the optical flow of the finest layer is obtained [23][24][25][26].Different methods are proposed to achieve better solutions that satisfy brightness constancy assumptions, solve large displacements and appearance variation [27,28], and address edge blur and improve temporal consistency [29][30][31].Recently, some deep learning methods have been proposed.For example, RAFT (recurrent all-pairs field transforms for optical flow) [32] extracts perpixel features, builds multi-scale 4D correlation volumes for all pairs of pixels, and iteratively updates a flow field through a recurrent unit.FlowFormer (optical flow Transformer) [33] is based on a transformer neural network architecture with a novel encoder which effectively aggregates cost information of correlation volume into compact latent cost tokens, and a recurrent cost decoder which recurrently decodes cost features to iter-

Calculation of Asymmetric Optical Flow
The general pipeline of the optical flow calculation is to construct an image pyramid, calculate the optical flow of each layer from coarse to fine, and use the estimated currentlayer optical flow divided by the scaling factor as the initial optical flow of the finer layer until the optical flow of the finest layer is obtained [23][24][25][26].Different methods are proposed to achieve better solutions that satisfy brightness constancy assumptions, solve large displacements and appearance variation [27,28], and address edge blur and improve temporal consistency [29][30][31].Recently, some deep learning methods have been proposed.For example, RAFT (recurrent all-pairs field transforms for optical flow) [32] extracts per-pixel features, builds multi-scale 4D correlation volumes for all pairs of pixels, and iteratively updates a flow field through a recurrent unit.FlowFormer (optical flow Transformer) [33] is based on a transformer neural network architecture with a novel encoder which effectively aggregates cost information of correlation volume into compact latent cost tokens, and a recurrent cost decoder which recurrently decodes cost features to iteratively refine the estimated optical flows.
In order to improve the optical flow calculation speed, we use the method based on optical flow propagation and gradient descent adopted in Facebook surround360 [34] to calculate the optical flow.When calculating the optical flow of each layer, first calculate the optical flow of each pixel from top to bottom and from left to right.From the optical flow values of the current-layer left and top pixels and upper-layer same-position pixel, the value with minimum error represented by Equation ( 19) is selected as the initial value of the current pixel.Then a gradient descent method is performed to update the optical flow value of the current pixel, and is then spread to the right and bottom pixels, as a candidate for the initial optical flow of the right and bottom pixels.After completing the forward optical flow propagation from top to bottom and from left to right, perform a reverse optical flow propagation and gradient descent from bottom to top and from right to left to obtain the final optical flow value.
When calculating the optical flow value F(u) of pixel u, the error function E(F(u)) used is: where E I denotes the optical flow alignment error of the edge image (which is Gaussian filtered to improve the robustness); E S denotes the consistency error of the optical flow; G(u; σ) * F(u) denotes the Gaussian-filtered optical flow of pixel u; E T denotes the magnitude error after normalization of optical flow, with excessively large optical flow being penalized; W and H are the width and height of the current-layer image, respectively, D(1/W, 1/H) denotes the diagonal matrix with diagonal elements 1/W and 1/H.

Estimation of Image Intrinsic and Extrinsic Parameters
The SC-AOF method requires known camera parameters of images I 1 and I 2 .When only the intrinsic parameters K 1 and K 2 of an image are known, the essential matrix [t] × R between two images can be obtained by feature point matching, and the rotation matrix R and translation vector t between images can be obtained by decomposing the essential matrix.When both intrinsic and extrinsic parameters are unknown, the intrinsic parameters can be estimated by calibration [35,36] firstly, and then the extrinsic parameters of the image can be estimated accordingly.In these cases, both intrinsic and extrinsic parameters of image I 1 and I 2 can be estimated robustly.
When none of the above methods is feasible, it is necessary to calculate the fundamental matrix from the matched feature points and restore the camera internal and external parameters.
When the camera has zero skew, the known principal point and aspect ratio, then each intrinsic parameter matrix has only one degree of freedom (focal length of the camera).The total degree of freedom of the camera parameters is 7 (where t has 2 degrees of freedom due to the inability to recover scale, R has 3 degrees of freedom, and each camera has 1 degree of freedom), which is equal to the fundamental matrix F's degree of freedom.The internal and external parameters of the image can be recovered using a self-calibration method [37].But even if these constraints are met, the camera parameters by solved [37] suffer from large errors when the scene is approximately planar and the matching error is large.Therefore, we use the method of optimizing the objective function in [6] to solve the internal and external parameters of the camera.
To obtain an accurate fundamental matrix, firstly, the feature points need to be distributed more evenly in the image.As shown in Figure 7, a uniform and sparse distribution of feature points can both reduce the computation time and obtain more robust intrinsic and extrinsic camera parameters and global projection planes, which will lead to improved stitching results.Secondly, it is necessary to filter the matched feature points to exclude the influence of outliers.Use the similarity transformation to normalize the matched feature points.After normalization, the mean value of the feature points is 0, and the average distance to the origin is √2.
Thirdly, multiple homographies are estimated to exclude outlier points.Let   and  denote all matched feature points and the total number of matched feature points.In   , the RANSAC method with threshold  = 0.01 is applied to compute homography  1   Secondly, it is necessary to filter the matched feature points to exclude the influence of outliers.Use the similarity transformation to normalize the matched feature points.After normalization, the mean value of the feature points is 0, and the average distance to the origin is √ 2. Thirdly, multiple homographies are estimated to exclude outlier points.Let F cond and n denote all matched feature points and the total number of matched feature points.In F cond , the RANSAC method with threshold η = 0.01 is applied to compute homography H 1 and its inlier set F 1 inlier , and the matches of isolated feature points which have no neighboring points within a 50 pixel distance are removed from F 1 inlier .A new candidate set is generated by removing F 1 inlier from F cond .Repeat the above steps to calculate m homography matrices H m and corresponding inlier set F m inlier until F m inlier < 20 or F m inlier < 0.05n .The final inlier set is F inlier = ∪ m i=1 F i inlier .If m = 1, then there is only one valid plane.In this case, apply the RANSAC method with threshold η = 0.1 to recalculate homography H 1 and the corresponding inlier set F 1 inlier .After excluding the outliers, for any matched points {x 1 , x 2 } in the inlier set F inlier , the cost function is: where λ balanced epipolar constraint and the infinite homography constraint, and generally take λ = 0.01.h is a robust kernel function, which can mitigate the effect of mis-matched points on the optimization of camera internal and external parameters.r e and r p denote the projection errors of the epipolar constraint and of the infinite homography constraint, respectively: where ρ denotes the length of the vector composed of the first two components of Fx 1 .That is, assuming

Experiment
To verify the effectiveness of the SC-AOF method, the mosaics generated by our method and the existing APAP [4], AANAP [16], SPHP [14], TFT [7], REW [10] and SPW [18] methods are compared on some typical datasets used by others to verify the feasibility and advantages of the SC-AOF method in solving deformation and improving alignment.Next, the SC-AOF method is used together with other methods to demonstrate its compatibility.The image pairs used in the comparison experiment are shown in Figure 8.

Effectiveness Analysis of SC-AOF Method
In this section, various methods of image stitching are compared and analyzed based on three indicators: perspective deformation, local alignment and running speed.The experimental setup is as follows.
• The first two experiments compare typical methods for solving perspective deforma- tion and local alignment, respectively, and all the methods in the first two experiments are included in the third experiment to show the superiority of the SC-AOF method in all aspects.

•
Since the averaging methods generally underperform compared to linear blending ones, all methods to be compared adopt linear blending to achieve the best performance.• All methods other than ours use the parameters recommended by their proposers.Our SC-AOF method has the following parameter settings in optical-flow-based image blending: α s = 10, α m = 100, and c d = 10.
To verify the effectiveness of the SC-AOF method, the mosaics generated by our method and the existing APAP [4], AANAP [16], SPHP [14], TFT [7], REW [10] and SPW [18] methods are compared on some typical datasets used by others to verify the feasibility and advantages of the SC-AOF method in solving deformation and improving alignment.Next, the SC-AOF method is used together with other methods to demonstrate its compatibility.The image pairs used in the comparison experiment are shown in Figure 8.

Effectiveness Analysis of SC-AOF Method
In this section, various methods of image stitching are compared and analyzed based on three indicators: perspective deformation, local alignment and running speed.The experimental setup is as follows.

Perspective Deformation Reduction
Figure 9 shows the results of the SC-AOF method versus the SPHP, APAP, AANAP and SPW methods for perspective deformation reduction in image stitching.School, building and park square datasets were used in this experiment.We can see from Figure 9 that, compared with the other methods, our SC-AOF method changes the viewpoint of the stitched image in a more natural manner and effectively eliminates perspective deformation.As explained below, all other methods underperform compared to our SC-AOF method.
The image stitched using the APAP method has its edges stretched to a large extent.This is because it does not process perspective deformation.This method only serves as a reference to verify the effectiveness of perspective-deformation-reducing algorithms.
The AANAP algorithm can achieve a smooth transition between the two viewpoints, but results in severely "curved edges".And there is even more severe edge stretching for the park square dataset than that of the APAP method.This is because, when the AANAP method extrapolates from homographies, it linearizes the homography in addition to similarity transformation, causing affine deformation in the final transformation.
Compared with the APAP method, the SPW method makes no significant improvement in perspective deformation, except for the image in the first row.SPW preserves perspective consistency, so a multiple-viewpoint method excels in solving perspective deformation compared to single-viewpoint method.
and SPW methods for perspective deformation reduction in image stitching.School, building and park square datasets were used in this experiment.We can see from Figure 9 that, compared with the other methods, our SC-AOF method changes the viewpoint of the stitched image in a more natural manner and effectively eliminates perspective deformation.As explained below, all other methods underperform compared to our SC-AOF method.
Sensors 2024, 24, x FOR PEER REVIEW 15 of 37 The image stitched using the APAP method has its edges stretched to a large extent.This is because it does not process perspective deformation.This method only serves as a reference to verify the effectiveness of perspective-deformation-reducing algorithms.
The AANAP algorithm can achieve a smooth transition between the two viewpoints, but results in severely "curved edges".And there is even more severe edge stretching for the park square dataset than that of the APAP method.This is because, when the AANAP method extrapolates from homographies, it linearizes the homography in addition to similarity transformation, causing affine deformation in the final transformation.
Compared with the APAP method, the SPW method makes no significant improvement in perspective deformation, except for the image in the first row.SPW preserves The SPHP algorithm performs well overall.However, it causes severe distortions in some areas (red circles in Figure 8c) due to the rapid change of viewpoints.This is because the SPHP method estimates the similarity transformation and interpolated homographies from global homography.As a result, the similarity transformation cannot reflect the real scene information and the interpolated homographies may deviate from a reasonable image projection.The SSIM (structural similarity) [38] is employed to objectively describe the alignment accuracy of different methods.SSIM measures the similarity between two images  1 and  2 to be blended in the overlapping area.For our two-step alignment method,  1 () =  1 ′ ( +  2→1 ()),  2 () =  2 ′ ( + (1 − ) 1→2 ()).The structural similarity is defined as:

Local Alignment
where  1 and  1 represent the mean and standard deviation of pixel values within the overlapping area  of  1 , respectively. 2 and  2 are the corresponding mean and standard deviation of  2 , respectively. 12 is the covariance of pixel values in the overlapping area of  1 and  2 . 1 = ( 1 ) 2 and  2 = ( 2 ) 2 are constants used to maintain stability, where  1 = 0.01,  1 = 0.03, and L is the dynamic range of pixel values (for 8-bit grayscale images,  = 255).
The scores of all methods on the datasets building1, building2, garden, building, school, park-square, wall, cabinet, campus-square and racetracks are listed in Table 1.The best SSIM value is highlighted in bold.• The APAP method performs fairly well in most images, though with some alignment errors.This is because the moving DLT method smooths the mosaics to some extent.• The TFT-generated stitched image is of excellent quality in planar areas.But when there is a sudden depth change in the scene, there are serious distortions.This is because large errors appear when calculating planes using three vertices of a triangle in the area with sudden depth changes.• The REW method has large alignment errors in the planar area and aligns the images better than the APAP and TFT method in all other scenes.This is because the fewer feature points in the planar area might be filtered out as mismatched points by the REW method.
The SSIM (structural similarity) [38] is employed to objectively describe the alignment accuracy of different methods.SSIM measures the similarity between two images J 1 and J 2 to be blended in the overlapping area.For our two-step alignment method, J 1 (u) = The structural similarity is defined as: where µ 1 and σ 1 represent the mean and standard deviation of pixel values within the overlapping area O of J 1 , respectively.µ 2 and σ 2 are the corresponding mean and standard deviation of J 2 , respectively.σ 12 is the covariance of pixel values in the overlapping area of J 1 and J 2 .C 1 = (k 1 L) 2 and C 2 = (k 2 L) 2 are constants used to maintain stability, where k 1 = 0.01, k 1 = 0.03, and L is the dynamic range of pixel values (for 8-bit grayscale images, L = 255).
The scores of all methods on the datasets building1, building2, garden, building, school, park-square, wall, cabinet, campus-square and racetracks are listed in Table 1.The best SSIM value is highlighted in bold.• APAP and AANAP have high scores on all image pairs, but the scores are lower than our method and REW, proving that APAP and AANAP blur mosaics to some extent.• When SPHP is not combined with APAP, only the global homography is used to align the images, resulting in lower scores compared to other methods.• TFT has higher scores on the datasets except for the building dataset.TFT can improve alignment accuracy but also bring instability.

•
SPW combines quasi-homography and content-preserving warping to align images, which add other constraints while also reducing the accuracy of alignment, resulting in lower scores compared to REW and our method.

•
Both REW and our method use a global homography matrix to coarsely align the images.Afterwards, in REW and our method, a deformation field and optical flow are applied to further align the images, respectively.Therefore, both methods have higher scores and robustness than other methods.

Stitching Speed Comparison
The running speed is a direct reflection of the efficiency of each stitching method.Figure 12 shows the speed of the SC-AOF method versus the APAP, AANAP, SPHP, TFT, REW and SPW methods.The same image pairs as in the SSIM comparison are used in this experiment.It can be seen that the REW algorithm has the fastest stitching speed.The reason is that it only needs to calculate TPS parameters based on feature point matching and then compute the transformations of grid points quickly.Our SC-AOF method ranks second in terms of stitching speed, and the AANAP algorithm requires the longest running time.Both the APAP and AANAP methods calculate the local homographies based on moving DLT, and the AANAP method also needs to calculate the Taylor expansion of anchor points.In order to comprehensively and quantitatively evaluate our method and other methods in improving local alignment and reducing perspective deformation, we define a scoring method that assigns an integer score ranging from 0 to 10 to estimate the effectiveness and efficiency of stitching each image pair using each method.The total score is obtained by adding up the scores from four aspects: 1.The subjective scoring of perspective deformation reduction.The scores from 0 to 2, respectively, indicate severe deformation, slight relief of deformation, and less deformation.2. The subjective scoring of local alignment.The score ranges from 0 to 2, where 0 indicates obvious ghosting in many regions, 1 indicates few or mild mismatches, and 2 indicates no apparent alignment errors.3. The objective scoring of local alignment.The score ranges from 0 to 3. We define the mean and standard deviation of the SSIM values of different methods on the same image pair as  and , the SSIM of current method is , the score of the method is 0, 1, 2 and 3, respectively, when  satisfies  −  < − ,  −  ∈ (−, 0) ,  −  ∈ (0, −) and  −  > . 4. The scoring of running time.Like the objective scoring for local alignment, we score 0 when the running time of the method is greater than the mean plus standard deviation.When the time is less than the mean plus standard deviation and greater than the mean, the score is 1.The score is 2 when the time is less than the mean and greater than the mean minus standard deviation.Otherwise, the score is 3.
The scoring results of these methods on the image pairs are shown in Table 2.The image pairs in Table 2 include those used in SSIM and runtime comparison, as well as the test image pairs in the Appendix (specific comparison of the mosaics generated by different methods are shown in the Appendix).Every scoring is displayed in the format as "the score of perspective deformation reduction + the subjective score of local alignment + the objective score of local alignment + the score of running time = the overall score".The

Overall Scoring for All the Methods
In order to comprehensively and quantitatively evaluate our method and other methods in improving local alignment and reducing perspective deformation, we define a scoring method that assigns an integer score ranging from 0 to 10 to estimate the effectiveness and efficiency of stitching each image pair using each method.The total score is obtained by adding up the scores from four aspects: 1.
The subjective scoring of perspective deformation reduction.The scores from 0 to 2, respectively, indicate severe deformation, slight relief of deformation, and less deformation.

2.
The subjective scoring of local alignment.The score ranges from 0 to 2, where 0 indicates obvious ghosting in many regions, 1 indicates few or mild mismatches, and 2 indicates no apparent alignment errors.

3.
The objective scoring of local alignment.The score ranges from 0 to 3. We define the mean and standard deviation of the SSIM values of different methods on the same image pair as µ and σ, the SSIM of current method is x, the score of the method is 0, 1, 2 and 3, respectively, when The scoring of running time.Like the objective scoring for local alignment, we score 0 when the running time of the method is greater than the mean plus standard deviation.When the time is less than the mean plus standard deviation and greater than the mean, the score is 1.The score is 2 when the time is less than the mean and greater than the mean minus standard deviation.Otherwise, the score is 3.
The scoring results of these methods on the image pairs are shown in Table 2.The image pairs in Table 2 include those used in SSIM and runtime comparison, as well as the test image pairs in the Appendix B (specific comparison of the mosaics generated by different methods are shown in the Appendix B).Every scoring is displayed in the format as "the score of perspective deformation reduction + the subjective score of local alignment + the objective score of local alignment + the score of running time = the overall score".The highest score is bolded and highlighted.Our SC-AOF method has the highest scores in all the image pairs except the worktable image pair.Given that our method and REW all scored highly and have the same scores on some image pairs, in order to prove that our method is indeed ahead of REW, rather than due to statistical bias, we performed a Wilcoxon test using MATLAB 2018b on all scores of our method and REW.The resultant p-values of 0.0106 and h = 1 prove that the scores of REW and our method come from different distributions, our method has the better overall performance, and our method can maintain a desirable operation efficiency while guaranteeing the final image quality.Our method can have broad applications and promotion significance.

Compatibility of SC-AOF Method
The SC-AOF method can not only be used independently to generate stitched image with reduced perspective deformation and low alignment error, but also be decomposed (into SC method and image blending method) and combined with other methods to improve the quality of the mosaic.

SC Module Compatibility Analysis
The sliding camera (SC) module in the SC-AOF method can not only be used in the global alignment model, but also be combined with other local alignment models (e.g., APAP and TFT) to solve perspective deformation while maintaining the alignment accuracy.The implementation steps are as follows.

1.
Use the global similarity transformation to project I 2 onto the I 1 coordinate system to calculate the size and mesh vertices of the mosaic; 2.
Use Equations ( 6)-( 9) to calculate the weights of mesh vertices and the projection matrix, replace the homography H in (2) with the homography matrix in local alignment model, and bring them into (12) to compute the warped images and blend them.
Figure 13 presents the stitched images when using the TFT algorithm alone vs. using the TFT algorithm combined with the SC method.The combined method is more effective in mitigating edge stretching, and it generates more natural images.This shows that the SC method can effectively solve perspective deformation suffered by the local alignment method.

Blending Module Compatibility Analysis
The asymmetric optical-flow-based blending in the SC-AOF method can also be used in other methods to enhance the final stitching effect.The implementation steps are as follows.

1.
Generate two projected images using one of the other algorithms and calculate the blending parameters based on the overlapping areas; 2.
Set the optical flow value to be 0, replace linear blending parameter λ with α in Equation ( 17) to blend warped images, preserve the blending band width in the low-frequency area and narrow the blending width in the high-frequency area to obtain a better image stitching effect.blend them.
Figure 13 presents the stitched images when using the TFT algorithm alone vs. using the TFT algorithm combined with the SC method.The combined method is more effective in mitigating edge stretching, and it generates more natural images.This shows that the SC method can effectively solve perspective deformation suffered by the local alignment method.

Blending Module Compatibility Analysis
The asymmetric optical-flow-based blending in the SC-AOF method can also be used in other methods to enhance the final stitching effect.The implementation steps are as follows.
1. Generate two projected images using one of the other algorithms and calculate the blending parameters based on the overlapping areas; Figure 14 shows the image stitching effect of the APAP algorithm when using linear blending vs. when using our blending method.It can be seen that the blurring and ghosting in the stitched image are effectively mitigated when using our blending method.This shows that our blending algorithm can blend the aligned images better.
frequency area and narrow the blending width in the high-frequency area to obtain a better image stitching effect.
Figure 14 shows the image stitching effect of the APAP algorithm when using linear blending vs. when using our blending method.It can be seen that the blurring and ghosting in the stitched image are effectively mitigated when using our blending method.This shows that our blending algorithm can blend the aligned images better.

Conclusions
In this paper, to solve the perspective deformation and misalignment in image stitching using homographies, a SC-AOF method is proposed.In image warping, a new virtual camera and a projection matrix are generated as the observation perspective in the overlapping area by interpolating between two projection matrices.The overlapping area transitions gradually from one viewpoint to another to achieve preservation of the viewpoint and the smooth transition of the stitched image, and thus solve the perspective deformation problem.In image blending, the optical-flow-based blending algorithm is proposed to further improve alignment accuracy.The width of the blending area is automatically adjusted according to the softmax function and alignment accuracy.Finally, extensive comparison experiments are conducted to demonstrate the effectiveness of our algorithm in reducing perspective deformation and improving alignment accuracy.In

Conclusions
In this paper, to solve the perspective deformation and misalignment in image stitching using homographies, a SC-AOF method is proposed.In image warping, a new virtual camera and a projection matrix are generated as the observation perspective in the overlapping area by interpolating between two projection matrices.The overlapping area transitions gradually from one viewpoint to another to achieve preservation of the viewpoint and the smooth transition of the stitched image, and thus solve the perspective deformation problem.In image blending, the optical-flow-based blending algorithm is proposed to further improve alignment accuracy.The width of the blending area is automatically adjusted according to the softmax function and alignment accuracy.Finally, extensive comparison experiments are conducted to demonstrate the effectiveness of our algorithm in reducing perspective deformation and improving alignment accuracy.In addition, our algorithm had broad applicability, as its component modules can be used with other algorithms to mitigate edge stretching and improve alignment accuracy.
However, the proposed local alignment method may fail if the input images contain large parallax, which cause severe occlusion to prevent us from obtaining the correct optical flow.The problem of local alignment failure caused by large parallax also exists in other local alignment methods.Exploring more robust optical flow calculation and occlusion processing methods to reduce misalignment in a large parallax scene is an interesting research direction for future work.The scores of all methods on the image pairs roundabout, fence, railtracks, temple, corner, shelf, standing-he, foundation, guardbar, office, plantain, building4, potberry, lawn and worktable are listed in Table A3.The best SSIM value is highlighted in bold.

Figure 2 .
Figure 2. Image stitching based on sliding cameras.n is the projection surface, which is fitted by scene points p 1 , p 2 , . . ., p 6 .Stitched image I can be generated by projection of sampling points S 1 , S 2 , . . ., S 8 .The points S 1 , S 2 , S 3 in the area Ω 1 are generated by back-projection of pixels in I 1 .Similarly, the points S 6 , S 7 , S 8 in the area Ω 2 are generated by back-projection of pixels in I 2 .The points S 4 , S 5 in the area Ω o are generated by back-projection of pixels in virtual cameras.The pixel values of S 4 , S 5 correspond to the fused pixel values of projection in I 1 and I 2 .P 1 and P 2 are the camera projection matrices of images I 1 and I 2 .To unify the pixel coordinates of I 1 and I 2 , P 2 is adjusted to P ′ 2 using the method in Section 3.1.3.

Figure 3 .
Figure 3.The diagram of gradient weight.The quadrilateral is the bound of  1 and mapped image of  2 using  −1 , where  is the center of  1 a of the center point of  2 using  −1 .  and   are the projection poin the line  ′ of the quadrilateral vertices, respectively. * indicates the p overlapping area that need to calculate weighted parameter .

Figure 3 .
Figure 3.The diagram of gradient weight.The quadrilateral is the boundary of the overlapping area of I 1 and mapped image of I 2 using H −1 , where O is the center of I 1 and O ′ is the warped point of the center point of I 2 using H −1 .k m and K M are the projection points closest to O and O ′ on the line OO ′ of the quadrilateral vertices, respectively.P * indicates the pixel coordinates within the overlapping area that need to calculate weighted parameter m.

Figure 3 .Figure 4 .
Figure 3.The diagram of gradient weight.The quadrilateral is the boundary of the overlapping area of  1 and mapped image of  2 using  −1, where  is the center of  1 and  ′ is the warped point of the center point of  2 using  −1 .  and   are the projection points closest to  and  ′ on the line  ′ of the quadrilateral vertices, respectively. * indicates the pixel coordinates within the overlapping area that need to calculate weighted parameter .

Figure 4 .
Figure 4. Image stitching based on sliding cameras and global projection plane.(a,b) show the warped images I ′ 1 and I ′ 2 of the input images of a school; (c) shows the average blending images of I ′ 1

Figure 5 .
Figure 5. Image blending based on optical flow. 1  2 is the projection surface of the mosaic.In the overlapping areas (denoted by  2  1 ) of  1 and  2 , we need to blend  1 ′ and  2 ′ .The 3D point  is outside the projection surface.When  is projected onto the projection surface, ghosting points  1 and  2 appear.Through the weighted blending of asymmetric optical flow,  1 and  2 are merged into point  ̃, which solves the ghosting problem of stitching.

Figure 5 .
Figure 5. Image blending based on optical flow.B 1 E 2 is the projection surface of the mosaic.In the overlapping areas (denoted by B 2 E 1 ) of I 1 and I 2 , we need to blend I ′ 1 and I ′ 2 .The 3D point P is outside the projection surface.When P is projected onto the projection surface, ghosting points p 1 and p 2 appear.Through the weighted blending of asymmetric optical flow, p 1 and p 2 are merged into point ∼ p, which solves the ghosting problem of stitching.Suppose F 2→1 (p 2 ) represents the optical flow value of p 2 in I ′ 2 and F 1→2 (p 1 ) represents the optical flow value of p 1 in I ′ 1 .If the blending weight of pixel ∼ p in the overlapping area is λ (from the non-overlapping area of I ′ 1 to the non-overlapping area of I ′ 2 ), λ gradually transitions from 0 to 1, as shown in Figure 5, then after blending, the pixel value of image I at ∼ p is:

Figure 6 .
Figure 6.Blending parameter curves.The figure on the left shows the  curves at different optical flow intensities.The right figure shows the  curve at different   values.

Figure 6 .
Figure 6.Blending parameter curves.The figure on the left shows the β curves at different optical flow intensities.The right figure shows the α curve at different λ d values.

Figure 7 .
Figure 7.The impact of feature point distribution on stitching results.The feature points are marked by small color circles, and the blue boxes indicate the regions where the enlarged images are located in the mosaics.The feature points in (a) are concentrated in the grandstand.The corresponding mosaic (c) is misaligned in the playground area.The feature points in (b) are evenly distributed within a 2 × 2 grid.Although the total number of feature points is smaller, the mosaic (d) has better quality.(e,f) show the detail of mosaics.

Figure 7 .
Figure 7.The impact of feature point distribution on stitching results.The feature points are marked by small color circles, and the blue boxes indicate the regions where the enlarged images are located in the mosaics.The feature points in (a) are concentrated in the grandstand.The corresponding mosaic (c) is misaligned in the playground area.The feature points in (b) are evenly distributed within a 2 × 2 grid.Although the total number of feature points is smaller, the mosaic (d) has better quality.(e,f) show the detail of mosaics.

Figure 8 .
Figure 8.The image dataset for comparative experiments.The image pairs are initially used by stitching methods such as APAP, AANAP, and REW.

Figure 8 .
Figure 8.The image dataset for comparative experiments.The image pairs are initially used by stitching methods such as APAP, AANAP, and REW.

Figure 9 .
Figure 9.Comparison of perspective deformation processing.From the first row to the last row, the mosaics generated by our method, AANAP, SPHP, SPW and APAP on the datasets are presented, respectively.The red elliptical boxes indicate the unnatural transitions in the mosaics.

Figure 9 .
Figure 9.Comparison of perspective deformation processing.From the first row to the last row, the mosaics generated by our method, AANAP, SPHP, SPW and APAP on the datasets are presented, respectively.The red elliptical boxes indicate the unnatural transitions in the mosaics.

Figures 10 and 11
Figures 10 and 11 show the results of the SC-AOF method versus APAP, TFT and REW methods for local alignment in image stitching.It can be seen that SC-AOF performs well in all scenes, showing the effectiveness of our method in local alignment.Sensors 2024, 24, x FOR PEER REVIEW 16 of 37

Figure 10 .
Figure 10.Qualitative comparison on the garden image pairs.From the first row to the last row, the mosaics and detail views generated by our method, APAP, TFT, REW are presented, respectively.The red boxes indicate the regions where the enlarged images are located in the mosaics.The red circles highlight errors and distortions.

Figure 10 .
Figure 10.Qualitative comparison on the garden image pairs.From the first row to the last row, the mosaics and detail views generated by our method, APAP, TFT, REW are presented, respectively.The red boxes indicate the regions where the enlarged images are located in the mosaics.The red circles highlight errors and distortions.

Figure 11 .
Figure 11.Comparison of image alignment on the wall and cabinet image pairs.From the first row to the last row, the mosaics and detail views generated by our method, APAP, TFT, REW are presented, respectively.The blue boxes indicate the region where the enlarged images are located.The red circles highlight errors and distortions.

Figure 11 .
Figure 11.Comparison of image alignment on the wall and cabinet image pairs.From the first row to the last row, the mosaics and detail views generated by our method, APAP, TFT, REW are presented, respectively.The blue boxes indicate the region where the enlarged images are located.The red circles highlight errors and distortions.

Sensors 2024 , 37 Figure 12 .
Figure 12.Comparison on elapsed time.Our method is second only to REW in speed and is superior to other methods.4.1.4.Overall Scoring for All the Methods

Figure 12 .
Figure 12.Comparison on elapsed time.Our method is second only to REW in speed and is superior to other methods.

Figure 13 .
Figure 13.The combination of TFT and moving cameras method.(a) The mosaics created using TFT.(b) The mosaics obtained by adding the moving camera method to TFT.

Figure 13 .
Figure 13.The combination of TFT and moving cameras method.(a) The mosaics created using TFT.(b) The mosaics obtained by adding the moving camera method to TFT.

Figure 14 .
Figure 14.The combination of APAP and our blending method.(a) The mosaic and detail view generated by the APAP using linear blending.(b) The results of APAP combined with our blending method.The red elliptical boxes indicate the regions where the enlarged images are located.

Figure 14 .
Figure 14.The combination of APAP and our blending method.(a) The mosaic and detail view generated by the APAP using linear blending.(b) The results of APAP combined with our blending method.The red elliptical boxes indicate the regions where the enlarged images are located.
In this section, some supplementary experiments about perspective deformation reduction and local alignment are added.The image pairs used in the supplementary experiments are shown in FigureA1.

Figures
Figures A2-A4 show the comparisons of perspective deformation reduction of our method, AANAP, SPHP, SPW and APAP.The comparisons of local alignment of our method, APAP, TFT and REW are shown in Figures A5-A12.The detail images inside the red rects are the image regions with misalignment and shown directly right the mosaics.

Figures
Figures A2-A4show the comparisons of perspective deformation reduction of our method, AANAP, SPHP, SPW and APAP.The comparisons of local alignment of our method, APAP, TFT and REW are shown in FiguresA5-A12.The detail images inside the red rects are the image regions with misalignment and shown directly right the mosaics.

Figure A2 .
Figure A2.Qualitative comparisons of perspective deformation reduction on the building1, fence and building4 image pairs.Figure A2.Qualitative comparisons of perspective deformation reduction on the building1, fence and building4 image pairs.

Figure A2 .
Figure A2.Qualitative comparisons of perspective deformation reduction on the building1, fence and building4 image pairs.Figure A2.Qualitative comparisons of perspective deformation reduction on the building1, fence and building4 image pairs.

Figure A3 .
Figure A3.Qualitative comparisons of perspective deformation reduction on the foundation and office image pairs.Figure A3.Qualitative comparisons of perspective deformation reduction on the foundation and office image pairs.

Figure A3 .
Figure A3.Qualitative comparisons of perspective deformation reduction on the foundation and office image pairs.Figure A3.Qualitative comparisons of perspective deformation reduction on the foundation and office image pairs.

Figure A4 .
Figure A4.Qualitative comparisons of perspective deformation reduction on the standing-he and lawn image pairs.Figure A4.Qualitative comparisons of perspective deformation reduction on the standing-he and lawn image pairs.

Figure A4 .
Figure A4.Qualitative comparisons of perspective deformation reduction on the standing-he and lawn image pairs.Figure A4.Qualitative comparisons of perspective deformation reduction on the standing-he and lawn image pairs.

Figure A5 .
Figure A5.Qualitative comparisons of local alignment on the railtracks image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.Figure A5.Qualitative comparisons of local alignment on the railtracks image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A5 .
Figure A5.Qualitative comparisons of local alignment on the railtracks image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.Figure A5.Qualitative comparisons of local alignment on the railtracks image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A6 .
Figure A6.Qualitative comparisons of local alignment on the worktable image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A7 .
Figure A7.Qualitative comparisons of local alignment on the temple image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A6 . 37 Figure A6 .
Figure A6.Qualitative comparisons of local alignment on the worktable image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A7 .
Figure A7.Qualitative comparisons of local alignment on the temple image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.Figure A7.Qualitative comparisons of local alignment on the temple image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A7 .
Figure A7.Qualitative comparisons of local alignment on the temple image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.Figure A7.Qualitative comparisons of local alignment on the temple image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A8 .
Figure A8.Qualitative comparisons of local alignment on the guardbar image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.Figure A8.Qualitative comparisons of local alignment on the guardbar image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A8 .
Figure A8.Qualitative comparisons of local alignment on the guardbar image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.Figure A8.Qualitative comparisons of local alignment on the guardbar image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A9 .
Figure A9.Qualitative comparisons of local alignment on the roundabout image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A10 .
Figure A10.Qualitative comparisons of local alignment on the potberry image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A9 . 37 Figure A9 .
Figure A9.Qualitative comparisons of local alignment on the roundabout image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A10 .
Figure A10.Qualitative comparisons of local alignment on the potberry image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.Figure A10.Qualitative comparisons of local alignment on the potberry image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A10 .
Figure A10.Qualitative comparisons of local alignment on the potberry image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.Figure A10.Qualitative comparisons of local alignment on the potberry image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A11 .
Figure A11.Qualitative comparisons of local alignment on the plantain image pair.TFT failed to stitch this image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A11 . 37 Figure A12 .
Figure A11.Qualitative comparisons of local alignment on the plantain image pair.TFT failed to stitch this image pair.The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A12 .
Figure A12.Qualitative comparisons of local alignment on the shelf and corner image pairs.TFT failed to stitch the corner image pair.The Red circles highlight errors and distortions.

Table 2 .
The soring results on the image pairs.

Table A1 .
Cont. used for solving the optimal optical flow E I , E S , E T the optical flow's alignment error, consistency error and penalty for large value F i inlier , H i the i-th homography transforming I 1 to I 2 and the corresponding inlier set F cond , F inlier the initial set and the final inlier set of matched feature points r e , r p the projection errors of the epipolar constraint and of the infinite homography constraint h(r, σ) the robust kernel function to reduce the impact of false matches to optimization

Table A2 .
The abbreviations and their explanations.-as-possible, used to solve the local alignment by location-dependent homography warping DLT direct linear transform, used for estimating the parameters of the homography REW robust elastic warping, used to improve the local alignment using deformation fields TPS thin-plate spline, used to compute deformation fields corresponding to matched feature points TFT triangular facet approximation, using scene triangular facet estimating to improve the local alignment NIS natural image stitching, a local alignment method using the depth map SPHP shape preserving half projective, solving perspective deformation by gradually changing the resultant warp from projective to similarity AANAP adaptive as-natural-as-possible, a method to solve perspective deformation GSP global similarity prior, used to align images and reduce deformation SPW single-projective warp, which adopts the quasi-homography warp to mitigate projective distortion and preserve single perspective SPSO structure preservation and seam optimization, a method can obtain precise alignment while preserving local and global image structures.GES-GSP geometric structure preserving-global similarity prior, based on GSP to futher protect the large-scale geometric structure from distortion SIFT scale-invariant feature transform, a feature detection and description method SURF speed-up robust feature, a feature detection and description method, faster than SIFT KNN k-nearest neighbor, a feature matching method RAFT recurrent all-pairs field transforms, estimating optical flow based on deep learning RANSAC random sample consensus, used to filter outliers and estimate model parameters