Next Article in Journal
Development of Portable Magnetic Adsorption Amphibious Robot
Next Article in Special Issue
Holoscopic Elemental-Image-Based Disparity Estimation Using Multi-Scale, Multi-Window Semi-Global Block Matching
Previous Article in Journal
The Effect of Moisture Content on the Electrical Properties of Graphene Oxide/Cementitious Composites
Previous Article in Special Issue
Multi-Identity Recognition of Darknet Vendors Based on Metric Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Rectification for Stitched Images with Deformable Meshes and Residual Networks

Institute of Remote Sensing and Geographic Information Systems, Peking University, No. 5 Summer Palace Road, Beijing 100871, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(7), 2821; https://doi.org/10.3390/app14072821
Submission received: 26 February 2024 / Revised: 24 March 2024 / Accepted: 26 March 2024 / Published: 27 March 2024
(This article belongs to the Special Issue AI-Based Image Processing: 2nd Edition)

Abstract

:
Image stitching is an important method for digital image processing, which is often prone to the problem of the irregularity of stitched images after stitching. And the traditional image cropping or complementation methods usually lead to a large number of information loss. Therefore, this paper proposes an image rectification method based on deformable mesh and residual network. The method aims to minimize the information loss at the edges of the spliced image and the information loss inside the image. Specifically, the method can select the most suitable mesh shape for residual network regression according to different images. Its loss function includes global loss and local loss, aiming to minimize the loss of image information within the grid and global target. The method in this paper not only greatly reduces the information loss caused by irregular shapes after image stitching, but also adapts to different images with various rigid structures. Meanwhile, its validation on the DIR-D dataset shows that the method outperforms the state-of-the-art methods in image rectification.

1. Introduction

With the rapid development of image stitching and image fusion technologies, methods for obtaining multi-view or even global perspectives through multiple single viewpoints have been widely applied in human production and daily life [1,2,3,4,5]. For instance, the extensive use of technologies such as panoramic images, autonomous driving, and virtual reality (VR) enables the precise remote observation of scenes by individuals [6,7,8]. However, in the process of stitching multiple single-view images, it is necessary to align the overlapping regions of different images by adjusting their positions, angles, and local distortions [9]. This often results in irregular boundaries in non-overlapping regions, making it challenging for individuals to adapt and making them prone to misjudgments when observing panoramic images [10].
Some studies have solved irregular boundary selection by directly using smaller rectangular boxes to crop images [11,12]. However, such methods may result in the loss of a large amount of information, which contradicts the original purpose of image stitching, which aims to expand the field of view [13,14]. Additionally, image completion can be employed to predict missing portions of an image and restore its integrity to some extent [15,16]. Nevertheless, its limitations are evident, particularly in cases where the missing portions contain complex structures or highly personalized information, making it challenging for image completion to accurately predict the missing areas [17,18,19]. This limitation renders image completion unsuitable for applications in fields with high security requirements, such as autonomous driving and industrial production monitoring [20].
To address the aforementioned challenges, this study proposes a deep learning-based image rectification algorithm named RIS-DMRN (Rectification for Image Stitching with Deformable Mesh and Residual Network). The algorithm defines a deformable target mesh for irregularly stitched images, which can be predicted during model training. The selection of the deformable mesh shape is based on the judgment of the current image’s rigid structure by a convolutional neural network, offering three options: triangle, rectangle, and regular hexagon. Once the mesh shape is determined, the prediction network generates an initial predicted mesh based on the input irregular image and its mask matrix. The training process employs a width residual network to predict the initial mesh by the content-aware processing of irregularly stitched images. Subsequently, the input irregular image, predicted initial mesh, and predefined target mesh are collectively input into the width residual neural network for rectification regression. The loss function of the width residual network comprises local and global parts. The local loss function controls the deformation loss of targets within the mesh, while the global-related loss function helps avoid global information loss during the deformation process. Finally, the image rectangular restoration is achieved through continuous iterative regression using residual neural networks (Figure 1).
In response to the current problems of a large loss of edge information and severe deformation of internal details in image rectangles, this paper proposes a method based on variable grids and residual neural networks. The highlights of this study include the following:
  • This approach utilizes a deformable mesh as the initial mesh, allowing for more versatile directional movement during mesh restoration and achieving better correction effects for targets within irregular images.
  • The introduction of local and global-related loss functions significantly mitigates the drawbacks of traditional methods that focus only on partial regions, enhancing the overall coherence during the deformation recovery process and preserving content more effectively after image rectification.
  • In the experiment of randomly parallelizing 300 irregular images on the public dataset DRI-D [21], SSIM and PSNR reached 0.7234 and 22.65, respectively, achieving a relatively accurate level.

2. Related Work

Faced with the problems of image rectangles, researchers have conducted some related work and research, including feature matching-based methods, optimization algorithm-based methods, and deep learning methods. However, each method has its own limitations; for example, the feature matching-based method is easily affected by the mismatch and instability of the feature points in the image, resulting in discontinuous or irregularly shaped edges of the spliced image. While the class uses optimization algorithms to adjust the image, it usually needs to define the global or local loss function of the spliced image and use optimization algorithms to minimize the loss function so as to obtain smoother and more regular image edges. However, since these methods require complex optimization calculations on the image and may be affected by local optimal solutions, their effectiveness may be limited when dealing with large-scale images or complex scenes.
For example, in terms of feature matching, Zhu et al. [22] proposed adjusting the stitched image by computing a perspective transformation matrix to make it closer to a rectangular shape. However, this method often relies on the estimation of the geometric structures in specific regions of the image, such as lines or corners. Some approaches suggest transforming local quadrilateral mesh regions on the stitched image to make the overall image more rectangular [23,24,25]. Building upon the aforementioned research, He et al. [26] proposed optimizing the preservation of line meshes and deforming the rigid structures within the mesh. Li et al. [27] improved the preservation term from line meshes to geodesic lines. However, the applicability of this method is restricted due to the common occurrence of curved ground lines in panoramic images. Some researchers introduced Seam Carving, an algorithmic approach that alters the size of an image by carving or inserting pixels in different parts of the image, thereby transforming irregular images into rectangular forms [28,29,30,31,32]. Meanwhile, Lang et al. [21] proposed DRIS (Deep Rectangling for Image Stitching), employing a residual progressive regression strategy for fully convolutional network prediction of mesh deformations. Based on the predicted mesh, irregular images are corrected. This method partially addresses the challenges of flexible structural distortions for image rectification and computational acceleration. Moreover, the approach utilizes a residual progressive regression strategy for fully convolutional network prediction of mesh deformations and subsequent correction of irregular images. However, DRIS still faces certain challenges. For instance, its loss function focuses solely on the situation within the initial mesh, without considering global information for further adjustments. This limitation results in deformation errors in panoramic information. Additionally, the method concentrates on horizontal and vertical objectives within the mesh, making it prone to deformation errors when correcting targets in other scale directions [33,34]. Currently, there is relatively limited research on image rectification, and achieving image rectification while ensuring minimal loss of information remains a challenging task [35,36,37,38].

3. Materials and Methods

This paper proposes a deformable mesh structure for the initial prediction of irregular images, enhancing its adaptability in various spatial scene structures. In light of this mesh structure, the paper establishes two methods for mesh application. One approach involves predicting the rigid structure of the input image through a simple convolutional neural network. Based on the prediction results, the most suitable mesh shape for the image is selected (Figure 2a). Subsequently, the input image and the chosen mesh shape are input into a width residual network for initializing mesh prediction. Finally, the predicted initial mesh and the input irregular image are jointly used for image rectification regression, resulting in the output image.
Another option is to input the input image and predefined target meshes for all shapes into a width residual neural network to generate initial mesh predictions (Figure 2b). Subsequently, image rectification regression is performed with the input image. The optimization is then based on the regression loss of the rectified image, selecting the one with the minimum information loss as the final output image.
A model of image rectification algorithms based on deformable meshes and residual networks is illustrated in Algorithm 1, which consists of two strategies to choose from, namely, the image-based and loss-based strategies. The image-based strategy selects the grid shape that best suits the current image based on the number and distribution of rigid structures within the image. After that, the selected mesh shape and the input image are fed into the residual neural network to obtain the initialized prediction mesh. The initialized prediction grid and the input image are then subjected to rectangle regression together until it meets the accuracy requirements; otherwise, the network continues to be trained. The loss-based strategy, on the other hand, inputs the input image directly into the residual network to obtain initialized prediction grids for the three meshes. Then, after the rectangle regression of the three initialized prediction grids along with the input image, the rectangle image with the lowest loss is selected as the final output.
Algorithm 1 RIS-DMRN algorithm
Input: 
Irregular images: I; deformable target mesh: triangle(T), rectangle(R), and regular hexagon(H); mesh selection strategy: image-based, loss-based; neural network training parameters (such as learning rate, batch size, etc.)
Output: 
Rectangle images
 1:  for image-based do:
 2:    Obtain predicted rigid structure after I input CNN
 3:    Select the optimal mesh (such as H) based on the rigid structure
 4:    Input H and I into RNN to obtain the predicted initialization mesh (PIM)
 5:    if significant losses do:
 6:      Rectangle regression based on PIM and I
 7:    else do
 8:      Retrain the network
 9:  end for
 10:   for loss-based do:
 11:   I Input RNN to obtain the PIM of three meshes
 12:   Rectangle regression based on PIM and I
 13:   Select the best rectangle image based on three losses
 14:   end for

3.1. Deformable Mesh

This paper introduces deformable meshes to meet the application demands in different scenarios. Traditional rectangles exhibit weaker generalization capabilities when dealing with complex scenes. Therefore, the paper introduces two additional mesh models: hexagonal and triangular meshes. Hexagonal meshes have more uniform relationships between adjacent pixels, with each hexagon having six neighbors with equal adjacency properties. This provides better spatial consistency for image rectification. For example, during the image interpolation process, hexagonal meshes can offer smoother and more natural transitions. Moreover, hexagonal meshes closely resemble the shapes of many objects and structures found in the natural world, such as beehives and crystal structures [39]. Therefore, they may provide a more natural representation of images related to natural landscapes. Triangular meshes, on the other hand, excel in realistically reconstructing the shapes in images, especially when the images contain curves and surfaces [40]. The use of triangular meshes allows for better adaptation to irregular image regions, enabling more flexible shape approximation and, consequently, a more accurate capture of details in the images.
Simultaneously, this paper proposes two operational modes for the deformable meshes: speed-oriented and quality-oriented. In the speed-oriented mode, the input image undergoes the detection of rigid structures within irregular images using a simple recognition network. Based on the detected structure count and orientation, the mesh shape that best fits the threshold is directly selected. Currently available mesh shapes include triangles, rectangles, hexagons, and more. In the quality-oriented mode, each mesh shape conducts residual regression predictions on the input image, generating a rectified image. Ultimately, the optimal output is selected based on the loss values of the rectified images, choosing the one with a relatively superior rectification effect.

3.2. Network Architecture

The network architecture proposed in RIS-DMRN consists of two components (Figure 3): the rigid target recognition network and the width residual regression network [41]. The input comprises irregularly stitched images and their stitching mask matrix. The input image is initially processed by a simple recognition convolutional neural network to detect the quantity and orientation of rigid structures. Based on the detection results, the most suitable mesh shape is chosen for rectification. For instance, if there are predominantly horizontal or vertical rigid structures in the image, a preference is given to selecting a rectangular mesh. In the case of a higher prevalence of curved surfaces or curved structures, a triangular mesh is chosen. If the quantities of vertical rigid structures and curved structures are comparable, a hexagonal mesh is selected for image rectification.
The main structure of the rigid structure detection in this paper is a convolutional neural network (Figure 4), with input images resized to a unified 448 × 448. The CNN consists of six convolutional blocks, each composed of various combinations of 3 × 3 convolutional kernels, 1 × 1 convolutional kernels, and 2 × 2 max-pooling layers with a stride of 2. After extracting image features into a 1000-dimensional 7 × 7 feature vector, a 1000-dimensional vector is generated through average pooling. This vector is then input into Softmax for rigid structure detection. The network also incorporates normalization and dropout operations, although they are not explicitly shown in the diagram.
Once the mesh shape is determined, the irregularly stitched image and its mask matrix are fed into the Wide Residual Neural Network (Wide ResNet) for the rectification process. The choice of Wide ResNet as the recovery prediction network for image rectification is motivated by its ability to enhance feature dimensions in each residual block through increased channel numbers. This augmentation enables the network to capture richer feature representations, playing a crucial role in the recovery of content items after mesh transformation and minimizing the loss in rectified content. Moreover, due to significant internal variations within the meshes during the rectification process, some meshes experience a gradual decrease in gradients during the backward propagation of model training, leading to convergence challenges. The introduction of wider residual blocks in the Wide ResNet facilitates easier gradient flow, mitigating the issue of gradient vanishing during training [42,43].
The architecture of the Wide ResNet employed in RIS-DMRN consists of four residual convolutional blocks followed by an average pooling layer (Table 1). In the network, “k” represents the multiplier for the convolutional kernels in the original module, “N” indicates the number of residual modules in that layer, and “B (3,3)” signifies each residual module consisting of two 3 × 3 convolutional layers. After feature extraction through the residual network, a simple fully convolutional structure is utilized as a mesh motion regressor to predict the horizontal and vertical movements of each vertex based on the regular mesh, facilitating the output of the rectified image.

3.3. Loss Function

The loss function of the proposed RIS-DMRN consists of two components: the local loss function and the global loss function. The calculation is formulated as follows in Equation (1):
l t o t a l = ω l o c a l l l o c a l + ω g l o b a l l g l o b a l
where ω l o c a l and ω g l o b a l represent the weights assigned to the local loss and global loss, respectively. The local loss l l o c a l and global loss l g l o b a l contribute to the overall loss, and the weights control the balance between preserving local details and maintaining global context during the rectification process.

3.3.1. Local Loss

The content loss term in the RIS-DMRN consists of two components: content loss and mesh loss. The content loss term, represented by Equation (2), involves the comparison between the predicted mesh (m) applied to the input irregular image ( I p ) and the warped version of the irregular image ( D ( I p , m ) ) using the bending operation. Additionally, the content loss incorporates the difference between the predicted mesh and the ground truth mesh (T). The function C, denoting the “conv4” convolutional layer in the width recognition network, plays a role in shaping the content loss term. This formulation aims to ensure that the rectified image aligns closely with both the original irregular content and the ground truth mesh structure.
l c o n t e n t = T D ( I p , m ) 2 + C ( T ) C ( D ( I p , m ) ) 2
For the mesh loss term in RIS-DMRN, the formula can be expressed as Equation (3):
l m e s h = i , j W ( I p , m i , j ) W ( I p , T i , j )
where l m e s h represents the mesh loss term, i and j are indices within the mesh, m i , j is the predicted mesh by the model, T i , j is the ground truth label mesh, and W is the mesh generation function. This loss term aims to encourage the model to better learn and preserve the mesh structure of the image by comparing the differences between the predicted mesh by the model and the true label mesh.

3.3.2. Global Loss

The global loss term in RIS-DMRN proposed in this paper consists of two components: global structural loss term and boundary loss term, expressed as shown in Equation (4):
l g l o b a l = l m s + l b o r d e r
The computation of the global structural loss term is expressed as Equation (5), where I p c represents the irregular image cropped based on the mask matrix, μ I p c is the mean of I p c , μT is the mean of T, σ I p c 2 is the variance of I p c , σ T 2 is the variance of T, and σ I p c T is the covariance between I p c and T. Constants c 1 and c 2 are constants used to stabilize the formula.
l m s ( I p c , T ) = ( 2 μ I p c μ T + c 1 ) ( 2 σ I p c T + c 2 ) ( μ I p c 2 + μ T 2 + c 1 ) ( σ I p c 2 + σ T 2 + c 2 )
The expression for the boundary loss term is given by Equation (6), where I m represents the mask matrix of the original irregular image, and E represents the target template of an all-ones matrix. The boundary loss is adjusted based on the 0/1 mask matrix of the irregular stitched image, with an all-ones matrix as the true target, gradually approaching the rectangularization.
l b o r d e r = E D ( I m , m ) ) 2

4. Results

The experimental implementation of the RIS-DMRN algorithm in this study was conducted on the following workstation configuration: Processor (CPU): Intel Core i9-13900HX (2.2 GHz, 6 cores, 12 threads), Memory (RAM): 16GB DDR4 2400MHz, Graphics Card (GPU): NVIDIA GeForce GTX 4070 Ti (8GB GDDR5X). The algorithm was implemented using Python 3.6 + TensorFlow 1.13.1 for program design. Due to the limited availability of publicly accessible datasets for image rectification research, this study conducted validation on the DIR-D dataset [21]. Following the consistent approach outlined in the paper [21], RIS-DMRN set the batch size to 8 during the training process, initialized the learning rate to 1 × 10−3, and performed exponential decay on the learning rate at every 50 epochs. The parameters ω l o c a l and ω g l o b a l were set to 0.7 and 0.3, respectively, aiming to preserve detailed content while simultaneously focusing on the global shape changes. After the experimentation, this combination was found to be the better choice.

4.1. Quantitative Comparison of Image Rectification

The algorithm proposed in this paper was primarily tested on 300 samples selected from the DIR-D dataset. The DIR-D dataset consists of 5839 samples for training and 519 samples for testing. The resolution of each image in the dataset is 512 × 384, where each sample is a ternary consisting of a spliced image, a mask, and a rectangular label. The image content covers most of the scenes in human daily life with good generalization and usefulness, and, thus, serves as the experimental dataset for this paper. A quantitative comparison was performed against mainstream rectification methods, and the results are presented in Table 1. The term “Initialization” denotes the initial state of the freshly stitched image without image rectification processing. The quantification metrics include the average values of SSIM, PSNR, and FID within the samples for comparison [44,45,46].
It is evident that the proposed RIS-DMRN outperforms RPIW [26] and DRIS [21] in all metrics (Table 2). Additionally, it surpasses traditional seam carving and image completion. This superiority is attributed to the configurations in the loss functions of the deep learning rectification algorithm, which includes the design of content loss and mesh loss. These designs minimize the deformation of target content within the image during rectification, resulting in a more effective image rectification. Compared with the three loss terms of boundary, content, and grid adopted by DRIS, the global structure loss term and boundary loss term in RIS-DMRN better preserve global information. Preserving global information is crucial for the algorithm to better understand the contextual relationships of objects in the image. This understanding is vital for interpreting the relative positions, sizes, and interrelationships of objects. Additionally, global information contributes to maintaining consistency between different regions of the image, ensuring that the algorithm produces coherent output throughout the entire image, especially in tasks like image rectification [47,48,49].
Simultaneously, by performing comparisons with the method based on the image’s minimum energy line and rectangular mesh division adopted by RPIW, it can be observed that using a hexagon as the initial mesh shape yields slightly better results than traditional rectangular meshes (Table 2). This improvement is attributed to hexagons having more rigid directional choices and better shape adjacency relationships. While hexagons may increase the computational time to some extent compared to rectangles, they often produce superior results.
Consistent with the settings in [21], in this study, we also acknowledge that there may be differences in quantitative measurements when objects undergo slight positional variations in the generated rectangular results. Although the visual perception may still appear very natural in such cases, it could weaken the persuasiveness of quantitative experiments. Therefore, in this study, we also incorporate BIQUE [50] and NIQE [51] as “no-reference” evaluation metrics (Table 3). These two evaluation methods are no-reference image quality assessment metrics dedicated to quantifying the quality of images without the need for any additional reference data [52,53]. It is noteworthy that RIS-DMRN produces higher quality results under these blind image quality evaluation metrics. This indicates that the proposed method not only excels in preserving global information but also achieves significant improvements in overall image quality.

4.2. Qualitative Comparison of Image Rectification

To visually demonstrate the effectiveness of RIS-DMRN in image rectification, we divided the test set into two parts—one with more global contextual information and the other with more local detailed information. The algorithm was tested on both sets, and the results were compared qualitatively (Figure 5). Specifically, the study showcases the effects of different input irregular images, image completion results, RIS-DMRN processed images, and ground truth label images in scenes where global correlations are more prominent, such as natural landscapes.
It is evident that image completion can fully rectify the image into a rectangle (Figure 5), but it relies heavily on pixel-level adjustments based on context, making it overly dependent on surrounding information. This dependency may lead to the inaccurate filling of missing parts, resulting in generated images that appear unrealistic or unnatural, and may even cause a certain degree of decrease in image clarity [36,54]. In contrast, the rectangular images generated by RIS-DMRN closely resemble the real label images. RIS-DMRN performs well in maintaining the global rigidity or curvature of target objects in the image, attempting to preserve the original appearance without introducing local barrel or pincushion distortions.
In scenarios with dense local information, we compare the results of image completion and RIS-DMRN processing when dealing with irregular images (Figure 6). It is observed that image completion methods often lead to deformations in local rigid structures during the rectification process. Moreover, when addressing the boundaries of missing regions, noticeable boundary effects are common, as image completion methods need to ensure smooth transitions between the filled area and the surrounding region, leading to prominent boundary artifacts [55,56,57]. In addition, this article also presents other qualitative comparison results, as shown in Appendix A Figure A1 and Figure A2.
RIS-DMRN, with its finer mesh design, can control the loss and deformation of local information within a certain range. This ensures that local shape changes do not excessively impact the deformation of adjacent meshes, guaranteeing the preservation of local information during the rectification process. Additionally, constraints in different mesh directions in RIS-DMRN allow it to adapt to various deformations of rigid and curved structures, enabling a better fit to the original shape of the image during the deformation process.

4.3. Impact of Deformable Mesh and Loss Functions on Image Rectification

In accordance with practical application requirements, we designed three types of deformable meshes—triangle, rectangle, and regular hexagon—for predicting the rectification of irregular images. For the input size of the dataset at 512 × 384, a uniform mesh resolution of 16 × 12 was employed for rectification prediction. In terms of loss functions, both local and global loss terms were designed for regression prediction. Taking a random selection of 300 images from the test set of the DIR-D dataset as an example, we tested the quantitative metrics for image rectification under different method combinations (Table 4). In the table, “𝒲” indicates the inclusion of the current mesh shape or loss function in the combination.
When combining local and global loss terms on the basis of using a rectangular mesh, the overall rectification regression performance is poorer. Comparatively, the absence of the local loss term has a more significant impact on rectification. This is because the global loss term introduces substantial stretching and bending during the regression process, while the local loss term predicts the regression of rigid or curved structures based on the actual content within the mesh. If only the global loss term is used, the rectification may result in severe deformation within local structures, as they are not adequately repaired.
The figures provide a more intuitive sense of the image loss resulting from different combinations, offering a direct visual representation of the changes in evaluation metrics under various combinations (Figure 7). In terms of the overall rectification performance with deformable meshes, it can be observed that the hexagon performs the best, followed by the rectangular mesh, while the triangular mesh exhibits the poorest rectification effect. This is because the hexagonal mesh has better adaptability compared to other meshes; it can effectively cover and adapt to various irregular contours, thereby enhancing the performance of the regression model [58,59].
Simultaneously, the hexagonal mesh can more compactly cover the image area, reducing redundancy. It can decrease edge effects when handling image boundaries, reducing the likelihood of extensive deformation at the image edges and thereby improving representation efficiency [60]. In contrast, rectangular meshes may require more mesh points to represent the same image area, resulting in larger input dimensions [61]. Although triangular meshes, compared to rectangular meshes, can better adapt to irregular shapes and exhibit greater flexibility in handling complex image structures [62,63], the special interpolation between triangular meshes and the relationships between neighboring triangles may result in the need for more mesh points to represent the same image area, thus increasing redundancy [64]. Additionally, when dealing with image boundaries, triangular meshes may encounter issues of discontinuity or lack of smoothness at the borders.

5. Conclusions and Future Work

This paper proposes a method for rectifying irregularly stitched images using de-formable meshes and residual networks. The approach involves predicting initial mesh models for irregular images using three types of shapes: triangles, rectangles, and regular hexagons. The selection of different meshes can be dynamically adjusted based on the requirements of predicting rigid structures or actual image content. The predicted mesh model, predefined mesh model, and irregular input image are jointly input into a width residual network for rectification regression. The loss function comprises local and global loss terms, ensuring that the loss of image information within the mesh and global contextual information is minimized. The final output rectifies the irregularly stitched image into a rectangularized image. The generated rectangularized image not only reduces the information loss in image deformation, but can also be adapted according to different actual input image structures, which further improves the effect of image rectangularization. The image rectangularization has significant advantages in the practical application of image stitching, including simplifying the processing flow, increasing accuracy and stability, and improving the accuracy of stitching results. This enables it to have a wide range of applications in fields such as geographic information systems and medical image processing.
There is still room for further improvement and optimization of the algorithms in this paper. When constructing non-traditional rectangular grids such as triangles or hexagons, the complex calculations require further optimization. For example, calculating the relative positions or distances between neighboring cells may involve more complex geometric operations. Compared to rectangular grids, triangle and hexagonal grids have more complex vertex coordinates and connectivity, and they typically require more storage to represent the same regions. In addition, using neural network structures in deep learning networks for non-rectangular meshes may require more parameters and more complex processing layers, which may lead to slower training and inference. In the future, we will continue to investigate and improve the efficiency of these areas.

Author Contributions

Conceptualization, Y.F. and S.M.; Data curation, Y.F.; Formal analysis, Z.W. and B.L.; Funding acquisition, S.M.; Methodology, Y.F.; Project administration, S.M.; Resources, Y.F.; Software, Y.F.; Supervision, S.M.; Validation, Y.F., M.L. and J.K.; Visualization, S.M.; Writing—original draft, Y.F.; Writing—review and editing, Y.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program for the 14th Five-Year Plan (Prevention and Control of Major Natural Disasters and Public Security, Shanjun Mao et al.), grant number SQ2022YFC3000083.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

We would like to express our gratitude to many colleagues at Beijing LongRuan Technology Co., Ltd., for their extensive assistance in providing data and hardware support for this experiment.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Figure A1. Randomly selected stitched images from DIR-D test set for testing the RIS-DMRN.
Figure A1. Randomly selected stitched images from DIR-D test set for testing the RIS-DMRN.
Applsci 14 02821 g0a1
Figure A2. Randomly selected stitched images from DIR-D training set for testing the RIS-DMRN.
Figure A2. Randomly selected stitched images from DIR-D training set for testing the RIS-DMRN.
Applsci 14 02821 g0a2

References

  1. Bai, Z.W.; Li, Y.; Chen, X.H.; Yi, T.T.; Wei, W.; Wozniak, M.; Damasevicius, R. Real-Time Video Stitching for Mine Surveillance Using a Hybrid Image Registration Method. Electronics 2020, 9, 1336. [Google Scholar] [CrossRef]
  2. Lo, I.C.; Shih, K.T.; Chen, H.H. Efficient and Accurate Stitching for 360° Dual-Fisheye Images and Videos. IEEE Trans. Image Process. 2022, 31, 251–262. [Google Scholar] [CrossRef]
  3. Yi, S.; Jiang, G.; Liu, X.; Li, J.; Chen, L. TCPMFNet: An infrared and visible image fusion network with composite auto encoder and transformer-convolutional parallel mixed fusion strategy. Infrared Phys. Technol. 2022, 127, 104405. [Google Scholar] [CrossRef]
  4. Gao, H.; Huang, Z.Q.; Yang, H.P.; Zhang, X.B.; Cen, C. Research on Improved Multi-Channel Image Stitching Technology Based on Fast Algorithms. Electronics 2023, 12, 1700. [Google Scholar] [CrossRef]
  5. Li, Y.; Liu, G.; Bavirisetti, D.P.; Gu, X.; Zhou, X. Infrared-visible image fusion method based on sparse and prior joint saliency detection and LatLRR-FPDE. Digit. Signal Process. 2023, 134, 103910. [Google Scholar] [CrossRef]
  6. Madhusudana, P.C.; Soundararajan, R. Subjective and Objective Quality Assessment of Stitched Images for Virtual Reality. IEEE Trans. Image Process. 2019, 28, 5620–5635. [Google Scholar] [CrossRef] [PubMed]
  7. Wang, L.; Yu, W.; Li, B. Multi-scenes Image Stitching Based on Autonomous Driving. In Proceedings of the 4th IEEE Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 12–14 June 2020; pp. 694–698. [Google Scholar]
  8. Kinzig, C.; Cortés, I.; Fernández, C.; Lauer, M. Real-time Seamless Image Stitching in Autonomous Driving. In Proceedings of the 25th International Conference of Information Fusion (FUSION), Linkoping, Sweden, 4–7 July 2022. [Google Scholar]
  9. Liao, T.L.; Li, N. Single-Perspective Warps in Natural Image Stitching. IEEE Trans. Image Process. 2020, 29, 724–735. [Google Scholar] [CrossRef] [PubMed]
  10. Fu, Y.; Aldrich, C. Flotation froth image recognition with convolutional neural networks. Miner. Eng. 2019, 132, 183–190. [Google Scholar] [CrossRef]
  11. Shi, J.Y.; Dang, J.; Zuo, R.Z. Bridge damage cropping-and-stitching segmentation using fully convolutional network based on images from UAVs. In Proceedings of the 10th International Conference on Bridge Maintenance, Safety and Management (IABMAS), Sapporo, Japan, 11–18 April 2021; pp. 264–270. [Google Scholar]
  12. Zhang, Y.; Li, X.; Li, X. Reinforcement learning cropping method based on comprehensive feature and aesthetics assessment. IET Image Process. 2022, 16, 1415–1423. [Google Scholar] [CrossRef]
  13. Cui, J.G.; Liu, M.; Zhang, Z.T.; Yang, S.Q.; Ning, J.F. Robust UAV Thermal Infrared Remote Sensing Images Stitching via Overlap-Prior-Based Global Similarity Prior Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 270–282. [Google Scholar] [CrossRef]
  14. Jinhao, L.I.; Cheng, J.; Zhang, Y.; Jia, B.; Zou, H.; Zhang, Z.; Jing, X.U. Underwater laser positioning of targets outside the field of view based on a binocular vision. Appl. Opt. 2023, 62, 7354–7361. [Google Scholar]
  15. Li, Z.; Xu, Z.Q.J.; Luo, T.; Wang, H. A regularised deep matrix factorised model of matrix completion for image restoration. IET Image Process. 2022, 16, 3212–3224. [Google Scholar] [CrossRef]
  16. Sari, I.N.; Du, W.W. Structure-Texture Consistent Painting Completion for Artworks. IEEE Access 2023, 11, 27369–27381. [Google Scholar] [CrossRef]
  17. Xu, S.Z.; Zhu, Q.; Wang, J. Generative image completion with image-to-image translation. Neural Comput. Appl. 2020, 32, 7333–7345. [Google Scholar] [CrossRef]
  18. Kapoor, N.; Lynch, E.A.; Lacson, R.; Flash, M.J.E.; Guenette, J.P.; Desai, S.P.; Eappen, S.; Khorasani, R. Predictors of Completion of Clinically Necessary Radiologist Recommended Follow-Up Imaging: Assessment Using an Automated Closed- Loop Communication and Tracking Tool. Am. J. Roentgenol. 2023, 220, 429–440. [Google Scholar] [CrossRef]
  19. Xu, J.W.; Li, F.; Shao, C.C.; Li, X.L. Face Completion Based on Symmetry Awareness with Conditional GAN. Symmetry 2023, 15, 663. [Google Scholar] [CrossRef]
  20. Wang, H.T.; Guo, E.T.; Chen, F.; Chen, P.P. Depth Completion in Autonomous Driving: Adaptive Spatial Feature Fusion and Semi-Quantitative Visualization. Appl. Sci. 2023, 13, 9804. [Google Scholar] [CrossRef]
  21. Nie, L.; Lin, C.Y.; Liao, K.; Liu, S.C.; Zhao, Y. Deep Rectangling for Image Stitching: A Learning Baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 5730–5738. [Google Scholar]
  22. Zhu, S.D.; Zhang, Y.Z.; Zhang, J.; Hu, H.; Zhang, Y.Z. ISGTA: An effective approach for multi-image stitching based on gradual transformation matrix. Signal Image Video Process. 2023, 17, 3811–3820. [Google Scholar] [CrossRef]
  23. Liu, S.G.; Chai, Q.P. Shape-Optimizing and Illumination-Smoothing Image Stitching. IEEE Trans. Multimed. 2019, 21, 690–703. [Google Scholar] [CrossRef]
  24. Dou, Z.; Ma, X.; Xie, X.; Liu, H.; Guo, C. A hybrid method of detecting flame from video stream. IET Image Process. 2022, 16, 2937–2946. [Google Scholar] [CrossRef]
  25. Wang, J.L.; Ma, M.X.; Yan, S.; Zhang, J. Image stitching using double features-based global similarity constraint and improved seam-cutting. Multimed. Tools Appl. 2023, 83, 7363–7378. [Google Scholar] [CrossRef]
  26. He, K.M.; Chang, H.W.; Sun, J. Rectangling Panoramic Images via Warping. ACM Trans. Graph. 2013, 32, 1–10. [Google Scholar] [CrossRef]
  27. Li, D.P.; He, K.M.; Sun, J.; Zhou, K. Geodesic-Preserving Method for Image Warping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 213–221. [Google Scholar]
  28. Arai, K. Modified Seam Carving by Changing Resizing Depending on the Object Size in Time and Space Domains. Int. J. Adv. Comput. Sci. Appl. 2019, 10, 143–150. [Google Scholar] [CrossRef]
  29. Patel, D.; Shanmuganathan, S.; Raman, S. Adaptive Multiple-pixel Wide Seam Carving. In Proceedings of the 25th National Conference on Communications (NCC), Bangalore, India, 20–23 February 2019. [Google Scholar]
  30. Wei, J.-D.; Cheng, H.-J.; Chang, C.-W. Hopfield network-based approach to detect seam-carved images and identify tampered regions. Neural Comput. Appl. 2019, 31, 6479–6492. [Google Scholar] [CrossRef]
  31. Nam, S.-H.; Ahn, W.; Yu, I.-J.; Kwon, M.-J.; Son, M.; Lee, H.-K. Deep Convolutional Neural Network for Identifying Seam-Carving Forgery. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 3308–3326. [Google Scholar] [CrossRef]
  32. Xia, M.; Chen, J.; Yang, G.; Wang, S. Robust detection of seam carving with low ratio via pixel adjacency subtraction and CNN-based transfer learning. J. Inf. Secur. Appl. 2023, 75, 103522. [Google Scholar] [CrossRef]
  33. Huang, Z.; Lin, J.; Yang, H.; Wang, H.; Bai, T.; Liu, Q.; Pang, Y. An Algorithm Based on Text Position Correction and Encoder-Decoder Network for Text Recognition in the Scene Image of Visual Sensors. Sensors 2020, 20, 2942. [Google Scholar] [CrossRef] [PubMed]
  34. Zheng, Y.; Li, S. Two-stage Parallax Correction and Multi-stage Cross-view Fusion Network Based Stereo Image Super-Resolution. In Proceedings of the IEEE International Conference on Visual Communications and Image Processing (VCIP)-Visual Communications in the Era of AI and Limited Resources, Munich, Germany, 5–8 December 2021. [Google Scholar]
  35. Lee, Y.; Lee, J.; Ahn, H.; Jeon, M. SNIDER: Single Noisy Image Denoising and Rectification for Improving License Plate Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1017–1026. [Google Scholar]
  36. Talat, R.; Muzammal, M.; Shan, R. A decentralised approach to scene completion using distributed feature hashgram. Multimed. Tools Appl. 2020, 79, 9799–9817. [Google Scholar] [CrossRef]
  37. Kim, B.; Lee, D.; Min, K.; Chong, J.; Joe, I. Global Convolutional Neural Networks with Self-Attention for Fisheye Image Rectification. IEEE Access 2022, 10, 129580–129587. [Google Scholar] [CrossRef]
  38. Wang, Z.H.; Tang, Z.J.; Huang, J.K.; Li, J.D. A real-time correction and stitching algorithm for underwater fisheye images. Signal Image Video Process. 2022, 16, 1783–1791. [Google Scholar] [CrossRef]
  39. Yao, X.C.; Yu, G.J.; Li, G.Q.; Yan, S.; Zhao, L.; Zhu, D.H. HexTile: A Hexagonal DGGS-Based Map Tile Algorithm for Visualizing Big Remote Sensing Data in Spark. ISPRS Int. J. Geo-Inf. 2023, 12, 89. [Google Scholar] [CrossRef]
  40. Zeng, W.B.; Deng, Q.Y.; Zhao, X.Y.; Li, D.H.; Min, X.R. A method for stitching remote sensing images with Delaunay triangle feature constraints. Geocarto Int. 2023, 38, 2285356. [Google Scholar] [CrossRef]
  41. Li, Y.Q.; Luo, T.; Yip, N.K. Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH). Csiam Trans. Appl. Math. 2022, 3, 692–760. [Google Scholar] [CrossRef]
  42. Shi, J.; Li, Z.; Ying, S.H.; Wang, C.F.; Liu, Q.P.; Zhang, Q.; Yan, P.K. MR Image Super-Resolution via Wide Residual Networks with Fixed Skip Connection. IEEE J. Biomed. Health Inform. 2019, 23, 1129–1140. [Google Scholar] [CrossRef] [PubMed]
  43. Chen, Z.Y.; Wang, Y.H.; Wu, J.; Deng, C.; Jiang, W.X. Wide Residual Relation Network-Based Intelligent Fault Diagnosis of Rotating Machines with Small Samples. Sensors 2022, 22, 4161. [Google Scholar] [CrossRef] [PubMed]
  44. Erfurt, J.; Helmrich, C.R.; Bosse, S.; Schwarz, H.; Marpe, D.; Wiegand, T. A study of the perceptually weighted peak signal-to-noise ratio (WPSNR) for image compression. In Proceedings of the 26th IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 2339–2343. [Google Scholar]
  45. Wang, Z.B.; Yang, Z.K. Seam elimination based on Curvelet for image stitching. Soft Comput. 2019, 23, 5065–5080. [Google Scholar] [CrossRef]
  46. Setiadi, D.R.I.M. PSNR vs. SSIM: Imperceptibility quality assessment for image steganography. Multimed. Tools Appl. 2021, 80, 8423–8444. [Google Scholar] [CrossRef]
  47. Li, T.; Huang, J.; Zhang, Y.; Liu, S. A Geometric Distortion Rectification Method for Wide Field of View Camera. In Proceedings of the International Conference on Electronical, Mechanical and Materials Engineering (ICE2ME), Wuhan, China, 20–21 January 2019; pp. 27–30. [Google Scholar]
  48. Liao, Q.; Lin, Q.; Jin, L.; Luo, C.; Zhang, J.; Peng, D.; Wang, T. A Multi-level Progressive Rectification Mechanism for Irregular Scene Text Recognition. In Proceedings of the 16th IAPR International Conference on Document Analysis and Recognition (ICDAR), Lausanne, Switzerland, 5–10 September 2021; pp. 140–155. [Google Scholar]
  49. Zhang, Y.; Wang, T.; Zheng, T.J.; Zhang, Y.S.; Li, L.; Yu, Y.; Li, L. On-Orbit Geometric Calibration and Performance Validation of the GaoFen-14 Stereo Mapping Satellite. Remote Sens. 2023, 15, 4256. [Google Scholar] [CrossRef]
  50. Venkatanath, N.; Praneeth, D.; Bh, M.C.; Channappayya, S.S.; Medasani, S.S. Blind image quality evaluation using perception based features. In Proceedings of the 21st National Conference on Communications (NCC), Bombay, India, 27 February–1 March 2015. [Google Scholar]
  51. Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “Completely Blind” Image Quality Analyzer. IEEE Signal Process. Lett. 2013, 20, 209–212. [Google Scholar] [CrossRef]
  52. Yelmanov, S.; Romanyshyn, Y. Quantifying the contrast of objects in a complex image. In Proceedings of the IEEE 40th International Conference on Electronics and Nanotechnology (ELNANO), Kyiv, Ukraine, 22–24 April 2020; pp. 541–546. [Google Scholar]
  53. Wu, J.; Wang, F.-X.; Chen, W.; Yin, Z.-Q.; Wang, S.; Wang, Z.-G.; Lan, S.-H.; Han, Z.-F. General temporal ghost imaging model with detection resolution and noise. Appl. Opt. 2023, 62, 1175–1182. [Google Scholar] [CrossRef]
  54. Ma, X.; Li, Y.; Yao, T. Image Completion Based on Edge Prediction and Improved Generator. Teh. Vjesn.-Tech. Gaz. 2021, 28, 1590–1596. [Google Scholar] [CrossRef]
  55. Zhao, P.; Hu, Q.; Tang, Z.; Ai, M. A Smooth Transition Algorithm for Adjacent Panoramic Viewpoints Using Matched Delaunay Triangular Patches. ISPRS Int. J. Geo-Inf. 2020, 9, 596. [Google Scholar] [CrossRef]
  56. Zhao, D.; Li, J.; Tan, F.; Zeng, C.; Ji, Y. Remote Sensing Image Mosaic Based on Distribution Measure and Saliency Information. Laser Optoelectron. Prog. 2022, 59, 0410007. [Google Scholar] [CrossRef]
  57. Wang, Z.; Shen, H.; Huang, K. Taming Vector-Wise Quantization for Wide-Range Image Blending with Smooth Transition. In Proceedings of the 1st International Workshop on Multimedia Content Generation and Evaluation-New Methods and Practice (McGE), Ottawa, ON, Canada, 29 October 2023; pp. 67–74. [Google Scholar]
  58. Stulpinas, R.; Morkunas, M.; Rasmusson, A.; Drachneris, J.; Augulis, R.; Gulla, A.; Strupas, K.; Laurinavicius, A. Improving HCC Prognostic Models after Liver Resection by AI-Extracted Tissue Fiber Framework Analytics. Cancers 2024, 16, 106. [Google Scholar] [CrossRef]
  59. Tang, C.; Wu, D.T.; Quek, S.S. Isotropic discretization methods of Laplacian and generalized divergence operators in phase field models. Comput. Mater. Sci. 2024, 233, 112688. [Google Scholar] [CrossRef]
  60. Cedillo-Servin, G.; Dahri, O.; Meneses, J.; van Duijn, J.; Moon, H.; Sage, F.; Silva, J.; Pereira, A.; Magalhaes, F.D.; Malda, J.; et al. 3D Printed Magneto-Active Microfiber Scaffolds for Remote Stimulation and Guided Organization of 3D In Vitro Skeletal Muscle Models. Small 2023, 20, e2307178. [Google Scholar] [CrossRef] [PubMed]
  61. Herzberg, W.; Hauptmann, A.; Hamilton, S.J. Domain independent post-processing with graph U-nets: Applications to electrical impedance tomographic imaging. Physiol. Meas. 2023, 44, 125008. [Google Scholar] [CrossRef] [PubMed]
  62. N’Guyen, F.; Kanit, T.; Maisonneuve, F.; Imad, A. Efficient boundary surface reconstruction from multi-label volumetric data with mathematical morphology. Comput. Graph. 2023, 117, 192–208. [Google Scholar] [CrossRef]
  63. Levy, T.; May, G. Conservative solution transfer between anisotropic meshes for time-accurate hybridized discontinuous Galerkin methods. Int. J. Numer. Methods Fluids 2024, 1–20. [Google Scholar] [CrossRef]
  64. Lemeunier, C.; Denis, F.; Lavoue, G.; Dupont, F. SpecTrHuMS: Spectral transformer for human mesh sequence learning. Comput. Graph. 2023, 115, 191–203. [Google Scholar] [CrossRef]
Figure 1. Comparison of rectification methods for irregular image stitching.
Figure 1. Comparison of rectification methods for irregular image stitching.
Applsci 14 02821 g001
Figure 2. Image rectification with different mesh shape selection modes. (a) Selecting a mesh shape based on the image. Predicting the most suitable mesh shape for an input image through a convolutional neural network. Subsequently, the predicted initial mesh based on this shape and the predefined mesh are jointly input into a residual network for image rectification. (b) Selecting all mesh shapes for loss comparisons. Applying all mesh shapes to the input image and jointly inputting them into a residual network for rectification. The rectified image with the lowest loss is selected as the final output.
Figure 2. Image rectification with different mesh shape selection modes. (a) Selecting a mesh shape based on the image. Predicting the most suitable mesh shape for an input image through a convolutional neural network. Subsequently, the predicted initial mesh based on this shape and the predefined mesh are jointly input into a residual network for image rectification. (b) Selecting all mesh shapes for loss comparisons. Applying all mesh shapes to the input image and jointly inputting them into a residual network for rectification. The rectified image with the lowest loss is selected as the final output.
Applsci 14 02821 g002
Figure 3. Schematic structure of width residual neural network.
Figure 3. Schematic structure of width residual neural network.
Applsci 14 02821 g003
Figure 4. Neural networks for target detection in rigid structures.
Figure 4. Neural networks for target detection in rigid structures.
Applsci 14 02821 g004
Figure 5. Rectification effect of different irregular images under images with more global correlation information. “Reference” in the figure represents the experimental result of image completion.
Figure 5. Rectification effect of different irregular images under images with more global correlation information. “Reference” in the figure represents the experimental result of image completion.
Applsci 14 02821 g005
Figure 6. Rectification effect of different irregular images under images with more local structural information.
Figure 6. Rectification effect of different irregular images under images with more local structural information.
Applsci 14 02821 g006
Figure 7. Trends of evaluation indexes under different combinations of loss functions and mesh shapes. The chart displays from left to right, with the vertical axis indicating SSIM, PSNR, and FID values, respectively, while the horizontal axis represents different combinations of mesh shapes and loss functions. Higher SSIM and PSNR values indicate better image rectification effects, and a lower FID value suggests superior image rectification performance. An upward arrow in the table indicates that the larger its value, the higher the image quality, and vice versa.
Figure 7. Trends of evaluation indexes under different combinations of loss functions and mesh shapes. The chart displays from left to right, with the vertical axis indicating SSIM, PSNR, and FID values, respectively, while the horizontal axis represents different combinations of mesh shapes and loss functions. Higher SSIM and PSNR values indicate better image rectification effects, and a lower FID value suggests superior image rectification performance. An upward arrow in the table indicates that the larger its value, the higher the image quality, and vice versa.
Applsci 14 02821 g007
Table 1. Schematic structure of width residual neural network. In the table, “k” represents the multiplier for the convolutional kernels in the original module, “N” indicates the number of residual modules in that layer, and “B (3,3)” signifies each residual module consisting of two 3 × 3 convolutional layers. The image undergoes processing through a mean pooling layer, resulting in the final output image.
Table 1. Schematic structure of width residual neural network. In the table, “k” represents the multiplier for the convolutional kernels in the original module, “N” indicates the number of residual modules in that layer, and “B (3,3)” signifies each residual module consisting of two 3 × 3 convolutional layers. The image undergoes processing through a mean pooling layer, resulting in the final output image.
TypeBlock Type = B(3,3)Output
conv1[3 × 3, 16]32 × 32
conv2 3   ×   3 ,   16   ×   k 3   ×   3 ,   16   ×   k ×   N32 × 32
conv3 3   ×   3 ,   16   ×   k 3   ×   3 ,   16   ×   k ×   N16 × 16
conv4 3   ×   3 ,   16   ×   k 3   ×   3 ,   16   ×   k ×   N8 × 8
avgpool[8 × 8]1 × 1
Table 2. Quantization comparison of image rectification on DIR-D. Structural Similarity Index (SSIM), Peak Signal-to-Noise Ratio (PSNR), Mean-Square Error (MSE), and Fréchet Inception Distance (FID) are employed to assess image quality from different perspectives. SSIM measures the structural similarity between two images, considering brightness, contrast, and structure. And the SSIM values range from −1 to 1, with 1 indicating identical images. PSNR compares original and processed images by measuring signal-to-noise strength. Higher PSNR values in decibels (dB) indicate better image quality. MSE evaluates the similarity between images by calculating the difference between pixels, with lower values indicating more similar images. FID primarily assesses dissimilarity between generated and real images in terms of distribution. Lower FID values indicate greater similarity in latent space. An upward arrow in the table indicates that the larger its value, the higher the image quality, and vice versa.
Table 2. Quantization comparison of image rectification on DIR-D. Structural Similarity Index (SSIM), Peak Signal-to-Noise Ratio (PSNR), Mean-Square Error (MSE), and Fréchet Inception Distance (FID) are employed to assess image quality from different perspectives. SSIM measures the structural similarity between two images, considering brightness, contrast, and structure. And the SSIM values range from −1 to 1, with 1 indicating identical images. PSNR compares original and processed images by measuring signal-to-noise strength. Higher PSNR values in decibels (dB) indicate better image quality. MSE evaluates the similarity between images by calculating the difference between pixels, with lower values indicating more similar images. FID primarily assesses dissimilarity between generated and real images in terms of distribution. Lower FID values indicate greater similarity in latent space. An upward arrow in the table indicates that the larger its value, the higher the image quality, and vice versa.
Method SSIM   PSNR   MSE   FID  
Reference0.335411.423180.9743.57
RPIW [26]0.380515.032893.1037.51
DRIS [21]0.717321.571796.8321.26
Ours0.723422.651512.7420.05
Table 3. Quantitative comparison of non-referenced assessment indicators. Blind Image Quality Evaluator (BIQUE) and Natural Image Quality Evaluator (NIQE) are “no-reference” metrics designed to assess image quality without an original reference image. BIQUE estimates image quality by considering local contrast, structural information, and global color and brightness variations. NIQE focuses on natural images, assessing quality through the analysis of statistical features such as gradients, luminance, and color distribution. An upward arrow in the table indicates that the larger its value, the higher the image quality, and vice versa.
Table 3. Quantitative comparison of non-referenced assessment indicators. Blind Image Quality Evaluator (BIQUE) and Natural Image Quality Evaluator (NIQE) are “no-reference” metrics designed to assess image quality without an original reference image. BIQUE estimates image quality by considering local contrast, structural information, and global color and brightness variations. NIQE focuses on natural images, assessing quality through the analysis of statistical features such as gradients, luminance, and color distribution. An upward arrow in the table indicates that the larger its value, the higher the image quality, and vice versa.
Method BIQUE   NIQE  
RPIW [26]14.04516.927
DRIS [21]13.79616.421
Label11.01714.763
Ours13.56216.027
Table 4. Influence of different loss functions and mesh shapes on image rectification. SSIM, PSNR, and FID are utilized as components in various combinations. The symbol “𝒲” denotes the inclusion of the current grid shape or loss function in the combination. The “model” column represents different combinations, where L stands for “Localized loss”, G for “Global loss”, T for “Triangle”, R for “Rectangle”, and H for “Hexagon”. Various colors are employed in the table for clear correspondence with the combination methods illustrated in Figure 7. An upward arrow in the table indicates that the larger its value, the higher the image quality, and vice versa.
Table 4. Influence of different loss functions and mesh shapes on image rectification. SSIM, PSNR, and FID are utilized as components in various combinations. The symbol “𝒲” denotes the inclusion of the current grid shape or loss function in the combination. The “model” column represents different combinations, where L stands for “Localized loss”, G for “Global loss”, T for “Triangle”, R for “Rectangle”, and H for “Hexagon”. Various colors are employed in the table for clear correspondence with the combination methods illustrated in Figure 7. An upward arrow in the table indicates that the larger its value, the higher the image quality, and vice versa.
Loss FunctionMesh ShapeModelQuantitative Index
Localized LossGlobal LossTriangleRectangleHexagonColor SSIM   PSNR   FID  
W W R + G0.475315.1274.68
W W R + L0.616918.9624.70
W W W T + G + L0.707120.4622.02
W W W R + G + L0.712621.0521.74
W W W H + G + L0.720321.9720.68
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fan, Y.; Mao, S.; Li, M.; Wu, Z.; Kang, J.; Li, B. Rectification for Stitched Images with Deformable Meshes and Residual Networks. Appl. Sci. 2024, 14, 2821. https://doi.org/10.3390/app14072821

AMA Style

Fan Y, Mao S, Li M, Wu Z, Kang J, Li B. Rectification for Stitched Images with Deformable Meshes and Residual Networks. Applied Sciences. 2024; 14(7):2821. https://doi.org/10.3390/app14072821

Chicago/Turabian Style

Fan, Yingbo, Shanjun Mao, Mei Li, Zheng Wu, Jitong Kang, and Ben Li. 2024. "Rectification for Stitched Images with Deformable Meshes and Residual Networks" Applied Sciences 14, no. 7: 2821. https://doi.org/10.3390/app14072821

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop