Real-Time Panoramic Surveillance Video Stitching Method for Complex Industrial Environments
Abstract
1. Introduction
2. Related Work
3. Methodology
3.1. Overall System Workflow
- For the first frame, image registration and the search for the optimal seam line are performed, and the registration parameters and optimal seam line are saved as a template for use in subsequent frame processing.
- For non-first frames, the previously saved registration parameters and optimal seam line template are directly applied to stitch the images, resulting in registered images.
- If a moving object is detected crossing the seam line, the seam line is recalculated and updated to avoid parallax artifacts in the final panoramic image.
- If no moving objects are detected, the images are directly processed using weighted average fusion to eliminate discontinuities and artifacts in overlapping regions.
3.2. Attention-Enhanced Unsupervised Homography Estimation for Image Registration
3.2.1. Network Architecture for Unsupervised Homography Estimation
- The two input images are first resized to pixels to standardize the spatial resolution for subsequent processing.
- Both images are then fed into a Siamese (shared-weight) feature extractor, which produces a two-level feature pyramid with 1/8-resolution feature maps and 1/16-resolution feature maps .
- At the coarse scale, and are passed to the global correlation layer to construct a dense correlation volume by computing cross-correlations between local feature blocks. This correlation volume is flattened and input to the regression network to predict eight corner displacements , which are converted by the TensorDLT layer into the initial homography matrix .
- The initial homography is used by the Transformer-based warping module to warp the 1/8-resolution target feature map , producing the warped feature map warped .
- At the fine scale, the pair is re-fed into the global correlation layer, and the same correlation–regression–TensorDLT pipeline is applied again to obtain the refined homography matrix .
- Finally, the refined homography is applied by the Transformer-based warping module to the original reference and target images, yielding the final geometrically registered image pair.
3.2.2. Siamese-Based Feature Extraction Layer
3.2.3. Loss Function
- (1)
- Similarity Loss (). The similarity loss is designed to minimize the photometric difference between the aligned images in the overlapping region. Let and denote the reference and target images, respectively, and let be the warping operation using the estimated deformation field. We denote by M a binary mask with the same spatial size as , where if pixel p lies in the valid overlapping region and otherwise. Instead of using a complex cross-correlation measure, we adopt the standard element-wise distance to enforce pixel-wise consistency:
- (2)
- Smoothness Loss (). To prevent structural distortion and mesh folding in non-overlapping regions, we introduce a geometric smoothness loss on the estimated deformation field. Following the grid-based regularization strategy commonly used in recent registration networks [14], this term penalizes abrupt changes of mesh edge directions and spacing so that neighbouring grids deform coherently without fold-overs. We explicitly formulate as the sum of an inter-grid term and an intra-grid term:
3.3. Motion-Aware Graph Cut Fusion for Optimal Seam Selection
3.3.1. Principle of Graph Cut–Based Seam Search
3.3.2. Energy Function Formulation
- smoothness term
- gradient term
- data-domain constraint
- motion-region constraint
- Data-domain constraint
- 2.
- Smoothness term
- The smoothness term guides the seam to pass through visually homogeneous regions by penalizing label discontinuities between neighbouring pixels:Here, measures the color difference between the two input images at a given pixel location. Let and denote the intensity functions of the first and second images, respectively. The color-difference measure is defined asThus, larger appearance differences at lead to higher penalties when the labels and are different, encouraging the seam to pass through visually smooth regions.
- 3.
- Gradient term
- The gradient term is intended to distinguish foreground objects from the background according to the geometric structure of the scene. By incorporating gradient information between pixels, it enhances the edge-detection capability of the seam search and reduces the likelihood that the optimal seam passes through salient objects (e.g., buildings), preferring background regions instead. The gradient term is defined aswhere is an edge-weight map computed from the image gradients:with denoting the sigmoid function. Let and denote the intensity functions of the first and second images, respectively. For a pixel p with label , we write for the corresponding intensity function, i.e., if and if . Using this notation, the horizontal and vertical gradient energies are given bywhere ∗ denotes convolution, and and are the Sobel operators in the horizontal and vertical directions, respectively, with kernels
- 4.
- Motion-region constraint
- The motion-region constraint separates moving foreground objects from the static background so that the optimal seam prefers to traverse background regions. Let denote the moving-object region detected in the overlapping area. The corresponding indicator function is defined asIn other words, any labeling whose seam passes through incurs an infinite energy and is therefore infeasible. When is incorporated into the energy formulation, this is equivalent to restricting the optimization domain from to ∖, as reflected in Equation (7).
4. Experimental Results and Analysis
4.1. Image Registration Experiments and Analysis
4.1.1. Experimental Setup and Training Procedure
4.1.2. Registration Performance Evaluation
- 0–30%: Low overlap rate, typically corresponding to large-parallax scenes that are most challenging for registration.
- 30–60%: Medium overlap rate representing standard registration scenarios.
- 60–100%: High overlap rate where images share most of the content.
- (1)
- Root Mean Square Error (RMSE)
- The RMSE was reduced by approximately 77.06% compared to the SIFT + RANSAC method.
- The RMSE was reduced by approximately 7.1% compared to the UDIS algorithm.
- The RMSE was reduced by approximately 2.19% compared to the NIS algorithm, although it marginally underperforms the UDIS++ algorithm by 6.51%.
- (2)
- Peak Signal-to-Noise Ratio (PSNR)
- PSNR increased by 4.367 dB compared to the SIFT + RANSAC method.
- PSNR increased by 1.951 dB compared to the UDIS algorithm.
- PSNR increased by 0.973 dB compared to the NIS algorithm, although it marginally underperforms the UDIS++ algorithm by 1.12 dB.
- (3)
- Structural Similarity Index Measure (SSIM)
- SSIM increased by 0.1669 compared to the SIFT + RANSAC method.
- SSIM increased by 0.0724 compared to the UDIS algorithm.
- SSIM increased by 0.0143 compared to the NIS algorithm, although it marginally underperforms the UDIS++ algorithm by 0.0297.
- (4)
- Runtime and Overall Conclusion
4.2. Image Fusion Evaluation
4.2.1. Fusion Performance in Dynamic Scenes
4.2.2. Real-World Industrial Video Stitching
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yoon, J.; Lee, D. Real-time video stitching using camera path estimation and homography refinement. Symmetry 2017, 10, 4. [Google Scholar] [CrossRef]
- He, B.; Yu, S. Parallax-robust surveillance video stitching. Sensors 2015, 16, 7. [Google Scholar] [CrossRef]
- Huangfu, Y.; Chen, H.; Huang, Z.; Li, W.; Shi, J.; Yang, L. Research on a Panoramic Image Stitching Method for Images of Corn Ears, Based on Video Streaming. Agronomy 2024, 14, 2884. [Google Scholar] [CrossRef]
- Chen, Q.; Chen, J.; Sun, K.; Huang, M.; Chen, G.; Liu, H. A Parallel-Optimized Visualization Method for Large-Scale Multiple Video-Augmented Geographic Scenes on Cesium. ISPRS Int. J. Geo-Inf. 2024, 13, 463. [Google Scholar] [CrossRef]
- Jia, Y.; Wang, R.; Jiang, X. A real-time image stitching and fusion algorithm circuit design based on FPGA. Electronics 2024, 13, 271. [Google Scholar] [CrossRef]
- Wu, H.; Bao, C.; Hao, Q.; Cao, J.; Zhang, L. Improved Unsupervised Stitching Algorithm for Multiple Environments SuperUDIS. Sensors 2024, 24, 5352. [Google Scholar] [CrossRef]
- Trigka, M.; Dritsas, E. A comprehensive survey of deep learning approaches in image processing. Sensors 2025, 25, 531. [Google Scholar] [CrossRef]
- Wang, Y.; Yu, M.; Jiang, G.; Pan, Z.; Lin, J. Image registration algorithm based on convolutional neural network and local homography transformation. Appl. Sci. 2020, 10, 732. [Google Scholar] [CrossRef]
- Du, X.; Zheng, L.; Zhu, J.; Cen, H.; He, Y. Evaluation of Mosaic Image Quality and Analysis of Influencing Factors Based on UAVs. Drones 2024, 8, 143. [Google Scholar] [CrossRef]
- Wang, C.; Wang, H.; Han, Q.; Zhang, Z.; Kong, D.; Zou, X. Strawberry detection and ripeness classification using yolov8+ model and image processing method. Agriculture 2024, 14, 751. [Google Scholar] [CrossRef]
- Sibilano, E.; Delprete, C.; Marvulli, P.M.; Brunetti, A.; Marino, F.; Lucarelli, G.; Battaglia, M.; Bevilacqua, V. Deep Learning Strategies for Semantic Segmentation in Robot-Assisted Radical Prostatectomy. Appl. Sci. 2025, 15, 10665. [Google Scholar] [CrossRef]
- Zhang, J.; Wang, C.; Liu, S.; Jia, L.; Ye, N.; Wang, J.; Zhou, J.; Sun, J. Content-aware unsupervised deep homography estimation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 653–669. [Google Scholar]
- Song, D.Y.; Um, G.M.; Lee, H.K.; Cho, D. End-to-end image stitching network via multi-homography estimation. IEEE Signal Process. Lett. 2021, 28, 763–767. [Google Scholar] [CrossRef]
- Nie, L.; Lin, C.; Liao, K.; Liu, S.; Zhao, Y. Unsupervised deep image stitching: Reconstructing stitched features to images. IEEE Trans. Image Process. 2021, 30, 6184–6197. [Google Scholar] [CrossRef]
- Zhou, S.; Wang, L.; Chen, Z.; Zheng, H.; Lin, Z.; He, L. An improved YOLOv9s algorithm for underwater object detection. J. Mar. Sci. Eng. 2025, 13, 230. [Google Scholar] [CrossRef]
- Ni, J.; Li, Y.; Ke, C.; Zhang, Z.; Cao, W.; Yang, S.X. A fast unsupervised image stitching model based on homography estimation. IEEE Sens. J. 2024, 24, 29452–29467. [Google Scholar]
- Zhang, T.; Yin, Q.; Li, S.; Guo, T.; Fan, Z. An Optimized Genetic Algorithm-Based Wavelet Image Fusion Technique for PCB Detection. Appl. Sci. 2025, 15, 3217. [Google Scholar] [CrossRef]
- Chen, J.; Luo, Y.; Wang, J.; Tang, H.; Tang, Y.; Li, J. Elimination of irregular boundaries and seams for UAV image stitching with a diffusion model. Remote Sens. 2024, 16, 1483. [Google Scholar] [CrossRef]
- Zhang, J.; Gao, Y.; Xu, Y.; Huang, Y.; Yu, Y.; Shu, X. A simple yet effective image stitching with computational suture zone. Vis. Comput. 2023, 39, 4915–4928. [Google Scholar]
- Wang, Z.; Zhang, H.W.; Dai, Y.Q.; Cui, K.; Wang, H.; Chee, P.W.; Wang, R.F. Resource-Efficient Cotton Network: A Lightweight Deep Learning Framework for Cotton Disease and Pest Classification. Plants 2025, 14, 2082. [Google Scholar] [CrossRef]
- Caparas, A.; Fajardo, A.; Medina, D. Feature-based automatic image stitching using SIFT, KNN and RANSAC. Int. J. Adv. Trends Comput. Sci. Eng 2020, 9, 96–101. [Google Scholar]
- Nie, L.; Lin, C.; Liao, K.; Liu, S.; Zhao, Y. Parallax-tolerant unsupervised deep image stitching. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 7399–7408. [Google Scholar]
- Kim, M.; Lee, J.; Lee, B.; Im, S.; Jin, K.H. Implicit neural image stitching with enhanced and blended feature reconstruction. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2024; pp. 4087–4096. [Google Scholar]
- Chang, J.; Li, Q.; Liang, Y.; Zhou, L. SC-AOF: A sliding camera and asymmetric optical-flow-based blending method for image stitching. Sensors 2024, 24, 4035. [Google Scholar] [CrossRef]
- Nghonda Tchinda, E.; Panoff, M.K.; Tchuinkou Kwadjo, D.; Bobda, C. Semi-supervised image stitching from unstructured camera arrays. Sensors 2023, 23, 9481. [Google Scholar] [CrossRef] [PubMed]
- Liu, Q.; Huo, J.; Tang, X.; Xue, M. Image stitching method for CMOS grayscale cameras in industrial applications. Opt. Laser Technol. 2025, 181, 111874. [Google Scholar] [CrossRef]
- Huang, X.; Tang, R.; Zhou, Y.; Yin, H.; Yan, C. DSP-based parallel optimization for real-time video stitching. J. Real-Time Image Process. 2023, 20, 28. [Google Scholar] [CrossRef]
- Du, C.; Yuan, J.; Dong, J.; Li, L.; Chen, M.; Li, T. GPU based parallel optimization for real time panoramic video stitching. Pattern Recognit. Lett. 2020, 133, 62–69. [Google Scholar] [CrossRef]
- Hu, K.C.; Lin, F.Y.; Chien, C.C.; Tsai, T.S.; Hsia, C.H.; Chiang, J.S. Panoramic image stitching system for automotive applications. In Proceedings of the 2014 IEEE International Conference on Consumer Electronics-Taiwan, Taipei, Taiwan, 26–28 May 2014; IEEE: New York, NY, USA, 2014; pp. 203–204. [Google Scholar]
- Zhu, A.; Zhang, L.; Chen, J.; Zhou, Y. Pedestrian-aware panoramic video stitching based on a structured camera array. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2021, 17, 1–24. [Google Scholar] [CrossRef]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 13713–13722. [Google Scholar]
- Nie, L.; Lin, C.; Liao, K.; Liu, M.; Zhao, Y. A view-free image stitching network based on global homography. J. Vis. Commun. Image Represent. 2020, 73, 102950. [Google Scholar] [CrossRef]
- Chen, K.; Wang, M. Image stitching algorithm research based on OpenCV. In Proceedings of the 33rd Chinese Control Conference, Nanjing, China, 28–30 July 2014; IEEE: New York, NY, USA, 2014; pp. 7292–7297. [Google Scholar]
- Shamir, A.; Avidan, S. Seam carving for content-aware image resizing. In Proceedings of the Proceeding SIGGRAPH, San Diego, CA, USA, 5–9 August 2007; pp. 341–350. [Google Scholar]
- Kwatra, V.; Schödl, A.; Essa, I.; Turk, G.; Bobick, A. Graphcut textures: Image and video synthesis using graph cuts. Acm Trans. Graph. (tog) 2003, 22, 277–286. [Google Scholar]















| Method | Root Mean Square Error (RMSE) | |||
|---|---|---|---|---|
| 0∼30% | 30∼60% | 60∼100% | Average | |
| 15.13 | 18.32 | 21.53 | 18.647 | |
| SIFT + RANSAC | 0.76 | 1.36 | 19.83 | 8.568 |
| UDIS | 1.28 | 1.57 | 3.15 | 2.115 |
| UDIS++ | 1.09 | 1.38 | 2.76 | 1.845 |
| NIS | 1.20 | 1.47 | 3.02 | 2.009 |
| Proposed Method | 1.14 | 1.45 | 2.97 | 1.965 |
| Method | Peak Signal-to-Noise Ratio (PSNR/dB) | |||
|---|---|---|---|---|
| 0∼30% | 30∼60% | 60∼100% | Average | |
| 15.62 | 12.53 | 10.45 | 12.625 | |
| SIFT + RANSAC | 24.85 | 21.96 | 17.32 | 20.971 |
| UDIS | 26.91 | 23.78 | 20.45 | 23.387 |
| UDIS++ | 31.16 | 26.86 | 22.63 | 26.458 |
| NIS | 28.33 | 25.06 | 21.32 | 24.545 |
| Proposed Method | 30.02 | 25.73 | 21.57 | 25.338 |
| Method | Peak Signal-to-Noise Ratio (PSNR/dB) | |||
|---|---|---|---|---|
| 0∼30% | 30∼60% | 60∼100% | Average | |
| 0.3736 | 0.1586 | 0.0658 | 0.1860 | |
| SIFT+RANSAC | 0.8143 | 0.7261 | 0.5145 | 0.6679 |
| UDIS | 0.8945 | 0.7992 | 0.6357 | 0.7624 |
| UDIS++ | 0.9573 | 0.8915 | 0.7791 | 0.8663 |
| NIS | 0.9037 | 0.8638 | 0.7302 | 0.8223 |
| Proposed Method | 0.9312 | 0.8728 | 0.7385 | 0.8366 |
| Method | Runtime (ms) |
|---|---|
| UDIS | 122 |
| UDIS++ | 86 |
| NIS | 75 |
| Proposed Method | 53 |
| Method | Algorithm Runtime (ms) | Avg. FPS | ||
|---|---|---|---|---|
| Search | Update | Fusion | ||
| Weighted Averaging | – | – | 35 | 28 |
| Dynamic Programming | 253 | 18 | 24 | 23 |
| Traditional Graph Cut | 107 | 17 | 24 | 25 |
| Proposed Method | 332 | 17 | 24 | 23 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhu, J.; Guo, J.; Ding, K.; Wang, G.; Zhou, Y.; Li, W. Real-Time Panoramic Surveillance Video Stitching Method for Complex Industrial Environments. Sensors 2026, 26, 186. https://doi.org/10.3390/s26010186
Zhu J, Guo J, Ding K, Wang G, Zhou Y, Li W. Real-Time Panoramic Surveillance Video Stitching Method for Complex Industrial Environments. Sensors. 2026; 26(1):186. https://doi.org/10.3390/s26010186
Chicago/Turabian StyleZhu, Jiuteng, Jianyu Guo, Kailun Ding, Gening Wang, Youxuan Zhou, and Wenhong Li. 2026. "Real-Time Panoramic Surveillance Video Stitching Method for Complex Industrial Environments" Sensors 26, no. 1: 186. https://doi.org/10.3390/s26010186
APA StyleZhu, J., Guo, J., Ding, K., Wang, G., Zhou, Y., & Li, W. (2026). Real-Time Panoramic Surveillance Video Stitching Method for Complex Industrial Environments. Sensors, 26(1), 186. https://doi.org/10.3390/s26010186

