Cross-Modal Image Registration via Rasterized Parameter Prediction for Object Tracking
Abstract
:1. Introduction
- We propose a cross-modal image registration method that solves the problem of pixel coordinate misalignment between visible-light and infrared images.
- We propose a deep learning-based multi-modality and multi-scale feature transformation parameter prediction method.
- We introduce a contrastive learning-based cross-modal similarity measurement model for semi-supervised image alignment.
2. Related Work
2.1. Area-Based Image Registration
2.2. Feature Extraction-Based Image Registration
2.3. Deep Learning-Based Image Registration
3. Methodology
- The FEM uses a VGG convolutional neural network to encode reference image (visible light) and moving image (infrared) feature pyramids.
- The RPPM predicts a homography matrix for each raster of the feature pyramid layers and outputs homography matrix groups for each image.
- The ITM conducts perspective transformation for the homography matrix group of the moving images.
- The SMM computes the similarity scores of the two images as a cost function for image registration model training.
3.1. Feature Extraction
3.2. Registration Parameter Prediction
Algorithm 1 The registration parameter prediction algorithm |
Input: |
Output: |
Hyperparameter: |
|
3.3. Image Transformation
3.4. Similarity Measurement
3.5. Image Registration
4. Experiment Settings
4.1. Datasets and Parameter Settings
4.2. Competitors
- MI (2008) [31], which uses mutual information for image similarity measurement.
- UDHN+CSM (2009) [56], which combines UDHN with a real-time correlative scan matching algorithm.
- SIFT+RANSAC (2015) [20], which combines SIFT feature extraction with the RANSAC outlier removal algorithm.
- UDHN (2018) [26], which uses a convolutional neural network to predict feature point pairs and applies mean absolute error metrics for cost function computation.
- VFIS (2020) [27], which applies Cost Volume for multimodal image feature fusion.
4.3. Evaluation Metric
- Correlation Coefficient (CC): describes the linear correlation between images; a larger correlation coefficient between the target and reference images indicates higher similarity between the two, i.e., a better registration result, and is calculated as follows:
- Structural Similarity Index Measure (SSIM): constructs the quality loss and distortion of images, including the correlation loss, grayscale distortion, and contrast distortion; a larger SSIM indicates that the quality of the target image is closer to that of the reference image. SSIM is calculated as follows:
- Peak Signal-to-Noise Ratio (PSNR): computes the ratio of peak energy to noise energy for measurement of distortion in the image registration process; the larger the PSNR value, the closer the target image is to the reference image. PSNR is calculated as follows:
- Mutual Information (MI): describes the mutual information contained in each image. The higher the similarity or overlap between two images, the higher their correlation, and the smaller their joint entropy, which means that the mutual information is greater. MI is calculated as follows:
5. Results and Analysis
5.1. Overall Results
5.2. Visualization
5.3. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ma, J.; Ma, Y.; Li, C. Infrared and visible image fusion methods and applications: A survey. Inf. Fusion 2019, 45, 153–178. [Google Scholar] [CrossRef]
- Zhang, X.; Ye, P.; Leung, H.; Gong, K.; Xiao, G. Object fusion tracking based on visible and infrared images: A comprehensive review. Inf. Fusion 2020, 63, 166–187. [Google Scholar] [CrossRef]
- Zhu, Y.; Lu, W.; Zhang, R.; Wang, R.; Robbins, D. Dual-channel cascade pose estimation network trained on infrared thermal image and groundtruth annotation for real-time gait measurement. Med. Image Anal. 2022, 79, 102435. [Google Scholar] [CrossRef] [PubMed]
- Hazra, S.; Roy, P.; Nandy, A.; Scherer, R. A Pilot Study for Investigating Gait Signatures in Multi-Scenario Applications. In Proceedings of the 2020 International Joint Conference on Neural Networks, Glasgow, UK, 19–24 July 2020; pp. 1–10. [Google Scholar]
- Du, J.; Li, W.; Xiao, B.; Nawaz, Q. Union Laplacian pyramid with multiple features for medical image fusion. Neurocomputing 2016, 194, 326–339. [Google Scholar] [CrossRef]
- Li, X.; He, Y.S.; Zhan, X.; Liu, F.Y. A rapid fusion Algorithm of infrared and the visible images based on Directionlet transform. Appl. Mech. Mater. 2010, 20, 45–51. [Google Scholar] [CrossRef]
- Deng, M.; Kang, J.; Li, Y. The Fusion Algorithm of Infrared and Visible Images Based on Computer Vision. Adv. Mater. Res. 2014, 945, 1851–1855. [Google Scholar] [CrossRef]
- Kudinov, I.; Nikiforov, M.; Kholopov, I. Camera and auxiliary sensor calibration for a multispectral panoramic vision system with a distributed aperture. J. Phys. Conf. Ser. 2019, 1368, 032009. [Google Scholar] [CrossRef]
- Rhee, J.H.; Seo, J. Low-Cost Curb Detection and Localization System Using Multiple Ultrasonic Sensors. Sensors 2019, 19, 1389. [Google Scholar] [CrossRef]
- Valkov, V.; Kuzin, A.; Kazantsev, A. Calibration of digital non-metric cameras for measuring works. J. Phys. Conf. Ser. 2018, 1118, 012044. [Google Scholar] [CrossRef]
- Badue, C.; Guidolini, R.; Carneiro, R.V.; Azevedo, P.; Cardoso, V.B.; Forechi, A.; Jesus, L.F.R.; Berriel, R.F.; Paixão, T.M.; Mutz, F.W.; et al. Self-driving cars: A survey. Expert Syst. Appl. 2021, 165, 113816. [Google Scholar] [CrossRef]
- Drew, S.; Andersen, H.; Du, X.; Shen, X.; Meghjani, M.; Eng, Y.H.; Rus, D.; Ang, M.H. Perception, Planning, Control, and Coordination for Autonomous Vehicles. Machines 2017, 5, 6. [Google Scholar]
- Campbell, M.; Egerstedt, M.; How, J.P.; Murray, R.M. Autonomous driving in urban environments: Approaches, lessons and challenges. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2010, 368, 4649–4672. [Google Scholar] [CrossRef] [PubMed]
- Susilo, J.; Febriani, A.; Rahmalisa, U.; Irawan, Y. Car parking distance controller using ultrasonic sensors based on arduino uno. J. Robot. Control (JRC) 2021, 2, 353–356. [Google Scholar] [CrossRef]
- Takumi, K.; Watanabe, K.; Ha, Q.; Tejero-De-Pablos, A.; Ushiku, Y.; Harada, T. Multispectral object detection for autonomous vehicles. In Proceedings of the on Thematic Workshops of ACM Multimedia, Mountain View, CA, USA, 23–27 October 2017; pp. 35–43. [Google Scholar]
- Li, H.; Wu, X.J.; Kittler, J. MDLatLRR: A novel decomposition method for infrared and visible image fusion. IEEE Trans. Image Process. 2020, 29, 4733–4746. [Google Scholar] [CrossRef]
- Bavirisetti, D.P.; Dhuli, R. Two-scale image fusion of visible and infrared images using saliency detection. Infrared Phys. Technol. 2016, 76, 52–64. [Google Scholar] [CrossRef]
- Gao, J.; Kim, S.J.; Brown, M.S. Constructing image panoramas using dual-homography warping. In Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Springs, CO, USA, 20–25 June 2011; pp. 49–56. [Google Scholar]
- Zaragoza, J.; Chin, T.; Brown, M.S.; Suter, D. As-Projective-As-Possible Image Stitching with Moving DLT. In Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 2339–2346. [Google Scholar]
- Lin, C.; Pankanti, S.; Ramamurthy, K.N.; Aravkin, A.Y. Adaptive as-natural-as-possible image stitching. In Proceedings of the 28th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1155–1163. [Google Scholar]
- Li, H.; Wu, X.J. DenseFuse: A fusion approach to infrared and visible images. IEEE Trans. Image Process. 2018, 28, 2614–2623. [Google Scholar] [CrossRef] [PubMed]
- Ma, J.; Liang, P.; Yu, W.; Chen, C.; Guo, X.; Wu, J.; Jiang, J. Infrared and visible image fusion via detail preserving adversarial learning. Inf. Fusion 2020, 54, 85–98. [Google Scholar] [CrossRef]
- Zhang, H.; Ma, J. SDNet: A versatile squeeze-and-decomposition network for real-time image fusion. Int. J. Comput. Vis. 2021, 129, 2761–2785. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Jiang, H.; Tian, Y. Fuzzy image fusion based on modified Self-Generating Neural Network. Expert Syst. Appl. 2011, 38, 8515–8523. [Google Scholar] [CrossRef]
- Nguyen, T.; Chen, S.W.; Shivakumar, S.S.; Taylor, C.J.; Kumar, V. Unsupervised Deep Homography: A Fast and Robust Homography Estimation Model. IEEE Robot. Autom. Lett. 2018, 3, 2346–2353. [Google Scholar] [CrossRef]
- Nie, L.; Lin, C.; Liao, K.; Liu, M.; Zhao, Y. A view-free image stitching network based on global homography. J. Vis. Commun. Image Represent. 2020, 73, 102950. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
- Zitová, B.; Flusser, J. Image registration methods: A survey. Image Vis. Comput. 2003, 21, 977–1000. [Google Scholar] [CrossRef]
- Chen, H.; Varshney, P.K. Mutual information-based CT-MR brain image registration using generalized partial volume joint histogram estimation. IEEE Trans. Med. Imaging 2003, 22, 1111–1119. [Google Scholar] [CrossRef] [PubMed]
- Lu, X.; Zhang, S.; Su, H.; Chen, Y. Mutual information-based multimodal image registration using a novel joint histogram estimation. Comput. Med. Imaging Graph. 2008, 32, 202–209. [Google Scholar] [CrossRef] [PubMed]
- Gao, Z.; Gu, B.; Lin, J. Monomodal image registration using mutual information based methods. Image Vis. Comput. 2008, 26, 164–173. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Bay, H.; Tuytelaars, T.; Gool, L.V. SURF: Speeded Up Robust Features. In Proceedings of the 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 404–417. [Google Scholar]
- Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
- Torr, P.H.S.; Zisserman, A. MLESAC: A New Robust Estimator with Application to Estimating Image Geometry. Comput. Vis. Image Underst. 2000, 78, 138–156. [Google Scholar] [CrossRef]
- Krig, S. Interest Point Detector and Feature Descriptor Survey. In Computer Vision Metrics: Textbook Edition; Springer International Publishing: Cham, Switzerland, 2016; pp. 187–246. [Google Scholar]
- Zhang, G.; He, Y.; Chen, W.; Jia, J.; Bao, H. Multi-viewpoint panorama construction with wide-baseline images. IEEE Trans. Image Process. 2016, 25, 3099–3111. [Google Scholar] [CrossRef]
- Tang, C.; Tian, G.Y.; Chen, X.; Wu, J.; Li, K.; Meng, H. Infrared and visible images registration with adaptable local-global feature integration for rail inspection. Infrared Phys. Technol. 2017, 87, 31–39. [Google Scholar] [CrossRef]
- Jiang, Q.; Liu, Y.; Yan, Y.; Deng, J.; Fang, J.; Li, Z.; Jiang, X. A Contour Angle Orientation for Power Equipment Infrared and Visible Image Registration. IEEE Trans. Power Deliv. 2021, 36, 2559–2569. [Google Scholar] [CrossRef]
- Min, C.; Gu, Y.; Li, Y.; Yang, F. Non-rigid infrared and visible image registration by enhanced affine transformation. Pattern Recognit. 2020, 106, 107377. [Google Scholar] [CrossRef]
- Liu, X.; Ai, Y.; Tian, B.; Cao, D. Robust and Fast Registration of Infrared and Visible Images for Electro-Optical Pod. IEEE Trans. Ind. Electron. 2019, 66, 1335–1344. [Google Scholar] [CrossRef]
- Yang, Z.; Dan, T.; Yang, Y. Multi-temporal remote sensing image registration using deep convolutional features. IEEE Access 2018, 6, 38544–38555. [Google Scholar] [CrossRef]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. Deep Image Homography Estimation. arXiv 2016, arXiv:1606.03798. [Google Scholar]
- Yang, X.; Kwitt, R.; Styner, M.; Niethammer, M. Quicksilver: Fast predictive image registration – A deep learning approach. NeuroImage 2017, 158, 378–396. [Google Scholar] [CrossRef]
- Yi, K.M.; Trulls, E.; Ono, Y.; Lepetit, V.; Salzmann, M.; Fua, P. Learning to Find Good Correspondences. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2666–2674. [Google Scholar]
- Toldo, X.; Maracani, A.; Michieli, U.; Zanuttigh, P. Unsupervised Domain Adaptation in Semantic Segmentation: A Review. Technologies 2020, 8, 35. [Google Scholar] [CrossRef]
- Le, H.; Liu, F.; Zhang, S.; Agarwala, A. Deep homography estimation for dynamic scenes. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognitio, Seattle, WA, USA, 13–19 June 2020; pp. 7652–7661. [Google Scholar]
- Zhang, J.; Wang, C.; Liu, S.; Jia, L.; Ye, N.; Wang, J.; Zhou, J.; Sun, J. Content-aware unsupervised deep homography estimation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 653–669. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Zaragoza, J.; Chin, T.; Tran, Q.; Brown, M.S.; Suter, D. As-Projective-As-Possible Image Stitching with Moving DLT. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 1285–1298. [Google Scholar]
- Kalluri, K.; Varma, G.; Chandraker, M.; Jawahar, C.V. Universal Semi-Supervised Semantic Segmentation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5258–5269. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshic, R.B. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognitio, Seattle, WA, USA, 13–19 June 2020; pp. 9726–9735. [Google Scholar]
- Lin, T.; Maire, M.; Belongie, S.J.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Fang, Q.; Han, D.; Wang, Z. Cross-Modality Fusion Transformer for Multispectral Object Detection. arXiv 2021, arXiv:2111.00273. [Google Scholar] [CrossRef]
- Olson, E.B. Real-time correlative scan matching. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; pp. 4387–4393. [Google Scholar]
Dataset | Model | RMSE | CC | SSIM | MI | PSNR |
---|---|---|---|---|---|---|
MS-COCO | SIFT+RANSAC | 17.8542 | 0.9716 | 0.8787 | 2.2536 | 29.6583 |
UDHN | 8.4506 | 0.9126 | 0.8662 | 2.1918 | 21.9387 | |
VFIS | 8.2986 | 0.9493 | 0.8761 | 2.3268 | 22.2943 | |
Ours | 7.4214 | 0.9708 | 0.8915 | 2.4001 | 30.0036 | |
FLIR (labelled) | SIFT+RANSAC | 859.4847 | \ | \ | \ | \ |
UDHN | 24.7125 | 0.0394 | 0.4060 | 0.6678 | 10.7463 | |
VFIS | 19.9754 | 0.1885 | 0.4262 | 0.8160 | 10.9331 | |
Ours | 15.9170 | 0.3505 | 0.4425 | 0.9274 | 11.1414 | |
FLIR (unlabelled) | UDHN+CSM | \ | 0.0224 | 0.3997 | 0.6572 | 9.8765 |
VFIS | \ | 0.1849 | 0.4215 | 0.8169 | 10.9210 | |
Ours | \ | 0.3623 | 0.4437 | 0.9042 | 10.9934 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Q.; Xiang, W. Cross-Modal Image Registration via Rasterized Parameter Prediction for Object Tracking. Appl. Sci. 2023, 13, 5359. https://doi.org/10.3390/app13095359
Zhang Q, Xiang W. Cross-Modal Image Registration via Rasterized Parameter Prediction for Object Tracking. Applied Sciences. 2023; 13(9):5359. https://doi.org/10.3390/app13095359
Chicago/Turabian StyleZhang, Qing, and Wei Xiang. 2023. "Cross-Modal Image Registration via Rasterized Parameter Prediction for Object Tracking" Applied Sciences 13, no. 9: 5359. https://doi.org/10.3390/app13095359