CrossModal Image Registration via Rasterized Parameter Prediction for Object Tracking
Abstract
:1. Introduction
 We propose a crossmodal image registration method that solves the problem of pixel coordinate misalignment between visiblelight and infrared images.
 We propose a deep learningbased multimodality and multiscale feature transformation parameter prediction method.
 We introduce a contrastive learningbased crossmodal similarity measurement model for semisupervised image alignment.
2. Related Work
2.1. AreaBased Image Registration
2.2. Feature ExtractionBased Image Registration
2.3. Deep LearningBased Image Registration
3. Methodology
 The FEM uses a VGG convolutional neural network to encode reference image (visible light) and moving image (infrared) feature pyramids.
 The RPPM predicts a homography matrix for each raster of the feature pyramid layers and outputs homography matrix groups for each image.
 The ITM conducts perspective transformation for the homography matrix group of the moving images.
 The SMM computes the similarity scores of the two images as a cost function for image registration model training.
3.1. Feature Extraction
3.2. Registration Parameter Prediction
Algorithm 1 The registration parameter prediction algorithm 
Input: $\left[{F}_{r}^{l}\right],\left[{F}_{m}^{l}\right],l\in [0,L1]$ 
Output: $\left[{H}_{i}\right],i\in [1,{N}_{L1}^{2}]$ 
Hyperparameter: $L,\left[{N}_{l}\right],l\in [1,L1]$ 

3.3. Image Transformation
3.4. Similarity Measurement
3.5. Image Registration
4. Experiment Settings
4.1. Datasets and Parameter Settings
4.2. Competitors
 MI (2008) [31], which uses mutual information for image similarity measurement.
 UDHN+CSM (2009) [56], which combines UDHN with a realtime correlative scan matching algorithm.
 SIFT+RANSAC (2015) [20], which combines SIFT feature extraction with the RANSAC outlier removal algorithm.
 UDHN (2018) [26], which uses a convolutional neural network to predict feature point pairs and applies mean absolute error metrics for cost function computation.
 VFIS (2020) [27], which applies Cost Volume for multimodal image feature fusion.
4.3. Evaluation Metric
 Correlation Coefficient (CC): describes the linear correlation between images; a larger correlation coefficient between the target and reference images indicates higher similarity between the two, i.e., a better registration result, and is calculated as follows:$$\begin{array}{c}\hfill C{C}_{XY}={\displaystyle \frac{Cov(X,Y)}{\sqrt{D\left(X\right)}\sqrt{D\left(Y\right)}}}\end{array}$$
 Structural Similarity Index Measure (SSIM): constructs the quality loss and distortion of images, including the correlation loss, grayscale distortion, and contrast distortion; a larger SSIM indicates that the quality of the target image is closer to that of the reference image. SSIM is calculated as follows:$$\begin{array}{c}\hfill SSIM(x,y)={\left[luminance(x,y)\right]}^{\alpha}\xb7{\left[contrast(x,y)\right]}^{\beta}\xb7{\left[structure(x,y)\right]}^{\gamma}\end{array}$$
 Peak SignaltoNoise Ratio (PSNR): computes the ratio of peak energy to noise energy for measurement of distortion in the image registration process; the larger the PSNR value, the closer the target image is to the reference image. PSNR is calculated as follows:$$\begin{array}{c}\hfill PSNR=10\times lg({\displaystyle \frac{MaxValu{e}^{2}}{MSE}})\end{array}$$
 Mutual Information (MI): describes the mutual information contained in each image. The higher the similarity or overlap between two images, the higher their correlation, and the smaller their joint entropy, which means that the mutual information is greater. MI is calculated as follows:$$\begin{array}{cc}\hfill MI(X,Y)& =H\left(X\right)+H\left(Y\right)H(X,Y)\hfill \end{array}$$$$\begin{array}{cc}\hfill H\left(X\right)& =\sum _{j=1}^{N}p\left({a}_{j}\right)logp\left({a}_{j}\right)\hfill \end{array}$$$$\begin{array}{cc}\hfill H(X,Y)& =\sum _{x,y}{P}_{XY}(x,y)log{P}_{XY}(x,y)\hfill \end{array}$$
5. Results and Analysis
5.1. Overall Results
5.2. Visualization
5.3. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
 Ma, J.; Ma, Y.; Li, C. Infrared and visible image fusion methods and applications: A survey. Inf. Fusion 2019, 45, 153–178. [Google Scholar] [CrossRef]
 Zhang, X.; Ye, P.; Leung, H.; Gong, K.; Xiao, G. Object fusion tracking based on visible and infrared images: A comprehensive review. Inf. Fusion 2020, 63, 166–187. [Google Scholar] [CrossRef]
 Zhu, Y.; Lu, W.; Zhang, R.; Wang, R.; Robbins, D. Dualchannel cascade pose estimation network trained on infrared thermal image and groundtruth annotation for realtime gait measurement. Med. Image Anal. 2022, 79, 102435. [Google Scholar] [CrossRef] [PubMed]
 Hazra, S.; Roy, P.; Nandy, A.; Scherer, R. A Pilot Study for Investigating Gait Signatures in MultiScenario Applications. In Proceedings of the 2020 International Joint Conference on Neural Networks, Glasgow, UK, 19–24 July 2020; pp. 1–10. [Google Scholar]
 Du, J.; Li, W.; Xiao, B.; Nawaz, Q. Union Laplacian pyramid with multiple features for medical image fusion. Neurocomputing 2016, 194, 326–339. [Google Scholar] [CrossRef]
 Li, X.; He, Y.S.; Zhan, X.; Liu, F.Y. A rapid fusion Algorithm of infrared and the visible images based on Directionlet transform. Appl. Mech. Mater. 2010, 20, 45–51. [Google Scholar] [CrossRef]
 Deng, M.; Kang, J.; Li, Y. The Fusion Algorithm of Infrared and Visible Images Based on Computer Vision. Adv. Mater. Res. 2014, 945, 1851–1855. [Google Scholar] [CrossRef]
 Kudinov, I.; Nikiforov, M.; Kholopov, I. Camera and auxiliary sensor calibration for a multispectral panoramic vision system with a distributed aperture. J. Phys. Conf. Ser. 2019, 1368, 032009. [Google Scholar] [CrossRef]
 Rhee, J.H.; Seo, J. LowCost Curb Detection and Localization System Using Multiple Ultrasonic Sensors. Sensors 2019, 19, 1389. [Google Scholar] [CrossRef]
 Valkov, V.; Kuzin, A.; Kazantsev, A. Calibration of digital nonmetric cameras for measuring works. J. Phys. Conf. Ser. 2018, 1118, 012044. [Google Scholar] [CrossRef]
 Badue, C.; Guidolini, R.; Carneiro, R.V.; Azevedo, P.; Cardoso, V.B.; Forechi, A.; Jesus, L.F.R.; Berriel, R.F.; Paixão, T.M.; Mutz, F.W.; et al. Selfdriving cars: A survey. Expert Syst. Appl. 2021, 165, 113816. [Google Scholar] [CrossRef]
 Drew, S.; Andersen, H.; Du, X.; Shen, X.; Meghjani, M.; Eng, Y.H.; Rus, D.; Ang, M.H. Perception, Planning, Control, and Coordination for Autonomous Vehicles. Machines 2017, 5, 6. [Google Scholar]
 Campbell, M.; Egerstedt, M.; How, J.P.; Murray, R.M. Autonomous driving in urban environments: Approaches, lessons and challenges. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2010, 368, 4649–4672. [Google Scholar] [CrossRef] [PubMed]
 Susilo, J.; Febriani, A.; Rahmalisa, U.; Irawan, Y. Car parking distance controller using ultrasonic sensors based on arduino uno. J. Robot. Control (JRC) 2021, 2, 353–356. [Google Scholar] [CrossRef]
 Takumi, K.; Watanabe, K.; Ha, Q.; TejeroDePablos, A.; Ushiku, Y.; Harada, T. Multispectral object detection for autonomous vehicles. In Proceedings of the on Thematic Workshops of ACM Multimedia, Mountain View, CA, USA, 23–27 October 2017; pp. 35–43. [Google Scholar]
 Li, H.; Wu, X.J.; Kittler, J. MDLatLRR: A novel decomposition method for infrared and visible image fusion. IEEE Trans. Image Process. 2020, 29, 4733–4746. [Google Scholar] [CrossRef]
 Bavirisetti, D.P.; Dhuli, R. Twoscale image fusion of visible and infrared images using saliency detection. Infrared Phys. Technol. 2016, 76, 52–64. [Google Scholar] [CrossRef]
 Gao, J.; Kim, S.J.; Brown, M.S. Constructing image panoramas using dualhomography warping. In Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Springs, CO, USA, 20–25 June 2011; pp. 49–56. [Google Scholar]
 Zaragoza, J.; Chin, T.; Brown, M.S.; Suter, D. AsProjectiveAsPossible Image Stitching with Moving DLT. In Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 2339–2346. [Google Scholar]
 Lin, C.; Pankanti, S.; Ramamurthy, K.N.; Aravkin, A.Y. Adaptive asnaturalaspossible image stitching. In Proceedings of the 28th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1155–1163. [Google Scholar]
 Li, H.; Wu, X.J. DenseFuse: A fusion approach to infrared and visible images. IEEE Trans. Image Process. 2018, 28, 2614–2623. [Google Scholar] [CrossRef] [PubMed]
 Ma, J.; Liang, P.; Yu, W.; Chen, C.; Guo, X.; Wu, J.; Jiang, J. Infrared and visible image fusion via detail preserving adversarial learning. Inf. Fusion 2020, 54, 85–98. [Google Scholar] [CrossRef]
 Zhang, H.; Ma, J. SDNet: A versatile squeezeanddecomposition network for realtime image fusion. Int. J. Comput. Vis. 2021, 129, 2761–2785. [Google Scholar] [CrossRef]
 Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
 Jiang, H.; Tian, Y. Fuzzy image fusion based on modified SelfGenerating Neural Network. Expert Syst. Appl. 2011, 38, 8515–8523. [Google Scholar] [CrossRef]
 Nguyen, T.; Chen, S.W.; Shivakumar, S.S.; Taylor, C.J.; Kumar, V. Unsupervised Deep Homography: A Fast and Robust Homography Estimation Model. IEEE Robot. Autom. Lett. 2018, 3, 2346–2353. [Google Scholar] [CrossRef]
 Nie, L.; Lin, C.; Liao, K.; Liu, M.; Zhao, Y. A viewfree image stitching network based on global homography. J. Vis. Commun. Image Represent. 2020, 73, 102950. [Google Scholar] [CrossRef]
 Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for LargeScale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
 Zitová, B.; Flusser, J. Image registration methods: A survey. Image Vis. Comput. 2003, 21, 977–1000. [Google Scholar] [CrossRef]
 Chen, H.; Varshney, P.K. Mutual informationbased CTMR brain image registration using generalized partial volume joint histogram estimation. IEEE Trans. Med. Imaging 2003, 22, 1111–1119. [Google Scholar] [CrossRef] [PubMed]
 Lu, X.; Zhang, S.; Su, H.; Chen, Y. Mutual informationbased multimodal image registration using a novel joint histogram estimation. Comput. Med. Imaging Graph. 2008, 32, 202–209. [Google Scholar] [CrossRef] [PubMed]
 Gao, Z.; Gu, B.; Lin, J. Monomodal image registration using mutual information based methods. Image Vis. Comput. 2008, 26, 164–173. [Google Scholar] [CrossRef]
 Lowe, D.G. Distinctive Image Features from ScaleInvariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
 Bay, H.; Tuytelaars, T.; Gool, L.V. SURF: Speeded Up Robust Features. In Proceedings of the 9th European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 404–417. [Google Scholar]
 Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
 Torr, P.H.S.; Zisserman, A. MLESAC: A New Robust Estimator with Application to Estimating Image Geometry. Comput. Vis. Image Underst. 2000, 78, 138–156. [Google Scholar] [CrossRef]
 Krig, S. Interest Point Detector and Feature Descriptor Survey. In Computer Vision Metrics: Textbook Edition; Springer International Publishing: Cham, Switzerland, 2016; pp. 187–246. [Google Scholar]
 Zhang, G.; He, Y.; Chen, W.; Jia, J.; Bao, H. Multiviewpoint panorama construction with widebaseline images. IEEE Trans. Image Process. 2016, 25, 3099–3111. [Google Scholar] [CrossRef]
 Tang, C.; Tian, G.Y.; Chen, X.; Wu, J.; Li, K.; Meng, H. Infrared and visible images registration with adaptable localglobal feature integration for rail inspection. Infrared Phys. Technol. 2017, 87, 31–39. [Google Scholar] [CrossRef]
 Jiang, Q.; Liu, Y.; Yan, Y.; Deng, J.; Fang, J.; Li, Z.; Jiang, X. A Contour Angle Orientation for Power Equipment Infrared and Visible Image Registration. IEEE Trans. Power Deliv. 2021, 36, 2559–2569. [Google Scholar] [CrossRef]
 Min, C.; Gu, Y.; Li, Y.; Yang, F. Nonrigid infrared and visible image registration by enhanced affine transformation. Pattern Recognit. 2020, 106, 107377. [Google Scholar] [CrossRef]
 Liu, X.; Ai, Y.; Tian, B.; Cao, D. Robust and Fast Registration of Infrared and Visible Images for ElectroOptical Pod. IEEE Trans. Ind. Electron. 2019, 66, 1335–1344. [Google Scholar] [CrossRef]
 Yang, Z.; Dan, T.; Yang, Y. Multitemporal remote sensing image registration using deep convolutional features. IEEE Access 2018, 6, 38544–38555. [Google Scholar] [CrossRef]
 DeTone, D.; Malisiewicz, T.; Rabinovich, A. Deep Image Homography Estimation. arXiv 2016, arXiv:1606.03798. [Google Scholar]
 Yang, X.; Kwitt, R.; Styner, M.; Niethammer, M. Quicksilver: Fast predictive image registration – A deep learning approach. NeuroImage 2017, 158, 378–396. [Google Scholar] [CrossRef]
 Yi, K.M.; Trulls, E.; Ono, Y.; Lepetit, V.; Salzmann, M.; Fua, P. Learning to Find Good Correspondences. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2666–2674. [Google Scholar]
 Toldo, X.; Maracani, A.; Michieli, U.; Zanuttigh, P. Unsupervised Domain Adaptation in Semantic Segmentation: A Review. Technologies 2020, 8, 35. [Google Scholar] [CrossRef]
 Le, H.; Liu, F.; Zhang, S.; Agarwala, A. Deep homography estimation for dynamic scenes. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognitio, Seattle, WA, USA, 13–19 June 2020; pp. 7652–7661. [Google Scholar]
 Zhang, J.; Wang, C.; Liu, S.; Jia, L.; Ye, N.; Wang, J.; Zhou, J.; Sun, J. Contentaware unsupervised deep homography estimation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 653–669. [Google Scholar]
 Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
 Zaragoza, J.; Chin, T.; Tran, Q.; Brown, M.S.; Suter, D. AsProjectiveAsPossible Image Stitching with Moving DLT. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 1285–1298. [Google Scholar]
 Kalluri, K.; Varma, G.; Chandraker, M.; Jawahar, C.V. Universal SemiSupervised Semantic Segmentation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5258–5269. [Google Scholar]
 He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshic, R.B. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognitio, Seattle, WA, USA, 13–19 June 2020; pp. 9726–9735. [Google Scholar]
 Lin, T.; Maire, M.; Belongie, S.J.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
 Fang, Q.; Han, D.; Wang, Z. CrossModality Fusion Transformer for Multispectral Object Detection. arXiv 2021, arXiv:2111.00273. [Google Scholar] [CrossRef]
 Olson, E.B. Realtime correlative scan matching. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; pp. 4387–4393. [Google Scholar]
Dataset  Model  RMSE  CC  SSIM  MI  PSNR 

MSCOCO  SIFT+RANSAC  17.8542  0.9716  0.8787  2.2536  29.6583 
UDHN  8.4506  0.9126  0.8662  2.1918  21.9387  
VFIS  8.2986  0.9493  0.8761  2.3268  22.2943  
Ours  7.4214  0.9708  0.8915  2.4001  30.0036  
FLIR (labelled)  SIFT+RANSAC  859.4847  \  \  \  \ 
UDHN  24.7125  0.0394  0.4060  0.6678  10.7463  
VFIS  19.9754  0.1885  0.4262  0.8160  10.9331  
Ours  15.9170  0.3505  0.4425  0.9274  11.1414  
FLIR (unlabelled)  UDHN+CSM  \  0.0224  0.3997  0.6572  9.8765 
VFIS  \  0.1849  0.4215  0.8169  10.9210  
Ours  \  0.3623  0.4437  0.9042  10.9934 
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Q.; Xiang, W. CrossModal Image Registration via Rasterized Parameter Prediction for Object Tracking. Appl. Sci. 2023, 13, 5359. https://doi.org/10.3390/app13095359
Zhang Q, Xiang W. CrossModal Image Registration via Rasterized Parameter Prediction for Object Tracking. Applied Sciences. 2023; 13(9):5359. https://doi.org/10.3390/app13095359
Chicago/Turabian StyleZhang, Qing, and Wei Xiang. 2023. "CrossModal Image Registration via Rasterized Parameter Prediction for Object Tracking" Applied Sciences 13, no. 9: 5359. https://doi.org/10.3390/app13095359