RGB-to-Infrared Translation Using Ensemble Learning Applied to Driving Scenarios
Abstract
:1. Introduction
- (I)
- We demonstrate the effectiveness of ensemble learning for RGB-to-NIR image translation using a limited amount of training data. This is done by accurately selecting input features in addition to RGB channels, such as horizontal/vertical gradients and segmentation masks. The main objective is to achieve good visual quality and performances comparable with translations implemented with more complex methods and reported in the literature.
- (II)
- We extend the method to RGB-to-thermal translation, tackling the one-to-many mapping (as one RGB can generate several thermal images, depending on the temperature conditions) by introducing a recursive training method generating multiple checkpoints. Additionally, we develop a regularized loss function for gradient boosting that enhances regression by considering the physical constraints of key elements in the scene with the highest thermal information, specifically, pedestrians and cars.
- (III)
- We apply both RGB-to-NIR and RGB-to-thermal translation to images generated with the CARLA simulator for autonomous driving, providing an essential tool for evaluating new modalities and testing fusion pipelines. Ultimately, we apply the regularized translation model to the CARLA-generated images.
2. Background
3. Method
3.1. Data Sources and Features Extraction
3.2. Modeling and Training Procedure
Algorithm 1: Feature Matrix Construction and Training |
- is the standard deviation of the true labels for mask m (with for numerical stability).
- is an additional hyperparameter (here, ).
- is the regularization coefficient (here, ).
- and are the lower and upper bounds for mask m for each class (e.g., person or vehicle) measured in the training set.
- is the regularization coefficient (here, ).
4. Results and Discussion
4.1. Infrared Translated Images and Performance Comparisons
Metric | XGBoost | GAN (sRGB-TIR) | Ref. Range | ||
---|---|---|---|---|---|
EPFL | MS2 | Freiburg | Freiburg | ||
(↑) | 0.94 (0.03) | 0.98 (0.003) | 0.86 (0.15) | 0.85 (0.20) | >0.85 [38] |
PSNR (↑) | 18.7 (2.57) | 17.1 (1.84) | 21.3 (5.08) | 23.55 (3.14) | >20 [39] |
SSIM (↑) | 0.54 (0.08) | 0.62 (0.05) | 0.75 (0.06) | 0.81 (0.05) | >0.70 [40] |
LPIPS (↓) | 0.094 (0.04) | 0.15 (0.08) | 0.038 (0.01) | 0.25 (0.03) | <0.1 [39] |
Metric | Pedestrian | Car |
---|---|---|
+0.10 | +0.06 | |
PSNR | +0.70 dB | +0.284 dB |
4.2. Infrared Translation Using Synthetic RGB Images Generated in CARLA Simulator
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Computational Complexity and Runtime
Resolution | Throughput [fps] | |
---|---|---|
CPU (i7-1185G7) | GPU (T4) | |
4.3 | 4.3 | |
0.5 | 0.5 |
- (a)
- Asymptotic cost: A forward pass consists of convolutions, for which the arithmetic grows linearly with the number of pixels:Empirically, this is for one frame, versus the integer comparisons used by the GBDT.
- (b)
- Memory footprint: The FP32 weights occupy 40 MB, the peak activation tensor for a input is 13 MB, and the cuDNN scratch space requires 250 MB. Hence, the entire translation branch fits comfortably within 303 MB of GPU RAM, compared with 5.6 MB for the serialized XGBoost ensemble and 5.1 MB for one feature buffer.
References
- Pendleton, S.; Andersen, H.; Du, X.; Shen, X.; Meghjani, M.; Eng, Y.; Rus, D.; Ang, M. Perception, planning, control, and coordination for autonomous vehicles. Machines 2017, 5, 6. [Google Scholar] [CrossRef]
- Adnan, N. Exploring the future: A meta-analysis of autonomous vehicle adoption and its impact on urban life and the healthcare sector. Transp. Res. Interdiscip. Perspect. 2024, 26, 101110. [Google Scholar] [CrossRef]
- Sana, F.; Azad, N.L.; Raahemifar, K. Autonomous vehicle decision-making and control in complex and unconventional scenarios—A review. Machines 2023, 11, 676. [Google Scholar] [CrossRef]
- Huang, K.; Shi, B.; Li, X.; Li, X.; Huang, S.; Li, Y. Multi-modal sensor fusion for auto driving perception: A survey. arXiv 2022, arXiv:2202.02703. [Google Scholar]
- Marnissi, M.A.; Fradi, H.; Sahbani, A.; Amara, N.E.B. Thermal image enhancement using generative adversarial network for pedestrian detection. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 6509–6516. [Google Scholar]
- Geronimo, D.; Lopez, A.M.; Sappa, A.D.; Graf, T. Survey of pedestrian detection for advanced driver assistance systems. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 1239–1258. [Google Scholar] [CrossRef]
- Aloufi, N.; Alnori, A.; Basuhail, A. Enhancing Autonomous Vehicle Perception in Adverse Weather: A Multi Objectives Model for Integrated Weather Classification and Object Detection. Electronics 2024, 13, 3063. [Google Scholar] [CrossRef]
- Fadadu, S.; Pandey, S.; Hegde, D.; Shi, Y.; Chou, F.C.; Djuric, N.; Vallespi-Gonzalez, C. Multi-view fusion of sensor data for improved perception and prediction in autonomous driving. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 2349–2357. [Google Scholar]
- Liu, Z.; Tang, H.; Amini, A.; Yang, X.; Mao, H.; Rus, D.L.; Han, S. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 2774–2781. [Google Scholar]
- Isola, P.; Zhu, J.; Zhou, T.; Efros, A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Zhu, J.; Park, T.; Isola, P.; Efros, A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Yi, Z.; Zhang, H.; Tan, P.; Gong, M. DualGAN: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2849–2857. [Google Scholar]
- Kim, T.; Cha, M.; Kim, H.; Lee, J.; Kim, J. Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 1857–1865. [Google Scholar]
- Friedjungová, M.; Vašata, D.; Chobola, T.; Jiřina, M. Unsupervised Latent Space Translation Network. In Proceedings of the European Symposium on Artificial Neural Networks, Bruges, Belgium, 5–7 October 2022. [Google Scholar]
- Wang, T.; Liu, M.; Zhu, J.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8798–8807. [Google Scholar]
- Liu, M.; Breuel, T.; Kautz, J. Unsupervised image-to-image translation networks. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Huang, X.; Liu, M.Y.; Belongie, S.; Kautz, J. Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 172–189. [Google Scholar]
- Jin, Y.; Park, I.; Song, H.; Ju, H.; Nalcakan, Y.; Kim, S. Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation. arXiv 2024, arXiv:2409.16706. [Google Scholar] [CrossRef]
- Yang, S.; Sun, M.; Lou, X.; Yang, H.; Liu, D. Nighttime Thermal Infrared Image Translation Integrating Visible Images. Remote Sens. 2024, 16, 666. [Google Scholar] [CrossRef]
- Jeon, H.; Seo, J.; Kim, T.; Son, S.; Lee, J.; Choi, G.; Lim, Y. RainSD: Rain Style Diversification Module for Image Synthesis Enhancement using Feature-Level Style Distribution. arXiv 2023, arXiv:2401.00460. [Google Scholar] [CrossRef]
- Zhai, H.; Jin, G.; Yang, X.; Kang, G. ColorMamba: Towards High-quality NIR-to-RGB Spectral Translation with Mamba. arXiv 2024, arXiv:2408.08087. [Google Scholar]
- Wang, Z.; Colonnier, F.; Zheng, J.; Acharya, J.; Jiang, W.; Huang, K. Tirdet: Mono-modality thermal infrared object detection based on prior thermal-to-visible translation. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 2663–2672. [Google Scholar]
- Liu, S.; Gao, M.; John, V.; Liu, Z.; Blasch, E. Deep learning thermal image translation for night vision perception. ACM Trans. Intell. Syst. Technol. (TIST) 2020, 12, 9. [Google Scholar] [CrossRef]
- Pizzati, F.; Charette, R.d.; Zaccaria, M.; Cerri, P. Domain bridge for unpaired image-to-image translation and unsupervised domain adaptation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 2990–2998. [Google Scholar]
- Richardson, E.; Weiss, Y. The surprising effectiveness of linear unsupervised image-to-image translation. In Proceedings of the 2020 25th International Conference on Pattern Recognition, Virtual Event, 10–15 January 2021; IEEE: Piscataway, NJ, USA; pp. 7855–7861. [Google Scholar]
- Lu, K.; Yang, D. Image processing and image mining using decision trees. J. Inf. Sci. Eng. 2009, 25, 989–1003. [Google Scholar]
- Brown, M.; Süsstrunk, S. Multispectral SIFT for Scene Category Recognition. In Proceedings of the Computer Vision and Pattern Recognition (CVPR11), Colorado Springs, CO, USA, 20–25 June 2011; pp. 177–184. [Google Scholar]
- Shin, U.; Park, J.; Kweon, I.S. Deep Depth Estimation From Thermal Image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 1043–1053. [Google Scholar]
- Vertens, J.; Zürn, J.; Burgard, W. HeatNet: Bridging the Day-Night Domain Gap in Semantic Segmentation with Thermal Images. arXiv 2020, arXiv:2003.04645. [Google Scholar]
- Wu, Y.; Kirillov, A.; Massa, F.; Lo, W.; Girshick, R. Detectron2. 2019. Available online: https://github.com/facebookresearch/detectron2 (accessed on 1 June 2024).
- Chen, T.; Guestrin, C. XGBoost Documentation. Available online: https://xgboost.readthedocs.io (accessed on 24 April 2024).
- Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
- Yue, G.; Zhang, L.; Zhang, J.; Xu, Z.; Wang, S.; Zhou, T.; Gong, Y.; Zhou, W. Subjective quality assessment of thermal infrared images. In Proceedings of the 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, UAE, 27–30 October 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1212–1217. [Google Scholar]
- Zelmati, O.; Bondžulić, B.; Pavlović, B.; Tot, I.; Merrouche, S. Study of subjective and objective quality assessment of infrared compressed images. J. Electr. Eng. 2022, 73, 73–87. [Google Scholar] [CrossRef]
- Lee, D.-G.; Jeon, M.-H.; Cho, Y.; Kim, A. Edge-guided multi-domain RGB-to-TIR image translation for training vision tasks with challenging labels. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; IEEE: Piscataway, Nj, USA, 2023; pp. 8291–8298. [Google Scholar]
- Panigrahi, B.; Kathala, K.C.R.; Sujatha, M. A machine learning-based comparative approach to predict the crop yield using supervised learning with regression models. Procedia Comput. Sci. 2023, 218, 2684–2693. [Google Scholar] [CrossRef]
- Cai, W.; Wei, Z. PiiGAN: Generative adversarial networks for pluralistic image inpainting. IEEE Access 2020, 8, 48451–48463. [Google Scholar] [CrossRef]
- Bakurov, I.; Buzzelli, M.; Schettini, R.; Castelli, M.; Vanneschi, L. Structural similarity index (SSIM) revisited: A data-driven approach. Expert Syst. Appl. 2022, 189, 116087. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An open urban driving simulator. In Proceedings of the Conference on Robot Learning. PMLR, Mountain View, CA, USA, 13–15 November 2017; pp. 1–16. [Google Scholar]
- Shaikh, Z.A.; Van Hamme, D.; Veelaert, P.; Philips, W. Probabilistic fusion for pedestrian detection from thermal and colour images. Sensors 2022, 22, 8637. [Google Scholar] [CrossRef] [PubMed]
- Dimitrievski, M.; Van Hamme, D.; Veelaert, P.; Philips, W. Cooperative multi-sensor tracking of vulnerable road users in the presence of missing detections. Sensors 2020, 20, 4817. [Google Scholar] [CrossRef] [PubMed]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ravaglia, L.; Longo, R.; Wang, K.; Van Hamme, D.; Moeyersoms, J.; Stoffelen, B.; De Schepper, T. RGB-to-Infrared Translation Using Ensemble Learning Applied to Driving Scenarios. J. Imaging 2025, 11, 206. https://doi.org/10.3390/jimaging11070206
Ravaglia L, Longo R, Wang K, Van Hamme D, Moeyersoms J, Stoffelen B, De Schepper T. RGB-to-Infrared Translation Using Ensemble Learning Applied to Driving Scenarios. Journal of Imaging. 2025; 11(7):206. https://doi.org/10.3390/jimaging11070206
Chicago/Turabian StyleRavaglia, Leonardo, Roberto Longo, Kaili Wang, David Van Hamme, Julie Moeyersoms, Ben Stoffelen, and Tom De Schepper. 2025. "RGB-to-Infrared Translation Using Ensemble Learning Applied to Driving Scenarios" Journal of Imaging 11, no. 7: 206. https://doi.org/10.3390/jimaging11070206
APA StyleRavaglia, L., Longo, R., Wang, K., Van Hamme, D., Moeyersoms, J., Stoffelen, B., & De Schepper, T. (2025). RGB-to-Infrared Translation Using Ensemble Learning Applied to Driving Scenarios. Journal of Imaging, 11(7), 206. https://doi.org/10.3390/jimaging11070206