Enhancing Perception Through Context-Adaptive Visible and SWIR Image Fusion in Harsh Environments
Abstract
1. Introduction
1.1. Related Work
1.1.1. VIS–LWIR Fusion
- Multi-scale fusions [13] (e.g., Wavelet Transform-Based Fusion [17]) are frequently used to combine fine visible textures with the thermal contrasts provided by LWIR. This enables efficient detail capture and preservation of the overall structure while highlighting thermal information. However, conventional multi-scale schemes based on fixed transforms can be less effective in complex environments dominated by non-linear structures, where their ability to adapt to local variations is limited.
- Neural networks are particularly popular in surveillance systems, where high quality is crucial (e.g., modern night-vision systems). However, they require large datasets to guarantee high-quality training. GAN-based approaches such as TarDAL [18], have shown improved detail preservation and robustness in VIS–LWIR fusion for adverse conditions, but their design is tailored to LWIR characteristics and would require adaptation for SWIR imagery. More recently, deep learning-based LWIR fusion methods have been proposed for enhanced perception in complex outdoor scenes [19], further illustrating the maturity of VIS–LWIR fusion compared to VIS–SWIR.
- Saliency-based methods [20] highlight visually prominent regions to create saliency maps, preserving the integrity of salient objects and enhancing the visual quality. These methods, which often combine saliency analysis and filtering, effectively retain important information but can be complex to implement and require optimisation for different scenarios.
1.1.2. VIS–NIR Fusion
- Complementary information-based fusion [22,23] analyses the differences between visible and NIR spectra at the physical signal level to design a complementary fusion model. This approach leverages the unique characteristics of each spectrum to enhance the overall image quality by integrating complementary information. This method aims to create a more comprehensive representation, improving the robustness and detail retention in the fused images. Morphological approaches, such as Top-Hat [23], enhance local contrast and fine structures while preserving colours, with structuring element size automatically guided by granulometric analysis.
- Pyramid transform [24] is a specific multi-scale approach designed to provide smooth transitions between spectra. It offers effective preservation of spectral detail and produces natural-looking images suitable for visualisation. Compared to traditional multi-scale methods, it can better handle gradual variations, but still requires careful parameter tuning to avoid inconsistent fusion and the appearance of artefacts.
- Neural networks [25] are particularly effective for complex environments. However, similarly to LWIR, they require a large volume of training data, which is not always available.
1.1.3. VIS–SWIR Fusion
1.2. Aim and Contributions
- A physical adaptation for VIS-SWIR fusion: We adapt a weight-map-guided multi-scale pyramid architecture to handle the highly decorrelated nature of SWIR data. While building upon foundational concepts originally designed for the NIR band [24], the essential difference lies in our specific physical adaptations for the SWIR spectrum (e.g., tailored pre-processing, weather-driven weighting, and specific tone mapping). This translates the framework to process SWIR signals effectively, revealing structural details completely obscured to RGB sensors in degraded conditions.
- Weather-driven parameter scheduling: We address the limitations of fixed-parameter fusion by formulating the hyperparameter selection as a multi-objective optimisation problem. Using an automated offline strategy guided by non-reference perceptual metrics (NR-IQA), we derive context-specific parameter sets. This allows the fusion framework to schedule its parameters based on the meteorological context, shifting away from empirical static weights.
- Empirical validation as a lightweight baseline: We demonstrate the effectiveness and algorithmic efficiency of the proposed method through evaluations in controlled weather facilities (PAVIN). Furthermore, preliminary assessments on a dynamic, real-world driving dataset (RASMD) indicate that VISWIR can consistently recover local contrast and suppress sensor noise while maintaining natural colour fidelity, establishing it as a robust algorithmic baseline for embedded perception.
1.3. Organisation of the Paper
2. Methodology
2.1. Proposed VISWIR Framework
- Images pre-processing.
- Adaptive weight maps calculation and normalisation.
- Multi-scale pyramid fusion.
- Weather-aware post-processing.
2.1.1. Images Pre-Processing
- SWIR pre-processing. Unlike standard VIS-NIR fusion, where raw contrast is often sufficient, SWIR imagery in adverse weather suffers from severe histogram compression and sensor-specific noise. Data preparation is therefore a necessary step. Initial processing is carried out directly via the camera parameters by selecting an adapted NUC (Non-Uniformity Correction). We then explicitly enhance the image contrast and dynamic range by applying a clipping and histogram stretching procedure. Specifically, to eliminate extreme outliers (e.g., sensor hot pixels) without compromising the core signal, we compute a dynamic clipping threshold , where and are the global mean and standard deviation of the raw SWIR image. This threshold statistically bounds 99% of the normal intensity distribution. The raw image I is first clipped such that . Following this, a standard Min–Max normalisation redistributes the clipped intensities across the full 8-bit dynamic scale:This operation removes extreme pixel values and redistributes the remaining intensity range over the full dynamic scale, improving local contrast while limiting the influence of outliers. In our experiments, this specific pre-processing proved essential to enable effective fusion under foggy or rainy conditions, where raw SWIR data would otherwise appear flat.
- HSV conversion. The visible image is converted from the RGB to the HSV colour space to facilitate subsequent processing and yield better fusion results. This conversion isolates the hue (H), saturation (S), and value (V) components. The value component (V), which acts as the perceived visible luminance, is extracted and hereafter denoted as for the core fusion calculations. According to [33], colour spaces such as HSV or YCbCr give better visual results than RGB. Furthermore, HSV separates luminance from the chromatic channels, allowing colour saturation to be better preserved during the fusion process.
2.1.2. Adaptive Weight Maps Calculation and Normalisation
- Local Contrast (Standard Deviation) C: Assesses the high-frequency variability of intensities within a local neighbourhood . To prevent numerical instabilities (e.g., negative variances due to floating-point precision limits), we apply a strict zero bound prior to the square root:where denotes the local mean over .
- Local Entropy J: Measures the statistical complexity and information content. For an 8-bit image domain, the maximum possible entropy is 8 bits. We explicitly normalise the standard Shannon entropy over a neighbourhood to bound the metric within :where is the probability of occurrence of intensity i within .
- Local Visibility V: Estimates image clarity and local noise level by extracting the high-frequency residual from a first Gaussian blur (), and averaging its energy using a second Gaussian filter ():where ∗ denotes the convolution operator.
2.1.3. Multi-Scale Pyramid Fusion
2.1.4. Weather-Aware Post-Processing
2.2. Weather-Driven Parameter Optimisation
3. Experimental Setup and Evaluation
3.1. Dataset and Experimental Setup
- Sensors. Two cameras were used: a visible camera (Teledyne Dalsa, Waterloo, ON, Canada; Genie Nano C1630; 1632 × 1248 resolution) and a SWIR camera [10] (SWIR Vision Systems, Morrisville, NC, USA; Acuros® CQD® 1280; 1280 × 1024 resolution). Notably, this study leverages Colloidal Quantum Dot (CQD) technology for the SWIR sensor, offering a high-resolution alternative to the traditional InGaAs sensors predominantly used in the existing literature. While this fine spatial resolution is essential for effective deep multi-scale decomposition, an in-depth radiometric comparison between CQD and InGaAs remains beyond the algorithmic scope of this work. A comprehensive hardware evaluation detailing the signal-to-noise ratio (SNR) and dynamic range of these technologies can be found in the manufacturer’s comparative analysis [41]. Furthermore, the concrete benefits of this specific CQD sensor for downstream perception tasks (e.g., object detection and segmentation) under adverse weather conditions have been previously validated in recent experimental studies [4].
- Meteorological Conditions. Our dataset presented in [4] is composed of a diverse set of images acquired in clear weather (15 images at 1 frame per second), foggy conditions (86 images at 1 frame every 5 s), and rainy scenarios (125 images at 1 frame every 5 s). Weather conditions ranged from clear visibility to dense fog (visibility from 10 m to 400 m) and rain with intensities from 20 to 170 mm/h. A distinctive feature of this dataset is the inclusion of precise meteorological metrics (e.g., exact visibility range and rainfall rate) for every individual frame. This provides a synchronised environmental ground truth that is rarely available in standard public datasets.
3.2. Baselines and Implementation Details
3.3. Evaluation Metrics
- Normalised Entropy (NE) (↑) [35]: This metric estimates the total information content and signal complexity within an image. In the context of multi-spectral fusion, a higher NE score indicates that a greater amount of structural detail and texture from both the visible and SWIR sources has been successfully preserved in the final combined image.
- BRISQUE (↓) [38]: The Blind/Referenceless Image Spatial Quality Evaluator assesses the overall naturalness of an image. It relies on a Support Vector Regression (SVR) model trained on Natural Scene Statistics (NSS) to quantify losses in naturalness. A lower score signifies that the fused image exhibits fewer unnatural artefacts and more closely resembles a clear, undistorted scene.
- NIQE (↓) [36]: The Natural Image Quality Evaluator also leverages NSS but, unlike BRISQUE, operates completely blindly without requiring prior training on distorted images or human subjective scores. It measures the statistical distance between the evaluated image and a corpus of high-fidelity natural images. A lower NIQE value indicates higher perceptual quality and fewer spatial distortions.
- PIQE (↓) [37]: The Perception-based Image Quality Evaluator is an unsupervised metric that evaluates locally perceptible distortions by analysing local block variances. It is primarily designed to penalise blur and blockiness (e.g., compression artefacts). A lower PIQE score reflects an image with high apparent sharpness and distinct edges, though it does not explicitly differentiate between true structural texture and high-frequency sensor noise.
4. Results
4.1. Optimised Weather-Driven Parameters
4.2. Qualitative Evaluations
4.3. Quantitative Evaluations
- Clear Weather. Our method achieves the best scores for both NE↑ (0.638) and NIQE↓ (8.74) and ranks second for BRISQUE↓ (8.23) and PIQE↓ (36.75). This is a critical result demonstrating that VISWIR does not introduce artificial degradation when the input images are already of high quality. It acts as a non-destructive enhancement layer, unlike certain dehazing methods that inject false contrast or saturation artefacts in the absence of atmospheric scattering.
- Rain Scenarios. VISWIR proves to be the most consistent method across varying rain intensities. It improves BRISQUE↓ over the visible spectrum alone in both light and average rain (e.g., from 28.60 to 16.39 in average rain) and achieves the best NIQE↓ scores overall. Comparatively, Top-Hat often yields lower PIQE↓ scores but fails to significantly improve BRISQUE↓, while TarDAL exhibits unstable behaviour across different rain severities.
- Light Fog (102 m visibility). VISWIR remains highly competitive, ranking second in NIQE↓ (8.75) and achieving a BRISQUE↓ score (8.29) comparable to the best-performing methods. Notably, even under these mild conditions, Pyramid Transform begins to show limitations in naturalness (BRISQUE 13.68), suggesting that its static fusion logic struggles to balance spectral contributions as effectively as our weather-optimised weighting.
- Average Fog (28 m visibility). As conditions deteriorate and visibility drops, VISWIR achieves a balanced performance, securing the best NIQE↓ (9.04) and competitive NE↑ results. Although Pyramid Transform shows strong entropy scores here, its significantly higher BRISQUE↓ (38.90 vs. 12.20 for VISWIR) suggests that the measured “information” contains substantial perceptual distortions, which are heavily penalised by natural scene statistics metrics.
- Heavy Fog (15 m visibility). The advantage of the proposed approach is most evident in this extreme scenario. Pyramid Transform suffers a severe degradation with a BRISQUE↓ score of 72.23 (worse than the visible image at 47.17), indicating a failure to handle low-contrast SWIR data. In contrast, VISWIR maintains a score of 17.02, closely tracking the best score obtained by the SWIR sensor alone (10.91). This confirms that without specific pre-processing and adaptive weighting, standard pyramid fusion cannot cope with extreme signal attenuation. While Pyramid Transform achieves a slightly higher NE↑, this metric does not discriminate between true structural details and the high-frequency sensor noise that the baseline preserves.
- Local Distortion (PIQE) Interpretation. As observed in Table 5, Pyramid Transform frequently achieves the lowest PIQE scores. However, as previously illustrated in the qualitative analysis (Figure 6), this baseline produces a significant high-frequency grain. Because PIQE is a no-reference metric primarily designed to penalise blur and blockiness, it often misinterprets this pervasive sensor noise as “texture” or “sharpness”, resulting in an optimistically low distortion score. This apparent sharpness comes at the cost of overall perceptual naturalness (reflected by its poor BRISQUE performance). Conversely, VISWIR applies adaptive contrast clipping to suppress this noise, resulting in a smoother, cleaner image. While this reduction in high-frequency grain leads to a comparatively higher PIQE score, it represents a truthful restoration of the scene’s structural content without artificial noise injection.
- Summary. Across the full spectrum of evaluated weather scenarios, VISWIR achieves competitive overall stability instead of delivering optimal performance across all metrics and scenarios. As evidenced by the PIQE and BRISQUE scores, where certain baselines occasionally peak by favouring high-frequency noise or specific contrast features, VISWIR prioritises a balanced trade-off. Unlike baseline methods that excel in one specific metric or weather condition but significantly underperform in another, VISWIR avoids extreme variations. This stability directly validates the effectiveness of the weather-aware parameter scheduling, ensuring that the fusion framework provides a reliable and perceptually coherent perception stream for autonomous systems, regardless of environmental severity.
5. Discussions
5.1. Performance Analysis and Methodological Limitations
- Offline Optimisation and Extreme Degradation. While our current “weather-driven parameter scheduling” significantly enhances robustness compared to fixed baselines, it is important to clarify that the parameter optimisation process (via Optuna) is executed entirely offline to generate discrete sets of hyperparameters. Consequently, during inference, the system selects ready-made parameters from a categorical look-up table rather than computing them in real time, which slightly limits the strict definition of being “dynamically adaptive”. Furthermore, limitations are still observed under extreme degradation (e.g., dense fog with visibility under 15 m), where the signal itself is severely attenuated. Future work could investigate continuous, real-time parameter optimisation to handle these edge cases more effectively than the current discrete approach.
- Metrics. It should also be noted that the metrics used in this study are designed for, and in some cases trained on, visible images and are not specific to SWIR. This can inherently favour visible images in the evaluation process and, in some cases, lead to counter-intuitive results. For example, in the case of PIQE in average fog, the smoothing effect of fog can reduce local variance and yield lower (better) scores for visible images, even when SWIR reveals more scene details to a human observer. Similarly, BRISQUE relies on Natural Scene Statistics (NSS) that multispectral fusion images (VIS–IR or VIS–SWIR) may deviate from, especially when high-frequency noise is preserved, leading to an overestimation of perceived degradation. To reduce metric sensitivity to high-frequency noise, light denoising was applied to the luminance channel of TarDAL and V-SWIR-IF outputs before evaluation, without affecting colour information. Our method and Top-Hat did not require this adjustment. Therefore, future research should develop customised metrics that better reflect the characteristics of SWIR imagery and provide a more balanced assessment of fusion quality across spectral domains.
- Computational Efficiency. Execution times were measured on a standard CPU architecture (Intel i7) using images. The proposed CPU implementation averages 0.70 s per pair, being slightly faster than the Pyramid Transform method (∼0.77 s) and being significantly faster than Top-Hat and V-SWIR-IF (which both require ∼2.5 s). Notably, despite running entirely on a CPU, it remains competitive with the GPU-accelerated TarDAL (<0.5 s). This confirms the algorithmic efficiency of VISWIR on standard hardware, establishing it as a strong, lightweight algorithmic baseline for future embedded perception systems rather than a fully deployed real-time solution.
5.2. Dynamic Scenario Assessment
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Mohammed, A.S.; Amamou, A.; Ayevide, F.K.; Kelouwani, S.; Agbossou, K.; Zioui, N. The perception system of intelligent ground vehicles in all weather conditions: A systematic literature review. Sensors 2020, 20, 6532. [Google Scholar] [CrossRef] [PubMed]
- Bijelic, M.; Gruber, T.; Mannan, F.; Kraus, F.; Ritter, W.; Dietmayer, K.; Heide, F. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11682–11692. [Google Scholar]
- Bertozzi, M.; Fedriga, R.I.; Miron, A.; Reverchon, J.L. Pedestrian detection in poor visibility conditions: Would SWIR help? In Proceedings of the International Conference on Image Analysis and Processing; Springer: Berlin/Heidelberg, Germany, 2013; pp. 229–238. [Google Scholar]
- Mehra, R.; Riffard, A.; Labussière, M.; Duthon, P.; Aufrère, R. Would SWIR Modality Help for Detection and Segmentation in Harsh Weather Conditions? An Experimental Study. In Proceedings of the 2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Honolulu, HI, USA, 19–20 October 2025; pp. 2232–2240. [Google Scholar]
- Wolff, L.B.; Socolinsky, D.A.; Eveland, C.K.; Yalcin, J.I.; Holloway, J.H., Jr. Image fusion of shortwave infrared (SWIR) and visible for detection of mines, obstacles, and camouflage. In Proceedings of the Detection and Remediation Technologies for Mines and Minelike Targets VIII; SPIE: Bellingham, WA, USA, 2003; Volume 5089, pp. 1298–1306. [Google Scholar]
- Pavlović, M.S.; Milanović, P.D.; Stanković, M.S.; Perić, D.B.; Popadić, I.V.; Perić, M.V. Deep learning based SWIR object detection in long-range surveillance systems: An automated cross-spectral approach. Sensors 2022, 22, 2562. [Google Scholar] [PubMed]
- Park, J.; Hong, J.; Shim, W.; Jung, D.J. Multi-object tracking on swir images for city surveillance in an edge-computing environment. Sensors 2023, 23, 6373. [Google Scholar] [CrossRef] [PubMed]
- Liandrat, S.; Duthon, P.; Bernardin, F.; Daoued, A.B.; Bicard, J.L. A review of Cerema PAVIN fog & rain platform: From past and back to the future. In Proceedings of the ITS World Congress, Los Angeles, CA, USA, 18–22 September 2022. [Google Scholar]
- Riffard, A.; Labussière, M.; Duthon, P.; Aufrère, R. Exploitation d’un capteur proche infrarouge (SWIR) pour la perception des robots mobiles en conditions météorologiques difficiles. In Proceedings of the Reconnaissance des Formes, Image, Apprentissage et Perception (RFIAP’24), Lille, France, 1–3 July 2024. [Google Scholar]
- Gregory, C.; Hilton, A.; Violette, K.; Klem, E.J. 66-3: Invited paper: Colloidal quantum dot photodetectors for large format NIR, SWIR, and eSWIR imaging arrays. In Proceedings of the SID Symposium Digest of Technical Papers; Wiley Online Library: Hoboken, NJ, USA, 2021; Volume 52, pp. 982–986. [Google Scholar]
- Pinchon, N.; Cassignol, O.; Nicolas, A.; Bernardin, F.; Leduc, P.; Tarel, J.P.; Brémond, R.; Bercier, E.; Brunet, J. All-weather vision for automotive safety: Which spectral band? In Proceedings of the International Forum on Advanced Microsystems for Automotive Applications; Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–15. [Google Scholar]
- Ma, J.; Ma, Y.; Li, C. Infrared and visible image fusion methods and applications: A survey. Inf. Fusion 2019, 45, 153–178. [Google Scholar] [CrossRef]
- Xiao, G.; Bavirisetti, D.P.; Liu, G.; Zhang, X. Image Fusion; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Jin, Y.; Kovac, M.; Nalcakan, Y.; Park, I.; Yeo, S.; Ju, H.; Kim, S. RASMD: RGB and SWIR Multispectral Driving Dataset for Robust Perception in Adverse Conditions. Inf. Fusion 2025, 128, 103872. [Google Scholar] [CrossRef]
- Li, Y.; Moreau, J.; Ibanez-Guzman, J. Emergent visual sensors for autonomous vehicles. IEEE Trans. Intell. Transp. Syst. 2023, 24, 4716–4737. [Google Scholar] [CrossRef]
- Luo, Y.; Luo, Z. Infrared and visible image fusion: Methods, datasets, applications, and prospects. Appl. Sci. 2023, 13, 10891. [Google Scholar] [CrossRef]
- Xu, L.; Du, J.; Zhang, Z. Infrared-visible video fusion based on motion-compensated wavelet transforms. IET Image Process. 2015, 9, 318–328. [Google Scholar] [CrossRef]
- Liu, J.; Fan, X.; Huang, Z.; Wu, G.; Liu, R.; Zhong, W.; Luo, Z. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In Proceedings of the IEEE/CVF Conference On Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5802–5811. [Google Scholar]
- Zhuang, C.; Kuang, H.; Wang, H.; Wen, C.; Liu, C.; Yuan, G. PHFuse: Unsupervised color visible and infrared image fusion with preserved hue. Sci. Rep. 2025, 15, 31458. [Google Scholar] [CrossRef] [PubMed]
- Ma, J.; Zhou, Z.; Wang, B.; Zong, H. Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrared Phys. Technol. 2017, 82, 8–17. [Google Scholar] [CrossRef]
- Dümbgen, F.; El Helou, M.; Gucevska, N.; Süsstrunk, S. Near-infrared fusion for photorealistic image dehazing. In IS&T EI Proceedings; Ingenta: Burlingame, CA, USA, 2018. [Google Scholar]
- Awad, M.; Elliethy, A.; Aly, H.A. Adaptive near-infrared and visible fusion for fast image enhancement. IEEE Trans. Comput. Imaging 2019, 6, 408–418. [Google Scholar]
- Herrera-Arellano, M.; Peregrina-Barreto, H.; Terol-Villalobos, I. Visible-NIR image fusion based on top-hat transform. IEEE Trans. Image Process. 2021, 30, 4962–4972. [Google Scholar] [CrossRef] [PubMed]
- Vanmali, A.V.; Gadre, V.M. Visible and NIR image fusion using weight-map-guided Laplacian–Gaussian pyramid for improving scene visibility. Sādhanā 2017, 42, 1063–1082. [Google Scholar] [CrossRef]
- Jung, C.; Han, Q.; Zhou, K.; Xu, Y. Multispectral fusion of rgb and nir images using weighted least squares and convolution neural networks. IEEE Open J. Signal Process. 2021, 2, 559–570. [Google Scholar] [CrossRef]
- Fang, H.; Su, G.; Xu, G.; Cheng, C. V-SWIR-IF: Visible and Short-Wave Infrared Image Fusion. In Proceedings of the 2023 4th International Symposium on Computer Engineering and Intelligent Communications (ISCEIC); IEEE: New York, NY, USA, 2023; pp. 275–280. [Google Scholar]
- Huang, W.; Zhang, W.; Tu, Z.; Qin, Y.; Bi, H. Short-wave infrared and visible image fusion based on a dual-band polarization imaging sensor. In Proceedings of the 2024 International Conference on Optoelectronic Information and Optical Engineering (OIOE 2024); SPIE: Bellingham, WA, USA, 2024; Volume 13182, pp. 35–41. [Google Scholar]
- Huang, P.; Liu, X.; Zhao, S.; Ma, R.; Dong, H.; Wang, C.; Cao, H.; Shen, C. Shortwave infrared and visible light image fusion method based on dual discriminator GAN. Phys. Scr. 2024, 99, 036005. [Google Scholar] [CrossRef]
- Li, H.; Zhang, L.; Shen, H.; Li, P. A variational gradient-based fusion method for visible and SWIR imagery. Photogramm. Eng. Remote. Sens. 2012, 78, 947–958. [Google Scholar] [CrossRef]
- Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
- Grigorescu, S.; Trasnea, B.; Cocias, T.; Macesanu, G. A survey of deep learning techniques for autonomous driving. J. Field Robot. 2020, 37, 362–386. [Google Scholar]
- Michaelis, C.; Mitzkus, B.; Geirhos, R.; Rusak, E.; Bringmann, O.; Ecker, A.S.; Bethge, M.; Brendel, W. Benchmarking robustness in object detection: Autonomous driving when winter is coming. arXiv 2019, arXiv:1907.07484. [Google Scholar]
- Fredembach, C.; Süsstrunk, S. Colouring the near-infrared. In Proceedings of the Color and Imaging Conference; Society of Imaging Science and Technology: Springfield, VA, USA, 2008; Volume 16, pp. 176–182. [Google Scholar] [CrossRef]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
- Venkatanath, N.; Praneeth, D.; Bhatlapenumarti, M.C.; Sumohana, S.C.; Swarup, S.M. Blind image quality evaluation using perception based features. In Proceedings of the 2015 Twenty First National Conference On Communications (NCC); IEEE: New York, NY, USA, 2015; pp. 1–6. [Google Scholar]
- Mittal, A.; Moorthy, A.K.; Bovik, A.C. No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 2012, 21, 4695–4708. [Google Scholar] [CrossRef] [PubMed]
- Lai, W.S.; Huang, J.B.; Wang, O.; Shechtman, E.; Yumer, E.; Yang, M.H. Learning blind video temporal consistency. In Proceedings of the European Conference On Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 170–185. [Google Scholar]
- Lindenberger, P.; Sarlin, P.E.; Pollefeys, M. Lightglue: Local feature matching at light speed. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 17627–17638. [Google Scholar]
- SWIR Vision Systems. SWIR Vision Systems Acuros® vs Sony® IMX990: A Closer Look at Key Metrics and Performance Whitepaper; SWIR Vision Systems: Morrisville, NC, USA, 2022. [Google Scholar]
- Ouattara, H.; Duthon, P.; Salmane, P.H.; Bernardin, F.; Aider, O.A. Heuristic Style Transfer for Real-Time, Efficient Weather Attribute Detection. arXiv 2026, arXiv:2604.13947. [Google Scholar] [CrossRef]
- Loumouamou, Y.; Riffard, A.; Mehra, R.; Labussière, M. Étude expérimentale du fine-tuning de YOLOv8 pour la détection routière multimodale RGB-SWIR. In Proceedings of the Reconnaissance des Formes, Image, Apprentissage et Perception (RFIAP’26), Montpellier, France, 6–8 July 2026. [Google Scholar]








| Weather Condition | L | |||
|---|---|---|---|---|
| Search Space | [0.0, 1.0] | {1, …, 6} | [1.0, 5.0] | [0.01, 4.0] |
| Clear | 0.89 | 5 | 1.07 | 2.82 |
| Rain | 0.85 | 3 | 1.59 | 1.77 |
| Fog | 0.90 | 1 | 2.31 | 2.13 |
| Weather Conditions (Visibility (m) or Rainfall Rate (mm/h)) | NE ↑ | ||||||
|---|---|---|---|---|---|---|---|
| VIS | SWIR | TarDAL | Top-Hat | Pyramid Transform | V-SWIR-IF | VISWIR (Ours) | |
| Clear weather (30 km; 0 mm/h) | 0.610 | 0.602 | 0.606 | 0.607 | 0.611 | 0.619 | 0.638 |
| Light rain (51 mm/h) | 0.657 | 0.632 | 0.648 | 0.652 | 0.663 | 0.664 | 0.650 |
| Average rain (87 mm/h) | 0.629 | 0.640 | 0.647 | 0.628 | 0.646 | 0.639 | 0.639 |
| Heavy rain (174 mm/h) | 0.633 | 0.616 | 0.636 | 0.624 | 0.644 | 0.637 | 0.640 |
| Light fog (102 m) | 0.646 | 0.616 | 0.657 | 0.646 | 0.652 | 0.650 | 0.637 |
| Average fog (28 m) | 0.613 | 0.612 | 0.605 | 0.599 | 0.631 | 0.620 | 0.620 |
| Heavy fog (15 m) | 0.515 | 0.603 | 0.597 | 0.491 | 0.589 | 0.506 | 0.588 |
| All weather (mean ± std) | 0.615 ± 0.05 | 0.617 ± 0.01 | 0.628 ± 0.02 | 0.607 ± 0.05 | 0.634 ± 0.03 | 0.619 ± 0.05 | 0.630 ± 0.02 |
| Weather Conditions (Visibility (m) or Rainfall Rate (mm/h)) | BRISQUE ↓ | ||||||
|---|---|---|---|---|---|---|---|
| VIS | SWIR | TarDAL | Top-Hat | Pyramid Trans. | V-SWIR-IF | VISWIR (Ours) | |
| Clear weather (30 km; 0 mm/h) | 10.30 | 9.77 | 15.64 | 6.75 | 8.50 | 12.13 | 8.23 |
| Light rain (51 mm/h) | 12.55 | 8.27 | 9.04 | 10.24 | 9.18 | 8.34 | 9.93 |
| Average rain (87 mm/h) | 28.60 | 11.78 | 33.09 | 20.36 | 34.37 | 44.31 | 16.39 |
| Heavy rain (174 mm/h) | 9.66 | 9.12 | 15.03 | 10.82 | 21.27 | 34.93 | 10.31 |
| Light fog (102 m) | 12.53 | 10.44 | 8.14 | 8.16 | 13.68 | 23.20 | 8.29 |
| Average fog (28 m) | 23.06 | 9.84 | 38.44 | 19.39 | 38.90 | 55.98 | 12.20 |
| Heavy fog (15 m) | 47.17 | 10.91 | 51.35 | 43.27 | 72.23 | 91.92 | 17.02 |
| All weather (mean ± std) | 20.55 ± 13.73 | 10.02 ± 1.16 | 24.39 ± 16.65 | 17.00 ± 12.74 | 28.30 ± 22.73 | 38.69 ± 29.00 | 11.77 ± 3.63 |
| Weather Conditions (Visibility (m) or Rainfall Rate (mm/h)) | NIQE ↓ | ||||||
|---|---|---|---|---|---|---|---|
| VIS | SWIR | TarDAL | Top-Hat | Pyramid Transform | V-SWIR-IF | VISWIR (Ours) | |
| Clear weather (30 km; 0 mm/h) | 10.32 | 9.82 | 10.20 | 10.24 | 10.19 | 10.15 | 8.74 |
| Light rain (51 mm/h) | 9.76 | 9.72 | 9.92 | 9.82 | 10.04 | 10.53 | 9.53 |
| Average rain (87 mm/h) | 11.30 | 11.87 | 12.03 | 12.27 | 11.52 | 12.37 | 11.10 |
| Heavy rain (174 mm/h) | 9.78 | 9.86 | 11.50 | 11.68 | 10.75 | 11.53 | 9.29 |
| Light fog (102 m) | 9.40 | 8.60 | 10.65 | 9.48 | 9.33 | 9.40 | 8.75 |
| Average fog (28 m) | 10.94 | 9.28 | 11.81 | 11.54 | 9.93 | 10.94 | 9.04 |
| Heavy fog (15 m) | 12.56 | 10.58 | 12.41 | 12.76 | 11.07 | 12.79 | 11.88 |
| All weather (mean ± std) | 10.58 ± 1.11 | 9.96 ± 1.04 | 11.22 ± 0.96 | 11.11 ± 1.27 | 10.40 ± 0.75 | 11.10 ± 1.21 | 9.76 ± 1.23 |
| Weather Conditions (Visibility (m) or Rainfall Rate (mm/h)) | PIQE ↓ | ||||||
|---|---|---|---|---|---|---|---|
| VIS | SWIR | TarDAL | Top-Hat | Pyramid Transform | V-SWIR-IF | VISWIR (Ours) | |
| Clear weather (30 km; 0 mm/h) | 41.41 | 43.02 | 60.29 | 50.51 | 32.37 | 58.50 | 36.75 |
| Light rain (51 mm/h) | 48.44 | 39.55 | 56.05 | 46.26 | 36.00 | 61.85 | 42.16 |
| Average rain (87 mm/h) | 41.77 | 36.00 | 65.73 | 49.02 | 27.90 | 70.22 | 37.20 |
| Heavy rain (174 mm/h) | 49.46 | 36.02 | 56.87 | 43.11 | 28.98 | 68.14 | 39.21 |
| Light fog (102 m) | 45.07 | 46.22 | 57.16 | 41.85 | 32.83 | 60.46 | 41.96 |
| Average fog (28 m) | 32.38 | 36.81 | 57.10 | 27.28 | 24.56 | 51.76 | 36.27 |
| Heavy fog (15 m) | 36.25 | 19.00 | 65.13 | 44.28 | 17.23 | 61.83 | 18.76 |
| All weather (mean ± std) | 42.11 ± 6.22 | 36.66 ± 8.69 | 59.76 ± 4.10 | 43.19 ± 7.67 | 28.55 ± 6.23 | 61.82 ± 6.12 | 36.04 ± 7.99 |
| Modality | Entropy (↑) | NIQE (↓) | PIQE (↓) | BRISQUE (↓) |
|---|---|---|---|---|
| Visible Image | 0.61 | 8.98 | 38.59 | 45.21 |
| SWIR Image | 0.61 | 8.68 | 51.61 | 48.35 |
| VISWIR (Ours) | 0.64 | 8.93 | 35.59 | 38.65 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Riffard, A.; Labussière, M.; Duthon, P.; Aufrère, R. Enhancing Perception Through Context-Adaptive Visible and SWIR Image Fusion in Harsh Environments. Sensors 2026, 26, 4035. https://doi.org/10.3390/s26134035
Riffard A, Labussière M, Duthon P, Aufrère R. Enhancing Perception Through Context-Adaptive Visible and SWIR Image Fusion in Harsh Environments. Sensors. 2026; 26(13):4035. https://doi.org/10.3390/s26134035
Chicago/Turabian StyleRiffard, Alexandre, Mathieu Labussière, Pierre Duthon, and Romuald Aufrère. 2026. "Enhancing Perception Through Context-Adaptive Visible and SWIR Image Fusion in Harsh Environments" Sensors 26, no. 13: 4035. https://doi.org/10.3390/s26134035
APA StyleRiffard, A., Labussière, M., Duthon, P., & Aufrère, R. (2026). Enhancing Perception Through Context-Adaptive Visible and SWIR Image Fusion in Harsh Environments. Sensors, 26(13), 4035. https://doi.org/10.3390/s26134035

