Beyond RGB: Early Stage Fusion of Thermal and Visual Modalities for Robust Maritime Perception
Abstract
1. Introduction
- Comprehensive Benchmarking: We present a systematic evaluation of multimodal (RGBT) early fusion architectures for maritime semantic segmentation and object detection. The benchmark spans transformer-based, attention-based, and lightweight CNN models, all retrained on a rigorously aligned RGBT maritime dataset.
- Modular Fusion Architecture: We introduce the WNet architecture, a flexible fusion framework that enables interchangeable encoders, fusion modules, and decoders. This modular design supports systematic exploration of architecture trade-offs and achieves a superior performance-to-cost ratio, making it suitable for edge deployment.
- Performance Efficiency Analysis: We provide practical insights into fusion depth, model complexity, and real-time deployability through a detailed analysis of the inference speed and parameter count. This helps identify architectures that are both accurate and resource-efficient.
2. Materials and Methods
2.1. Dataset
2.2. Architectures and Fusion Strategies
- Single-modality baselines: RGB-only and thermal-only U-Net models are used to quantify the contribution of each modality individually. Additionally, U-Net variants with separable convolutions, such as those proposed in MobileNet [31], were benchmarked, as the ultimate goal is deployment on edge devices.
- Early fusion baseline: A four-channel U-Net (RGB + thermal concatenated at the input) served as the simplest early fusion strategy.
- Established RGBT double-encoder fusion networks: (1) RTFNet [14], one of the first architectures to introduce a multilevel fusion strategy for RGB and thermal images, performs simple addition-based one-way feature propagation between modalities. (2) GMNet [16] integrates both shallow and deep feature fusion, depending on the encoder depth. It also employs multiple training losses, including boundary loss, binary mask loss, and semantic mask loss.
- Attention-based fusion: SA-Gate [19], which has shown promising results in the RGB depth domain, was benchmarked here as well. It employs bidirectional feature propagation with attention gates to selectively combine features from both modalities.
- Proposed modular network: WNet corresponds to the general schematic shown in Figure 2. It was implemented as a double-encoder U-Net baseline, but it was designed modularly to allow configuration of the encoder (e.g., SegFormer or ResNet), fusion strategy, decoder structure, and depth. This modular design enables systematic experiments across numerous architecture variants, allowing for a balance between feature interaction, depth, and computational cost.
2.3. Training Set-Up
2.4. Evaluation Metrics
- Segmentation metrics: (1) The intersection over union (IoU) [35] measures the overlap between the predicted and ground truth masks. (2) The Dice similarity coefficient (equivalent to the F1 score) [36] emphasizes the balance between precision and recall. (3) The Matthews correlation coefficient [37] provides a balanced evaluation, even under class imbalance. The binarization threshold was selected for the validation set to maximize the mean IoU.
- Detection: Bounding boxes were derived from segmentation masks and evaluated using precision, recall, and the F1 score [38]. True positives, false positives, and false negatives were determined using an IoU threshold of 0.5 between the predicted and ground truth boxes [35]. This threshold aligns with widely adopted standards (e.g., PASCAL VOC, and COCO) and offers a practical balance between localization precision and recall. Higher thresholds would disproportionately penalize small object detections, which are critical in maritime scenarios. To further analyze robustness, the results were stratified by object size into four categories: small, medium, large, and valid. The valid category included all objects larger than 1.75 × 1.75 px, sharing its lower bound with the small category, while the thresholds for the medium and large categories were set at 8 × 8 px and 20 × 20 px. This stratification highlights performance differences across scales, which is particularly relevant for detecting small and distant objects, which are common in maritime environments.
- Efficiency: The inference time (milliseconds per frame) and model parameter count were measured to quantify the trade-off between accuracy and practical deployability on resource-constrained edge devices. These metrics provide insight into whether an architecture can support real-time operation onboard vessels without exceeding hardware limitations.
3. Results
3.1. Semantic Segmentation
3.2. Object Detection
3.3. Efficiency Analysis
- (a) Inference time vs. F1 detection score: This plot illustrates the balance between computational speed and detection accuracy. The transformer-based CMX-b2 achieved the highest F1 score but at the cost of significantly longer inference times, making it less suitable for real-time edge deployment. In contrast, lightweight architectures such as WNet-S and SA-Gate delivered competitive F1 scores while maintaining inference times below 3 ms per frame, highlighting their suitability for embedded systems.
- (b) Inference time vs. false positives: This subplot shows how efficiency correlates with detection reliability. The models with longer inference times (e.g., CMX-b2) generally exhibited fewer false positives, while faster models like WNet-S maintained a reasonable balance, suggesting that efficiency-focused designs do not necessarily compromise robustness.
- (c) Inference time vs. Matthews correlation coefficient: This metric provides a holistic view of the binary classification quality. CMX-b2 again led in overall performance, but WNet-S and SA-Gate achieved strong Matthews coefficients with minimal computational overhead, reinforcing their practicality for real-time maritime applications.
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- International Maritime Organization. Guidelines for the Onboard operational Use of Shipborne Automatic Identification Systems (AIS), 2nd ed.; Number A.1106(29) in IMO Resolutions; IMO: London, UK, 2014. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015, Proceedings, Part III 18; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Ma, J.; He, Y.; Li, F.; Han, L.; You, C.; Wang, B. Segment anything in medical images. Nat. Commun. 2024, 15, 654. [Google Scholar] [CrossRef]
- Brenner, M.; Reyes, N.H.; Susnjak, T.; Barczak, A.L. Rgb-d and thermal sensor fusion: A systematic literature review. IEEE Access 2023, 11, 82410–82442. [Google Scholar] [CrossRef]
- El Ahmar, W.; Massoud, Y.; Kolhatkar, D.; AlGhamdi, H.; Alja’Afreh, M.; Laganiere, R.; Hammoud, R. Enhanced Thermal-RGB Fusion for Robust Object Detection. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada, 17–24 June 2023; pp. 365–374. [Google Scholar] [CrossRef]
- Ha, Q.; Watanabe, K.; Karasawa, T.; Ushiku, Y.; Harada, T. MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; IEEE: New York, NY, USA, 2017; pp. 5108–5115. [Google Scholar]
- Yang, Z.; Li, Y.; Tang, X.; Xie, M. MGFusion: A multimodal large language model-guided information perception for infrared and visible image fusion. Front. Neurorobotics 2024, 18, 1521603. [Google Scholar] [CrossRef] [PubMed]
- Ying, X.; Xiao, C.; An, W.; Li, R.; He, X.; Li, B.; Cao, X.; Li, Z.; Wang, Y.; Hu, M.; et al. Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 6088–6096. [Google Scholar] [CrossRef]
- Zhou, J.; Liu, Y.; Peng, B.; Liu, L.; Li, X. MaDiNet: Mamba Diffusion Network for SAR Target Detection. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 10787–10800. [Google Scholar] [CrossRef]
- Guo, Y.; Liu, R.W.; Qu, J.; Lu, Y.; Zhu, F.; Lv, Y. Asynchronous trajectory matching-based multimodal maritime data fusion for vessel traffic surveillance in inland waterways. IEEE Trans. Intell. Transp. Syst. 2023, 24, 12779–12792. [Google Scholar] [CrossRef]
- Yao, S.; Guan, R.; Wu, Z.; Ni, Y.; Huang, Z.; Wen Liu, R.; Yue, Y.; Ding, W.; Gee Lim, E.; Seo, H.; et al. WaterScenes: A Multi-Task 4D Radar-Camera Fusion Dataset and Benchmarks for Autonomous Driving on Water Surfaces. IEEE Trans. Intell. Transp. Syst. 2024, 25, 16584–16598. [Google Scholar] [CrossRef]
- Sun, Y.; Zuo, W.; Liu, M. RTFNet: RGB-thermal fusion network for semantic segmentation of urban scenes. IEEE Robot. Autom. Lett. 2019, 4, 2576–2583. [Google Scholar] [CrossRef]
- Sun, Y.; Zuo, W.; Yun, P.; Wang, H.; Liu, M. FuseSeg: Semantic segmentation of urban scenes based on RGB and thermal data fusion. IEEE Trans. Autom. Sci. Eng. 2020, 18, 1000–1011. [Google Scholar] [CrossRef]
- Zhou, W.; Liu, J.; Lei, J.; Yu, L.; Hwang, J.N. GMNet: Graded-feature multilabel-learning network for RGB-thermal urban scene semantic segmentation. IEEE Trans. Image Process. 2021, 30, 7790–7802. [Google Scholar] [CrossRef]
- Hazirbas, C.; Ma, L.; Domokos, C.; Cremers, D. FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture. In Proceedings of the Asian Conference on Computer Vision (ACCV), Taipei, Taiwan, 20–24 November 2016; pp. 213–228. [Google Scholar]
- Hu, X.; Yang, K.; Fei, L.; Wang, K. Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; IEEE: New York, NY, USA, 2019; pp. 1440–1444. [Google Scholar]
- Chen, X.; Lin, K.Y.; Wang, J.; Wu, W.; Qian, C.; Li, H.; Zeng, G. Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation. In Proceedings of the European Conference on Computer Vision, Virtual, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 561–577. [Google Scholar]
- Zhang, J.; Liu, H.; Yang, K.; Hu, X.; Liu, R.; Stiefelhagen, R. CMX: Cross-modal fusion for RGB-X semantic segmentation with transformers. IEEE Trans. Intell. Transp. Syst. 2023, 24, 14679–14694. [Google Scholar] [CrossRef]
- Zhou, W.; Zhang, H.; Yan, W.; Lin, W. MMSMCNet: Modal memory sharing and morphological complementary networks for RGB-T urban scene semantic segmentation. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 7096–7108. [Google Scholar] [CrossRef]
- Shivakumar, S.S.; Rodrigues, N.; Zhou, A.; Miller, I.D.; Kumar, V.; Taylor, C.J. Pst900: Rgb-thermal calibration, dataset and segmentation network. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Virtual, 31 May–31 August 2020; IEEE: New York, NY, USA, 2020; pp. 9441–9447. [Google Scholar]
- Kafka, O.; Kaufmann, J.; Rankl, C. SEANet: RGB and Thermal Maritime Panoptic Dataset and Intermodal Alignment Procedure. In Proceedings of the 2025 IEEE 9th Forum on Research and Technologies for Society and Industry (RTSI), Tunis, Tunisia, 24–26 August 2025; pp. 202–207. [Google Scholar] [CrossRef]
- Santos, C.E.; Bhanu, B. DyFusion: Dynamic IR/RGB Fusion for Maritime Vessel Recognition. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 1328–1332. [Google Scholar] [CrossRef]
- Baltrušaitis, T.; Ahuja, C.; Morency, L.P. Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 423–443. [Google Scholar] [CrossRef] [PubMed]
- Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S.L. Joint 3D Proposal Generation and Object Detection from View Aggregation. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; IEEE: New York, NY, USA, 2018; pp. 1–8. [Google Scholar] [CrossRef]
- Liang, M.; Yang, B.; Chen, Y.; Hu, R.; Urtasun, R. Multi-Task Multi-Sensor Fusion for 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7345–7353. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27 June–1 July 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar] [CrossRef]
- Lv, Y.; Liu, Z.; Li, G. Context-Aware Interaction Network for RGB-T Semantic Segmentation. IEEE Trans. Multimed. 2024, 26, 6348–6360. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and flexible image augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef]
- Berman, M.; Triki, A.R.; Blaschko, M.B. The lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4413–4421. [Google Scholar]
- Yu, J.; Blaschko, M. Learning submodular losses with the Lovász hinge. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 1623–1631. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Dice, L.R. Measures of the Amount of Ecologic Association Between Species. Ecology 1945, 26, 297–302. [Google Scholar] [CrossRef]
- Matthews, B.W. Comparison of the Predicted and Observed Secondary Structure of T4 Phage Lysozyme. Biochim. Biophys. Acta (BBA) Protein Struct. 1975, 405, 442–451. [Google Scholar] [CrossRef]
- Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27 June–1 July 2016; pp. 779–788. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014, Proceedings, Part V 13; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]




| Property | Count or Range | Description |
|---|---|---|
| Total image pairs | ∼8000 | 6000 training, 2000 validation |
| Image resolution | 640 × 512 px | Pixel-aligned RGB and thermal pairs |
| Geographic regions | 20+ | Throughout Atlantic ocean |
| Illumination conditions | 2 | Daytime or nighttime (80:20) percentage |
| Weather conditions | 4 | Clear, foggy, rainy, overcast |
| Object categories | 18 | Vessels, buoys, containers, wooden logs, etc. |
| Sensor platform | SEA.AI embedded system | RGB + LWIR cameras |
| Model | Time | Size | Thr | IoU | Dice | Matthews |
|---|---|---|---|---|---|---|
| RTFNet-R18 | 4.13 | 31.00 | 0.28 | 31.74 | 39.76 | 41.18 |
| GMNet-R18 | 6.70 | 31.98 | 0.25 | 33.17 | 41.11 | 42.39 |
| GMNet-R18-1L * | 6.66 | 31.87 | 0.28 | 32.76 | 40.57 | 41.81 |
| GMNet-R34 | 8.14 | 52.20 | 0.22 | 32.04 | 39.88 | 41.16 |
| SAGate-R18 | 4.28 | 9.82 | 0.35 | 32.93 | 40.80 | 42.11 |
| SAGate-R18-1L | 4.38 | 9.67 | 0.30 | 32.99 | 40.95 | 42.33 |
| SAGate-R34 | 5.71 | 14.99 | 0.32 | 34.46 | 42.46 | 43.74 |
| WNet | 2.94 | 5.17 | 0.32 | 32.69 | 40.57 | 41.97 |
| WNet-S ** | 2.52 | 0.62 | 0.28 | 33.13 | 41.12 | 42.46 |
| WNet-S-Deeper | 3.67 | 2.25 | 0.28 | 32.82 | 40.89 | 42.29 |
| WNet-MLP | 2.24 | 0.88 | 0.32 | 32.96 | 41.07 | 42.41 |
| WNet-CMX | 5.06 | 4.95 | 0.25 | 32.41 | 40.51 | 41.97 |
| WNet-FFM | 5.18 | 4.51 | 0.25 | 32.67 | 40.46 | 41.76 |
| CMX-b0 | 6.98 | 12.11 | 0.30 | 33.99 | 41.84 | 43.05 |
| CMX-b2 | 11.68 | 66.56 | 0.38 | 36.67 | 44.61 | 45.76 |
| UNetS-S-4i | 1.99 | 0.35 | 0.28 | 31.99 | 39.89 | 41.29 |
| UNet-S-4i | 2.40 | 0.29 | 0.25 | 31.68 | 39.68 | 41.19 |
| UNetS-S-THR | 2.02 | 0.35 | 0.22 | 30.67 | 38.36 | 39.87 |
| UNet-S-THR | 2.31 | 0.29 | 0.22 | 30.40 | 38.20 | 39.58 |
| UNetS-S-RGB | 2.04 | 0.35 | 0.15 | 16.52 | 23.00 | 24.57 |
| UNet-S-RGB | 2.28 | 0.29 | 0.10 | 16.17 | 22.57 | 24.28 |
| Model | Valid | Small | Large | ||||||
|---|---|---|---|---|---|---|---|---|---|
| prec | rec | f1 | prec | rec | f1 | prec | rec | f1 | |
| RTFNet-R18 | 45.41 | 42.88 | 44.11 | 28.62 | 23.54 | 25.83 | 78.36 | 76.06 | 77.19 |
| GMNet-R18 | 46.47 | 44.77 | 45.60 | 28.77 | 24.56 | 26.50 | 76.07 | 75.57 | 75.82 |
| GMNet-R18-1L | 49.80 | 45.19 | 47.39 | 31.64 | 24.02 | 27.31 | 76.82 | 75.57 | 76.19 |
| GMNet-R34 | 46.87 | 43.63 | 45.19 | 28.86 | 23.18 | 25.71 | 78.31 | 74.10 | 76.15 |
| SAGate-R18 | 50.56 | 47.44 | 48.95 | 35.06 | 29.32 | 31.93 | 69.67 | 76.71 | 73.02 |
| SAGate-R18-1L | 49.18 | 47.70 | 48.43 | 34.51 | 30.52 | 32.40 | 74.38 | 78.01 | 76.15 |
| SAGate-R34 | 50.82 | 50.70 | 50.76 | 34.89 | 31.91 | 33.33 | 72.96 | 83.06 | 77.68 |
| WNet-S | 50.66 | 51.61 | 51.13 | 37.27 | 36.48 | 36.87 | 70.02 | 71.50 | 70.75 |
| WNet | 49.82 | 49.95 | 49.89 | 38.39 | 36.85 | 37.60 | 62.95 | 68.89 | 65.79 |
| WNet-S-Deeper | 48.28 | 51.58 | 49.87 | 35.32 | 38.11 | 36.66 | 69.42 | 72.48 | 70.92 |
| WNet-MLP | 44.39 | 50.64 | 47.31 | 32.96 | 37.09 | 34.90 | 61.24 | 72.31 | 66.32 |
| WNet-CMX | 48.86 | 50.11 | 49.48 | 35.98 | 36.30 | 36.14 | 67.37 | 73.29 | 70.20 |
| WNet-FFM | 51.65 | 51.12 | 51.38 | 37.63 | 35.82 | 36.71 | 69.27 | 73.78 | 71.45 |
| CMX-b0 | 53.47 | 50.67 | 52.03 | 37.70 | 32.39 | 34.84 | 76.12 | 80.46 | 78.23 |
| CMX-b2 | 56.06 | 54.22 | 55.13 | 40.01 | 35.34 | 37.53 | 81.48 | 82.41 | 81.94 |
| UNetS-S-4i | 48.38 | 50.64 | 49.48 | 35.11 | 36.54 | 35.81 | 68.95 | 70.52 | 69.73 |
| UNet-S-4i | 48.19 | 49.79 | 48.97 | 36.42 | 35.76 | 36.09 | 61.73 | 68.57 | 64.97 |
| UNetS-S-THR | 47.84 | 48.26 | 48.05 | 35.99 | 35.34 | 35.66 | 66.49 | 61.40 | 63.84 |
| UNet-S-THR | 45.91 | 48.06 | 46.96 | 34.35 | 36.54 | 35.41 | 61.97 | 62.38 | 62.18 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kafka, O.; Rankl, C.; Moser, D. Beyond RGB: Early Stage Fusion of Thermal and Visual Modalities for Robust Maritime Perception. Electronics 2025, 14, 4746. https://doi.org/10.3390/electronics14234746
Kafka O, Rankl C, Moser D. Beyond RGB: Early Stage Fusion of Thermal and Visual Modalities for Robust Maritime Perception. Electronics. 2025; 14(23):4746. https://doi.org/10.3390/electronics14234746
Chicago/Turabian StyleKafka, Ondrej, Christian Rankl, and David Moser. 2025. "Beyond RGB: Early Stage Fusion of Thermal and Visual Modalities for Robust Maritime Perception" Electronics 14, no. 23: 4746. https://doi.org/10.3390/electronics14234746
APA StyleKafka, O., Rankl, C., & Moser, D. (2025). Beyond RGB: Early Stage Fusion of Thermal and Visual Modalities for Robust Maritime Perception. Electronics, 14(23), 4746. https://doi.org/10.3390/electronics14234746

