WEMFusion: Wavelet-Driven Hybrid-Modality Enhancement and Discrepancy-Aware Mamba for Optical–SAR Image Fusion
Highlights
- A unified WEMFusion framework is proposed. It constructs an interpretable band-wise prior via multi-scale wavelet transforms and introduces a hybrid-modality enhancement (HME) module: the high-frequency sub-module HRB focuses on injecting trustworthy optical edges and directional textures, while the low-frequency sub-module LF-CSCM performs cross-modality structural alignment, mitigating—from the fusion source—the entanglement and erroneous injection between genuine optical edges and SAR speckle.
- A discrepancy-aware deep interaction mechanism is developed. DAG-MF explicitly injects cross-modality discrepancy and complementarity into the two-dimensional directionally scanned Mamba interaction process, making long-range modeling more selective in conflicting regions and more stable in consistent regions. Together with adaptive weighted aggregation, it dynamically balances prior-enhanced representations and deep fused representations, further improving structural consistency and detail stability.
- The framework is more favorable for downstream interpretation. By suppressing noise mis-diffusion at the band level and constraining structural drift and directional breaks during long-range propagation, the fusion results become more reliable in terms of structural integrity, boundary stability, and perceptual consistency, thereby providing more stable support for downstream tasks such as object detection and semantic segmentation.
- The framework is generalizable and deployable. The paradigm of frequency-domain decoupled priors coupled with discrepancy-driven selective interaction is broadly applicable and can be transferred to other cross-modality remote-sensing fusion tasks. Moreover, leveraging efficient state-space modeling and a lightweight multi-scale design, it better meets practical deployment requirements while maintaining strong performance.
Abstract
1. Introduction
- A wavelet-driven unified fusion framework: We employ a multi-scale with recursive decomposition to construct a frequency-domain prior, providing an interpretable input space for cross-modality alignment, complementary aggregation, and noise suppression. The recursive interactions are further propagated to lower-resolution scales, which substantially reduces computational and memory overhead.
- A hybrid-modality enhancement module (HME): To address the high-frequency discrepancy between genuine optical edges and SAR speckle noise, the high-frequency branch strengthens optical directional textures, while the low-frequency branch models structurally consistent alignment. Together with a lightweight Adaptive Attentive Aggregation for Fusion (A3F), HME enables robust complementary fusion.
- A discrepancy-aware gated Mamba fusion block (DAG-MF): We generate dynamic gates from modality differences and complementary responses to modulate the parameters of a two-dimensional directionally scanned state-space model, so that long-range modeling focuses on discrepant regions while preserving directional coherence.
2. Related Work
2.1. Multimodal Image Fusion
2.2. Wavelet Transform and Multi-Scale Decomposition
2.3. State-Space Models and Mamba
3. Methods
3.1. Discrete Wavelet Transform () and Inverse Discrete Wavelet Transform ()
3.2. Overall Architecture
3.3. Feature Extraction
3.4. Feature Fusion
3.4.1. Hybrid-Modality Enhancement (HME)
3.4.2. Discrepancy-Aware Gated Mamba Fusion (DAG-MF)
3.4.3. Adaptive Attentional Feature Fusion (A3F)
3.5. Loss Function
4. Experimental
4.1. Experimental Setup
4.2. Comparison Experiments
4.3. Ablation Experiments
4.3.1. Ablation Settings
4.3.2. Overall Result Analysis
4.3.3. Contribution Analysis of the HME Branches
4.3.4. Verification of the Mamba Feature Extraction Strategy
4.3.5. Verification of the Fusion Interaction Mechanism
4.3.6. Role of the Adversarial Consistency Constraint
4.4. Comparison Experiments on Downstream Segmentation Tasks
4.5. Efficiency Comparison Experiments
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Liu, C.; Sun, Y.; Xu, Y.; Sun, Z.; Zhang, X.; Lei, L.; Kuang, G. A review of optical and SAR image deep feature fusion in semantic segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 12910–12930. [Google Scholar] [CrossRef]
- Yuan, Y.; Deng, T.; Le, Y.; Bai, H.; Guo, S.; Sun, S.; Chen, Y. SAR and Visible Image Fusion via Retinex-Guided SAR Reconstruction. Remote Sens. 2026, 18, 111. [Google Scholar] [CrossRef]
- Han, S.; Wang, J.; Zhang, S. Former-CR: A Transformer-Based Thick Cloud Removal Method with Optical and SAR Imagery. Remote Sens. 2023, 15, 1196. [Google Scholar] [CrossRef]
- Zhao, B.; Huang, L.; Jin, B. Strategy for SAR Imaging Quality Improvement with Low-Precision Sampled Data. IEEE Trans. Geosci. Remote Sens. 2021, 59, 3150–3160. [Google Scholar] [CrossRef]
- Si, C.; Zhao, B.; Huang, L.; Liu, S. A Convolutional De-Quantization Network for Harmonics Suppression in One-Bit SAR Imaging. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5221316. [Google Scholar] [CrossRef]
- Zhao, B.; Huang, L.; Li, J.; Zhang, P. Target Reconstruction from Deceptively Jammed Single-Channel SAR. IEEE Trans. Geosci. Remote Sens. 2018, 56, 152–167. [Google Scholar] [CrossRef]
- Li, J.; Li, C.; Tan, X.; You, D.; Duan, C.; Zhang, S.; Dang, H.; Li, G.; Zhang, Q. A Review of Recent Development of Geosynchronous Synthetic Aperture Radar Technique. Remote Sens. 2025, 17, 3405. [Google Scholar] [CrossRef]
- Sun, Z.; Zhi, S.; Li, R.; Xia, J.; Liu, Y.; Jiang, W. GDROS: A Geometry-Guided Dense Registration Framework for Optical–SAR Images Under Large Geometric Transformations. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5650315. [Google Scholar] [CrossRef]
- Zhang, R.; Yang, Y.; Li, Z.; Li, P.; Wang, H. Optical and SAR Image Fusion: A Review of Theories, Methods, and Applications. Remote Sens. 2026, 18, 73. [Google Scholar] [CrossRef]
- Gao, G.; Wang, M.; Zhang, X.; Li, G. DEN: A New Method for SAR and Optical Image Fusion and Intelligent Classification. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5201118. [Google Scholar] [CrossRef]
- Zhou, L.; Duan, K.; Dai, J.; Ye, Y. Advancing perturbation space expansion based on information fusion for semi-supervised remote sensing image semantic segmentation. Inf. Fusion 2025, 117, 102830. [Google Scholar] [CrossRef]
- Wang, J.; Ma, L.; Zhao, B.; Gou, Z.; Yin, Y.; Sun, G. MRLF: Multi-Resolution Layered Fusion Network for Optical and SAR Images. Remote Sens. 2025, 17, 3740. [Google Scholar] [CrossRef]
- Burt, P.J.; Adelson, E.H. The Laplacian pyramid as a compact image code. In Readings in Computer Vision; Morgan Kaufmann: San Francisco, CA, USA, 1987; pp. 671–679. [Google Scholar]
- Pajares, G.; De La Cruz, J.M. A wavelet-based image fusion tutorial. Pattern Recognit. 2004, 37, 1855–1872. [Google Scholar] [CrossRef]
- Olshausen, B.; Field, D. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 1996, 381, 607–609. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.-J. DenseFuse: A Fusion Approach to Infrared and Visible Images. IEEE Trans. Image Process. 2019, 28, 2614–2623. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Wu, X.-J.; Durrani, T. NestFuse: An Infrared and Visible Image Fusion Architecture Based on Nest Connection and Spatial/Channel Attention Models. IEEE Trans. Instrum. Meas. 2020, 69, 9645–9656. [Google Scholar] [CrossRef]
- Ma, J.; Xu, H.; Jiang, J.; Mei, X.; Zhang, X.-P. DDcGAN: A Dual-Discriminator Conditional Generative Adversarial Network for Multi-Resolution Image Fusion. IEEE Trans. Image Process. 2020, 29, 4980–4995. [Google Scholar] [CrossRef]
- Zhao, Z.; Bai, H.; Zhu, Y.; Zhang, J.; Xu, S.; Zhang, Y.; Zhang, K.; Meng, D.; Timofte, R.; Van Gool, L. DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 8082–8093. [Google Scholar]
- Ma, J.; Tang, L.; Fan, F.; Huang, J.; Mei, X.; Ma, Y. SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA J. Autom. Sin. 2022, 9, 1200–1217. [Google Scholar] [CrossRef]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. In Proceedings of the First Conference on Language Modeling, Philadelphia, PA, USA, 7–9 October 2024. [Google Scholar]
- Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Jiao, J.; Liu, Y. Vmamba: Visual state space model. Adv. Neural Inf. Process. Syst. 2024, 37, 103031–103063. [Google Scholar]
- Li, Y.; Liang, X.; Liang, J.; Chen, J. Image-Domain Signal Modeling and Refocusing of Air Moving Targets for MEO Multichannel SAR. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5224514. [Google Scholar] [CrossRef]
- Ye, Y.; Zhang, J.; Zhou, L.; Li, J.; Ren, X.; Fan, J. Optical and SAR Image Fusion Based on Complementary Feature Decomposition and Visual Saliency Features. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5205315. [Google Scholar] [CrossRef]
- Li, Y.; Cao, C.; Li, H. Analysis of Image Domain Characteristics of Maritime Rotating Ships for Spaceborne Multichannel SAR. Remote Sens. 2026, 18, 41. [Google Scholar] [CrossRef]
- Sommervold, O.; Gazzea, M.; Arghandeh, R. A Survey on SAR and Optical Satellite Image Registration. Remote Sens. 2023, 15, 850. [Google Scholar] [CrossRef]
- Sun, N.; Feng, Y.; Tong, X.; Lei, Z.; Chen, S.; Wang, C.; Xu, X.; Jin, Y. Road Network Extraction from SAR Images with the Support of Angular Texture Signature and POIs. Remote Sens. 2022, 14, 4832. [Google Scholar] [CrossRef]
- Yang, X.; Huo, H.; Li, C.; Liu, X.; Wang, W.; Wang, C. Semantic perceptive infrared and visible image fusion transformer. Pattern Recognit. 2024, 149, 110223. [Google Scholar] [CrossRef]
- Zhang, Y.; Liu, Y.; Sun, P.; Yan, H.; Zhao, X.; Zhang, L. IFCNN: A general image fusion framework based on convolutional neural network. Inf. Fusion 2020, 54, 99–118. [Google Scholar] [CrossRef]
- Chen, T.; Wang, C.; Zhang, Y.; Xia, K.; Qian, P. MFS-Fusion: Mamba-integrated deep multi-modal image fusion framework with multi-scale fourier enhancement and spatial calibration. Expert Syst. Appl. 2025, 299, 130054. [Google Scholar] [CrossRef]
- Ma, J.; Yu, W.; Liang, P.; Li, C.; Jiang, J. FusionGAN: A generative adversarial network for infrared and visible image fusion. Inf. Fusion 2019, 48, 11–26. [Google Scholar] [CrossRef]
- Cao, H.; Tan, C.; Gao, Z.; Xu, Y.; Chen, G.; Heng, P.A.; Li, S.Z. A survey on generative diffusion models. IEEE Trans. Knowl. Data Eng. 2024, 36, 2814–2830. [Google Scholar] [CrossRef]
- Zhao, Z.; Bai, H.; Zhang, J.; Zhang, Y.; Xu, S.; Lin, Z.; Timofte, R.; Van Gool, L. Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 5906–5916. [Google Scholar]
- Mallat, S.G. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 11, 674–693. [Google Scholar] [CrossRef]
- Guo, T.; Seyed Mousavi, H.; Huu Vu, T.; Monga, V. Deep wavelet prediction for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 104–113. [Google Scholar]
- Liu, R.; Liu, Y.; Wang, H.; Du, S. WaveFusionNet: Infrared and visible image fusion based on multi-scale feature encoder–decoder and discrete wavelet decomposition. Opt. Commun. 2024, 573, 131024. [Google Scholar] [CrossRef]
- Gu, A.; Goel, K.; Ré, C. Efficiently modeling long sequences with structured state spaces. arXiv 2021, arXiv:2111.00396. [Google Scholar]
- Zhu, L.; Liao, B.; Zhang, Q.; Wang, X.; Liu, W.; Wang, X. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv 2024, arXiv:2401.09417. [Google Scholar] [CrossRef]
- Zhang, Q.; Zhang, X.; Quan, C.; Zhao, T.; Huo, W.; Huang, Y. Mamba-STFM: A Mamba-Based Spatiotemporal Fusion Method for Remote Sensing Images. Remote Sens. 2025, 17, 2135. [Google Scholar] [CrossRef]
- Chen, H.; Song, J.; Han, C.; Xia, J.; Yokoya, N. ChangeMamba: Remote Sensing Change Detection with Spatiotemporal State Space Model. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4409720. [Google Scholar] [CrossRef]
- Tao, J.; Qiao, Q.; Song, J.; Sun, S.; Chen, Y.; Wu, Q.; Liu, Y.; Xue, F.; Wu, H.; Zhao, F. Deep Learning-Driven Automatic Segmentation of Weeds and Crops in UAV Imagery. Sensors 2025, 25, 6576. [Google Scholar] [CrossRef]
- Wang, Z.; Zhang, Z.; Qi, W.; Yang, F.; Xu, J. FreqGAN: Infrared and Visible Image Fusion via Unified Frequency Adversarial Learning. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 728–740. [Google Scholar] [CrossRef]
- Zhou, J.; Senhadji, L.; Coatrieux, J.-L.; Luo, L. Iterative PET Image Reconstruction Using Translation Invariant Wavelet Transform. IEEE Trans. Nucl. Sci. 2009, 56, 116–128. [Google Scholar] [CrossRef][Green Version]
- Zhou, M.; Zheng, N.; He, X.; Hong, D.; Chanussot, J. Probing Synergistic High-Order Interaction for Multi-Modal Image Fusion. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 840–857. [Google Scholar] [CrossRef]
- Li, Z.; Pan, H.; Zhang, K.; Wang, Y.; Yu, F. Mambadfuse: A mamba-based dual-phase model for multi-modality image fusion. arXiv 2024, arXiv:2404.08406. [Google Scholar]
- Chen, L.; Lu, X.; Zhang, J.; Chu, X.; Chen, C. Hinet: Half instance normalization network for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Nashville, TN, USA, 19–25 June 2021; pp. 182–192. [Google Scholar]
- Ren, B.; Ma, S.; Hou, B.; Hong, D.; Chanussot, J.; Wang, J.; Jiao, L. A dual-stream high resolution network: Deep fusion of GF-2 and GF-3 data for land cover classification. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102896. [Google Scholar] [CrossRef]
- Li, X.; Zhang, G.; Cui, H.; Hou, S.; Wang, S.; Li, X.; Chen, Y.; Li, Z.; Zhang, L. MCANet: A joint semantic segmentation framework of optical and SAR images for land use classification. Int. J. Appl. Earth Obs. Geoinf. 2022, 106, 102638. [Google Scholar] [CrossRef]
- Xu, H.; Ma, J.; Jiang, J.; Guo, X.; Ling, H. U2Fusion: A Unified Unsupervised Image Fusion Network. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 502–518. [Google Scholar] [CrossRef]
- Lei, J.; Li, J.; Liu, J.; Wang, B.; Zhou, S.; Zhang, Q.; Wei, X.; Kasabov, N.K. MLFuse: Multi-Scenario Feature Joint Learning for Multi-Modality Image Fusion. IEEE Trans. Multimed. 2025, 27, 3880–3894. [Google Scholar] [CrossRef]
- Tang, L.; Zhang, H.; Xu, H.; Ma, J. Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity. Inf. Fusion 2023, 99, 101870. [Google Scholar] [CrossRef]
- Tang, L.; Yuan, J.; Ma, J. Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Inf. Fusion 2022, 82, 28–42. [Google Scholar] [CrossRef]
- Hu, X.; Jiang, J.; Wang, C.; Jiang, K.; Liu, X.; Ma, J. Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion. arXiv 2025, arXiv:2504.05164. [Google Scholar]
- Sun, H.; Lv, L.; Zhang, P.; Tang, T.; Tian, F.; Sun, W.; Lu, H. Spatial-Frequency Enhanced Mamba for Multi-Modal Image Fusion. IEEE Trans. Image Process. 2025, 34, 7684–7696. [Google Scholar] [CrossRef] [PubMed]
- Roberts, J.W.; Van Aardt, J.A.; Ahmed, F.B. Assessment of image fusion procedures using entropy, image quality, and multispectral classification. J. Appl. Remote Sens. 2008, 2, 023522. [Google Scholar]
- Han, Y.; Cai, Y.; Cao, Y.; Xu, X. A new image fusion performance metric based on visual information fidelity. Inf. Fusion 2013, 14, 127–135. [Google Scholar] [CrossRef]
- Ma, J.; Ma, Y.; Li, C. Infrared and visible image fusion methods and applications: A survey. Inf. Fusion 2019, 45, 153–178. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]










| EN | SD | VIFF | AG | FSIM | PSNR | ||
|---|---|---|---|---|---|---|---|
| U2Fusion | 6.57 | 34.55 | 0.52 | 0.48 | 9.59 | 0.56 | 19.30 |
| VSFF | 6.88 | 34.17 | 0.28 | 0.52 | 10.38 | 0.48 | 16.32 |
| SwinFusion | 6.89 | 40.27 | 0.71 | 0.87 | 9.74 | 0.61 | 23.11 |
| MLFuse | 6.55 | 31.15 | 0.51 | 0.52 | 7.53 | 0.57 | 20.12 |
| PSFusion | 6.89 | 38.58 | 0.65 | 0.61 | 9.42 | 0.59 | 19.48 |
| SeAFusion | 6.86 | 39.50 | 0.66 | 0.70 | 9.95 | 0.58 | 20.44 |
| TITA | 6.81 | 38.54 | 0.69 | 0.81 | 9.42 | 0.60 | 21.24 |
| MambaDFuse | 6.88 | 40.30 | 0.70 | 0.86 | 9.74 | 0.61 | 23.14 |
| SFMFusion | 6.89 | 40.98 | 0.68 | 0.71 | 10.31 | 0.59 | 18.32 |
| Ours | 6.90 | 40.50 | 0.71 | 0.88 | 9.70 | 0.62 | 23.40 |
| EN | SD | VIFF | AG | FSIM | PSNR | ||
|---|---|---|---|---|---|---|---|
| U2Fusion | 6.81 | 38.99 | 0.59 | 0.44 | 15.01 | 0.54 | 18.96 |
| VSFF | 6.73 | 30.89 | 0.17 | 0.44 | 11.74 | 0.44 | 16.69 |
| SwinFusion | 6.90 | 39.94 | 0.71 | 0.59 | 15.96 | 0.57 | 18.12 |
| MLFuse | 6.47 | 31.19 | 0.51 | 0.42 | 11.87 | 0.53 | 18.10 |
| PSFusion | 6.95 | 38.46 | 0.64 | 0.43 | 13.81 | 0.55 | 17.76 |
| SeAFusion | 6.84 | 37.74 | 0.61 | 0.36 | 15.42 | 0.53 | 17.86 |
| TITA | 6.94 | 40.26 | 0.73 | 0.59 | 16.59 | 0.57 | 17.51 |
| MambaDFuse | 6.81 | 38.71 | 0.65 | 0.51 | 15.07 | 0.55 | 17.42 |
| SFMFusion | 6.84 | 39.78 | 0.69 | 0.49 | 16.45 | 0.56 | 14.97 |
| Ours | 6.95 | 40.18 | 0.73 | 0.55 | 16.03 | 0.57 | 17.84 |
| EN | SD | VIFF | AG | FSIM | PSNR | SSIM | ||
|---|---|---|---|---|---|---|---|---|
| (1) w/o HME | 6.901 | 40.46 | 0.70 | 0.86 | 9.69 | 0.61 | 23.26 | 0.583 |
| (2) w/o HRB | 6.894 | 40.29 | 0.71 | 0.87 | 9.68 | 0.62 | 22.94 | 0.592 |
| (3) w/o LF-CSCM | 6.907 | 40.50 | 0.70 | 0.87 | 9.68 | 0.61 | 23.29 | 0.587 |
| (4) w/o Mamba Extractor | 6.893 | 40.24 | 0.69 | 0.82 | 9.67 | 0.61 | 22.70 | 0.582 |
| (5) Shared Extractor | 6.904 | 40.46 | 0.70 | 0.87 | 9.72 | 0.61 | 23.31 | 0.590 |
| (6) Mamba Fusion | 6.901 | 40.39 | 0.71 | 0.87 | 9.71 | 0.61 | 23.26 | 0.591 |
| (7) w/o DAG-MF | 6.903 | 40.45 | 0.70 | 0.87 | 9.68 | 0.61 | 23.42 | 0.577 |
| (8) w/o GAN | 6.893 | 40.18 | 0.70 | 0.84 | 9.73 | 0.61 | 22.71 | 0.594 |
| Full | 6.905 | 40.50 | 0.71 | 0.88 | 9.70 | 0.62 | 23.40 | 0.593 |
| OA | AA | Kappa | Farmland | City | Water | Forest | Others | |
|---|---|---|---|---|---|---|---|---|
| U2Fusion | 81.58 | 70.87 | 71.05 | 81.92 | 76.57 | 60.44 | 91.19 | 44.22 |
| VSFF | 82.73 | 74.33 | 72.96 | 80.90 | 81.65 | 62.11 | 92.22 | 54.75 |
| MLFuse | 82.12 | 71.74 | 71.77 | 81.98 | 77.70 | 61.70 | 91.67 | 45.67 |
| SeAFusion | 82.03 | 71.32 | 71.72 | 81.77 | 75.73 | 60.77 | 91.93 | 46.42 |
| PSFusion | 81.73 | 70.69 | 71.16 | 82.04 | 77.93 | 56.36 | 92.01 | 45.10 |
| TITA | 81.22 | 69.62 | 70.42 | 82.82 | 75.39 | 59.03 | 90.71 | 40.14 |
| MambaDfuse | 83.11 | 74.50 | 73.61 | 82.09 | 78.69 | 65.06 | 91.75 | 54.92 |
| SFMFusion | 83.47 | 74.61 | 74.14 | 83.52 | 78.79 | 64.79 | 91.59 | 54.35 |
| Ours | 83.40 | 74.51 | 74.02 | 82.05 | 80.21 | 66.93 | 92.56 | 50.78 |
| Params (M) | Runtime (ms) | Peak Memory (MB) | |
|---|---|---|---|
| MLFuse | 0.1118 | 4.941 | 179.59 |
| SeAFusion | 0.1669 | 1.187 | 237.78 |
| SwinFusion | 2.4154 | 431.204 | 824.42 |
| MambaDfuse | 1.3481 | 175.842 | 980.31 |
| TITA | 1.9235 | 446.536 | 753.15 |
| SFMFusion | 8.1375 | 1129.842 | 1360.82 |
| Ours | 1.5677 | 104.181 | 264.75 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, J.; Zhao, Y.; Ma, L.; Zhao, B.; Song, F.; Cai, Z. WEMFusion: Wavelet-Driven Hybrid-Modality Enhancement and Discrepancy-Aware Mamba for Optical–SAR Image Fusion. Remote Sens. 2026, 18, 612. https://doi.org/10.3390/rs18040612
Wang J, Zhao Y, Ma L, Zhao B, Song F, Cai Z. WEMFusion: Wavelet-Driven Hybrid-Modality Enhancement and Discrepancy-Aware Mamba for Optical–SAR Image Fusion. Remote Sensing. 2026; 18(4):612. https://doi.org/10.3390/rs18040612
Chicago/Turabian StyleWang, Jinwei, Yongjin Zhao, Liang Ma, Bo Zhao, Fujun Song, and Zhuoran Cai. 2026. "WEMFusion: Wavelet-Driven Hybrid-Modality Enhancement and Discrepancy-Aware Mamba for Optical–SAR Image Fusion" Remote Sensing 18, no. 4: 612. https://doi.org/10.3390/rs18040612
APA StyleWang, J., Zhao, Y., Ma, L., Zhao, B., Song, F., & Cai, Z. (2026). WEMFusion: Wavelet-Driven Hybrid-Modality Enhancement and Discrepancy-Aware Mamba for Optical–SAR Image Fusion. Remote Sensing, 18(4), 612. https://doi.org/10.3390/rs18040612

