Modulated Diffusion with Spatial–Spectral Disentangled Guidance for Hyperspectral Image Super-Resolution
Highlights
- A Dynamic Modulated Residual Network (DMRN) with time-aware Feature-wise Linear Modulation (FiLM) modulation achieves adaptive spatial–spectral conditional guidance throughout the diffusion denoising process. This design effectively prevents feature dilution and hallucination, interacting well with the side-branch injection architecture with a 0.65 dB PSNR improvement over static guidance even under timesteps of 5.
- A training-free Spatial–Spectral Disentangled Guidance (SSDG) strategy explicitly decouples spatial and spectral guidance during sampling, enabling flexible control over modality trade-offs, and achieves the best overall performance under blind Gaussian noise ( = 0.01, approximately 23 dB SNR) across three hyperspectral benchmark datasets, demonstrating its generalization ability.
- The proposed framework demonstrates that dynamic, timestep-aware conditioning is essential for diffusion-based multi-modal fusion, offering a generalizable design principle applicable beyond hyperspectral imaging to other sensor fusion tasks requiring noise robustness.
- The SSDG mechanism provides a plug-and-play, retraining-free knob for downstream users to balance spatial fidelity against spectral consistency, making the method readily adaptable to diverse remote sensing applications such as land monitoring, ecological detection, and agricultural analysis.
Abstract
1. Introduction
- We propose a time-aware diffusion framework for HS–MS fusion, comprising a Dynamic Modulated Residual Network (DMRN) for adaptive conditional denoising and a Spatial–Spectral Disentangled Guidance (SSDG) mechanism for flexible sampling.
- DMRN introduces a time-aware side-injection mechanism with FiLM modulation, dynamically adapting conditional guidance to each denoising stage while incorporating frequency-domain constraints to align with the residual learning objective.
- SSDG extends Classifier-Free Guidance to multi-modal fusion, enabling training-free, explicit control over spatial and spectral guidance weights to mitigate modality conflicts and suit diverse downstream tasks.
- Extensive experiments on three public datasets (Chikusei, Houston, and KSC) demonstrate state-of-the-art performance and superior robustness to noise, while maintaining high computational efficiency.
2. Related Work
2.1. Hyperspectral Image Fusion
2.2. Diffusion Models for HSI-SR
2.3. Conditional Guidance and Disentanglement
3. Materials and Methods
3.1. Problem Formulation
3.2. Diffusion Preliminaries
3.3. Dynamic Modulated Residual Network (DMRN)
3.3.1. Overall Architecture
3.3.2. Side-Injection Mechanism
3.3.3. Time-Aware Dynamic Modulation
3.4. Spatial–Spectral Disentangled Guidance (SSDG)
3.5. Training Objective and Efficient Sampling
| Algorithm 1: Training Algorithm |
![]() |
| Algorithm 2: Sampling Algorithm |
![]() |
3.5.1. Frequency-Aware Optimization Objective
3.5.2. Accelerated Inference via DDIM
3.6. Implementation Details
3.7. Datasets
3.8. Comparison Methods and Metrics
- CNN-based: LAGConv [22].
4. Results
4.1. Overall Generalization and Efficiency Comparison
4.2. Quantitative Comparison
4.3. Qualitative Comparison
5. Discussion
5.1. Ablation Studies
5.1.1. Dynamic Modulated Residual Network
5.1.2. Spatial–Spectral Disentangled Guidance
5.1.3. Impact of Optimization Strategy
5.2. Analysis of Hyperparameters
5.3. Analysis of Time-Aware Condition Module
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Goetz, A.F.H.; Vane, G.; Solomon, J.E.; Rock, B.N. Imaging Spectrometry for Earth Remote Sensing. Science 1985, 228, 1147–1153. [Google Scholar] [CrossRef]
- Asner, G. Hyperspectral Remote Sensing of Canopy Chemistry, Physiology, and Biodiversity in Tropical Rainforests. In Hyperspectral Remote Sensing of Tropical and Sub-Tropical Forests; CRC Press: Boca Raton, FL, USA, 2008; pp. 261–296. [Google Scholar] [CrossRef]
- Thenkabail, P.S.; Smith, R.B.; De Pauw, E. Hyperspectral Vegetation Indices and Their Relationships with Agricultural Crop Characteristics. Remote Sens. Environ. 2000, 71, 158–182. [Google Scholar] [CrossRef]
- Wei, Q.; Bioucas-Dias, J.; Dobigeon, N.; Tourneret, J.Y. Hyperspectral and Multispectral Image Fusion Based on a Sparse Representation. IEEE Trans. Geosci. Remote Sens. 2015, 53, 3658–3668. [Google Scholar] [CrossRef]
- Akhtar, N.; Shafait, F.; Mian, A. Sparse Spatio-spectral Representation for Hyperspectral Image Super-resolution. In Proceedings of the Computer Vision—ECCV 2014; Springer: Cham, Switzerland, 2014; pp. 63–78. [Google Scholar] [CrossRef]
- Yokoya, N.; Yairi, T.; Iwasaki, A. Coupled Nonnegative Matrix Factorization Unmixing for Hyperspectral and Multispectral Data Fusion. IEEE Trans. Geosci. Remote Sens. 2012, 50, 528–537. [Google Scholar] [CrossRef]
- Zhu, Z.; Hou, J.; Chen, J.; Zeng, H.; Zhou, J. Hyperspectral Image Super-Resolution via Deep Progressive Zero-Centric Residual Learning. IEEE Trans. Image Process. 2021, 30, 1423–1438. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Liu, Q.; Wang, Y. Remote sensing image fusion based on two-stream fusion network. Inf. Fusion 2020, 55, 1–15. [Google Scholar] [CrossRef]
- Zhang, X.; Huang, W.; Wang, Q.; Li, X. SSR-NET: Spatial–Spectral Reconstruction Network for Hyperspectral and Multispectral Image Fusion. IEEE Trans. Geosci. Remote Sens. 2021, 59, 5953–5965. [Google Scholar] [CrossRef]
- Gao, Y.; Zhang, M.; Wang, J.; Li, W. Cross-Scale Mixing Attention for Multisource Remote Sensing Data Fusion and Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–15. [Google Scholar] [CrossRef]
- Hu, J.F.; Huang, T.Z.; Deng, L.J.; Dou, H.X.; Hong, D.; Vivone, G. Fusformer: A Transformer-Based Fusion Network for Hyperspectral Image Super-Resolution. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Chen, L.; Vivone, G.; Qin, J.; Chanussot, J.; Yang, X. Spectral–Spatial Transformer for Hyperspectral Image Sharpening. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 16733–16747. [Google Scholar] [CrossRef]
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Proceedings of the Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 6840–6851. [Google Scholar]
- Song, Y.; Sohl-Dickstein, J.; Kingma, D.P.; Kumar, A.; Ermon, S.; Poole, B. Score-Based Generative Modeling through Stochastic Differential Equations. In Proceedings of the International Conference on Learning Representations, Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
- Saharia, C.; Ho, J.; Chan, W.; Salimans, T.; Fleet, D.J.; Norouzi, M. Image Super-Resolution Via Iterative Refinement. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 4713–4726. [Google Scholar] [CrossRef]
- Cao, Z.; Cao, S.; Wu, X.; Hou, J.; Ran, R.; Deng, L.J. DDRF: Denoising Diffusion Model for Remote Sensing Image Fusion. arXiv 2023, arXiv:2304.04774. [Google Scholar] [CrossRef]
- Wu, C.; Wang, D.; Bai, Y.; Mao, H.; Li, Y.; Shen, Q. HSR-Diff: Hyperspectral Image Super-Resolution via Conditional Diffusion Models. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023; pp. 7060–7070. [Google Scholar] [CrossRef]
- Perez, E.; Strub, F.; De Vries, H.; Dumoulin, V.; Courville, A. FiLM: Visual Reasoning with a General Conditioning Layer. Proc. AAAI Conf. Artif. Intell. 2018, 32, 3942–3951. [Google Scholar] [CrossRef]
- Zhang, L.; Rao, A.; Agrawala, M. Adding Conditional Control to Text-to-Image Diffusion Models. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 3813–3824. [Google Scholar] [CrossRef]
- Cao, M.; Bao, W.; Qu, K.; Zhang, X.; Ma, X. Nonlocal Low-Rank Regularization for Hyperspectral and High-Resolution Remote Sensing Image Fusion. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 1444–1447. [Google Scholar] [CrossRef]
- Xue, J.; Zhao, Y.Q.; Bu, Y.; Liao, W.; Chan, J.C.W.; Philips, W. Spatial-Spectral Structured Sparse Low-Rank Representation for Hyperspectral Image Super-Resolution. IEEE Trans. Image Process. 2021, 30, 3084–3097. [Google Scholar] [CrossRef]
- Jin, Z.R.; Zhang, T.J.; Jiang, T.X.; Vivone, G.; Deng, L.J. LAGConv: Local-Context Adaptive Convolution Kernels with Global Harmonic Bias for Pansharpening. Proc. AAAI Conf. Artif. Intell. 2022, 36, 1113–1121. [Google Scholar] [CrossRef]
- Hu, J.F.; Huang, T.Z.; Deng, L.J.; Jiang, T.X.; Vivone, G.; Chanussot, J. Hyperspectral Image Super-Resolution via Deep Spatiospectral Attention Convolutional Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 7251–7265. [Google Scholar] [CrossRef]
- Deng, S.Q.; Deng, L.J.; Wu, X.; Ran, R.; Hong, D.; Vivone, G. PSRT: Pyramid Shuffle-and-Reshuffle Transformer for Multispectral and Hyperspectral Image Fusion. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5503715. [Google Scholar] [CrossRef]
- Xiao, N.; Fu, X.; Ren, Q.; He, W.; Wei, S.; Jia, S. Region-Aware MoE Network for Hyperspectral and Multispectral Image Fusion. IEEE Trans. Geosci. Remote Sens. 2026, 64, 5510015. [Google Scholar] [CrossRef]
- Xiao, J.; Li, J.; Yuan, Q.; Jiang, M.; Zhang, L. Physics-Based GAN with Iterative Refinement Unit for Hyperspectral and Multispectral Image Fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 6827–6841. [Google Scholar] [CrossRef]
- Zhu, C.; Deng, S.; Zhou, Y.; Deng, L.J.; Wu, Q. QIS-GAN: A Lightweight Adversarial Network with Quadtree Implicit Sampling for Multispectral and Hyperspectral Image Fusion. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5531115. [Google Scholar] [CrossRef]
- Liu, J.; Wu, Z.; Xiao, L. A Spectral Diffusion Prior for Unsupervised Hyperspectral Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5528613. [Google Scholar] [CrossRef]
- Zhao, Z.; Bai, H.; Zhu, Y.; Zhang, J.; Xu, S.; Zhang, Y.; Zhang, K.; Meng, D.; Timofte, R.; Van Gool, L. DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 8048–8059. [Google Scholar] [CrossRef]
- Zhu, J.; Wang, H.; Xu, Y.; Wu, Z.; Wei, Z. Self-Learning Hyperspectral and Multispectral Image Fusion via Adaptive Residual Guided Subspace Diffusion Model. In Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 17862–17871. [Google Scholar] [CrossRef]
- Shi, Y.; Liu, Y.; Cheng, J.; Wang, Z.J.; Chen, X. VDMUFusion: A Versatile Diffusion Model-Based Unsupervised Framework for Image Fusion. IEEE Trans. Image Process. 2025, 34, 441–454. [Google Scholar] [CrossRef]
- Zeng, Y.; Huang, W.; Liu, M.; Zhang, H.; Zou, B. Fusion of satellite images in urban area: Assessing the quality of resulting images. In Proceedings of the 2010 18th International Conference on Geoinformatics, Beijing, China, 18–20 June 2010; pp. 1–4. [Google Scholar] [CrossRef]
- Qu, J.; He, J.; Dong, W.; Zhao, J. S2CycleDiff: Spatial-Spectral-Bilateral Cycle-Diffusion Framework for Hyperspectral Image Super-resolution. Proc. AAAI Conf. Artif. Intell. 2024, 38, 4623–4631. [Google Scholar] [CrossRef]
- Phung, H.; Dao, Q.; Tran, A. Wavelet Diffusion Models are fast and scalable Image Generators. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 10199–10208. [Google Scholar] [CrossRef]
- Cao, Z.; Cao, S.; Deng, L.J.; Wu, X.; Hou, J.; Vivone, G. Diffusion model with disentangled modulations for sharpening multispectral and hyperspectral images. Inf. Fusion 2024, 104, 102158. [Google Scholar] [CrossRef]
- Dong, W.; Liu, S.; Xiao, S.; Qu, J.; Li, Y. ISPDiff: Interpretable Scale-Propelled Diffusion Model for Hyperspectral Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5519614. [Google Scholar] [CrossRef]
- Ho, J.; Salimans, T. Classifier-Free Diffusion Guidance. arXiv 2022, arXiv:2207.12598. [Google Scholar] [CrossRef]
- Zhou, W.; Wang, W.; Bao, J.; Chen, D.; Chen, D.; Yuan, L.; Li, H. Semantic Image Synthesis via Diffusion Models. arXiv 2022, arXiv:2207.00050. [Google Scholar] [CrossRef]
- Chen, B.; Liu, L.; Liu, C.; Zou, Z.; Shi, Z. Spectral-Cascaded Diffusion Model for Remote Sensing Image Spectral Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5528414. [Google Scholar] [CrossRef]
- Sturm, B.L. Stéphane Mallat: A Wavelet Tour of Signal Processing, 2nd Edition. Comput. Music J. 2007, 31, 83–85. [Google Scholar] [CrossRef]
- Song, J.; Meng, C.; Ermon, S. Denoising Diffusion Implicit Models. In Proceedings of the International Conference on Learning Representations, Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
- Yokoya, N.; Iwasaki, A. Airborne Hyperspectral Data over Chikusei; Technical Report SAL-2016-05-27; Space Application Laboratory, University of Tokyo: Tokyo, Japan, 2016. [Google Scholar]
- Le Saux, B.; Yokoya, N.; Hansch, R.; Prasad, S. 2018 IEEE GRSS Data Fusion Contest: Multimodal Land Use Classification [Technical Committees]. IEEE Geosci. Remote Sens. Mag. 2018, 6, 52–54. [Google Scholar] [CrossRef]
- Ham, J.; Chen, Y.; Crawford, M.; Ghosh, J. Investigation of the random forest framework for classification of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 2005, 43, 492–501. [Google Scholar] [CrossRef]









| Method | Params (M) | FLOPs (G) | VRAM (GB) | Run time (s) | PSNR (dB) |
|---|---|---|---|---|---|
| LAGConv | 0.25 | 2.671 | 0.96 | 0.0138 | 45.77 |
| Fusformer | 0.35 | 10.02 | 28.32 | 0.306 | 45.02 |
| PSRT | 0.30 | 17.17 | 3.28 | 0.0195 | 42.56 |
| RAMoE | 29.69 | 1944 | 12.69 | 0.0696 | 46.16 |
| Dif-PAN | 14.09 | 3057 | 0.48 | 0.663 | 44.66 |
| S2CycleDiff | 65.94 | 307,800 | 0.90 | 29.9 | 42.77 |
| ISPDiff | 5.37 | 30,220 | 0.70 | 22.7 | 43.63 |
| Proposed | 19.07 | 29,540 | 0.57 | 0.349 | 46.41 |
| Metric | LAGConv | Fusformer | PSRT | RAMoE | Dif-PAN | S2CycleDiff | ISPDiff | Proposed |
|---|---|---|---|---|---|---|---|---|
| Noise-free Case | ||||||||
| PSNR ↑ | 46.4992±0.0931 | 46.4238±0.0872 | 42.7353±0.0878 | 47.7667±0.1512 | 45.2549±0.0823 | 45.1923±0.1143 | 46.3544±0.0948 | 47.7571±0.0873 |
| SSIM ↑ | 0.9932±0.0001 | 0.9929±0.0001 | 0.9910±0.0002 | 0.9943±0.0001 | 0.9916±0.0001 | 0.9916±0.0002 | 0.9928±0.0002 | 0.9935±0.0001 |
| SAM ↓ | 2.3819±0.2736 | 2.3901±0.2753 | 2.5689±0.2589 | 2.4268±0.3503 | 2.4111±0.2620 | 2.4301±0.2918 | 2.3721±0.2963 | 2.3030±0.2940 |
| ERGAS ↓ | 1.4937±0.0123 | 1.7143±0.0173 | 1.9250±0.0174 | 1.6702±0.0192 | 1.9000±0.0179 | 1.8931±0.0185 | 1.6293±0.0133 | 1.5568±0.0137 |
| RMSE ↓ | 0.0052±0.0001 | 0.0053±0.0001 | 0.0065±0.0001 | 0.0048±0.0001 | 0.0055±0.0001 | 0.0056±0.0001 | 0.0052±0.0001 | 0.0048±0.0000 |
| 1% Noisy Case | ||||||||
| PSNR ↑ | 37.5546±0.1314 | 37.8283±0.1146 | 36.9311±0.1126 | 37.5346±0.0922 | 38.5874±0.1154 | 40.6467±0.1115 | 39.5276±0.1018 | 39.0820±0.1181 |
| SSIM ↑ | 0.9814±0.0006 | 0.9827±0.0005 | 0.9835±0.0004 | 0.9808±0.0005 | 0.9850±0.0004 | 0.9888±0.0003 | 0.9856±0.0004 | 0.9857±0.0005 |
| SAM ↓ | 3.0820±0.2905 | 3.0194±0.2876 | 3.0625±0.2720 | 3.0987±0.2372 | 2.9006±0.3040 | 2.6426±0.2884 | 2.8233±0.2794 | 2.8453±0.2771 |
| ERGAS ↓ | 2.6029±0.0294 | 2.6470±0.0277 | 2.6514±0.0287 | 2.5191±0.0205 | 2.5727±0.0267 | 2.1504±0.0211 | 2.2957±0.0230 | 2.3583±0.0266 |
| RMSE ↓ | 0.0092±0.0001 | 0.0090±0.0001 | 0.0101±0.0001 | 0.0098±0.0000 | 0.0083±0.0001 | 0.0072±0.0001 | 0.0080±0.0001 | 0.0079±0.0001 |
| Metric | LAGConv | Fusformer | PSRT | RAMoE | Dif-PAN | S2CycleDiff | ISPDiff | Proposed |
|---|---|---|---|---|---|---|---|---|
| Noise-free Case | ||||||||
| PSNR ↑ | 48.1339±0.3342 | 47.2223±0.0722 | 44.3320±1.1600 | 49.4940±0.5549 | 48.0839±0.4030 | 43.8755±0.5578 | 48.6642±0.8049 | 50.1413±0.3632 |
| SSIM ↑ | 0.9971±0.0003 | 0.9964±0.0003 | 0.9951±0.0008 | 0.9969±0.0002 | 0.9974±0.0002 | 0.9931±0.0007 | 0.9952±0.0003 | 0.9964±0.0004 |
| SAM↓ | 0.8398±0.0113 | 0.9191±0.0115 | 0.8536±0.0127 | 1.0532±0.0152 | 0.9080±0.0258 | 1.1498±0.0175 | 0.9129±0.0152 | 0.9293±0.0179 |
| ERGAS ↓ | 0.7257±0.0424 | 0.7874±0.0404 | 1.0220±0.1225 | 0.7293±0.0228 | 0.6893±0.0356 | 1.1026±0.0924 | 0.8380±0.0384 | 0.7479±0.0445 |
| RMSE ↓ | 0.0025±0.0001 | 0.0026±0.0001 | 0.0042±0.0005 | 0.0024±0.0000 | 0.0024±0.0001 | 0.0037±0.0003 | 0.0024±0.0001 | 0.0023±0.0000 |
| 1% Noisy Case | ||||||||
| PSNR ↑ | 38.7893±0.1153 | 39.4085±0.1227 | 37.7499±0.2045 | 40.5530±0.1002 | 39.5046±0.1700 | 40.1866±0.2786 | 41.6502±0.5375 | 42.7386±0.4648 |
| SSIM ↑ | 0.9868±0.0008 | 0.9878±0.0008 | 0.9853±0.0014 | 0.9916±0.0007 | 0.9883±0.0005 | 0.9903±0.0011 | 0.9919±0.0010 | 0.9936±0.0008 |
| SAM ↓ | 2.9955±0.1504 | 2.7075±0.1322 | 2.9880±0.1611 | 2.4665±0.1388 | 2.8116±0.1343 | 2.1043±0.1146 | 2.1176±0.2085 | 1.8671±0.1794 |
| ERGAS ↓ | 1.6415±0.0814 | 1.5533±0.0806 | 1.7995±0.1407 | 1.3274±0.0702 | 1.5252±0.0707 | 1.4009±0.1112 | 1.2477±0.0991 | 1.1002±0.0988 |
| RMSE ↓ | 0.0062±0.0002 | 0.0057±0.0002 | 0.0072±0.0005 | 0.0052±0.0002 | 0.0057±0.0002 | 0.0054±0.0003 | 0.0047±0.0004 | 0.0042±0.0003 |
| Metric | LAGConv | Fusformer | PSRT | RAMoE | Dif-PAN | S2CycleDiff | ISPDiff | Proposed |
|---|---|---|---|---|---|---|---|---|
| Noise-free Case | ||||||||
| PSNR ↑ | 42.6676±1.4041 | 41.4131±1.2097 | 40.6046±1.1391 | 41.2137±1.4014 | 40.6493±1.1596 | 39.2428±1.0877 | 35.8717±1.3498 | 41.3284±1.2553 |
| SSIM ↑ | 0.9882±0.0029 | 0.9874±0.0029 | 0.9867±0.0028 | 0.9874±0.0032 | 0.9880±0.0027 | 0.9834±0.0029 | 0.9723±0.0051 | 0.9874±0.0028 |
| SAM ↓ | 2.2087±0.3354 | 2.2772±0.3162 | 2.3404±0.3248 | 2.3593±0.3732 | 2.2340±0.3211 | 2.6673±0.3439 | 3.4728±0.4921 | 2.2983±0.3197 |
| ERGAS ↓ | 1.5219±0.1976 | 1.5963±0.2109 | 1.6720±0.1862 | 1.6139±0.2068 | 1.5926±0.1821 | 1.9161±0.1724 | 2.3258±0.2357 | 1.6351±0.2034 |
| RMSE ↓ | 0.0019±0.0002 | 0.0020±0.0002 | 0.0022±0.0003 | 0.0021±0.0002 | 0.0022±0.0003 | 0.0027±0.0004 | 0.0033±0.0003 | 0.0021±0.0003 |
| 1% Noisy Case | ||||||||
| PSNR ↑ | 36.6895±1.0841 | 36.9831±1.0223 | 36.1476±1.1148 | 37.4090±0.9888 | 36.3646±1.0019 | 37.1089±0.9399 | 33.3714±1.8274 | 36.6688±1.6512 |
| SSIM ↑ | 0.9781±0.0038 | 0.9794±0.0035 | 0.9770±0.0038 | 0.9809±0.0041 | 0.9788±0.0038 | 0.9802±0.0031 | 0.9488±0.0159 | 0.9757±0.0066 |
| SAM ↓ | 3.6181±0.5110 | 3.3878±0.4647 | 3.5412±0.4704 | 3.3208±0.4582 | 3.5331±0.4637 | 3.1641±0.3886 | 4.9393±0.9754 | 3.4321±0.5218 |
| ERGAS ↓ | 2.0571±0.2249 | 2.0187±0.2378 | 2.1515±0.2080 | 1.9839±0.2394 | 2.0724±0.2202 | 2.0708±0.1945 | 3.0253±0.4749 | 2.1087±0.2995 |
| RMSE ↓ | 0.0030±0.0004 | 0.0029±0.0003 | 0.0032±0.0004 | 0.0028±0.0004 | 0.0032±0.0004 | 0.0031±0.0005 | 0.0041±0.0002 | 0.0029±0.0002 |
| Variant | PSNR ↑ | SSIM ↑ | SAM ↓ | ERGAS ↓ | RMSE ↓ |
|---|---|---|---|---|---|
| w/o Residual | 41.4714 | 0.9875 | 2.7304 | 2.1009 | 0.0074 |
| w/o Side-Inject | 47.0979 | 0.9931 | 2.3294 | 1.6060 | 0.0050 |
| w/o Time FiLM | 47.1247 | 0.9932 | 2.3355 | 1.6050 | 0.0050 |
| DMRN (Full) | 47.7679 | 0.9935 | 2.3028 | 1.5569 | 0.0048 |
| Sampling Strategy | PSNR ↑ | SSIM ↑ | SAM ↓ | ERGAS ↓ | RMSE ↓ |
|---|---|---|---|---|---|
| Baseline (No CFG) | 50.1664 | 0.9964 | 0.9279 | 0.7485 | 0.0023 |
| Standard CFG | 50.1294 | 0.9964 | 0.9301 | 0.7478 | 0.0023 |
| SSDG (Best PSNR) | 50.1664 | 0.9964 | 0.9277 | 0.7486 | 0.0023 |
| SSDG (Best ERGAS) | 50.1306 | 0.9964 | 0.9296 | 0.7478 | 0.0023 |
| SSDG (Best SAM) | 50.0980 | 0.9964 | 0.9268 | 0.7506 | 0.0023 |
| Objective | PSNR ↑ | SSIM ↑ | SAM ↓ | ERGAS ↓ | RMSE ↓ |
|---|---|---|---|---|---|
| pure MSE baseline | 46.2942 | 0.9925 | 2.3813 | 1.6154 | 0.0059 |
| LP constraint | 46.7582 | 0.9930 | 2.3395 | 1.5926 | 0.0051 |
| DWT constraint (Proposed) | 47.7679 | 0.9935 | 2.3028 | 1.5569 | 0.0048 |
| Method | Steps (T) | Time (s) | PSNR ↑ | SSIM ↑ | SAM ↓ | ERGAS ↓ | RMSE ↓ |
|---|---|---|---|---|---|---|---|
| Time-Aware Condition Module (Full) | 5 | 0.1952 | 47.7672 | 0.9935 | 2.3028 | 1.5568 | 0.0048 |
| 25 | 0.8605 | 47.7679 | 0.9935 | 2.3028 | 1.5569 | 0.0048 | |
| 100 | 3.4225 | 47.7681 | 0.9935 | 2.3028 | 1.5569 | 0.0048 | |
| 500 | 16.9945 | 47.7681 | 0.9935 | 2.3028 | 1.5569 | 0.0048 | |
| 1000 | 33.8550 | 47.7681 | 0.9935 | 2.3028 | 1.5569 | 0.0048 | |
| Simple Condition Module (w/o Time-Aware) | 5 | 0.1952 | 47.1182 | 0.9932 | 2.3356 | 1.6052 | 0.0050 |
| 25 | 0.8605 | 47.1247 | 0.9932 | 2.3355 | 1.6050 | 0.0050 | |
| 100 | 3.4225 | 47.1259 | 0.9932 | 2.3355 | 1.6050 | 0.0050 | |
| 500 | 16.9945 | 47.1263 | 0.9932 | 2.3355 | 1.6050 | 0.0050 | |
| 1000 | 33.8550 | 47.1263 | 0.9932 | 2.3355 | 1.6050 | 0.0050 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Xu, X.; Qiao, J.; Zhou, J.; Yuan, K.; Feng, L. Modulated Diffusion with Spatial–Spectral Disentangled Guidance for Hyperspectral Image Super-Resolution. Remote Sens. 2026, 18, 1582. https://doi.org/10.3390/rs18101582
Xu X, Qiao J, Zhou J, Yuan K, Feng L. Modulated Diffusion with Spatial–Spectral Disentangled Guidance for Hyperspectral Image Super-Resolution. Remote Sensing. 2026; 18(10):1582. https://doi.org/10.3390/rs18101582
Chicago/Turabian StyleXu, Xinlan, Jiaqing Qiao, Jialin Zhou, Kuo Yuan, and Lei Feng. 2026. "Modulated Diffusion with Spatial–Spectral Disentangled Guidance for Hyperspectral Image Super-Resolution" Remote Sensing 18, no. 10: 1582. https://doi.org/10.3390/rs18101582
APA StyleXu, X., Qiao, J., Zhou, J., Yuan, K., & Feng, L. (2026). Modulated Diffusion with Spatial–Spectral Disentangled Guidance for Hyperspectral Image Super-Resolution. Remote Sensing, 18(10), 1582. https://doi.org/10.3390/rs18101582



