DSEPGAN: A Dual-Stream Enhanced Pyramid Based on Generative Adversarial Network for Spatiotemporal Image Fusion
Highlights
- A Dual-Stream Enhanced Pyramid based on GAN (DSEPGAN) is proposed for spatiotemporal fusion of remote sensing images.
- The model integrates reversible detail preservation and large-kernel feature reconstruction to enhance fine spatial details.
- DSEPGAN significantly improves detail and edge restoration in regions with pronounced phenological and land-cover changes, ensuring high-fidelity spatiotemporal reconstruction.
- The dual-stream reversible pyramid design provides a new framework for multi-modal image fusion and change analysis.
Abstract
1. Introduction
- A Dual-Stream Decoupling Framework: We introduce the Pyramid Time Change Extractor (PTCE) and the Pyramid Space Detail Extractor (PSDE), creating a specialized architecture that explicitly decouples the extraction of temporal dynamics from coarse imagery and high-frequency spatial structure from fine imagery. This asymmetric approach ensures dedicated modeling tailored to the heterogeneous nature of multi-sensor inputs.
- Structurally Lossless Spatial Detail Preservation: To fundamentally address the irreversible information bottleneck caused by downsampling, the PSDE is customized with a novel detail-preserving strategy that integrates Affine Coupling Layers and Patch Merging operations. This combination maximizes the fidelity of extracted features, which is crucial for reconstructing sharp texture and edge details in STF products.
- Hierarchical Long-Range Feature Aggregation (HLRFA): We designed the HLRFA module, incorporating advanced feature fusion techniques such as Large Kernel Fusion Blocks. This mechanism efficiently combines segregated temporal and spatial features across multiple scales, substantially expanding the receptive field to capture necessary long-range contextual information without incurring the heavy computational burden associated with Transformer-based models.
- Superior Fusion Performance: Through comprehensive experiments on three benchmark datasets, DSEPGAN demonstrates significant improvements over existing state-of-the-art models, achieving superior performance in both spatial fidelity (sharpness and texture preservation) and spectral consistency (radiometric accuracy), validated by both objective metrics and visual quality assessment.
2. Related Work
2.1. Multiscale Mechanisms in STF
2.2. Long-Range Dependency Modeling in STF
3. Proposed Methods
3.1. Pyramid Generator
3.1.1. Pyramid Time Change Extractor
3.1.2. Pyramid Space Detail Extractor
3.1.3. Hierarchical Long-Range Feature Aggregation
3.2. PatchGAN-Based Discriminator
3.3. Compound Loss Function
4. Experiments and Results
4.1. Study Areas and Datasets
4.2. Experiment Design and Evaluation
4.3. Experimental Results for CIA
4.4. Experimental Results for LGC
4.5. Experimental Results for Tianjin
5. Discussion
5.1. Model Efficiency
5.2. Ablation Study
5.3. Parameter Analysis
5.3.1. Kernel Size
5.3.2. Number of Invertible Basic Units
5.3.3. Number of Stages
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhu, X.; Cai, F.; Tian, J.; Williams, T. Spatiotemporal Fusion of Multisource Remote Sensing Data: Literature Survey, Taxonomy, Principles, Applications, and Future Directions. Remote Sens. 2018, 10, 527. [Google Scholar] [CrossRef]
- Emelyanova, I.V.; McVicar, T.R.; Van Niel, T.G.; Li, L.T.; Van Dijk, A.I.J.M. Assessing the Accuracy of Blending Landsat–MODIS Surface Reflectances in Two Landscapes with Contrasting Spatial and Temporal Dynamics: A Framework for Algorithm Selection. Remote Sens. Environ. 2013, 133, 193–209. [Google Scholar] [CrossRef]
- Pan, L.; Lu, L.; Fu, P.; Nitivattananon, V.; Guo, H.; Li, Q. Understanding Spatiotemporal Evolution of the Surface Urban Heat Island in the Bangkok Metropolitan Region from 2000 to 2020 Using Enhanced Land Surface Temperature. Geomat. Nat. Hazards Risk 2023, 14, 2174904. [Google Scholar] [CrossRef]
- Mbabazi, D.; Mohanty, B.P.; Gaur, N. High Spatio-Temporal Resolution Evapotranspiration Estimates Within Large Agricultural Fields by Fusing Eddy Covariance and Landsat Based Data. Agric. For. Meteorol. 2023, 333, 109417. [Google Scholar] [CrossRef]
- Zhou, Y.; Liu, T.; Batelaan, O.; Duan, L.; Wang, Y.; Li, X.; Li, M. Spatiotemporal Fusion of Multi-Source Remote Sensing Data for Estimating Aboveground Biomass of Grassland. Ecol. Indic. 2023, 146, 109892. [Google Scholar] [CrossRef]
- Wu, W.; Liu, Y.; Li, K.; Yang, H.; Yang, L.; Chen, Z. STFCropNet: A Spatiotemporal Fusion Network for Crop Classification in Multiresolution Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 4736–4750. [Google Scholar] [CrossRef]
- Zhukov, B.; Oertel, D.; Lanzl, F.; Reinhackel, G. Unmixing-Based Multisensor Multiresolution Image Fusion. IEEE Trans. Geosci. Remote Sens. 1999, 37, 1212–1226. [Google Scholar] [CrossRef]
- Zhu, X.; Helmer, E.H.; Gao, F.; Liu, D.; Chen, J.; Lefsky, M.A. A Flexible Spatiotemporal Method for Fusing Satellite Images with Different Resolutions. Remote Sens. Environ. 2016, 172, 165–177. [Google Scholar] [CrossRef]
- Gao, F.; Masek, J.; Schwaller, M.; Hall, F. On the Blending of the Landsat and MODIS Surface Reflectance: Predicting Daily Landsat Surface Reflectance. IEEE Trans. Geosci. Remote Sens. 2006, 44, 2207–2218. [Google Scholar] [CrossRef]
- Zhu, X.; Chen, J.; Gao, F.; Chen, X.; Masek, J.G. An Enhanced Spatial and Temporal Adaptive Reflectance Fusion Model for Complex Heterogeneous Regions. Remote Sens. Environ. 2010, 114, 2610–2623. [Google Scholar] [CrossRef]
- Wang, Q.; Atkinson, P.M. Spatio-Temporal Fusion for Daily Sentinel-2 Images. Remote Sens. Environ. 2018, 204, 31–42. [Google Scholar] [CrossRef]
- Liao, L.; Song, J.; Wang, J.; Xiao, Z.; Wang, J. Bayesian Method for Building Frequent Landsat-Like NDVI Datasets by Integrating MODIS and Landsat NDVI. Remote Sens. 2016, 8, 452. [Google Scholar] [CrossRef]
- Xue, J.; Leung, Y.; Fung, T. A Bayesian Data Fusion Approach to Spatio-Temporal Fusion of Remotely Sensed Images. Remote Sens. 2017, 9, 1310. [Google Scholar] [CrossRef]
- Huang, B.; Song, H. Spatiotemporal Reflectance Fusion via Sparse Representation. IEEE Trans. Geosci. Remote Sens. 2012, 50, 3707–3716. [Google Scholar] [CrossRef]
- Wei, J.; Wang, L.; Liu, P.; Chen, X.; Li, W.; Zomaya, A.Y. Spatiotemporal Fusion of MODIS and Landsat-7 Reflectance Images via Compressed Sensing. IEEE Trans. Geosci. Remote Sens. 2017, 55, 7126–7139. [Google Scholar] [CrossRef]
- Tan, Z.; Yue, P.; Di, L.; Tang, J. Deriving High Spatiotemporal Remote Sensing Images Using Deep Convolutional Network. Remote Sens. 2018, 10, 1066. [Google Scholar] [CrossRef]
- Zhang, X.; Li, S.; Tan, Z.; Li, X. Enhanced Wavelet Based Spatiotemporal Fusion Networks Using Cross-Paired Remote Sensing Images. ISPRS J. Photogramm. Remote Sens. 2024, 211, 281–297. [Google Scholar] [CrossRef]
- Ren, K.; Sun, W.; Meng, X.; Yang, G. GCM-PDA: A Generative Compensation Model for Progressive Difference Attenuation in Spatiotemporal Fusion of Remote Sensing Images. IEEE Trans. Image Process. 2025, 34, 3817–3832. [Google Scholar] [CrossRef]
- Song, H.; Liu, Q.; Wang, G.; Hang, R.; Huang, B. Spatiotemporal Satellite Image Fusion Using Deep Convolutional Neural Networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 821–829. [Google Scholar] [CrossRef]
- Tan, Z.; Di, L.; Zhang, M.; Guo, L.; Gao, M. An Enhanced Deep Convolutional Model for Spatiotemporal Image Fusion. Remote Sens. 2019, 11, 2898. [Google Scholar] [CrossRef]
- Li, W.; Zhang, X.; Peng, Y.; Dong, M. DMNet: A Network Architecture Using Dilated Convolution and Multiscale Mechanisms for Spatiotemporal Fusion of Remote Sensing Images. IEEE Sens. J. 2020, 20, 12190–12202. [Google Scholar] [CrossRef]
- Cao, H.; Luo, X.; Peng, Y.; Xie, T. MANet: A Network Architecture for Remote Sensing Spatiotemporal Fusion Based on Multiscale and Attention Mechanisms. Remote Sens. 2022, 14, 4600. [Google Scholar] [CrossRef]
- Song, B.; Liu, P.; Li, J.; Wang, L.; Zhang, L.; He, G.; Chen, L.; Liu, J. MLFF-GAN: A Multilevel Feature Fusion with GAN for Spatiotemporal Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–16. [Google Scholar] [CrossRef]
- Huang, Y.; Li, X.; Du, Z.; Shen, H. Spatiotemporal Enhancement and Interlevel Fusion Network for Remote Sensing Images Change Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–14. [Google Scholar] [CrossRef]
- Tan, Z.; Gao, M.; Li, X.; Jiang, L. A Flexible Reference-Insensitive Spatiotemporal Fusion Model for Remote Sensing Images Using Conditional Generative Adversarial Network. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Guan, F.; Zhao, N.; Wang, H.; Fang, Z.; Zhang, J.; Yu, Y.; Jiang, L.; Huang, H. Dual-Branch Transformer Framework with Gradient-Aware Weighting Feature Alignment for Robust Cross-View Geo-Localization. Inf. Fusion 2026, 127, 103808. [Google Scholar] [CrossRef]
- Benzenati, T.; Kallel, A.; Kessentini, Y. STF-Trans: A Two-Stream Spatiotemporal Fusion Transformer for Very High Resolution Satellites Images. Neurocomputing 2024, 563, 126868. [Google Scholar] [CrossRef]
- Li, Y.; Hou, Q.; Zheng, Z.; Cheng, M.-M.; Yang, J.; Li, X. Large Selective Kernel Network for Remote Sensing Object Detection. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV 2023), Paris, France, 3 October 2023; IEEE Computer Soc: Los Alamitos, CA, USA, 2023; pp. 16748–16759. [Google Scholar]
- Guan, F.; Zhao, N.; Fang, Z.; Jiang, L.; Zhang, J.; Yu, Y.; Huang, H. Multi-Level Representation Learning via ConvNeXt-Based Network for Unaligned Cross-View Matching. Geo-Spat. Inf. Sci. 2025, 28, 2344–2357 . [Google Scholar] [CrossRef]
- Li, W.; Yang, C.; Peng, Y.; Du, J. A Pseudo-Siamese Deep Convolutional Neural Network for Spatiotemporal Satellite Image Fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 1205–1220. [Google Scholar] [CrossRef]
- Chen, Y.; Shi, K.; Ge, Y.; Zhou, Y. Spatiotemporal Remote Sensing Image Fusion Using Multiscale Two-Stream Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–12. [Google Scholar] [CrossRef]
- Liu, Q.; Meng, X.; Shao, F.; Li, S. PSTAF-GAN: Progressive Spatio-Temporal Attention Fusion Method Based on Generative Adversarial Network. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
- Hu, C.; Ma, M.; Ma, X.; Zhang, H.; Wu, D.; Gao, G.; Zhang, W. STANet: Spatiotemporal Adaptive Network for Remote Sensing Images. In Proceedings of the 2023 IEEE International Conference on Image Processing (ICIP), Kuala Lumpur, Malaysia, 9–12 October 2023; IEEE: New York, NY, USA, 2023; pp. 3429–3433. [Google Scholar]
- Chen, G.; Jiao, P.; Hu, Q.; Xiao, L.; Ye, Z. SwinSTFM: Remote Sensing Spatiotemporal Fusion Using Swin Transformer. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–18. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV 2021), Montreal, QC, Canada, 10–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 9992–10002. [Google Scholar]
- Jiang, M.; Shao, H. A CNN-Transformer Combined Remote Sensing Imagery Spatiotemporal Fusion Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 13995–14009. [Google Scholar] [CrossRef]
- Wang, W.; Deng, L.-J.; Ran, R.; Vivone, G. A General Paradigm with Detail-Preserving Conditional Invertible Network for Image Fusion. Int. J. Comput. Vis. 2024, 132, 1029–1054. [Google Scholar] [CrossRef]
- Wang, J.; Lu, T.; Huang, X.; Zhang, R.; Feng, X. Pan-Sharpening via Conditional Invertible Neural Network. Inf. Fusion 2024, 101, 101980. [Google Scholar] [CrossRef]
- Liu, H.; Shao, M.; Qiao, Y.; Wan, Y.; Meng, D. Unpaired Image Super-Resolution Using a Lightweight Invertible Neural Network. Pattern Recognit. 2023, 144, 109822. [Google Scholar] [CrossRef]
- Huang, J.-J.; Dragotti, P.L. WINNet: Wavelet-Inspired Invertible Network for Image Denoising. IEEE Trans. Image Process. 2022, 31, 4377–4392. [Google Scholar] [CrossRef]
- Shang, F.; Lan, Y.; Yang, J.; Li, E.; Kang, X. Robust Data Hiding for JPEG Images with Invertible Neural Network. Neural Netw. 2023, 163, 219–232. [Google Scholar] [CrossRef]
- Zhou, M.; Fu, X.; Huang, J.; Zhao, F.; Liu, A.; Wang, R. Effective Pan-Sharpening with Transformer and Invertible Neural Network. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
- Zhao, Z.; Bai, H.; Zhang, J.; Zhang, Y.; Xu, S.; Lin, Z.; Timofte, R.; Van Gool, L. CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 5906–5916. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; IEEE: New York, NY, USA, 2018; pp. 4510–4520. [Google Scholar]
- Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, NY, USA, 2017; pp. 5967–5976. [Google Scholar]
- Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; IEEE: New York, NY, USA, 2017; pp. 2242–2251. [Google Scholar]
- Li, J.; Li, Y.; He, L.; Chen, J.; Plaza, A. Spatio-Temporal Fusion for Remote Sensing Data: An Overview and New Benchmark. Sci. China Inf. Sci. 2020, 63, 140301. [Google Scholar] [CrossRef]















| Model | Network Architecture | Multi-Scale Mechanism | Long-Range Modeling | Loss Functions |
|---|---|---|---|---|
| EDCSTFN | CNN | None | None | MSE loss, feature loss, and structure loss |
| GAN-STFM | CNN + GAN | None | None | GAN loss, feature loss, spectrum loss, and structure loss |
| MLFF-GAN | CNN + GAN with a pyramid encoder | Pyramid downsampling | None | GAN loss, L1 loss, spectrum loss, and structure loss |
| STF-Trans | CNN + Transformer | None | Transformer attention | L1 loss, high-frequency loss, and Total Variation |
| CTSTFM | CNN + Transformer | multikernel CNN | Transformer attention | L1 loss |
| DSEPGAN | CNN + INN + GAN with a Dual-Stream pyramid encoder | Pyramid downsampling + detail-preserving mechanism | Large-kernel convolution | GAN loss, L1 loss, spectrum loss, and structure loss |
| Band | STARFM | FSDAF | Fit-FC | EDCSTFN | GAN-STFM | MLFF-GAN | STF-Trans | CTSTFM | DSEPGAN | |
|---|---|---|---|---|---|---|---|---|---|---|
| RMSE | 1 | 0.0166 | 0.0161 | 0.0157 | 0.0162 | 0.0126 | 0.0122 | 0.0117 | 0.0133 | 0.0115 |
| 2 | 0.0250 | 0.0234 | 0.0235 | 0.0255 | 0.0180 | 0.0180 | 0.0172 | 0.0173 | 0.0176 | |
| 3 | 0.0399 | 0.0364 | 0.0376 | 0.0436 | 0.0293 | 0.0286 | 0.0275 | 0.0273 | 0.0291 | |
| 4 | 0.0496 | 0.0509 | 0.0472 | 0.0445 | 0.0401 | 0.0395 | 0.0374 | 0.0371 | 0.0368 | |
| 5 | 0.0460 | 0.0459 | 0.0468 | 0.0477 | 0.0396 | 0.0374 | 0.0378 | 0.0375 | 0.0353 | |
| 6 | 0.0379 | 0.0384 | 0.0385 | 0.0398 | 0.0344 | 0.0334 | 0.0326 | 0.0316 | 0.0311 | |
| Avg | 0.0358 | 0.0352 | 0.0349 | 0.0362 | 0.0290 | 0.0282 | 0.0274 | 0.0273 | 0.0269 | |
| SSIM | 1 | 0.8928 | 0.9007 | 0.8939 | 0.8850 | 0.9250 | 0.9286 | 0.9299 | 0.9271 | 0.9343 |
| 2 | 0.8464 | 0.8594 | 0.8536 | 0.8259 | 0.8966 | 0.8956 | 0.8988 | 0.9026 | 0.9067 | |
| 3 | 0.7739 | 0.7910 | 0.7873 | 0.7165 | 0.8413 | 0.8463 | 0.8462 | 0.8543 | 0.8559 | |
| 4 | 0.6811 | 0.6740 | 0.6785 | 0.7147 | 0.7608 | 0.7687 | 0.7674 | 0.7835 | 0.7901 | |
| 5 | 0.7429 | 0.7498 | 0.7462 | 0.7377 | 0.8006 | 0.8056 | 0.804 | 0.8080 | 0.8175 | |
| 6 | 0.7748 | 0.7800 | 0.7723 | 0.7749 | 0.8178 | 0.8193 | 0.8231 | 0.8265 | 0.8300 | |
| Avg | 0.7853 | 0.7925 | 0.7886 | 0.7758 | 0.8404 | 0.8440 | 0.8449 | 0.8503 | 0.8557 | |
| UIQI | 1 | 0.8140 | 0.8293 | 0.8190 | 0.8152 | 0.8979 | 0.9113 | 0.9131 | 0.9107 | 0.9175 |
| 2 | 0.8152 | 0.8387 | 0.8275 | 0.8023 | 0.9107 | 0.9145 | 0.9209 | 0.9225 | 0.9200 | |
| 3 | 0.8165 | 0.8498 | 0.8400 | 0.7815 | 0.9132 | 0.9178 | 0.9254 | 0.9264 | 0.9177 | |
| 4 | 0.8275 | 0.8262 | 0.8370 | 0.8806 | 0.8976 | 0.9043 | 0.9124 | 0.9178 | 0.9173 | |
| 5 | 0.9222 | 0.9247 | 0.9224 | 0.9174 | 0.9444 | 0.9491 | 0.9534 | 0.9528 | 0.9544 | |
| 6 | 0.9206 | 0.9215 | 0.9193 | 0.9174 | 0.9368 | 0.9408 | 0.9465 | 0.9475 | 0.9468 | |
| Avg | 0.8527 | 0.8650 | 0.8609 | 0.8524 | 0.9168 | 0.9230 | 0.9286 | 0.9296 | 0.9290 | |
| CC | 1 | 0.8320 | 0.8370 | 0.8401 | 0.8331 | 0.9022 | 0.9116 | 0.9167 | 0.9148 | 0.9195 |
| 2 | 0.8369 | 0.8501 | 0.8473 | 0.8234 | 0.9153 | 0.9160 | 0.9232 | 0.9247 | 0.9278 | |
| 3 | 0.8454 | 0.8654 | 0.8551 | 0.8013 | 0.9165 | 0.9198 | 0.9266 | 0.9275 | 0.9284 | |
| 4 | 0.8344 | 0.8281 | 0.8459 | 0.8813 | 0.8994 | 0.9049 | 0.9137 | 0.9187 | 0.9185 | |
| 5 | 0.9222 | 0.9249 | 0.9238 | 0.9180 | 0.9451 | 0.9497 | 0.954 | 0.9532 | 0.9553 | |
| 6 | 0.9210 | 0.9217 | 0.9198 | 0.9177 | 0.9376 | 0.9412 | 0.9472 | 0.9478 | 0.9477 | |
| Avg | 0.8653 | 0.8712 | 0.8720 | 0.8625 | 0.9194 | 0.9239 | 0.9302 | 0.9311 | 0.9328 | |
| ERGAS | ALL | 1.3146 | 1.2666 | 1.2612 | 1.3488 | 1.0298 | 1.0047 | 0.9697 | 0.9931 | 0.9675 |
| SAM | ALL | 11.1256 | 10.9104 | 10.8326 | 11.4756 | 8.8898 | 8.6679 | 8.2846 | 8.2494 | 8.1507 |
| Band | STARFM | FSDAF | Fit-FC | EDCSTFN | GAN-STFM | MLFF-GAN | STF-Trans | CTSTFM | DSEPGAN | |
|---|---|---|---|---|---|---|---|---|---|---|
| RMSE | 1 | 0.0143 | 0.0149 | 0.0140 | 0.0151 | 0.0146 | 0.0161 | 0.0147 | 0.0167 | 0.0148 |
| 2 | 0.0200 | 0.0207 | 0.0201 | 0.0200 | 0.0207 | 0.0223 | 0.0197 | 0.0234 | 0.0212 | |
| 3 | 0.0251 | 0.0258 | 0.0251 | 0.0257 | 0.0264 | 0.0269 | 0.0249 | 0.0320 | 0.0258 | |
| 4 | 0.0376 | 0.0397 | 0.0385 | 0.0394 | 0.041 | 0.0400 | 0.0357 | 0.0532 | 0.0366 | |
| 5 | 0.0568 | 0.0621 | 0.0565 | 0.0590 | 0.054 | 0.0533 | 0.0583 | 0.0660 | 0.0514 | |
| 6 | 0.0455 | 0.0515 | 0.0446 | 0.0407 | 0.0399 | 0.0404 | 0.0432 | 0.0476 | 0.0374 | |
| Avg | 0.0332 | 0.0358 | 0.0331 | 0.0333 | 0.0328 | 0.0332 | 0.0328 | 0.0398 | 0.0312 | |
| SSIM | 1 | 0.9132 | 0.9125 | 0.9233 | 0.9228 | 0.9185 | 0.9059 | 0.9167 | 0.9166 | 0.9171 |
| 2 | 0.8730 | 0.8709 | 0.8800 | 0.8897 | 0.8801 | 0.8709 | 0.8857 | 0.8843 | 0.8832 | |
| 3 | 0.8350 | 0.8331 | 0.8438 | 0.8455 | 0.8413 | 0.8356 | 0.8480 | 0.8391 | 0.8492 | |
| 4 | 0.7292 | 0.7294 | 0.7405 | 0.7083 | 0.7037 | 0.7239 | 0.7291 | 0.6910 | 0.7481 | |
| 5 | 0.5697 | 0.5220 | 0.5532 | 0.5513 | 0.5696 | 0.5948 | 0.5663 | 0.5766 | 0.6168 | |
| 6 | 0.6408 | 0.5754 | 0.6274 | 0.6267 | 0.6438 | 0.6533 | 0.6237 | 0.6473 | 0.6739 | |
| Avg | 0.7601 | 0.7405 | 0.7614 | 0.7574 | 0.7595 | 0.7641 | 0.7616 | 0.7591 | 0.7814 | |
| UIQI | 1 | 0.7152 | 0.7062 | 0.7124 | 0.6132 | 0.6844 | 0.6839 | 0.7086 | 0.5813 | 0.7361 |
| 2 | 0.7019 | 0.6890 | 0.6943 | 0.6560 | 0.6807 | 0.7125 | 0.7207 | 0.5637 | 0.7305 | |
| 3 | 0.7072 | 0.6965 | 0.7007 | 0.6573 | 0.696 | 0.7179 | 0.7327 | 0.5366 | 0.7349 | |
| 4 | 0.7857 | 0.7794 | 0.7827 | 0.7431 | 0.766 | 0.7826 | 0.8112 | 0.6991 | 0.8191 | |
| 5 | 0.7531 | 0.7198 | 0.7313 | 0.6645 | 0.7659 | 0.7940 | 0.7571 | 0.7436 | 0.8171 | |
| 6 | 0.7205 | 0.6531 | 0.7021 | 0.7316 | 0.7798 | 0.7918 | 0.7670 | 0.7713 | 0.8183 | |
| Avg | 0.7306 | 0.7073 | 0.7206 | 0.6776 | 0.7288 | 0.7471 | 0.7495 | 0.6493 | 0.7760 | |
| CC | 1 | 0.7158 | 0.7076 | 0.7160 | 0.6582 | 0.6878 | 0.6870 | 0.7151 | 0.6073 | 0.7396 |
| 2 | 0.7071 | 0.6916 | 0.7007 | 0.6958 | 0.6888 | 0.7169 | 0.7250 | 0.5902 | 0.7335 | |
| 3 | 0.7130 | 0.7004 | 0.7086 | 0.6834 | 0.6994 | 0.7192 | 0.7352 | 0.5469 | 0.7353 | |
| 4 | 0.8075 | 0.8011 | 0.8098 | 0.7832 | 0.769 | 0.7871 | 0.8265 | 0.7243 | 0.8245 | |
| 5 | 0.7909 | 0.7666 | 0.7832 | 0.7695 | 0.7864 | 0.7993 | 0.8111 | 0.7750 | 0.8228 | |
| 6 | 0.7873 | 0.7473 | 0.7786 | 0.7871 | 0.7883 | 0.7977 | 0.8070 | 0.7949 | 0.8208 | |
| Avg | 0.7536 | 0.7358 | 0.7495 | 0.7295 | 0.7366 | 0.7512 | 0.7700 | 0.6731 | 0.7794 | |
| ERGAS | ALL | 2.0655 | 2.2230 | 2.0322 | 2.0245 | 1.9837 | 2.0300 | 2.0232 | 2.3816 | 1.9096 |
| SAM | ALL | 16.2826 | 17.0293 | 16.2303 | 16.8170 | 16.7738 | 16.5577 | 15.9806 | 18.0406 | 15.7259 |
| Band | STARFM | FSDAF | Fit-FC | EDCSTFN | GAN-STFM | MLFF-GAN | STF-Trans | CTSTFM | DSEPGAN | |
|---|---|---|---|---|---|---|---|---|---|---|
| RMSE | 1 | 0.0359 | 0.0071 | 0.0083 | 0.0119 | 0.0064 | 0.0060 | 0.0073 | 0.0080 | 0.0055 |
| 2 | 0.0175 | 0.0139 | 0.0082 | 0.0161 | 0.0079 | 0.0076 | 0.0093 | 0.0090 | 0.0068 | |
| 3 | 0.0314 | 0.0074 | 0.0081 | 0.0148 | 0.0071 | 0.0080 | 0.0086 | 0.0093 | 0.0064 | |
| 4 | 0.0215 | 0.0145 | 0.0155 | 0.0163 | 0.0159 | 0.0160 | 0.0175 | 0.0203 | 0.0149 | |
| Avg | 0.0266 | 0.0107 | 0.0100 | 0.0148 | 0.0093 | 0.0094 | 0.0107 | 0.0117 | 0.0084 | |
| SSIM | 1 | 0.7382 | 0.9422 | 0.9319 | 0.8439 | 0.9605 | 0.9592 | 0.9469 | 0.9294 | 0.9631 |
| 2 | 0.8550 | 0.8850 | 0.9465 | 0.8424 | 0.9530 | 0.9487 | 0.9365 | 0.9381 | 0.9608 | |
| 3 | 0.7943 | 0.9485 | 0.9436 | 0.8435 | 0.9531 | 0.9454 | 0.9332 | 0.9198 | 0.9625 | |
| 4 | 0.8437 | 0.8927 | 0.8820 | 0.8708 | 0.8807 | 0.8853 | 0.8427 | 0.8188 | 0.8922 | |
| Avg | 0.8078 | 0.9171 | 0.9260 | 0.8502 | 0.9368 | 0.9347 | 0.9148 | 0.9015 | 0.9447 | |
| UIQI | 1 | 0.0870 | 0.8394 | 0.6238 | 0.7275 | 0.7756 | 0.8418 | 0.6868 | 0.6393 | 0.8706 |
| 2 | 0.5866 | 0.7412 | 0.7293 | 0.6288 | 0.6974 | 0.7838 | 0.6234 | 0.6035 | 0.8286 | |
| 3 | 0.1253 | 0.8300 | 0.7397 | 0.6529 | 0.7791 | 0.8037 | 0.6806 | 0.6136 | 0.8600 | |
| 4 | 0.6177 | 0.8038 | 0.7628 | 0.6778 | 0.7563 | 0.7673 | 0.6588 | 0.5704 | 0.7986 | |
| Avg | 0.3541 | 0.8036 | 0.7139 | 0.6718 | 0.7521 | 0.7992 | 0.6624 | 0.6067 | 0.8395 | |
| CC | 1 | 0.1660 | 0.8581 | 0.7842 | 0.8467 | 0.8510 | 0.8466 | 0.7513 | 0.7245 | 0.8775 |
| 2 | 0.6835 | 0.8131 | 0.7542 | 0.7782 | 0.8205 | 0.7893 | 0.7313 | 0.6960 | 0.8378 | |
| 3 | 0.1841 | 0.8331 | 0.7889 | 0.7786 | 0.8373 | 0.8114 | 0.7308 | 0.6747 | 0.8626 | |
| 4 | 0.6210 | 0.8054 | 0.7654 | 0.7129 | 0.7613 | 0.7681 | 0.675 | 0.5975 | 0.7989 | |
| Avg | 0.4136 | 0.8274 | 0.7732 | 0.7791 | 0.8175 | 0.8038 | 0.7221 | 0.6732 | 0.8442 | |
| ERGAS | ALL | 7.9286 | 2.4543 | 2.1782 | 3.5716 | 1.8916 | 1.8925 | 2.1988 | 2.3437 | 1.3594 |
| SAM | ALL | 34.8451 | 14.6957 | 16.5414 | 18.0913 | 15.0764 | 15.2694 | 17.8568 | 19.2186 | 13.8698 |
| MODEL | Parameters (M) | FLOPs | Batch-Time (s) | Test-Time (s) | |
|---|---|---|---|---|---|
| STARFM | \ | \ | \ | \ | 541.13 |
| FSDAF | \ | \ | \ | \ | 1262.19 |
| Fit-FC | \ | \ | \ | \ | 6544.50 |
| EDCSTFN | \ | 0.28 | 1.49 × 1011 | 0.16 | 8.55 |
| GAN-STFM | Generator | 0.58 | 3.02 × 1011 | 0.77 | 11.24 |
| Discriminator | 3.67 | 8.24 × 107 | |||
| MLFF-GAN | Generator | 5.93 | 1.09 × 1011 | 0.35 | 17.30 |
| Discriminator | 2.78 | 3.02 × 1010 | |||
| STF-Trans | \ | 6.01 | 4.41 × 1011 | 0.28 | 7.06 |
| CTSTFM | \ | 6.30 | 3.06 × 1012 | 1.36 | 15.74 |
| DSEPGAN | Generator | 5.40 | 2.33 × 1011 | 0.68 | 15.10 |
| Discriminator | 2.78 | 3.02 × 1010 | |||
| Model | CIA | LGC | Parameters | FLOPs (G) | Batch-Time (s) | ||||
|---|---|---|---|---|---|---|---|---|---|
| RMSE | SSIM | ERGAS | RMSE | SSIM | ERGAS | ||||
| DSEPGAN-Diff | 0.0302 | 0.8435 | 1.0669 | 0.0313 | 0.7775 | 1.9279 | 5,179,218 | 201.5 | 0.6403 |
| DSPGAN | 0.0275 | 0.8519 | 0.9730 | 0.0315 | 0.7754 | 1.9169 | 4,629,138 | 189.5 | 0.4657 |
| DSEPGAN-Conv | 0.0289 | 0.8519 | 1.0145 | 0.0328 | 0.7768 | 2.0348 | 4,536,786 | 189.7 | 0.4782 |
| DSEPGAN-Trans | 0.0278 | 0.8475 | 0.9992 | 0.0334 | 0.7722 | 2.0900 | 10,848,306 | 186.7 | 0.5779 |
| DSEPGAN w/o ConvFFN | 0.0281 | 0.8504 | 1.0010 | 0.0323 | 0.7806 | 1.9838 | 4,712,322 | 201.3 | 0.6146 |
| DSEPGAN | 0.0269 | 0.8557 | 0.9675 | 0.0312 | 0.7814 | 1.9096 | 5,396,946 | 232.8 | 0.6825 |
| Kernel Size | RMSE | SSIM | ERGAS | Parameters | FLOPs (G) | Batch-Time (s) |
|---|---|---|---|---|---|---|
| (3,3,3,3) | 0.0272 | 0.8528 | 0.9687 | 5,312,466 | 225.5 | 0.6306 |
| (7,7,7,7) | 0.0279 | 0.8542 | 0.9971 | 5,333,586 | 227.3 | 0.6463 |
| (9,9,9,9) | 0.0270 | 0.8546 | 0.9844 | 5,350,482 | 228.8 | 0.6528 |
| (13,13,13,13) | 0.0269 | 0.8557 | 0.9675 | 5,396,946 | 232.8 | 0.6762 |
| (3,7,9,13) | 0.0271 | 0.8525 | 0.9726 | 5,334,738 | 230.7 | 0.6663 |
| (3,7,15,31) | 0.0272 | 0.8543 | 0.9615 | 5,386,578 | 252.4 | 0.8043 |
| N | RMSE | SSIM | ERGAS | Parameters | FLOPs (G) | Batch-Time (s) |
|---|---|---|---|---|---|---|
| 1 | 0.0285 | 0.8491 | 1.0037 | 4,685,394 | 198.9 | 0.4361 |
| 2 | 0.0278 | 0.8508 | 0.9879 | 5,041,170 | 215.9 | 0.5220 |
| 3 | 0.0269 | 0.8557 | 0.9675 | 5,396,946 | 232.8 | 0.6762 |
| 4 | 0.0269 | 0.8557 | 0.9642 | 5,752,722 | 249.7 | 0.7773 |
| Stage | RMSE | SSIM | ERGAS | Parameters | FLOPs (G) | Batch-Time (s) |
|---|---|---|---|---|---|---|
| 3 | 0.0286 | 0.8475 | 1.0370 | 1,329,714 | 160.7 | 0.5989 |
| 4 | 0.0269 | 0.8557 | 0.9675 | 5,396,946 | 232.8 | 0.6762 |
| 5 | 0.0256 | 0.8632 | 0.9268 | 21,512,466 | 304.4 | 0.7321 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, D.; Xu, L.; Wu, K.; Liu, H.; Jiang, M. DSEPGAN: A Dual-Stream Enhanced Pyramid Based on Generative Adversarial Network for Spatiotemporal Image Fusion. Remote Sens. 2025, 17, 4050. https://doi.org/10.3390/rs17244050
Zhou D, Xu L, Wu K, Liu H, Jiang M. DSEPGAN: A Dual-Stream Enhanced Pyramid Based on Generative Adversarial Network for Spatiotemporal Image Fusion. Remote Sensing. 2025; 17(24):4050. https://doi.org/10.3390/rs17244050
Chicago/Turabian StyleZhou, Dandan, Lina Xu, Ke Wu, Huize Liu, and Mengting Jiang. 2025. "DSEPGAN: A Dual-Stream Enhanced Pyramid Based on Generative Adversarial Network for Spatiotemporal Image Fusion" Remote Sensing 17, no. 24: 4050. https://doi.org/10.3390/rs17244050
APA StyleZhou, D., Xu, L., Wu, K., Liu, H., & Jiang, M. (2025). DSEPGAN: A Dual-Stream Enhanced Pyramid Based on Generative Adversarial Network for Spatiotemporal Image Fusion. Remote Sensing, 17(24), 4050. https://doi.org/10.3390/rs17244050

