Deep Learning for Spatio-Temporal Fusion in Land Surface Temperature Estimation: A Comprehensive Survey, Experimental Analysis, and Future Trends
Highlights
- Reviews and analyzes deep learning-based spatio-temporal fusion methods for land surface temperature estimation, highlighting their strengths, limitations, and adaptation needs.
- Provides a comprehensive experimental evaluation on a newly released open-access dataset, revealing pronounced performance variability among state-of-the-art spatio-temporal fusion models and highlighting critical inconsistencies in existing fusion strategies when applied to thermal signals.
- Supports the development of spatio-temporal fusion methods that explicitly account for land surface temperature’s spatio-temporal characteristics and physical constraints.
- Offers a structured reference through taxonomy, benchmark dataset, and experimental analysis to guide future research and improve model generalizability for land surface temperature estimation.
Abstract
1. Introduction
- We provide a comprehensive overview of DL-based STF methods for LST, highlighting their architectures, objectives, and adaptations for LST’s spatio-temporal dynamics.
- We introduce an open-source MODIS-Landsat LST pair dataset (STF-LST), comprising 51 images spanning 2013–2024, which serves as the first benchmark in the field.
- We conduct experimental analysis of state-of-the-art DL methods by offering quantitative and qualitative insights into their performance, limitations, and practical applicability for LST estimation.
2. Satellite-Derived LST
2.1. LST Concept and Retrieval
- LST retrieval is underdetermined because, for every radiance measurement across N TIR channels, there are N unknown LSEs and an unknown LST.
- The radiances measured in different TIR channels are highly correlated, making the system of equations unstable and sensitive to small errors in the data.
2.2. Trade-Offs in Spatial and Temporal Resolution for LST Retrieval
- Spatial Resolution: defines the size of a pixel in the satellite image, which determines the smallest detectable feature. This is crucial for accurate LST retrieval, as fine-scale spatial resolution captures smaller, more localized temperature variations [25].
- Temporal Resolution: refers to the frequency at which a satellite revisits the same area. A higher temporal resolution is critical for monitoring dynamic temperature changes over time [26].
2.3. Differences Between SR and LST Dynamics
2.3.1. Spatial Variations
2.3.2. Temporal Variations
3. STF Problem Formulation for LST
3.1. Mathematical Definition
3.2. Loss Functions
3.2.1. Content Loss
3.2.2. Vision Loss
3.2.3. Feature Loss
3.2.4. Spectral Loss
3.2.5. Adversarial Loss
3.3. Evaluation Metrics
- Error Assessment Metrics: These metrics measure the numerical discrepancy between the fused LST and the reference high-resolution one. Common examples include RMSE, MAE, relative MAE (rMAE), and the coefficient of determination (). ERGAS is also frequently used to assess global normalized error, with lower values indicate higher fidelity.
- Quality Assessment Metrics: Rather than measuring numerical differences, this category assesses perceptual or structural similarity. SSIM [125], Peak Signal-to-Noise Ratio (PSNR) [130], correlation coefficient (CC) [131], Spectral Angle Mapper (SAM) [132], Perceptual Image Patch Similarity (LPIPS) [133], and Universal Image Quality Index (UIQI) [134] are commonly adopted to quantify texture preservation, sharpness, and spectral consistency in reconstructed LST fields.
- Efficiency Metrics: Beyond accuracy, computational efficiency is increasingly emphasized, especially for large-scale or near-real-time applications. Metrics include inference time (per fused scene), memory footprint, and scalability with spatial or temporal input size. For example, DL-based STF models are reported to achieve inference speeds orders of magnitude faster than classical algorithms such as ESTARFM [74].
4. Taxonomy of DL-Based STF Methods
4.1. Architectures
4.1.1. Convolutional Neural Networks
- Convolutional Layers: Extract spatial features using local filters.
- Pooling Layers: Reduce spatial dimensions and improve feature robustness.
- Normalization Layers: Stabilize and accelerate training.
- Activation Layers: Introduce non-linearity.
- Fully Connected Layers: Integrate learned features for prediction.
4.1.2. Autoencoder
4.1.3. Generative Adversarial Networks
4.1.4. Vision Transformers
4.1.5. Recurrent Neural Networks
4.2. Learning Paradigm
- 1.
- Supervised learning.This paradigm relies on paired training samples, where fine-resolution observations are available as targets. Most existing STF models fall into this category. While supervised learning has proven effective, its dependence on cloud-free, fine-resolution LST data limits scalability in real LST applications.
- 2.
- Unsupervised learning. Here, the model is trained without fine-resolution labels, meaning is unknown. Only one recent study has explored an unsupervised STF formulation [71]. This direction is especially promising for LST.
- 3.
- Self-supervised learning. Positioned between supervised and unsupervised paradigms, self-supervised learning creates proxy tasks or pseudo-labels directly from the data [180]. To date, only one STF method has adopted this strategy [69]. Extending self-supervised schemes to LST STF remains largely unexplored and could help reduce reliance on scarce fine-resolution LST images.
- 4.
- Collaborative learning. This strategy treats STF as a cooperative process, where different learners interact to improve fusion quality [181]. Only one study has explicitly framed STF in this way [70]. Such paradigms could be beneficial for LST, as they may better exploit complementary cues between coarse and fine thermal observations.
4.3. Training Strategy
- 1.
- Residual learning. Residual learning [182] introduces skip connections that allow the network to learn a residual function instead of the full mapping, which stabilizes the optimization of DL architectures [183]. The residual formulation is defined in Equation (15).where is the desired underlying function, and is the residual mapping. The output of the residual block becomes . As shown in Table 9, residual learning is the most common strategy across STF methods due to its ability to preserve essential spatial and temporal structures in LST data.
- 2.
- Attention mechanisms. Attention mechanisms enable a model to focus on the most informative components of the input. In STF, four forms are used: channel, spatial, temporal, and feature attention. Channel attention assigns importance to individual spectral bands, although its usefulness is limited for LST-focused STF because LST data generally contains only one thermal band. Spatial attention highlights salient spatial regions, helping the model detect areas with strong temperature variability or sharp thermal gradients. Temporal attention emphasizes key time steps, allowing the network to capture rapid LST fluctuations and short-term dynamics. Feature attention evaluates the relevance of entire feature maps. As summarized in Table 9, all ViT-based STF methods incorporate spatial attention, consistent with the fundamental role of attention in ViT architectures.
- 3.
- Normalization. Normalization refers to a set of transformations applied to stabilize and accelerate model training by enforcing desired statistical properties such as centering, scaling, or decorrelation [184]. In STF, five main normalization strategies are encountered. Batch Normalization (BN) [185] mitigates internal covariate shift by standardizing activations within each mini-batch. Thus, given an activation a, BN computes its normalized form as shown in Equation (16).where and denote the mini-batch mean and variance, and ensures numerical stability. Group Normalization (GN) [186] standardizes activations within predefined groups. Instance Normalization (IN) [187] normalizes each sample independently and is used to reduce contrast-related variations. Spectral Normalization (SN) [188], mainly applied in GAN-based STF methods. It stabilizes the discriminator training by constraining the Lipschitz constant of weight matrices. Finally, Switchable Normalization (SwN) [189] combines three types of statistics: channel-wise, layer-wise, and mini-batch-wise. As shown in Table 9, STF methods vary widely in their choice of normalization strategy, reflecting different architectural needs and constraints, especially when dealing with LST data.
- 4.
| Training Strategy | List of Methods |
|---|---|
| Residual Learning | [37,61,64,65,66,67,69,72,74,75,76,78,79,80,81,83,86,87,90] |
| Attention Mechanism | Channel Attention: [68,75,87,89] Spatial Attention: [81,83,84,85,86,87,88,90] Temporal Attention: [77,78,79] Feature Attention: [78] |
| Normalization | Batch Normalization: [37,61,65,68,74,75,76,79,80,92] Group Normalization: [81,87] Instance Normalization: [78] Spectral Normalization: [75,76,79,82] Switchable Normalization: [75,76,84] |
| Dropout | [80,82,83,86,91,92] |
4.4. Incorporation of Pre-Trained Models
- 1.
- Feature Extraction: In this strategy, pre-trained models are used solely to extract informative features from the input data without further fine-tuning [193]. Within STF, feature extraction is often employed to compute spectral or perceptual losses (see Section 3.2.3 and Section 3.2.4).
- 2.
- Transfer Learning: Transfer learning aims to improve performance on a target task by adapting knowledge from a related source domain [194]. Formally, given a source dataset with task and a target dataset with task , transfer learning seeks to enhance the target predictive function by utilizing knowledge from and , where or . In STF for LST, Chen et al. [61] pre-trained an autoencoder on simulated LST data generated by downscaling MODIS measurements to 4 km resolution via pixel aggregation, and transferred the learned parameters to initialize their fusion framework. Additionally, Huang et al. [81] proposed a fine-tuning strategy to adapt pre-trained models to new regions, but did not employ pre-training for initial model training, and thus is not included under this category.
5. Experiment Analysis and Results
5.1. Region of Interest
5.2. Satellite Data
5.3. Quantitative Comparison
5.4. Qualitative Comparison
6. Limitations and Future Trends
6.1. Inaccurate LST Estimations
6.2. Cloudy Conditions
6.3. Poor Generalizability
6.4. Leveraging Pretrained Models
6.5. Insufficient Spatial Resolution
6.6. Joint Spatio-Temporal Deep Learning Architectures
6.7. Integration of Large Language Models
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Bureau, P.R. 2007 World Population Data Sheet; Population Reference: Washington, DC, USA, 2007. [Google Scholar]
- Vanos, J.; Cakmak, S.; Kalkstein, L.; Yagouti, A. Association of weather and air pollution interactions on daily mortality in 12 Canadian cities. Air Qual. Atmos. Health 2015, 8, 307–320. [Google Scholar] [CrossRef]
- Nuruzzaman, M. Urban heat island: Causes, effects and mitigation measures-a review. Int. J. Environ. Monit. Anal. 2015, 3, 67–73. [Google Scholar] [CrossRef]
- Masiol, M.; Agostinelli, C.; Formenton, G.; Tarabotti, E.; Pavoni, B. Thirteen years of air pollution hourly monitoring in a large city: Potential sources, trends, cycles and effects of car-free days. Sci. Total Environ. 2014, 494, 84–96. [Google Scholar] [CrossRef]
- Hulley, G.C.; Ghent, D.; Göttsche, F.M.; Guillevic, P.C.; Mildrexler, D.J.; Coll, C. 3-Land Surface Temperature. In Taking the Temperature of the Earth; Hulley, G.C., Ghent, D., Eds.; Elsevier: Amsterdam, The Netherlands, 2019; pp. 57–127. [Google Scholar] [CrossRef]
- Li, X.; Zhou, W.; Ouyang, Z. Relationship between land surface temperature and spatial pattern of greenspace: What are the effects of spatial resolution? Landsc. Urban Plan. 2013, 114, 1–8. [Google Scholar] [CrossRef]
- Kerr, Y.H.; Lagouarde, J.P.; Nerry, F.; Ottlé, C. Land surface temperature retrieval techniques and applications: Case of the AVHRR. In Thermal Remote Sensing in Land Surface Processing; CRC Press: Boca Raton, FL, USA, 2004; pp. 33–109. [Google Scholar]
- Schneider, P.; Hook, S.J. Space observations of inland water bodies show rapid surface warming since 1985. Geophys. Res. Lett. 2010, 37, L22405. [Google Scholar] [CrossRef]
- Hall, D.K.; Comiso, J.C.; DiGirolamo, N.E.; Shuman, C.A.; Key, J.R.; Koenig, L.S. A satellite-derived climate-quality data record of the clear-sky surface temperature of the Greenland ice sheet. J. Clim. 2012, 25, 4785–4798. [Google Scholar] [CrossRef]
- Ibrahim, I.; Samah, A.A.; Fauzi, R. Land surface temperature and biophysical factors in urban planning. In Proceedings of the International Conference on Ecosystem, Environment and Sustainable Development, Kuala Lumpur, Malaysia, 21–23 June 2012; Volume 68, pp. 1792–1797. [Google Scholar]
- Maimaitiyiming, M.; Ghulam, A.; Tiyip, T.; Pla, F.; Latorre-Carmona, P.; Halik, Ü.; Sawut, M.; Caetano, M. Effects of green space spatial pattern on land surface temperature: Implications for sustainable urban planning and climate change adaptation. ISPRS J. Photogramm. Remote Sens. 2014, 89, 59–66. [Google Scholar] [CrossRef]
- Luyssaert, S.; Jammet, M.; Stoy, P.C.; Estel, S.; Pongratz, J.; Ceschia, E.; Churkina, G.; Don, A.; Erb, K.; Ferlicoq, M.; et al. Land management and land-cover change have impacts of similar magnitude on surface temperature. Nat. Clim. Chang. 2014, 4, 389–393. [Google Scholar] [CrossRef]
- Kafy, A.A.; Rahman, M.S.; Faisal, A.A.; Hasan, M.M.; Islam, M. Modelling future land use land cover changes and their impacts on land surface temperatures in Rajshahi, Bangladesh. Remote Sens. Appl. Soc. Environ. 2020, 18, 100314. [Google Scholar] [CrossRef]
- Li, Z.L.; Tang, B.H.; Wu, H.; Ren, H.; Yan, G.; Wan, Z.; Trigo, I.F.; Sobrino, J.A. Satellite-derived land surface temperature: Current status and perspectives. Remote Sens. Environ. 2013, 131, 14–37. [Google Scholar] [CrossRef]
- Wan, Z.; Dozier, J. A generalized split-window algorithm for retrieving land-surface temperature from space. IEEE Trans. Geosci. Remote Sens. 1996, 34, 892–905. [Google Scholar]
- Gillespie, A.; Rokugawa, S.; Matsunaga, T.; Cothern, J.S.; Hook, S.; Kahle, A.B. A temperature and emissivity separation algorithm for Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) images. IEEE Trans. Geosci. Remote Sens. 1998, 36, 1113–1126. [Google Scholar] [CrossRef]
- Sun, D.; Pinker, R.T. Estimation of land surface temperature from a Geostationary Operational Environmental Satellite (GOES-8). J. Geophys. Res. Atmos. 2003, 108, 4326. [Google Scholar] [CrossRef]
- Trigo, I.F.; Dacamara, C.C.; Viterbo, P.; Roujean, J.L.; Olesen, F.; Barroso, C.; Camacho-de Coca, F.; Carrer, D.; Freitas, S.C.; García-Haro, J.; et al. The satellite application facility for land surface analysis. Int. J. Remote Sens. 2011, 32, 2725–2744. [Google Scholar] [CrossRef]
- Malakar, N.K.; Hulley, G.C.; Hook, S.J.; Laraby, K.; Cook, M.; Schott, J.R. An operational land surface temperature product for Landsat thermal data: Methodology and validation. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5717–5735. [Google Scholar] [CrossRef]
- Koetz, B.; Bastiaanssen, W.; Berger, M.; Defourney, P.; Del Bello, U.; Drusch, M.; Drinkwater, M.; Duca, R.; Fernandez, V.; Ghent, D.; et al. High spatio-temporal resolution land surface temperature mission-a copernicus candidate mission in support of agricultural monitoring. In Proceedings of the Igarss 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 8160–8162. [Google Scholar]
- Shen, Y.; Shen, H.; Cheng, Q.; Zhang, L. Generating comparable and fine-scale time series of summer land surface temperature for thermal environment monitoring. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 2136–2147. [Google Scholar] [CrossRef]
- Chen, B.; Huang, B.; Xu, B. Comparison of spatiotemporal fusion models: A review. Remote Sens. 2015, 7, 1798–1835. [Google Scholar] [CrossRef]
- Shen, H.; Meng, X.; Zhang, L. An integrated framework for the spatio–temporal–spectral fusion of remote sensing images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7135–7148. [Google Scholar] [CrossRef]
- Zhang, J.; Li, J. Chapter 11—Spacecraft. In Spatial Cognitive Engine Technology; Zhang, J., Li, J., Eds.; Academic Press: Cambridge, MA, USA, 2023; pp. 129–162. [Google Scholar] [CrossRef]
- Gibson, P. Chapter 1—A systematic view of remote sensing (Second Edition). In Advanced Remote Sensing; Liang, S., Wang, J., Eds.; Academic Press: Cambridge, MA, USA, 2020; pp. 1–57. [Google Scholar] [CrossRef]
- Zhan, W.; Chen, Y.; Zhou, J.; Wang, J.; Liu, W.; Voogt, J.; Zhu, X.; Quan, J.; Li, J. Disaggregation of remotely sensed land surface temperature: Literature survey, taxonomy, issues, and caveats. Remote Sens. Environ. 2013, 131, 119–139. [Google Scholar] [CrossRef]
- Mao, Q.; Peng, J.; Wang, Y. Resolution enhancement of remotely sensed land surface temperature: Current status and perspectives. Remote Sens. 2021, 13, 1306. [Google Scholar] [CrossRef]
- Belgiu, M.; Stein, A. Spatiotemporal image fusion in remote sensing. Remote Sens. 2019, 11, 818. [Google Scholar] [CrossRef]
- Agam, N.; Kustas, W.P.; Anderson, M.C.; Li, F.; Neale, C.M. A vegetation index based technique for spatial sharpening of thermal imagery. Remote Sens. Environ. 2007, 107, 545–558. [Google Scholar] [CrossRef]
- Nichol, J. An emissivity modulation method for spatial enhancement of thermal satellite images in urban heat island analysis. Photogramm. Eng. Remote Sens. 2009, 75, 547–556. [Google Scholar] [CrossRef]
- Duan, S.B.; Li, Z.L. Spatial downscaling of MODIS land surface temperatures using geographically weighted regression: Case study in northern China. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6458–6469. [Google Scholar] [CrossRef]
- Zhu, X.; Song, X.; Leng, P.; Hu, R. Spatial downscaling of land surface temperature with the multi-scale geographically weighted regression. Natl. Remote Sens. Bull. 2021, 25, 1749–1766. [Google Scholar]
- Agathangelidis, I.; Cartalis, C. Improving the disaggregation of MODIS land surface temperatures in an urban environment: A statistical downscaling approach using high-resolution emissivity. Int. J. Remote Sens. 2019, 40, 5261–5286. [Google Scholar] [CrossRef]
- Dominguez, A.; Kleissl, J.; Luvall, J.C.; Rickman, D.L. High-resolution urban thermal sharpener (HUTS). Remote Sens. Environ. 2011, 115, 1772–1780. [Google Scholar] [CrossRef]
- Zhan, W.; Huang, F.; Quan, J.; Zhu, X.; Gao, L.; Zhou, J.; Ju, W. Disaggregation of remotely sensed land surface temperature: A new dynamic methodology. J. Geophys. Res. Atmos. 2016, 121, 10–538. [Google Scholar] [CrossRef]
- Zhu, X.; Cai, F.; Tian, J.; Williams, T.K.A. Spatiotemporal fusion of multisource remote sensing data: Literature survey, taxonomy, principles, applications, and future directions. Remote Sens. 2018, 10, 527. [Google Scholar] [CrossRef]
- Song, H.; Liu, Q.; Wang, G.; Hang, R.; Huang, B. Spatiotemporal satellite image fusion using deep convolutional neural networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 821–829. [Google Scholar] [CrossRef]
- Gao, F.; Masek, J.; Schwaller, M.; Hall, F. On the blending of the Landsat and MODIS surface reflectance: Predicting daily Landsat surface reflectance. IEEE Trans. Geosci. Remote Sens. 2006, 44, 2207–2218. [Google Scholar]
- Zhu, X.; Chen, J.; Gao, F.; Chen, X.; Masek, J.G. An enhanced spatial and temporal adaptive reflectance fusion model for complex heterogeneous regions. Remote Sens. Environ. 2010, 114, 2610–2623. [Google Scholar] [CrossRef]
- Kim, J.; Hogue, T.S. Evaluation and sensitivity testing of a coupled Landsat-MODIS downscaling method for land surface temperature and vegetation indices in semi-arid regions. J. Appl. Remote Sens. 2012, 6, 063569. [Google Scholar] [CrossRef]
- Wu, P.; Shen, H.; Ai, T.; Liu, Y. Land-surface temperature retrieval at high spatial and temporal resolutions based on multi-sensor fusion. Int. J. Digit. Earth 2013, 6, 113–133. [Google Scholar] [CrossRef]
- Huang, B.; Wang, J.; Song, H.; Fu, D.; Wong, K. Generating high spatiotemporal resolution land surface temperature for urban heat island monitoring. IEEE Geosci. Remote Sens. Lett. 2013, 10, 1011–1015. [Google Scholar] [CrossRef]
- Wu, P.; Shen, H.; Zhang, L.; Göttsche, F.M. Integrated fusion of multi-scale polar-orbiting and geostationary satellite observations for the mapping of high spatial and temporal resolution land surface temperature. Remote Sens. Environ. 2015, 156, 169–181. [Google Scholar] [CrossRef]
- Wu, M.; Niu, Z.; Wang, C.; Wu, C.; Wang, L. Use of MODIS and Landsat time series data to generate high-resolution temporal synthetic Landsat data using a spatial and temporal reflectance fusion model. J. Appl. Remote Sens. 2012, 6, 063507. [Google Scholar]
- Zhang, W.; Li, A.; Jin, H.; Bian, J.; Zhang, Z.; Lei, G.; Qin, Z.; Huang, C. An enhanced spatial and temporal data fusion model for fusing Landsat and MODIS surface reflectance to generate high temporal Landsat-like data. Remote Sens. 2013, 5, 5346–5368. [Google Scholar] [CrossRef]
- Zhang, H.K.; Huang, B.; Zhang, M.; Cao, K.; Yu, L. A generalization of spatial and temporal fusion methods for remotely sensed surface parameters. Int. J. Remote Sens. 2015, 36, 4411–4445. [Google Scholar] [CrossRef]
- Wu, M.; Huang, W.; Niu, Z.; Wang, C. Generating daily synthetic Landsat imagery by combining Landsat and MODIS data. Sensors 2015, 15, 24002–24025. [Google Scholar] [CrossRef] [PubMed]
- Zhu, X.; Helmer, E.H.; Gao, F.; Liu, D.; Chen, J.; Lefsky, M.A. A flexible spatiotemporal method for fusing satellite images with different resolutions. Remote Sens. Environ. 2016, 172, 165–177. [Google Scholar] [CrossRef]
- Li, X.; Ling, F.; Foody, G.M.; Ge, Y.; Zhang, Y.; Du, Y. Generating a series of fine spatial and temporal resolution land cover maps by fusing coarse spatial resolution remotely sensed images and fine spatial resolution land cover maps. Remote Sens. Environ. 2017, 196, 293–311. [Google Scholar]
- Quan, J.; Zhan, W.; Ma, T.; Du, Y.; Guo, Z.; Qin, B. An integrated model for generating hourly Landsat-like land surface temperatures over heterogeneous landscapes. Remote Sens. Environ. 2018, 206, 403–423. [Google Scholar]
- Xia, H.; Chen, Y.; Li, Y.; Quan, J. Combining kernel-driven and fusion-based methods to generate daily high-spatial-resolution land surface temperatures. Remote Sens. Environ. 2019, 224, 259–274. [Google Scholar] [CrossRef]
- Huang, B.; Song, H. Spatiotemporal reflectance fusion via sparse representation. IEEE Trans. Geosci. Remote Sens. 2012, 50, 3707–3716. [Google Scholar] [CrossRef]
- Wu, B.; Huang, B.; Zhang, L. An error-bound-regularized sparse coding for spatiotemporal reflectance fusion. IEEE Trans. Geosci. Remote Sens. 2015, 53, 6791–6803. [Google Scholar] [CrossRef]
- Wei, J.; Wang, L.; Liu, P.; Song, W. Spatiotemporal fusion of remote sensing images with structural sparsity and semi-coupled dictionary learning. Remote Sens. 2016, 9, 21. [Google Scholar] [CrossRef]
- Peng, Y.; Li, W.; Luo, X.; Du, J.; Zhang, X.; Gan, Y.; Gao, X. Spatiotemporal reflectance fusion via tensor sparse representation. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5608318. [Google Scholar]
- Li, A.; Bo, Y.; Zhu, Y.; Guo, P.; Bi, J.; He, Y. Blending multi-resolution satellite sea surface temperature (SST) products using Bayesian maximum entropy method. Remote Sens. Environ. 2013, 135, 52–63. [Google Scholar] [CrossRef]
- Huang, B.; Zhang, H.; Song, H.; Wang, J.; Song, C. Unified fusion of remote-sensing imagery: Generating simultaneously high-resolution synthetic spatial–temporal–spectral earth observations. Remote Sens. Lett. 2013, 4, 561–569. [Google Scholar] [CrossRef]
- Liao, L.; Song, J.; Wang, J.; Xiao, Z.; Wang, J. Bayesian method for building frequent Landsat-like NDVI datasets by integrating MODIS and Landsat NDVI. Remote Sens. 2016, 8, 452. [Google Scholar] [CrossRef]
- Xue, J.; Leung, Y.; Fung, T. A Bayesian data fusion approach to spatio-temporal fusion of remotely sensed images. Remote Sens. 2017, 9, 1310. [Google Scholar] [CrossRef]
- Addesso, P.; Longo, M.; Restaino, R.; Vivone, G. Sequential Bayesian methods for resolution enhancement of TIR image sequences. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 8, 233–243. [Google Scholar] [CrossRef]
- Chen, Y.; Yang, Y.; Pan, X.; Meng, X.; Hu, J. Spatiotemporal fusion network for land surface temperature based on a conditional variational autoencoder. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5002813. [Google Scholar] [CrossRef]
- Tan, Z.; Yue, P.; Di, L.; Tang, J. Deriving high spatiotemporal remote sensing images using deep convolutional network. Remote Sens. 2018, 10, 1066. [Google Scholar] [CrossRef]
- Liu, X.; Deng, C.; Chanussot, J.; Hong, D.; Zhao, B. StfNet: A two-stream convolutional neural network for spatiotemporal image fusion. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6552–6564. [Google Scholar] [CrossRef]
- Zheng, Y.; Song, H.; Sun, L.; Wu, Z.; Jeon, B. Spatiotemporal fusion of satellite images via very deep convolutional networks. Remote Sens. 2019, 11, 2701. [Google Scholar] [CrossRef]
- Yin, Z.; Wu, P.; Foody, G.M.; Wu, Y.; Liu, Z.; Du, Y.; Ling, F. Spatiotemporal fusion of land surface temperature based on a convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2020, 59, 1808–1822. [Google Scholar] [CrossRef]
- Wang, X.; Wang, X. Spatiotemporal fusion of remote sensing image based on deep learning. J. Sens. 2020, 2020, 8873079. [Google Scholar] [CrossRef]
- Li, Y.; Li, J.; He, L.; Chen, J.; Plaza, A. A new sensor bias-driven spatio-temporal fusion model based on convolutional neural networks. Sci. China Inf. Sci. 2020, 63, 140302. [Google Scholar] [CrossRef]
- Qin, P.; Huang, H.; Tang, H.; Wang, J.; Liu, C. MUSTFN: A spatiotemporal fusion method for multi-scale and multi-sensor remote sensing images based on a convolutional neural network. Int. J. Appl. Earth Obs. Geoinf. 2022, 115, 103113. [Google Scholar] [CrossRef]
- Sun, W.; Li, J.; Jiang, M.; Yuan, Q. Supervised and self-supervised learning-based cascade spatiotemporal fusion framework and its application. ISPRS J. Photogramm. Remote Sens. 2023, 203, 19–36. [Google Scholar] [CrossRef]
- Meng, X.; Liu, Q.; Shao, F.; Li, S. Spatio–temporal–spectral collaborative learning for spatio–temporal fusion with land cover changes. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5704116. [Google Scholar] [CrossRef]
- Yu, S.; Deng, Y.; Li, Y.; Li, J.; Chen, J.; Zhang, S. An Unsupervised Model Based on Convolutional Neural Network for Fusing Landsat-8 and Sentinel-2 Data. In Proceedings of the IGARSS 2024-2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 7–12 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 9214–9217. [Google Scholar]
- Tan, Z.; Di, L.; Zhang, M.; Guo, L.; Gao, M. An enhanced deep convolutional model for spatiotemporal image fusion. Remote Sens. 2019, 11, 2898. [Google Scholar] [CrossRef]
- Chen, J.; Wang, L.; Feng, R.; Liu, P.; Han, W.; Chen, X. CycleGAN-STF: Spatiotemporal fusion via CycleGAN-based image generation. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5851–5865. [Google Scholar] [CrossRef]
- Zhang, H.; Song, Y.; Han, C.; Zhang, L. Remote sensing image spatiotemporal fusion using a generative adversarial network. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4273–4286. [Google Scholar] [CrossRef]
- Ma, Y.; Wei, J.; Tang, W.; Tang, R. Explicit and stepwise models for spatiotemporal fusion of remote sensing images with deep neural networks. Int. J. Appl. Earth Obs. Geoinf. 2021, 105, 102611. [Google Scholar] [CrossRef]
- Tan, Z.; Gao, M.; Li, X.; Jiang, L. A flexible reference-insensitive spatiotemporal fusion model for remote sensing images using conditional generative adversarial network. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5601413. [Google Scholar] [CrossRef]
- Zhang, H.; Sun, Y.; Shi, W.; Guo, D.; Zheng, N. An object-based spatiotemporal fusion model for remote sensing images. Eur. J. Remote Sens. 2021, 54, 86–101. [Google Scholar] [CrossRef]
- Song, B.; Liu, P.; Li, J.; Wang, L.; Zhang, L.; He, G.; Chen, L.; Liu, J. MLFF-GAN: A multilevel feature fusion with GAN for spatiotemporal remote sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4410816. [Google Scholar] [CrossRef]
- Tan, Z.; Gao, M.; Yuan, J.; Jiang, L.; Duan, H. A robust model for MODIS and Landsat image fusion considering input noise. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5407217. [Google Scholar] [CrossRef]
- Pan, X.; Deng, M.; Ao, Z.; Xin, Q. An Adaptive Multiscale Generative Adversarial Network for the Spatiotemporal Fusion of Landsat and MODIS Data. Remote Sens. 2023, 15, 5128. [Google Scholar] [CrossRef]
- Huang, H.; He, W.; Zhang, H.; Xia, Y.; Zhang, L. STFDiff: Remote sensing image spatiotemporal fusion with diffusion models. Inf. Fusion 2024, 111, 102505. [Google Scholar] [CrossRef]
- Chen, Y.; Yang, Y.; Pan, X.; Hu, P. CGMFN: Conditional Generative Model Fusion Network for Land Surface Temperature Generation. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5004813. [Google Scholar] [CrossRef]
- Li, W.; Cao, D.; Peng, Y.; Yang, C. MSNet: A multi-stream fusion network for remote sensing spatiotemporal fusion based on transformer and convolution. Remote Sens. 2021, 13, 3724. [Google Scholar] [CrossRef]
- Yang, G.; Qian, Y.; Liu, H.; Tang, B.; Qi, R.; Lu, Y.; Geng, J. MSFusion: Multistage for remote sensing image spatiotemporal fusion based on texture transformer and convolutional neural network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 4653–4666. [Google Scholar] [CrossRef]
- Chen, G.; Jiao, P.; Hu, Q.; Xiao, L.; Ye, Z. SwinSTFM: Remote sensing spatiotemporal fusion using Swin transformer. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5410618. [Google Scholar] [CrossRef]
- Li, W.; Cao, D.; Xiang, M. Enhanced multi-stream remote sensing spatiotemporal fusion network based on transformer and dilated convolution. Remote Sens. 2022, 14, 4544. [Google Scholar] [CrossRef]
- Jiang, M.; Shao, H. A CNN-Transformer combined Remote Sensing Imagery Spatiotemporal Fusion Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 13995–14009. [Google Scholar] [CrossRef]
- Benzenati, T.; Kallel, A.; Kessentini, Y. STF-Trans: A two-stream spatiotemporal fusion transformer for very high resolution satellites images. Neurocomputing 2024, 563, 126868. [Google Scholar] [CrossRef]
- Ma, Z.; Bao, W.; Feng, W.; Zhang, X.; Ma, X.; Qu, K. SFT-GAN: Sparse Fast Transformer Fusion Method Based on GAN for Remote Sensing Spatiotemporal Fusion. Remote Sens. 2025, 17, 2315. [Google Scholar]
- Hu, P.; Pan, X.; Yang, Y.; Dai, Y.; Chen, Y. A Two-Stage Hierarchical Spatiotemporal Fusion Network for Land Surface Temperature with Transformer. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5002320. [Google Scholar] [CrossRef]
- Yang, Z.; Diao, C.; Li, B. A robust hybrid deep learning model for spatiotemporal image fusion. Remote Sens. 2021, 13, 5005. [Google Scholar]
- Zhan, W.; Luo, F.; Luo, H.; Li, J.; Wu, Y.; Yin, Z.; Wu, Y.; Wu, P. Time-Series-Based Spatiotemporal Fusion Network for Improving Crop Type Mapping. Remote Sens. 2024, 16, 235. [Google Scholar] [CrossRef]
- Prata, A.; Caselles, V.; Coll, C.; Sobrino, J.; Ottle, C. Thermal remote sensing of land surface temperature from satellites: Current status and future prospects. Remote Sens. Rev. 1995, 12, 175–224. [Google Scholar] [CrossRef]
- Wu, P.; Yin, Z.; Zeng, C.; Duan, S.B.; Göttsche, F.M.; Ma, X.; Li, X.; Yang, H.; Shen, H. Spatially continuous and high-resolution land surface temperature product generation: A review of reconstruction and spatiotemporal fusion techniques. IEEE Geosci. Remote Sens. Mag. 2021, 9, 112–137. [Google Scholar] [CrossRef]
- Yoo, C.; Im, J.; Park, S.; Cho, D. Spatial downscaling of MODIS land surface temperature: Recent research trends, challenges, and future directions. Korean J. Remote Sens. 2020, 36, 609–626. [Google Scholar]
- Ran, L.; Mengmeng, W.; Zhengjia, Z.; Tian, H.; Xiuguo, L. A review of spatiotemporal fusion methods for remotely sensed land surface temperature. Natl. Remote Sens. Bull. 2024, 26, 2433–2450. [Google Scholar] [CrossRef]
- Li, J.; Li, Y.; He, L.; Chen, J.; Plaza, A. Spatio-temporal fusion for remote sensing data: An overview and new benchmark. Sci. China Inf. Sci. 2020, 63, 140301. [Google Scholar] [CrossRef]
- Ferchichi, A.; Abbes, A.B.; Barra, V.; Farah, I.R. Forecasting vegetation indices from spatio-temporal remotely sensed data using deep learning-based approaches: A systematic literature review. Ecol. Inform. 2022, 68, 101552. [Google Scholar]
- Wang, Z.; Ma, Y.; Zhang, Y. Review of pixel-level remote sensing image fusion based on deep learning. Inf. Fusion 2023, 90, 36–58. [Google Scholar] [CrossRef]
- Wang, Q.; Tang, Y.; Ge, Y.; Xie, H.; Tong, X.; Atkinson, P.M. A comprehensive review of spatial-temporal-spectral information reconstruction techniques. Sci. Remote Sens. 2023, 8, 100102. [Google Scholar] [CrossRef]
- Xiao, J.; Aggarwal, A.K.; Duc, N.H.; Arya, A.; Rage, U.K.; Avtar, R. A review of remote sensing image spatiotemporal fusion: Challenges, applications and recent trends. Remote Sens. Appl. Soc. Environ. 2023, 32, 101005. [Google Scholar] [CrossRef]
- Chen, G.; Lu, H.; Zou, W.; Li, L.; Emam, M.; Chen, X.; Jing, W.; Wang, J.; Li, C. Spatiotemporal fusion for spectral remote sensing: A statistical analysis and review. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 259–273. [Google Scholar] [CrossRef]
- Cui, J.; Li, J.; Gu, X.; Zhang, W.; Wang, D.; Sun, X.; Zhan, Y.; Yang, J.; Liu, Y.; Yang, X. Comprehensive Analysis of Temporal–Spatial Fusion from 1991 to 2023 Using Bibliometric Tools. Atmosphere 2024, 15, 598. [Google Scholar] [CrossRef]
- Anand, S.; Sharma, R. Pansharpening and spatiotemporal image fusion method for remote sensing. Eng. Res. Express 2024, 6, 022201. [Google Scholar] [CrossRef]
- Swain, R.; Paul, A.; Behera, M.D. Spatio-temporal fusion methods for spectral remote sensing: A comprehensive technical review and comparative analysis. Trop. Ecol. 2024, 65, 356–375. [Google Scholar] [CrossRef]
- Lian, Z.; Zhan, Y.; Zhang, W.; Wang, Z.; Liu, W.; Huang, X. Recent Advances in Deep Learning-Based Spatiotemporal Fusion Methods for Remote Sensing Images. Sensors 2025, 25, 1093. [Google Scholar] [CrossRef] [PubMed]
- Sun, E.; Cui, Y.; Liu, P.; Yan, J. A decade of deep learning for remote sensing spatiotemporal fusion: Advances, challenges, and opportunities. arXiv 2025, arXiv:2504.00901. [Google Scholar] [CrossRef]
- Norman, J.M.; Becker, F. Terminology in thermal infrared remote sensing of natural surfaces. Agric. For. Meteorol. 1995, 77, 153–166. [Google Scholar] [CrossRef]
- Dash, P.; Göttsche, F.M.; Olesen, F.S.; Fischer, H. Land surface temperature and emissivity estimation from passive sensor data: Theory and practice-current trends. Int. J. Remote Sens. 2002, 23, 2563–2594. [Google Scholar] [CrossRef]
- Li, Z.L.; Wu, H.; Duan, S.B.; Zhao, W.; Ren, H.; Liu, X.; Leng, P.; Tang, R.; Ye, X.; Zhu, J.; et al. Satellite remote sensing of global land surface temperature: Definition, methods, products, and applications. Rev. Geophys. 2023, 61, e2022RG000777. [Google Scholar] [CrossRef]
- Becker, F.; Li, Z.L. Surface temperature and emissivity at various scales: Definition, measurement and related problems. Remote Sens. Rev. 1995, 12, 225–253. [Google Scholar] [CrossRef]
- Duan, S.B.; Li, Z.L.; Cheng, J.; Leng, P. Cross-satellite comparison of operational land surface temperature products derived from MODIS and ASTER data over bare soil surfaces. ISPRS J. Photogramm. Remote Sens. 2017, 126, 1–10. [Google Scholar] [CrossRef]
- Jimenez-Munoz, J.C.; Sobrino, J.A.; Skoković, D.; Mattar, C.; Cristobal, J. Land surface temperature retrieval methods from Landsat-8 thermal infrared sensor data. IEEE Geosci. Remote Sens. Lett. 2014, 11, 1840–1843. [Google Scholar] [CrossRef]
- Rozenstein, O.; Qin, Z.; Derimian, Y.; Karnieli, A. Derivation of land surface temperature for Landsat-8 TIRS using a split window algorithm. Sensors 2014, 14, 5768–5780. [Google Scholar] [CrossRef]
- Wang, F.; Qin, Z.; Song, C.; Tu, L.; Karnieli, A.; Zhao, S. An improved mono-window algorithm for land surface temperature retrieval from Landsat 8 thermal infrared sensor data. Remote Sens. 2015, 7, 4268–4289. [Google Scholar] [CrossRef]
- Gómez, C.; White, J.C.; Wulder, M.A. Optical remotely sensed time series data for land cover classification: A review. ISPRS J. Photogramm. Remote Sens. 2016, 116, 55–72. [Google Scholar] [CrossRef]
- Chraibi, E.; De Boissieu, F.; Barbier, N.; Luque, S.; Féret, J.B. Stability in time and consistency between atmospheric corrections: Assessing the reliability of Sentinel-2 products for biodiversity monitoring in tropical forests. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102884. [Google Scholar] [CrossRef]
- Tran, D.X.; Pla, F.; Latorre-Carmona, P.; Myint, S.W.; Caetano, M.; Kieu, H.V. Characterizing the relationship between land use land cover change and land surface temperature. ISPRS J. Photogramm. Remote Sens. 2017, 124, 119–132. [Google Scholar] [CrossRef]
- Wang, S.; Luo, Y.; Li, X.; Yang, K.; Liu, Q.; Luo, X.; Li, X. Downscaling land surface temperature based on non-linear geographically weighted regressive model over urban areas. Remote Sens. 2021, 13, 1580. [Google Scholar] [CrossRef]
- Kingma, D.P. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Tian, Y.; Su, D.; Lauria, S.; Liu, X. Recent advances on loss functions in deep learning for computer vision. Neurocomputing 2022, 497, 129–158. [Google Scholar] [CrossRef]
- Huber, P.J. Robust estimation of a location parameter. In Breakthroughs in Statistics: Methodology and Distribution; Springer: Berlin/Heidelberg, Germany, 1992; pp. 492–518. [Google Scholar]
- Huang, S.; Tang, L.; Hupy, J.P.; Wang, Y.; Shao, G. A commentary review on the use of normalized difference vegetation index (NDVI) in the era of popular remote sensing. J. For. Res. 2021, 32, 1–6. [Google Scholar] [CrossRef]
- Zha, Y.; Gao, J.; Ni, S. Use of normalized difference built-up index in automatically mapping urban areas from TM imagery. Int. J. Remote Sens. 2003, 24, 583–594. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Zhao, H.; Gallo, O.; Frosio, I.; Kautz, J. Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 2016, 3, 47–57. [Google Scholar] [CrossRef]
- Khare, N.; Thakur, P.S.; Khanna, P.; Ojha, A. Analysis of loss functions for image reconstruction using convolutional autoencoder. In Proceedings of the International Conference on Computer Vision and Image Processing, Ropar, India, 3–5 December 2021; Springer: Cham, Switzerland, 2021; pp. 338–349. [Google Scholar]
- Wu, B.; Duan, H.; Liu, Z.; Sun, G. SRPGAN: Perceptual generative adversarial network for single image super resolution. arXiv 2017, arXiv:1712.05927. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
- Korhonen, J.; You, J. Peak signal-to-noise ratio revisited: Is simple beautiful? In Proceedings of the 2012 Fourth International Workshop on Quality of Multimedia Experience, Melbourne, VIC, Australia, 5–7 July 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 37–38. [Google Scholar]
- Kaneko, S.; Satoh, Y.; Igarashi, S. Using selective correlation coefficient for robust image registration. Pattern Recognit. 2003, 36, 1165–1173. [Google Scholar] [CrossRef]
- Yuhas, R.H.; Goetz, A.F.; Boardman, J.W. Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm. In Proceedings of the JPL, Summaries of the Third Annual JPL Airborne Geoscience Workshop, Pasadenia, CA, USA, 1–5 June 1992; Nasa-Jpl: Pasadenia, CA, USA, 1992; Volume 1. [Google Scholar]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar]
- Wang, Z.; Bovik, A.C. A universal image quality index. IEEE Signal Process. Lett. 2002, 9, 81–84. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning internal representations by error propagation, ed. de rumelhart and j. mcclelland. vol. 1. 1986. Biometrika 1986, 71, 6. [Google Scholar]
- Bank, D.; Koenigstein, N.; Giryes, R. Machine Learning for Data Science Handbook: Data Mining and Knowledge Discovery Handbook; Springer: Cham, Switzerland, 2023. [Google Scholar]
- Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef]
- Ng, A. Sparse autoencoder. CS294A Lect. Notes 2011, 72, 1–19. [Google Scholar]
- Rifai, S.; Vincent, P.; Muller, X.; Glorot, X.; Bengio, Y. Contractive auto-encoders: Explicit invariance during feature extraction. In Proceedings of the 28th International Conference on International Conference on Machine Learning, Bellevue, WA, USA, 28 June–2 July 2011; pp. 833–840. [Google Scholar]
- Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.A.; Bottou, L. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 2010, 11, 3371–3408. [Google Scholar]
- An, J.; Cho, S. Variational autoencoder based anomaly detection using reconstruction probability. Spec. Lect. IE 2015, 2, 1–18. [Google Scholar]
- Semeniuta, S.; Severyn, A.; Barth, E. A hybrid convolutional variational autoencoder for text generation. arXiv 2017, arXiv:1702.02390. [Google Scholar] [CrossRef]
- Nguyen, H.D.; Tran, K.P.; Thomassey, S.; Hamad, M. Forecasting and Anomaly Detection approaches using LSTM and LSTM Autoencoder techniques with the applications in supply chain management. Int. J. Inf. Manag. 2021, 57, 102282. [Google Scholar] [CrossRef]
- Sharma, P.; Kumar, M.; Sharma, H.K.; Biju, S.M. Generative adversarial networks (GANs): Introduction, Taxonomy, Variants, Limitations, and Applications. Multimed. Tools Appl. 2024, 83, 88811–88858. [Google Scholar] [CrossRef]
- Hong, Y.; Hwang, U.; Yoo, J.; Yoon, S. How generative adversarial networks and their variants work: An overview. ACM Comput. Surv. (CSUR) 2019, 52, 1–43. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Zhu, J.Y.; Krähenbühl, P.; Shechtman, E.; Efros, A.A. Generative visual manipulation on the natural image manifold. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part V 14; Springer: Cham, Switzerland, 2016; pp. 597–613. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Zhang, H.; Sindagi, V.; Patel, V.M. Image de-raining using a conditional generative adversarial network. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 3943–3956. [Google Scholar] [CrossRef]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Radford, A. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
- Denton, E.L.; Chintala, S.; Szlam, A.; Fergus, R. Deep generative image models using a laplacian pyramid of adversarial networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1486–1494. [Google Scholar]
- Mirza, M. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar] [CrossRef]
- Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Adv. Neural Inf. Process. Syst. 2016, 29, 2180–2188. [Google Scholar]
- Dumoulin, V.; Belghazi, I.; Poole, B.; Mastropietro, O.; Lamb, A.; Arjovsky, M.; Courville, A. Adversarially learned inference. arXiv 2016, arXiv:1606.00704. [Google Scholar]
- Donahue, J.; Krähenbühl, P.; Darrell, T. Adversarial feature learning. arXiv 2016, arXiv:1605.09782. [Google Scholar]
- Kingma, D.P. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Mescheder, L.; Nowozin, S.; Geiger, A. Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 2391–2400. [Google Scholar]
- Emelyanova, I.V.; McVicar, T.R.; Van Niel, T.G.; Li, L.T.; Van Dijk, A.I. Assessing the accuracy of blending Landsat–MODIS surface reflectances in two landscapes with contrasting spatial and temporal dynamics: A framework for algorithm selection. Remote Sens. Environ. 2013, 133, 193–209. [Google Scholar] [CrossRef]
- Xia, Y.; He, W.; Huang, Q.; Chen, H.; Huang, H.; Zhang, H. SOSSF: Landsat-8 image synthesis on the blending of Sentinel-1 and MODIS data. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5401619. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16–20 November 2020; pp. 38–45. [Google Scholar]
- Dosovitskiy, A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
- Han, K.; Xiao, A.; Wu, E.; Guo, J.; Xu, C.; Wang, Y. Transformer in transformer. Adv. Neural Inf. Process. Syst. 2021, 34, 15908–15919. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 568–578. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Chu, X.; Tian, Z.; Wang, Y.; Zhang, B.; Ren, H.; Wei, X.; Xia, H.; Shen, C. Twins: Revisiting the design of spatial attention in vision transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 9355–9366. [Google Scholar]
- Zhang, P.; Dai, X.; Yang, J.; Xiao, B.; Yuan, L.; Zhang, L.; Gao, J. Multi-scale vision longformer: A new vision transformer for high-resolution image encoding. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2998–3008. [Google Scholar]
- Yuan, L.; Hou, Q.; Jiang, Z.; Feng, J.; Yan, S. Volo: Vision outlooker for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 6575–6586. [Google Scholar] [CrossRef]
- Medsker, L.R.; Jain, L. Recurrent neural networks. Des. Appl. 2001, 5, 2. [Google Scholar]
- Das, S.; Tariq, A.; Santos, T.; Kantareddy, S.S.; Banerjee, I. Recurrent neural networks (RNNs): Architectures, training tricks, and introduction to influential research. In Machine Learning for Brain Disorders; Humana: New York, NY, USA, 2023; pp. 117–138. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long Short-term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef]
- Karevan, Z.; Suykens, J.A. Spatio-temporal stacked LSTM for temperature prediction in weather forecasting. arXiv 2018, arXiv:1811.06341. [Google Scholar] [CrossRef]
- Cho, K. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
- Rani, V.; Nabi, S.T.; Kumar, M.; Mittal, A.; Kumar, K. Self-supervised learning: A succinct review. Arch. Comput. Methods Eng. 2023, 30, 2761–2775. [Google Scholar] [CrossRef]
- Laal, M.; Ghodsi, S.M. Benefits of collaborative learning. Procedia-Soc. Behav. Sci. 2012, 31, 486–490. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Shafiq, M.; Gu, Z. Deep residual learning for image recognition: A survey. Appl. Sci. 2022, 12, 8972. [Google Scholar] [CrossRef]
- Huang, L.; Qin, J.; Zhou, Y.; Zhu, F.; Liu, L.; Shao, L. Normalization techniques in training dnns: Methodology, analysis and application. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10173–10196. [Google Scholar] [CrossRef] [PubMed]
- Ioffe, S. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar] [CrossRef]
- Wu, Y.; He, K. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Ulyanov, D. Instance normalization: The missing ingredient for fast stylization. arXiv 2016, arXiv:1607.08022. [Google Scholar]
- Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral normalization for generative adversarial networks. arXiv 2018, arXiv:1802.05957. [Google Scholar] [CrossRef]
- Luo, P.; Ren, J.; Peng, Z.; Zhang, R.; Li, J. Differentiable learning-to-normalize via switchable normalization. arXiv 2018, arXiv:1806.10779. [Google Scholar]
- Salehin, I.; Kang, D.K. A review on dropout regularization approaches for deep neural networks within the scholarly domain. Electronics 2023, 12, 3106. [Google Scholar] [CrossRef]
- Ying, X. An overview of overfitting and its solutions. In Proceedings of the Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2019; Volume 1168, p. 022022. [Google Scholar]
- Chen, H.; Wang, Y.; Guo, T.; Xu, C.; Deng, Y.; Liu, Z.; Ma, S.; Xu, C.; Xu, C.; Gao, W. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 12299–12310. [Google Scholar]
- Puls, E.d.S.; Todescato, M.V.; Carbonera, J.L. An evaluation of pre-trained models for feature extraction in image classification. arXiv 2023, arXiv:2310.02037. [Google Scholar] [CrossRef]
- Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data 2016, 3, 9. [Google Scholar] [CrossRef]
- Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
- Duan, S.B.; Li, Z.L.; Li, H.; Göttsche, F.M.; Wu, H.; Zhao, W.; Leng, P.; Zhang, X.; Coll, C. Validation of Collection 6 MODIS land surface temperature product using in situ measurements. Remote Sens. Environ. 2019, 225, 16–29. [Google Scholar] [CrossRef]
- Wan, Z. MODIS land-surface temperature algorithm theoretical basis document (LST ATBD). Inst. Comput. Earth Syst. Sci. Santa Barbar. 1999, 75, 18. [Google Scholar]
- Ermida, S.L.; Soares, P.; Mantas, V.; Göttsche, F.M.; Trigo, I.F. Google earth engine open-source code for land surface temperature estimation from the landsat series. Remote Sens. 2020, 12, 1471. [Google Scholar] [CrossRef]
- Krishnan, P.; Meyers, T.P.; Hook, S.J.; Heuer, M.; Senn, D.; Dumas, E.J. Intercomparison of in situ sensors for ground-based land surface temperature measurements. Sensors 2020, 20, 5268. [Google Scholar] [CrossRef]
- Shandas, V.; Makido, Y.; Upraity, A.N. Evaluating Differences between Ground-Based and Satellite-Derived Measurements of Urban Heat: The Role of Land Cover Classes in Portland, Oregon and Washington, DC. Land 2023, 12, 562. [Google Scholar] [CrossRef]
- Li, Z.; Shen, H.; Cheng, Q.; Liu, Y.; You, S.; He, Z. Deep learning based cloud detection for medium and high resolution remote sensing images of different sensors. ISPRS J. Photogramm. Remote Sens. 2019, 150, 197–212. [Google Scholar] [CrossRef]
- Mo, Y.; Xu, Y.; Chen, H.; Zhu, S. A review of reconstructing remotely sensed land surface temperature under cloudy conditions. Remote Sens. 2021, 13, 2838. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.j.; Durrani, T.S. Infrared and visible image fusion with ResNet and zero-phase component analysis. Infrared Phys. Technol. 2019, 102, 103039. [Google Scholar] [CrossRef]
- Zhang, D.; Ren, K.; Zhou, J.; Gu, G.; Chen, Q. An infrared and visible image fusion method based on deep learning. In Proceedings of the 4th Optics Young Scientist Summit (OYSS 2020), Ningbo, China, 4–7 December 2020; SPIE: Bellingham, WA, USA, 2021; Volume 11781, pp. 64–70. [Google Scholar]
- Li, H.; Wu, X.J.; Kittler, J. Infrared and visible image fusion using a deep learning framework. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 2705–2710. [Google Scholar]
- Ren, X.; Meng, F.; Hu, T.; Liu, Z.; Wang, C. Infrared-visible image fusion based on convolutional neural networks (CNN). In Proceedings of the Intelligence Science and Big Data Engineering: 8th International Conference, IScIDE 2018, Lanzhou, China, 18–19 August 2018; Revised Selected Papers 8; Springer: Cham, Switzerland, 2018; pp. 301–307. [Google Scholar]
- Feng, Y.; Lu, H.; Bai, J.; Cao, L.; Yin, H. Fully convolutional network-based infrared and visible image fusion. Multimed. Tools Appl. 2020, 79, 15001–15014. [Google Scholar] [CrossRef]
- Li, Y.; Zhao, J.; Lv, Z.; Li, J. Medical image fusion method by deep learning. Int. J. Cogn. Comput. Eng. 2021, 2, 21–29. [Google Scholar] [CrossRef]
- Azam, M.A.; Khan, K.B.; Salahuddin, S.; Rehman, E.; Khan, S.A.; Khan, M.A.; Kadry, S.; Gandomi, A.H. A review on multimodal medical image fusion: Compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics. Comput. Biol. Med. 2022, 144, 105253. [Google Scholar] [CrossRef] [PubMed]
- Sánchez, J.M.; Galve, J.M.; González-Piqueras, J.; López-Urrea, R.; Niclòs, R.; Calera, A. Monitoring 10-m LST from the Combination MODIS/Sentinel-2, validation in a high contrast semi-arid agroecosystem. Remote Sens. 2020, 12, 1453. [Google Scholar] [CrossRef]
- Abunnasr, Y.; Mhawej, M. Towards a combined Landsat-8 and Sentinel-2 for 10-m land surface temperature products: The Google Earth Engine monthly Ten-ST-GEE system. Environ. Model. Softw. 2022, 155, 105456. [Google Scholar] [CrossRef]
- Li, X.; Wen, C.; Hu, Y.; Yuan, Z.; Zhu, X.X. Vision-language models in remote sensing: Current progress and future trends. IEEE Geosci. Remote Sens. Mag. 2024, 12, 32–66. [Google Scholar] [CrossRef]
- Thirunavukarasu, A.J.; Ting, D.S.J.; Elangovan, K.; Gutierrez, L.; Tan, T.F.; Ting, D.S.W. Large language models in medicine. Nat. Med. 2023, 29, 1930–1940. [Google Scholar] [CrossRef] [PubMed]
- Xu, F.F.; Alon, U.; Neubig, G.; Hellendoorn, V.J. A systematic evaluation of large language models of code. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, San Diego, CA, USA, 13 June 2022; pp. 1–10. [Google Scholar]
- Zhao, Z.; Deng, L.; Bai, H.; Cui, Y.; Zhang, Z.; Zhang, Y.; Qin, H.; Chen, D.; Zhang, J.; Wang, P.; et al. Image Fusion via Vision-Language Model. arXiv 2024, arXiv:2402.02235. [Google Scholar] [CrossRef]












| Survey | Year | Application Scope | Adaptation to LST | Deep Learning | Open Challenges | Experimental Evaluation | New Dataset |
|---|---|---|---|---|---|---|---|
| [36] | 2018 | SR | ✗ | ✗ | ✗ | ✗ | ✗ |
| [28] | 2019 | SR | ✗ | ✗ | ✗ | ✗ | ✗ |
| [97] | 2020 | SR | ✗ | ✓ | ✗ | ✓ | ✓ |
| [95] | 2020 | LST | ✗ | ✗ | ✗ | ✗ | ✗ |
| [66] | 2020 | SR | ✗ | ✓ | ✓ | ✓ | ✗ |
| [94] | 2021 | LST | ✗ | ✗ | ✗ | ✗ | ✗ |
| [98] | 2022 | NDVI | ✗ | ✓ | ✓ | ✗ | ✗ |
| [99] | 2023 | SR | ✓ | ✗ | ✓ | ✓ | ✗ |
| [100] | 2023 | SR | ✗ | ✗ | ✗ | ✗ | ✗ |
| [101] | 2023 | SR | ✗ | ✓ | ✓ | ✗ | ✗ |
| [102] | 2023 | SR | ✗ | ✗ | ✗ | ✗ | ✗ |
| [103] | 2024 | SR | ✗ | ✗ | ✗ | ✗ | ✗ |
| [104] | 2024 | SR | ✗ | ✗ | ✗ | ✗ | ✗ |
| [96] | 2024 | LST | ✓ | ✗ | ✗ | ✗ | ✗ |
| [105] | 2024 | SR | ✗ | ✗ | ✗ | ✗ | ✗ |
| [106] | 2025 | SR | ✗ | ✓ | ✓ | ✓ | ✗ |
| [107] | 2025 | SR | ✗ | ✓ | ✓ | ✓ | ✗ |
| Ours | 2025 | LST | ✓ | ✓ | ✓ | ✓ | ✓ |
| Satellite | Thermal Sensor | Spatial Resolution | Temporal Resolution | Temporal Extent |
|---|---|---|---|---|
| GF-5 | VIMS | 40 m | 7 days | 9 May 2018–Present |
| Landsat 9 | TIRS-2 | 100 m, resampled to 30 m | 16 days | 27 September 2021–Present |
| Landsat 8 | TIRS | 100 m, resampled to 30 m | 16 days | 11 February 2013–Present |
| Landsat 7 | ETM+ | 100 m, resampled to 30 m | 16 days | 15 April 1999–Present (Partially) |
| Landsat 5 | TM | 120 m, resampled to 30 m | 16 days | 1 May 1984–5 June 2013 |
| Terra | ASTER | 90 m | 16 days | 18 December 1999–Present |
| Aqua | MODIS | 1 km | 1 day | 4 May 2002–Present |
| Terra | MODIS | 1 km | 1 day | 18 December 1999–Present |
| Sentinel-3A | SLSTR | 1 km | 1 day | 16 February 2016–Present |
| FY-3D | MERSI-2 | 375 m | 12 h | 15 November 2017–Present |
| SNPP | VIIRS | 375 m | 12 h | 28 November 2011–Present |
| GOES-8 | Imager | 4 km | 30 min | 16 September 1994–4 June 2009 |
| FY-2F | VISSR | 5 km | 1 h | 13 January 2012–Present |
| Notation | Significance |
|---|---|
| Satellite providing LST data with LSHT | |
| Satellite providing LST data with HSLT | |
| Temporal time steps | |
| s | Region of interest |
| LST data at time for the region s | |
| Predicted HSHT LST data at time and for region s | |
| A pair of and for a specific ROI s at time |
| Method | Satellites | LST | Pairs | Loss Functions | Evaluation Metrics | Datasets | Code |
|---|---|---|---|---|---|---|---|
| [37] | MODIS, Landsat | ✗ | 2 | Content | RMSE, ERGAS, SAM, SSIM | CIA, LGC | ✗ |
| [62] | MODIS, Landsat | ✗ | 1 | Content | RMSE, , SSIM | Hand-crafted | ✓ (https://github.com/theonegis/rs-data-fusion, accessed on 12 January 2026) |
| [63] | MODIS, Landsat | ✗ | 2 | Content | RMSE, CC, SSIM | Hand-crafted | ✗ |
| [64] | MODIS, Landsat | ✗ | 2 | Content | RMSE, SAM, SSIM, ERGAS | CIA, LGC | ✗ |
| [65] | MODIS, Landsat | ✓ | 2 | Content | RMSE, SSIM | Hand-crafted | ✗ |
| [66] | MODIS, Landsat | ✗ | 2 | Content | RMSE, CC, UIQI | Hand-crafted | ✗ |
| [67] | MODIS, Landsat | ✗ | 2 | Content | RMSE, CC, ERGAS, SSIM, SAM | CIA, LGC | ✗ |
| [68] | MODIS, Landsat | ✗ | 2 | Content, Vision | RMSE, , MAE, rMAE, MAEC | Hand-crafted | ✓ (https://github.com/qpyeah/MUSTFN, accessed on 12 January 2026) |
| [70] | MODIS, Landsat | ✗ | 2 | Content, Vision | RMSE, SAM, SSIM | CIA, LGC | ✗ |
| [69] | MODIS, Landsat Landsat, Sentinel-2 | ✓ | n | Content, Adversarial | RMSE, SSIM, ERGAS, PSNR, SAM | Hand-crafted | ✗ |
| [71] | Landsat, Sentinel-2 | ✗ | 1 | Content | RMSE, CC, SSIM | Hand-crafted | ✗ |
| Method | Satellites | LST | Pairs | Loss Functions | Evaluation Metrics | Datasets | Code |
|---|---|---|---|---|---|---|---|
| [72] | MODIS, Landsat | ✗ | 2 | Content, Feature, Vision | RMSE, SAM, ERGAS | Hand-crafted | ✓ (https://github.com/theonegis/edcstfn, accessed on 12 January 2026) |
| [61] | MODIS, FY-4A | ✓ | 2 | Content | RMSE, SSIM, LPIPS | Hand-crafted | ✗ |
| Method | Satellites | LST | Pairs | Loss Functions | Evaluation Metrics | Datasets | Code |
|---|---|---|---|---|---|---|---|
| [73] | MODIS, Landsat | ✗ | 2 | Adversarial | RMSE, CC, SSIM, SAM, ERGAS | Hand-crafted | ✗ |
| [74] | MODIS, Landsat | ✗ | 2 | Content, Adversarial | MAE, RMSE, SSIM, SAM, ERGAS, time | CIA, LGC, Shenzhen | ✗ |
| [75] | MODIS, Landsat | ✗ | 1 | Content, Vision, Spectral, Adversarial | RMSE, SSIM, SAME, | CIA, LGC | ✗ |
| [76] | MODIS, Landsat | ✗ | 1/2 | Content, Vision, Feature, Adversarial | MAE, RMSE, SAM, SSIM | CIA, LGC | ✓ (https://github.com/theonegis/ganstfm, accessed on 12 January 2026) |
| [77] | MODIS, Landsat | ✗ | 1 | Adversarial | MSE, SSIM, CC, UIQI, ERGAS, SAM | CIA, LGC | ✗ |
| [78] | MODIS, Landsat | ✗ | 1 | Content, Vision, Spectral, Adversarial | MAE, RMSE, SAM, SSIM | CIA, LGC | ✓ (https://github.com/songbingze/MLFF-GAN, accessed on 12 January 2026) |
| [79] | MODIS, Landsat | ✗ | 2 | Content, Vision, Spectral, Feature | RMSE, SAM, SSIM, ERGAS | CIA, LGC | ✓ (https://github.com/theonegis/rsfn, accessed on 12 January 2026) |
| [80] | MODIS, Landsat | ✗ | 2 | Content, Vision, Spectral, Adversarial | RMSE, SSIM, PSNR, CC | CIA, LGC, Tianjin | ✓ (https://github.com/xxsfish/AMS-STF.git, accessed on 12 January 2026) |
| [81] | MODIS, Landsat | ✗ | 2 | Content | RMSE, SSIM, CC, SAM, ERGAS | CIA, LGC E-SMILE | ✓ (https://github.com/prowDIY/STF, accessed on 12 January 2026) |
| [82] | MODIS, FY-4A | ✓ | 1 | Content, Adversarial | RMSE, SSIM, LIPIPS | Hand-crafted | ✗ |
| Method | Satellites | LST | Pairs | Loss Functions | Evaluation Metrics | Datasets | Code |
|---|---|---|---|---|---|---|---|
| [83] | MODIS, Landsat | ✗ | 2 | Content | RMSE, MSE, CC, SAM, SSIM, ERGAS, PSNR | CIA, LGC, AHB | ✗ |
| [84] | MODIS, Landsat | ✗ | 2 | Feature | RMSE, SSIM, ERGAS, SAM, CC | CIA-LGC, DX | ✗ |
| [85] | MODIS, Landsat | ✗ | 1 | Content, Vision | RMSE, CC, SAM, SSIM, UIQI | CIA-LGC | ✓ (https://github.com/LouisChen0104/swinstfm.git, accessed on 12 January 2026) |
| [86] | MODIS, Landsat | ✗ | 1 | Content | RMSE, MSE, CC, SAM, SSIM, ERGAS, PSNR | CIA-LGC, AHB | ✗ |
| [87] | MODIS, Landsat | ✗ | 2 | Content | MAE, SAM, SSIM, PSNR | CIA, DX | ✗ |
| [88] | Planetscope, Pléiades | ✗ | 1/2 | Content, Vision | RMSE, CC, SAM, SSIM, UIQI | Hand-crafted | ✗ |
| [89] | MODIS, Landsat | ✗ | 1 | Content, Vision, Spectral, Adversarial | RMSE, PSNR, ERGAS, SAM, SSIM, UIQI | CIA, LGC, AHB, Tianjin | ✓ (https://github.com/MaZhaoX/SFT-GAN.git, accessed on 12 January 2026) |
| [90] | MODIS, Landsat | ✓ | 1 | Content, Vision | MAE, RMSE, PSNR, | Hand-crafted | ✓ (https://github.com/HuPengHua2021/THSTNet.git, accessed on 12 January 2026) |
| Method | Satellites | LST | Pairs | Loss Functions | Evaluation Metrics | Datasets | Code |
|---|---|---|---|---|---|---|---|
| [91] | MODIS, Landsat | ✗ | 2 | Content | RMSE, ERGAS, SAM | Hand-crafted | ✗ |
| [70] | MODIS, Landsat | ✗ | 2 | Content, Vision | RMSE, SAM, SSIM | CIA, LGC | ✗ |
| [92] | MODIS, Sentinel | ✗ | n | Content | RMSE, SSIM | Hand-crafted | ✗ |
| Training Strategy | List of Methods |
|---|---|
| Feature Extraction | [72,75,76,78,79,80,84] |
| Transfer Learning | [61] |
| Sample No. | Date | MODIS/Terra | Landsat 8 | Sample No. | Date | MODIS/Terra | Landsat 8 |
|---|---|---|---|---|---|---|---|
| 1 | 14 April 2013 | 11:54 | 10:43 | 27 | 21 October 2018 | 11:54 | 10:41 |
| 2 | 01 June 2013 | 11:08 | 10:43 | 28 | 26 February 2019 | 11:54 | 10:40 |
| 3 | 04 August 2013 | 11:54 | 10:43 | 29 | 02 June 2019 | 11:54 | 10:40 |
| 4 | 20 August 2013 | 11:54 | 10:43 | 30 | 04 July 2019 | 11:54 | 10:41 |
| 5 | 05 September 2013 | 11:54 | 10:42 | 31 | 06 September 2019 | 11:54 | 10:41 |
| 6 | 10 December 2013 | 11:54 | 10:42 | 32 | 01 April 2020 | 11:54 | 10:40 |
| 7 | 16 March 2014 | 11:54 | 10:41 | 33 | 19 May 2020 | 11:54 | 10:40 |
| 8 | 17 April 2014 | 11:54 | 10:41 | 34 | 22 July 2020 | 11:54 | 10:41 |
| 9 | 19 May 2014 | 11:54 | 10:40 | 35 | 07 August 2020 | 11:54 | 10:41 |
| 10 | 08 September 2014 | 11:54 | 10:41 | 36 | 27 November 2020 | 11:54 | 10:41 |
| 11 | 24 September 2014 | 11:54 | 10:41 | 37 | 04 April 2021 | 11:54 | 10:40 |
| 12 | 20 April 2015 | 11:54 | 10:40 | 38 | 10 August 2021 | 11:03 | 10:41 |
| 13 | 10 August 2015 | 11:54 | 10:41 | 39 | 06 March 2022 | 11:48 | 10:41 |
| 14 | 11 September 2015 | 11:47 | 10:41 | 40 | 22 March 2022 | 11:48 | 10:41 |
| 15 | 09 June 2016 | 11:54 | 10:40 | 41 | 09 May 2022 | 11:42 | 10:41 |
| 16 | 12 August 2016 | 11:54 | 10:41 | 42 | 13 August 2022 | 11:42 | 10:41 |
| 17 | 15 October 2016 | 11:54 | 10:41 | 43 | 29 August 2022 | 11:42 | 10:41 |
| 18 | 31 October 2016 | 11:54 | 10:41 | 44 | 30 September 2022 | 11:26 | 10:41 |
| 19 | 19 January 2017 | 11:54 | 10:41 | 45 | 01 November 2022 | 11:12 | 10:41 |
| 20 | 09 April 2017 | 11:54 | 10:40 | 46 | 28 May 2023 | 11:10 | 10:40 |
| 21 | 12 June 2017 | 11:54 | 10:40 | 47 | 13 June 2023 | 10:36 | 10:40 |
| 22 | 23 February 2018 | 11:54 | 10:40 | 48 | 19 October 2023 | 10:55 | 10:41 |
| 23 | 11 March 2018 | 10:40 | 10:40 | 49 | 12 April 2024 | 10:34 | 10:40 |
| 24 | 02 August 2018 | 10:48 | 10:40 | 50 | 19 September 2024 | 10:00 | 10:41 |
| 25 | 18 August 2018 | 11:55 | 10:40 | 51 | 05 October 2024 | 10:48 | 10:41 |
| 26 | 05 October 2018 | 11:53 | 10:41 |
| Metrics | ESTARFM | STTFN | EDCSTFN | MLFF-GAN | ESTARFM | STTFN | EDCSTFN | MLFF-GAN |
|---|---|---|---|---|---|---|---|---|
| 29 August 2022 | 30 September 2022 | |||||||
| RMSE (↓) | 5.350 | 6.258 | 5.725 | 5.758 | 2.650 | 2.649 | 2.317 | 2.549 |
| SSIM (↑) | 0.940 | 0.833 | 0.918 | 0.872 | 0.870 | 0.780 | 0.862 | 0.829 |
| PSNR (↑) | 21.410 | 20.048 | 20.821 | 20.772 | 22.900 | 22.894 | 24.058 | 23.227 |
| SAM (↓) | 8.450 | 9.374 | 8.812 | 9.118 | 6.38 | 7.661 | 6.336 | 6.798 |
| CC (↑) | 0.640 | 0.537 | 0.600 | 0.537 | 0.650 | 0.536 | 0.669 | 0.595 |
| ERGAS (↓) | 5.000 | 5.850 | 5.352 | 5.382 | 4.730 | 4.728 | 4.135 | 4.550 |
| 01 November 2022 | 28 May 2023 | |||||||
| RMSE (↓) | 2.720 | 2.137 | 1.579 | 1.037 | 2.390 | 3.213 | 2.719 | 3.175 |
| SSIM (↑) | 0.870 | 0.685 | 0.848 | 0.776 | 0.880 | 0.768 | 0.844 | 0.730 |
| PSNR (↑) | 23.190 | 18.980 | 20.449 | 24.098 | 23.780 | 21.2 | 22.651 | 21.304 |
| SAM (↓) | 6.500 | 4.861 | 2.937 | 3.666 | 3.410 | 4.600 | 4.043 | 5.555 |
| CC (↑) | 0.660 | 0.644 | 0.653 | 0.536 | 0.900 | 0.808 | 0.849 | 0.719 |
| ERGAS (↓) | 5.65 | 4.445 | 3.283 | 2.157 | 2.520 | 3.396 | 2.874 | 3.356 |
| 13 June 2023 | 19 October 2023 | |||||||
| RMSE (↓) | 1.950 | 1.993 | 1.937 | 2.710 | 3.820 | 1.877 | 2.842 | 3.028 |
| SSIM (↑) | 0.900 | 0.817 | 0.892 | 0.843 | 0.820 | 0.845 | 0.858 | 0.834 |
| PSNR (↑) | 26.140 | 25.936 | 26.183 | 23.26 | 20.970 | 24.322 | 24.718 | 20.166 |
| SAM (↓) | 3.010 | 3.414 | 3.115 | 3.671 | 8.470 | 5.132 | 5.540 | 5.131 |
| CC (↑) | 0.910 | 0.867 | 0.890 | 0.846 | 0.320 | 0.452 | 0.436 | 0.467 |
| ERGAS (↓) | 2.000 | 2.051 | 1.994 | 2.789 | 6.190 | 3.039 | 4.602 | 4.904 |
| 12 April 2024 | 19 September 2024 | |||||||
| RMSE (↓) | 4.480 | 4.365 | 4.186 | 4.000 | 3.890 | 4.484 | 3.030 | 3.281 |
| SSIM (↑) | 0.800 | 0.789 | 0.827 | 0.806 | 0.800 | 0.759 | 0.833 | 0.718 |
| PSNR (↑) | 20.440 | 20.656 | 21.014 | 21.414 | 20.660 | 16.923 | 20.329 | 19.637 |
| SAM (↓) | 10.910 | 8.738 | 9.702 | 9.852 | 9.300 | 7.480 | 7.230 | 7.825 |
| CC (↑) | 0.430 | 0.428 | 0.449 | 0.410 | 0.430 | 0.535 | 0.595 | 0.449 |
| ERGAS (↓) | 6.480 | 6.313 | 6.058 | 5.785 | 5.460 | 6.289 | 4.249 | 4.601 |
| Metrics | ESTARFM | STTFN | EDCSTFN | MLFF-GAN |
|---|---|---|---|---|
| Average | ||||
| RMSE (↓) | 3.406 | 3.372 | 3.042 | 3.196 |
| SSIM (↑) | 0.860 | 0.7845 | 0.861 | 0.800 |
| PSNR (↑) | 22.436 | 21.371 | 22.279 | 21.736 |
| SAM (↓) | 7.054 | 6.408 | 5.96 | 6.452 |
| CC (↑) | 0.618 | 0.601 | 0.643 | 0.576 |
| ERGAS (↓) | 4.754 | 4.5139 | 4.068 | 4.191 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Bouaziz, S.; Hafiane, A.; Canals, R.; Nedjai, R. Deep Learning for Spatio-Temporal Fusion in Land Surface Temperature Estimation: A Comprehensive Survey, Experimental Analysis, and Future Trends. Remote Sens. 2026, 18, 289. https://doi.org/10.3390/rs18020289
Bouaziz S, Hafiane A, Canals R, Nedjai R. Deep Learning for Spatio-Temporal Fusion in Land Surface Temperature Estimation: A Comprehensive Survey, Experimental Analysis, and Future Trends. Remote Sensing. 2026; 18(2):289. https://doi.org/10.3390/rs18020289
Chicago/Turabian StyleBouaziz, Sofiane, Adel Hafiane, Raphaël Canals, and Rachid Nedjai. 2026. "Deep Learning for Spatio-Temporal Fusion in Land Surface Temperature Estimation: A Comprehensive Survey, Experimental Analysis, and Future Trends" Remote Sensing 18, no. 2: 289. https://doi.org/10.3390/rs18020289
APA StyleBouaziz, S., Hafiane, A., Canals, R., & Nedjai, R. (2026). Deep Learning for Spatio-Temporal Fusion in Land Surface Temperature Estimation: A Comprehensive Survey, Experimental Analysis, and Future Trends. Remote Sensing, 18(2), 289. https://doi.org/10.3390/rs18020289

