Information-Guided Diffusion Model for Downscaling Land Surface Temperature from SDGSAT-1 Remote Sensing Images

Wang, Jianxin; Fu, Zhitao; Tang, Bohui; Xu, Jianhui

doi:10.3390/rs17101669

Open AccessArticle

Information-Guided Diffusion Model for Downscaling Land Surface Temperature from SDGSAT-1 Remote Sensing Images

¹

The Faculty of Land Resources Engineering, Kunming University of Science and Technology, Kunming 650032, China

²

The Key Laboratory of Guangdong for Utilization of Remote Sensing and Geographical Information System, Guangdong Open Laboratory of Geospatial Information Technology and Application, Guangzhou Institute of Geography, Guangdong Academy of Sciences, Guangzhou 510640, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(10), 1669; https://doi.org/10.3390/rs17101669

Submission received: 3 March 2025 / Revised: 24 April 2025 / Accepted: 29 April 2025 / Published: 9 May 2025

(This article belongs to the Special Issue Remote Sensing of Land Surface Temperature: Retrieval, Modeling, and Applications)

Download

Browse Figures

Versions Notes

Abstract

Land Surface Temperature (LST) is a parameter retrieved through the thermal infrared band of remote sensing satellites, and it is a crucial parameter in various climate and environmental models. Compared to other multispectral bands, the thermal infrared bands have lower spatial resolution, which limits their practical applications. Taking the Heihe River Basin in China as a case study, this research focuses on LST data retrieved from the SDGSAT-1 using the three-channel split-window algorithm. In this paper, we propose a novel approach, the Information-Guided Diffusion Model (IGDM), and apply it to downscale the SDGSAT-1 LST image. The results indicate that the downscaling accuracy of the SDGSAT-1 LST image using the proposed IGDM model outperforms that of Linear, Enhanced Deep Super-Resolution Network (EDSR), Super-Resolution Convolutional Neural Network (SRCNN), Discrete Cosine Transform and Local Spatial Attention (DCTLSA), and Denoising Diffusion Probabilistic Models (DDPM). Specifically, the RMSE of IGDM is reduced by 55.16%, 51.29%, 48.39%, 52.88%, and 17.18%. By incorporating auxiliary information, particularly when using NDVI and NDWI as auxiliary inputs, the performance of the IGDM model is significantly improved. Compared to DDPM, the RMSE of IGDM decreased from 0.666 to 0.574, MAE dropped from 0.517 to 0.376, and PSNR increased from 38.55 to 40.27. Overall, the results highlight the effectiveness of the auxiliary information-guided SDGSAT-1 LST downscaling diffusion model in generating high-resolution remote sensing LST data. Additionally, the study reveals the spatial feature impact of different auxiliary information in LST downscaling and the variations in features across different regions and temperature ranges.

Keywords:

land surface temperature; downscaling; diffusion model; deep learning

1. Introduction

Land Surface Temperature (LST) is a critical parameter for understanding heat flux exchange at the interface between the land surface and the atmosphere [1,2]. As a key variable, LST plays a significant role in a wide range of applications, including evapotranspiration estimation [3], urban heat island monitoring [4], forest fire detection [5], and biogeochemical process modeling [6]. High-spatial resolution LST data are particularly valuable as they allow for the accurate capture of local temperature variations [7], which are essential for understanding small-scale environmental changes [8]. With high-resolution LST data, researchers can analyze the spatiotemporal distribution of temperature across various landscapes with much greater precision, enabling a deeper understanding of local climate dynamics. This detailed information provides a scientific foundation for critical applications such as climate simulation [9], environmental assessments [10], and urban planning [11]. In addition, high-resolution LST data allow for more targeted decision-making in managing ecosystems [12], agriculture [13], and urban infrastructure [14]. As the development of technology faces increasing environmental challenges, the ability to monitor and address the impacts of temperature variations becomes ever more important [15]. Therefore, techniques that improve the spatial resolution of satellite remote sensing LST data—such as LST downscaling methods—are essential for future research and practical applications [16]. These methods provide a means to derive high spatial resolution LST from coarser remote sensing products, making it possible to conduct more precise and localized analyses [17].

In recent years, numerous methods for downscaling LST, also referred to as LST sharpening, have been proposed to enhance the spatial resolution of temperature data [18,19,20]. These methods aim to convert low-resolution LST data into high-resolution data while maintaining the thermal radiation characteristics of the surface. The underlying premise of these approaches is the assumption of scale invariance between surface parameters (independent variables) and LST (dependent variable) [21]. In essence, a mathematical model is first established at low resolution, and then this model is applied to high-resolution surface parameters to derive high-resolution LST data. The surface parameters utilized for downscaling in related studies are often referred to as scale factors [22]. These scale factors should accurately represent the key characteristics of LST, as their role is to guide the model in producing realistic high-resolution outputs. Among the various surface parameters used, vegetation cover has a significant influence on local temperature variations, making vegetation indices (VIS) one of the most commonly employed scale factors for LST downscaling [23,24]. Since the early work on downscaling techniques by Kustas [25] (DisTrad method) and Agam [26] (TsHARP method), several vegetation indices—such as the Normalized Difference Vegetation Index (NDVI) [27], Soil Adjusted Vegetation Index (SAVI) [28], Enhanced Vegetation Index (EVI) [29], and Green-Red Vegetation Index (GRVI) [30]—have been widely applied in the downscaling of LST across different land cover types. These indices are chosen because they reflect the vegetation’s influence on the surface’s thermal behavior, which plays a crucial role in determining local temperature patterns [31]. However, relying exclusively on vegetation indices presents certain limitations, as vegetation cover alone cannot fully capture the complexity of the LST dynamics. In response to this, researchers have expanded the use of scale factors to include spectral indices that account for other land surface characteristics. For instance, the Normalized Difference Water Index (NDWI) and the Normalized Difference Built-up Index (NDBI) have been introduced as additional scale factors to capture the effects of water bodies and urban areas on temperature variations [32,33]. These indices provide valuable information about the surface’s moisture content and built-up areas, which also significantly influence LST.

Despite the advancements in incorporating multiple scale factors to improve downscaling accuracy, there are still challenges. Models that rely on a linear relationship between low-resolution and high-resolution image pairs may not always be applicable, especially in areas where surface characteristics exhibit non-linear interactions [34]. In the context of the growing complexity and uncertainty surrounding the physical mechanisms involved in LST variations, advancements in technology have led to the emergence of numerous machine-learning approaches aimed at addressing the challenge of downscaling LST spatial heterogeneity [35]. Traditional techniques, such as random forests and BP neural networks, have made significant strides in predicting high-resolution LST images by leveraging the relationship between low-resolution and high-resolution LST images [36,37]. These methods rely on learning the patterns and correlations present in the data to estimate finer-scale LST values [38].

Among these, Convolutional Neural Networks (CNNs) have shown considerable promise in improving both modeling efficiency and prediction accuracy [39]. CNNs are particularly effective at establishing complex, non-linear relationships between input and output images, allowing them to capture intricate spatial dependencies in the data. With the increasing computational power of modern hardware, CNNs are now capable of automatically extracting relevant image features from large-scale datasets, which further enhances their predictive capabilities [40]. This ability to work with large and diverse datasets enables CNNs to outperform traditional techniques in many scenarios, leading to more accurate LST predictions. Several CNN-based spatiotemporal fusion models, such as STFDCNN [41], StfNet [42], and DCSTFN [43], have been developed in recent years to enhance the resolution of LST images. These models focus on combining spatial and temporal data to improve the accuracy of LST downscaling, utilizing advanced techniques to optimize the fusion of information from various sources. Compared to surface reflectance, which typically exhibits stable spatial patterns and is less affected by external factors, the downscaling of land surface temperature (LST) presents additional challenges. These include more complex physical processes, higher sensitivity to atmospheric and topographic variations, stronger temporal dynamics, limited access to high-resolution LST data, and nonlinear dependencies on auxiliary variables such as NDVI and NDWI. These factors collectively make LST downscaling more demanding in terms of model design, data preparation, and accuracy control. However, the majority of these approaches are primarily geared towards optical images, with relatively limited applications in the context of remote sensing LST images. This limitation presents a significant gap in the field, as LST images, which are crucial for climate and environmental studies, often come with lower spatial resolution and require effective downscaling methods to obtain high-resolution outputs.

To address these challenges, this study introduces an innovative SDGSAT-1 LST IGDM, designed to enhance the accuracy of converting low spatial resolution LST images into high spatial resolution LST images. The proposed IGDM model incorporates auxiliary information guidance, which improves the downscaling process by using Information data to guide the model’s predictions. The goal is not only to achieve high-resolution LST but also to demonstrate the feasibility of applying the IGDM model for efficient and precise LST downscaling in practical scenarios. By leveraging auxiliary information, the model can quickly obtain high spatial resolution data, providing valuable insights for remote sensing applications and climate monitoring. This approach holds promise for overcoming the current limitations in remote sensing LST image downscaling and contributes to advancing the field of environmental data analysis [44].

2. Data and Method

2.1. Study Area and LST Data

Study Area

The study area under consideration is the Heihe River Basin, which is located within the geographical coordinates of 98°E to 101°30′E longitude and 38°N to 42°N latitude, as shown in Figure 1. This vast basin spans a diverse range of elevations, ranging from 865 m to an impressive 5542 m above sea level. The topography of the region is marked by significant variations in elevation, with the southern parts exhibiting high-altitude terrains, while the northern regions are characterized by relatively low-lying areas. Similarly, the eastern part of the basin experiences higher elevations compared to the western part, which is notably flatter. LST in this region is influenced by a complex interplay of various factors, such as the diverse terrain, vegetation types, climate change, and human activities. As a result, the LST in the Heihe River Basin demonstrates considerable spatial and seasonal variations. The seasonal temperature fluctuations are particularly striking, with summer temperatures typically ranging between 25 °C to 30 °C, while winter temperatures can plunge below −20 °C. These drastic seasonal variations are a key feature of the region’s climate. The upstream mountainous areas, with their higher altitudes and extensive forest cover, generally experience lower LSTs. These regions are less affected by climate change compared to the downstream areas, due to their relatively stable and cooler environment. On the other hand, the downstream plain areas are undergoing rapid urbanization and agricultural expansion, which has significantly altered the LST patterns. The rise in human activities, including agriculture and urban development, has led to a marked increase in LST in these regions. Moreover, the temperature contrast between day and night has become more pronounced, further contributing to the growing variability in LST across the basin [45].

In summary, the Heihe River Basin presents a unique combination of natural and anthropogenic factors that together influence the LST, with distinct regional and seasonal differences. These temperature variations play a crucial role in understanding the dynamics of the basin’s climate, land use, and ecological systems.

2.2. SDGSAT-1 Land Surface Temperature Data

Sustainable Development Science Satellite 1 (SDGSAT-1) is the world’s first scientific satellite dedicated to the 2030 Agenda and the first geoscience satellite of the Chinese Academy of Sciences. This study primarily utilizes the thermal infrared imaging instrument on SDGSAT-1 (It is manufactured by the Chinese Academy of Sciences, China, Beijing) to detect the spatial distribution of LST. The payload includes three spectral bands for detection: 8–10.5 μm, 10.3–11.3 μm, and 11.5–12.5 μm, with a spatial resolution of 30 m. Using thermal infrared and multispectral remote sensing images from the SDGSAT-1 satellite, this study applied a three-channel split window algorithm to invert the LST of the Heihe River Basin from December 2021 to December 2022, covering both daytime and nighttime image data [49].

The tri-channel split-window (TCSW) algorithm was adopted in this study primarily because the thermal infrared sensor onboard the SDGSAT-1 satellite provides three specific bands with central wavelengths at 9.35 μm, 10.73 μm, and 11.72 μm, which offer a solid foundation for applying this method. Furthermore, the TCSW algorithm has been widely utilized in LST retrieval from other satellite sensors, making it a suitable candidate for comparison and evaluation in our study to identify algorithms best suited for SDGSAT-1 data. In terms of accuracy, the algorithm achieved an RMSE of 0.75 K in simulated data validation. Field validation at the HiWATER and SURFRAD sites yielded RMSE values of 2.57 K and 1.87 K, respectively, with both sites showing high R² values of 0.98. It is worth noting that the performance of TCSW may decrease under high atmospheric water vapor conditions, where its stability is somewhat lower than that of the W-TCSW algorithm. Nevertheless, when compared with existing LST products, the TCSW-retrieved LST maps exhibit similar spatial distribution patterns, although some differences are observed in fine-scale spatial details [49].

This study used the three-channel split window algorithm to invert the LST from SDGSAT-1 as input data for subsequent downscaling models. The algorithm takes into account information from three thermal infrared channels simultaneously, building upon the traditional split window method, as shown in Equation (1):

\begin{matrix} T_{S} = B_{0} + B_{1} \times T_{1} + B_{2} \times T_{2} + B_{3} \times T_{3} \\ + B_{4} \times T_{1} \frac{1 - ε_{1}}{ε_{1}} + B_{5} \times T_{2} \frac{1 - ε_{2}}{ε_{2}} + B_{6} \times T_{3} \frac{1 - ε_{3}}{ε_{3}} \end{matrix}

(1)

where,

T_{1}

,

T_{2}

and

T_{3}

represent the on-board brightness temperatures of the three thermal infrared channels of SDGSAT-1,

ε_{1}

,

ε_{2}

and

ε_{3}

are the surface emissivity corresponding to each of the three channels;

B_{0}

to

B_{6}

are the coefficients of the multi-window algorithm for the three channels. The quantification of the seven coefficients Bi in Table 1.

641 remote sensing LST images with a resolution of 30 m were obtained and resampled to a resolution of 120 m for model training and accuracy verification. We applied the bicubic interpolation method for resampling. This method estimates the value of a new pixel by calculating a weighted average of the 16 surrounding pixels, with the weights determined by a cubic function of the distance from the center of the target pixel. This approach helps maintain the spatial consistency of LST features. The entire dataset consists of 641 scenes, with a training-to-testing ratio of 7:3. A total of 448 scenes were used for training and validation, while the remaining 193 scenes were used for testing. The results presented in Table 1 are based on the testing conducted with these 193 scenes.

2.3. Method

IGDM

IGDM is divided into forward and backward processes. The original data include t diffusion processes. Each step adds plus Gaussian noise with respect to the data of the previous step with the following Equation (2):

\begin{matrix} x_{t} = \sqrt{α_{t}} x_{t - 1} + \sqrt{1 - α_{t}} ε \end{matrix}

(2)

where is

α_{t}

constant t of determination,

ε

belongs to

ε ~ N (0, I)

.

The above can be equivalently expressed as Equation (3):

\begin{matrix} q (x_{t}| x_{t - 1}) = N (x_{t}; \sqrt{1 - β_{t}} x_{t - 1}, β_{t}) \end{matrix}

(3)

where

β_{t}

is The variance used at each step is added in the process of diffusion.

β_{t}

∈ (0, 1),

β_{t} = 1 - α_{t}

. The diffusion process is a Markov process.

Figure 2 illustrates the forward and backward diffusion processes. The conditional probability distribution for the forward diffusion process is shown at the top, while the conditional probability distribution for the backward diffusion process is shown at the bottom. The goal of this study is to train a network capable of simulating the conditional probability distribution of the backward diffusion process and using it to recover images from noise (i.e., backward processing).

Figure 3 shows the framework of IGDM. IGDM first preprocesses the training and test datasets, then up-samples the resampled low-resolution LST image using bicubic interpolation to obtain a rough high-resolution LST image. During the training process, noise and corresponding auxiliary information are added to the rough high-resolution land surface temperature data at a set time step and input into the U-Net network. The U-Net network collaborates with the “Noise Scheduler” to predict the noise. The network is optimized by calculating the loss through a comparison with the real noise, resulting in the optimal trained model.

IGDM can generate relatively reasonable global semantics and reconstruct the overall LST features using randomly generated noise. However, due to the randomness of the denoising process, the generated high-resolution LST image may not match the actual retrieved LST image. To address this, auxiliary information is integrated into the noise prediction process of the diffusion model. Guided by the time step

t

, the noise image

x_{t}

and the auxiliary information are used, and the U-net predicts the removal of noise from

x_{t - 1}

. The predicted distribution

p_{θ} (x_{t - 1}| x_{t}, c_{t f})

of

x_{t - 1}

can be described as a Gaussian distribution, as indicated in Equation (4).

\begin{matrix} p_{θ} (x_{t - 1}| x_{t}, c_{t f}) = N (x_{t - 1}; \frac{1}{\sqrt{α_{t}}} (x_{t} - \frac{1 - α_{t}}{\sqrt{1 - \bar{α_{t}}}} ϵ_{θ} (x_{t}, c_{t f})), \sum_{θ} (x_{t}, c_{t f}) I) \end{matrix}

(4)

where

\bar{α_{t}} : = \prod_{i = 1}^{t} α_{i}

,

c_{t f}

is the auxiliary information, and

I

is the identity matrix. The neural network predicts the noise

ϵ_{θ} (x_{t}, c_{t f})

and the variance

\sum_{θ} (x_{t}, c_{t f})

. Under the constraint of auxiliary information, the network does not generate significant deviations from the true value of

x_{t - 1}

. To train the U-Net, a sample

x_{t}

is generated by adding Gaussian noise to

x_{0}

. The trained U-Net then predicts the added noise, which is optimized using the standard mean squared error (MSE) loss, as indicated in Equation (5).

\begin{matrix} L_{s i m p l e} : = E_{t}, x_{0}, ϵ [{||{ϵ_{t} - ϵ}_{θ} (x_{t}, c_{t f})||}^{2}] \end{matrix}

(5)

where

E_{t}, x_{0}, ϵ

represents the expected value of the noise prediction error with respect to

x_{0}

, the time step

t

, and the noise realization

ε

. Additionally, to avoid potentially unreliable outputs caused by noise, IGDM employs multiple independently executed diffusion models. By accumulating their output results and taking the average, a more stable and reliable prediction is ultimately generated.

2.4. Methods of Comparison

In this study, several common image super-resolution methods are used for comparing LST downscaling models. The methods include four approaches: LINEAR, EDSR [50], SRCNN [51], and DCTLSA [52]. The LINEAR method in image super-resolution typically involves using a linear combination to enhance image resolution. It constructs a mapping relationship by analyzing features in low-resolution images and leveraging known high-resolution samples. EDSR is a deep learning-based super-resolution model that utilizes CNNs to enhance image resolution. EDSR improves the network’s expressive power by removing the batch normalization layer and introduces residual learning to capture details and features more effectively for image super-resolution. SRCNN directly maps low-resolution images through a simple three-layer convolutional network. SRCNN first performs feature extraction on the input image and then generates a high-resolution output through feature reconstruction. DCTLSA combines the Discrete Cosine Transform with a local structure attention mechanism. It first performs frequency domain analysis on low-resolution images using DCT to extract important frequency information, and then enhances the features of key regions in the image by applying the local structure attention mechanism for image super-resolution.

For EDSR, SRCNN, and DCTLSA, in this strategy, we carefully split the available dataset into two subsets: the training set, which consisted of 70% of the total data, and the testing set, which made up the remaining 30%. This ratio was chosen to ensure the model was trained on a sufficiently large dataset while still allowing for reliable performance evaluation. The images used for training and testing were Low-Resolution (LR) LST images, which were fed into the respective models. These LST images were purposefully down-sampled to lower resolutions to simulate the effect of poor quality or incomplete data, which is common in real-world applications where high-resolution images might not always be available. The input LR images were then passed through each of the super-resolution models.

2.5. Metrics

In this study, the predicted high spatial resolution LST image and the corresponding real LST image are compared in order to evaluate the spatial consistency and accuracy of the model’s predictions. To visualize and analyze the relationship between the predicted and actual LST data, histograms and scatter density plots are generated. These plots provide an intuitive view of the spatial distribution alignment between the two datasets, allowing for a clear assessment of how well the model replicates the predicted LST. In addition, the coefficient of determination (R²) is calculated to quantify the strength of the linear correlation between the predicted and real LST values, with a higher R² indicating a better predictive performance, as indicated in Equation (10).

Furthermore, four key statistical indicators are employed to rigorously assess the accuracy of the proposed model. These indicators include Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Peak Signal-to-Noise Ratio (PSNR), and the coefficient of determination (R²), as indicated in Equations (6)–(9). Each of these metrics provides valuable insight into different aspects of the model’s performance. MAE measures the average magnitude of the errors in the predictions, RMSE gives more weight to larger errors by squaring the differences, PSNR evaluates the quality of the predicted images relative to the real ones, and R² reflects how well the model’s predictions align with the observed data. Collectively, these statistical indicators offer a comprehensive evaluation of the model’s accuracy and predictive capability.

\begin{matrix} M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - ŷ_{i}| \end{matrix}

(6)

\begin{matrix} M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - ŷ_{i})}^{2} \end{matrix}

(7)

\begin{matrix} R M S E = \sqrt{M S E} \end{matrix}

(8)

\begin{matrix} P N S R = 10 \cdot \log_{10} (\frac{{M A X}_{I}^{2}}{M S E}) \end{matrix}

(9)

\begin{matrix} R^{2} = 1 - \frac{\sum {(ŷ_{i} - y_{i})}^{2}}{\sum {({\bar{y}}_{i} - y_{i})}^{2}} \end{matrix}

(10)

where

y_{i}

is the true high-resolution LST image,

ŷ_{i}

is the predicted high-resolution LST image, and

{\bar{y}}_{i}

is the average of the predicted high-resolution LST image,

{M A X}_{I}

is the maximum value in the LST image.

3. Result

Table 2 presents a clear demonstration of the significant advantages of the IGDM in downscaling SDGSAT-1 LST, outperforming both traditional methods (such as the LINEAR model) and other mainstream deep learning approaches (including EDSR, SRCNN, and DCTLSA). The evaluation metrics, specifically MAE and RMSE, indicate that IGDM achieves superior performance in temperature prediction tasks, highlighting its effectiveness in accurately estimating high-resolution LST from lower-resolution inputs.

The MAE value of IGDM is 0.376, which is notably lower than that of all other methods, with the second-best performing method, DDPM, showing a significantly higher MAE of 0.517. This reduction in MAE signifies that IGDM delivers much higher prediction accuracy, with considerably smaller errors in the estimated temperatures. Similarly, IGDM also excels in the RMSE evaluation, achieving a value of 0.547. This result is a clear improvement over competing methods, such as DCTLSA (1.220) and SRCNN (1.161), further validating the model’s exceptional capability in minimizing both bias and variance in temperature predictions. These results not only illustrate IGDM’s remarkable ability to enhance prediction accuracy but also highlight its strength in capturing the finer details and spatial distribution patterns of temperature variations. This is visually represented in Figure 4g, where the predicted temperature image generated by IGDM reveals a more precise and granular representation of temperature changes compared to the outputs of other methods. An additional improvement in model performance was achieved by incorporating the Normalized Difference Vegetation Index (NDVI) as auxiliary information. The inclusion of NDVI further refined the downscaling process, improving the model’s overall accuracy and reliability in producing high-resolution LST estimates. This demonstrates the importance of auxiliary data in enriching model predictions. The visualized results further reinforce these findings, with clear distinctions between the outputs of IGDM and other methods. These intuitive visualizations underscore the superior performance of IGDM, emphasizing its potential for practical applications in high-precision temperature monitoring and remote sensing studies.

Figure 4 illustrates the performance of various downscaling models, including deep learning-based methods, DDPM, and IGDM, applied to the task of LST downscaling. In the testing area, the downscaled results from all methods show similar spatial textures, maintaining a consistent appearance in terms of general temperature patterns. However, the Linear method demonstrates a strong smoothing effect, which significantly distorts the finer details and results in an observable deviation from the true high-resolution LST. This smoothing effect leads to a substantial difference between the downscaled temperatures and the actual high-resolution data.

On the other hand, deep learning methods show a marked improvement in retaining the intricate details and textures of the LST, providing a more accurate representation of the spatial distribution of temperature across the area. These methods are better able to capture subtle variations in temperature, ensuring that the overall structure of the LST field is preserved.

While the DDPM generates LST estimates that are fairly close to the true values, it still exhibits some issues, particularly a tendency to overestimate temperatures. This overestimation is primarily due to noise interference that affects the model’s output. Additionally, the diffusion model, when used without any auxiliary information, continues to face the problem of producing consistently higher temperatures compared to the actual high-resolution data, leading to an overall bias in the predictions. In contrast, diffusion models that incorporate auxiliary information, such as the NDVI or other relevant environmental factors, are able to effectively mitigate this bias. By leveraging additional data, these enhanced diffusion models produce more accurate temperature predictions that not only align better with the true high-resolution LST but also exhibit improved detail and numerical precision. The auxiliary information helps the model to more precisely capture the underlying temperature dynamics, reducing errors and improving the overall quality of the downscaled results.

Given these findings, the IGDM demonstrates superior performance in the SDGSAT-1 remote sensing LST downscaling task. The incorporation of auxiliary information enhances the model’s ability to capture fine-scale temperature variations and produces more reliable, accurate downscaled temperature images. Therefore, IGDM stands out as the most effective method for this particular task, offering substantial improvements in both the spatial detail and numerical accuracy of the LST predictions.

To assess the degree of accuracy and consistency between the LST generated by various downscaling methods and the actual high-resolution values, Figure 5 presents a comparison of the numerical distribution of LST. The results demonstrate that LST produced by the diffusion model guided by IGDM aligns closely with the true values, exhibiting a high level of consistency. This significantly improves the overall fit of the predicted LST to the actual observed values, underscoring the positive impact of integrating auxiliary information into the downscaling process.

Figure 6 further complements this analysis by displaying the histogram of the error distribution between the LST generated by each downscaling method and the true values. The error distribution offers an insightful view of the precision of each method. The research findings reveal that for the IGDM, the majority of the errors are concentrated in the near-zero range. This indicates that IGDM produces highly accurate surface temperature predictions, with the errors being minimal and well-distributed around zero. Such results highlight the significant role that auxiliary information plays in enhancing the accuracy of LST prediction, as it helps to refine the model’s ability to capture the true temperature distribution more precisely.

In summary, the performance of IGDM in the SDGSAT-1 LST downscaling demonstrates the critical value of using auxiliary information to reduce prediction errors. The high degree of accuracy reflected in the error distribution suggests that IGDM can reliably replicate the LST characteristics, making it an effective tool for temperature downscaling applications in remote sensing.

Figure 7 presents the correlation scatter plot of LST results generated by various deep-learning downscaling methods. The analysis reveals that the instability inherent in deep learning models leads to notable deviations between the predicted LST from models like EDSR, SRCNN, and DCTLSA and the true values. These models tend to produce LST that not only deviates significantly from the actual measurements but also exhibits redundancy and considerable volatility. This results in predictions that are less consistent and accurate, particularly in capturing the fine-scale variations in temperature.

In contrast, the Linear method, while more stable, is highly susceptible to the influence of surrounding temperature values. It struggles to respond swiftly to abrupt temperature changes, leading to a pronounced smoothing effect in the predictions. As a result, the Linear method tends to dampen rapid temperature fluctuations and does not adequately capture the finer details of the LST distribution. While its performance in terms of error metrics like MAE and RMSE is relatively better compared to DDPM without auxiliary information, its R² value is notably lower. This suggests that, although the Linear method may yield less extreme error values, it is less effective in capturing the overall relationship between the predicted and true LST. The DDPM, although also affected by noise, demonstrates better overall performance than the Linear method. While there are still some discrepancies in the LST values, especially in areas with high variability, the diffusion model achieves a relatively high R² value, reflecting its stronger ability to capture the overall trend and structure of the surface temperature distribution. However, despite this higher R², its MAE and RMSE values are larger, indicating that the model still suffers from errors that hinder its accuracy, especially in terms of finer temperature details.

In comparison, the IGDM stands out as the most effective method. By incorporating auxiliary information, such as terrain features, vegetation cover, and other environmental factors, IGDM is able to better capture the complex relationships between surface temperature and these influencing factors. This enables IGDM to produce more accurate and nuanced downscaled LST. The incorporation of auxiliary information allows IGDM to more effectively reflect the spatial variations in LST, making it superior to other downscaling methods in terms of both accuracy and downscaling effectiveness. Overall, IGDM provides a more reliable and precise temperature prediction, with fewer errors, better consistency, and a stronger correlation with the true values.

4. Discussion

4.1. The Impact of Different Auxiliary Information on Downscaling Results

This section delves into the influence of auxiliary information on the downscaling results of surface temperature, specifically examining the roles of NDVI and NDWI as auxiliary factors. Their contributions to the downscaling process are evaluated through a series of experiments conducted in the same testing area. Table 3 provides a detailed comparison of the performance of the IGDM when combined with different types of auxiliary information, with a particular focus on LST prediction accuracy.

The results demonstrate that IGDM excels in two crucial indicators, MAE and RMSE, when integrated with auxiliary information. Specifically, the IGDM-NDVI model delivers exceptional prediction accuracy, with an MAE of 0.376 and an RMSE of 0.547, significantly outperforming other methods. These values reflect a low level of error fluctuations and highlight the model’s stability in predicting LST. Notably, when compared to the DDPM method, which yields an MAE of 0.517 and an RMSE of 0.666, IGDM_NDVI clearly demonstrates superior prediction accuracy and greater stability, underscoring its effectiveness in temperature downscaling tasks.

On the other hand, while IGDM_NDWI (with MAE of 0.435 and RMSE of 0.613) shows an observable improvement over methods without auxiliary information, it does not quite match the performance of IGDM_NDVI. However, it still outperforms DDPM, though the margin of improvement is smaller. This indicates that while NDWI contributes positively, its effect on improving the prediction is less pronounced than that of NDVI.

The combined use of NDVI and NDWI in the IGDM model leads to a slight increase in MAE and RMSE, with values rising to 0.590 and 0.796, respectively. Despite this increase, the model’s performance still outperforms DDPM. However, this marginal rise in error highlights the potential challenges of integrating multiple auxiliary factors. The simultaneous use of NDVI and NDWI may introduce issues such as redundant information and conflicts, especially in areas with complex or overlapping land cover types. Furthermore, discrepancies in acquisition times between these indices and LST data, increased model complexity, and the introduction of noise during training could contribute to biased predictions. The high correlation between NDVI and NDWI may also lead to model overfitting, as the model may overly rely on redundant features, ultimately diminishing its ability to generalize effectively to new test data. This suggests that, under certain conditions, simpler models incorporating a single auxiliary factor like NDVI might be more effective, as they could help avoid these issues and improve the accuracy of the downscaled LST results.

This study investigates the effect of different auxiliary guidance information on the downscaling results of surface temperature in diffusion models, with a focus on the NDVI and NDWI as guiding factors. As illustrated in Figure 8, the experiment concentrated on the central area of the study region, which encompasses both high and low-temperature zones. When compared to the results obtained without any guiding information, the downscaled surface temperature outcomes using NDVI as a guiding factor demonstrated more intricate texture details. This improved detail allowed for a more accurate reflection of how vegetation cover influences surface temperature variations.

In contrast, due to the negative correlation between NDVI and NDWI, using NDWI as a guidance factor led to a decrease in LST in the low-temperature areas and an increase in temperatures in the high-temperature regions, though the overall temperature trend remained largely consistent with the unadjusted results. When both NDVI and NDWI were used together as guiding factors, the downscaled LST results resembled those obtained using NDVI alone, but with enhanced detail. This dual-guidance approach resulted in a LST image that better aligned with the NDVI-guided results, while also highlighting the influence of NDWI, particularly in the high-temperature region located in the lower right corner of Figure 8. The combined use of NDVI and NDWI thus contributed to a more nuanced and accurate representation of surface temperature variations across the study area.

To investigate the specific impact of different guiding information on downscaling results, Figure 9 shows the error map under different auxiliary information and Figure 10 presents a correlation analysis and quantitative evaluation comparing downscaled LST data (with and without guiding information) to actual LST measurements. While the overall correlation between the downscaled results, both with and without guiding information, shows little difference, significant improvements were observed in downscaling performance when comparing the MAE and RMSE indicators. Specifically, downscaling with guiding information yielded more accurate results, with lower MAE and RMSE values. This demonstrates that guiding information significantly enhances the accuracy of LST downscaling.

4.2. Analysis of Remote Sensing Surface Temperature Downscaling Results in Different Regions

This study selected typical agricultural areas, urban areas, vegetation zones, water bodies, and village regions for analysis. The research findings show that IGDM performs well across different areas. In agricultural regions, the regular layout of farmland leads to a uniform and orderly LST distribution, with relatively stable temperature variations. Due to the influence of land use patterns, the LST exhibits distinct spatial stratification characteristics. In urban areas, where low-density residential zones are prevalent, the LST is relatively low, and the method also performs well in generating regions with lower temperatures. In vegetation areas, the results indicate that even when certain areas experience extremely high or low temperatures, the model effectively prevents excessive smoothing, a potential issue in deep-learning models. In water body regions, the mountainous areas in the medium-to-high temperature zone make up a significant proportion. The research found that the model can still generate clear textural features and accurately represent different types of surface coverage. In village areas, where villages, medium-temperature farmland, and small ponds with low temperatures coexist, the model also demonstrates strong performance in handling LST across regions with multiple feature distributions. The specific result is shown in Figure 11.

As is shown in Figure 12, to further elaborate on the findings of this study, it is crucial to delve deeper into the implications of incorporating various auxiliary information for the downscaling of Land Surface Temperature (LST) across different regions. The research highlights key insights regarding the influence of auxiliary data, such as NDVI and NDWI, on the accuracy and effectiveness of LST downscaling, particularly for water bodies. One noteworthy observation from the results is the substantial improvement in downscaling accuracy for water areas when auxiliary information was utilized. Without this guidance, water regions generally exhibited higher temperatures, leading to suboptimal downscaling performance. This can be attributed to the fact that water surfaces have distinct spectral properties that are not always well-captured by traditional remote sensing methods. However, the introduction of NDVI and NDWI—two indices that are known to capture vegetative and moisture-related characteristics—facilitated a more accurate representation of water bodies. This demonstrates the power of using auxiliary information that directly correlates with LST, such as vegetation indices, which help refine the downscaling models for water bodies [35].

On the other hand, vegetation areas presented a more complex challenge. Despite the inclusion of auxiliary information, the downscaling results were less accurate due to the inherent characteristics of these regions, including the rich texture information and large variations in temperature [53]. Vegetation areas often exhibit heterogeneous thermal behavior due to factors like plant type, canopy cover, and soil moisture. These variations make it difficult to achieve precise LST downscaling, regardless of the inclusion of additional information. This highlights an important limitation: while auxiliary data can improve downscaling accuracy in certain areas, their effectiveness may be reduced in regions with high spatial variability, such as those with significant vegetation cover [54].

In contrast, regions like farmland, urban, and village areas showed relatively stable results across different conditions. These areas likely have more uniform temperature patterns and less complexity in terms of texture, making them less sensitive to the inclusion of auxiliary data. In such environments, LST downscaling can be successfully performed even without the guidance of additional information, suggesting that the influence of auxiliary data is more significant in regions with greater thermal heterogeneity [55].

The study emphasizes that the selection of appropriate auxiliary information plays a pivotal role in the success of LST downscaling, especially when utilizing diffusion models. It becomes evident that the effectiveness of these models depends on the region-specific characteristics, such as temperature variation and texture complexity. Therefore, it is recommended to prioritize areas with smaller temperature differences and weaker texture features when selecting auxiliary data for LST downscaling. In areas where temperature variability is high, or texture complexity is dominant—such as vegetation zones—additional processing or more specialized auxiliary data might be necessary to improve the accuracy of the downscaling process.

In conclusion, this study underscores the importance of understanding regional characteristics and their relationship to LST when choosing auxiliary information. The results highlight that successful downscaling not only requires the incorporation of suitable auxiliary data but also an awareness of the underlying thermal and textural properties of the region in question.

5. Conclusions

In this study, we propose the Information-Guided Diffusion Model (IGDM) for downscaling LST from the SDGSAT-1 satellite in the Heihe River Basin. The key findings are:

(1): Effectiveness of IGDM: The IGDM demonstrates significant advantages in LST downscaling tasks, outperforming other deep learning models in terms of accuracy and spatial detail. The incorporation of NDVI as auxiliary information further improves the model’s performance, enhancing accuracy and reliability. The results show that the diffusion model-generated LST is closer to actual values, with improved numerical accuracy and detail.
(2): Role of NDVI and NDWI: Both NDVI and NDWI serve as valuable auxiliary information, enhancing the accuracy of LST predictions. While NDVI primarily captures the influence of vegetation, NDWI helps correct temperature deviations. When used together, they produce more accurate and detailed downscaling results, demonstrating the critical role of these indices in improving prediction precision.
(3): Region-Specific Performance: In areas with distinct features, such as water bodies, the introduction of auxiliary information significantly improves downscaling accuracy. However, in regions with complex textures and large temperature variations, the downscaling performance remains suboptimal, regardless of the auxiliary information used. This highlights the importance of selecting the right auxiliary data, especially in areas with less temperature variation and weaker texture features.

In conclusion, the IGDM outperforms traditional downscaling methods and, by incorporating NDVI and NDWI, offers significant improvements in accuracy, particularly in areas with clear temperature correlations with terrain and vegetation. The model shows great potential for practical applications, especially in regions with distinct features like water bodies, and highlights the importance of selecting appropriate auxiliary information for LST downscaling.

Supplementary Materials

The following supporting information can be downloaded at: https://data.tpdc.ac.cn/en/data/6bbf9a3f-e7d8-4255-9ecb-131e1543316d. Figure 1 Study area. (b) is the Heihe River region.

Author Contributions

Conceptualization, Z.F. and J.X.; methodology, J.W., Z.F. and J.X.; software, J.W., Z.F. and J.X.; validation, J.W., Z.F. and J.X.; formal analysis, J.W., Z.F. and J.X.; investigation, J.W., B.T. and J.X.; resources, Z.F. and J.X.; data curation, J.W., Z.F. and J.X.; writing—original draft preparation, J.W.; writing—review and editing, J.W. and B.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant [41961053]; Yunnan Fundamental Research Projects under Grant [202301AT070463]; Major scientific and technological projects of Yunnan Province [202202AD080010].

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank the International Research Center of Big Data for Sustainable Development Goals (CBAS) for kindly providing the SDGSAT-1 data. The datasets is provided by National Tibetan Plateau/Third Pole Environment Data Center (http://data.tpdc.ac.cn).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Fick, S.E.; Hijmans, R.J. WorldClim 2: New 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol. 2017, 37, 4302–4315. [Google Scholar] [CrossRef]
Li, Z.L.; Tang, B.H.; Wu, H.; Ren, H.Z.; Yan, G.J.; Wan, Z.M.; Trigo, I.F.; Sobrino, J.A. Satellite-derived land surface temperature: Current status and perspectives. Remote Sens. Environ. 2013, 131, 14–37. [Google Scholar] [CrossRef]
Weng, Q.H.; Lu, D.S.; Schubring, J. Estimation of land surface temperature-vegetation abundance relationship for urban heat island studies. Remote Sens. Environ. 2004, 89, 467–483. [Google Scholar] [CrossRef]
Shen, Y.; Shen, H.F.; Cheng, Q.; Zhang, L.P. Generating Comparable and Fine-Scale Time Series of Summer Land Surface Temperature for Thermal Environment Monitoring. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2021, 14, 2136–2147. [Google Scholar] [CrossRef]
Sun, L.; Chen, Z.X.; Gao, F.; Anderson, M.; Song, L.S.; Wang, L.M.; Hu, B.; Yang, Y. Reconstructing daily clear-sky land surface temperature for cloudy regions from MODIS data. Comput. Geosci. 2017, 105, 10–20. [Google Scholar] [CrossRef]
Peng, Z.X.; Zhou, J.; Liu, S.M.; Li, M.S.; Zhu, L.Q. Influences of ground structure on remotely sensed land surface temperature. In Proceedings of the 36th IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 1368–1371. [Google Scholar]
Liu, H.M.; He, B.J.; Gao, S.H.; Zhan, Q.M.; Yang, C. Influence of non-urban reference delineation on trend estimate of surface urban heat island intensity: A comparison of seven methods. Remote Sens. Environ. 2023, 296, 113735. [Google Scholar] [CrossRef]
Li, R.B.; Li, H.; Bian, Z.J.; Cao, B.A.; Du, Y.M.; Liu, Q.H. A Temperature-Based Validation Method for Medium and High Spatial Resolution LST Products. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Pasadena, CA, USA, 16–21 July 2023; pp. 6306–6309. [Google Scholar]
Zhu, X.L.; Duan, S.B.; Li, Z.L.; Wu, P.H.; Wu, H.; Zhao, W.; Qian, Y.G. Reconstruction of land surface temperature under cloudy conditions from Landsat 8 data using annual temperature cycle model. Remote Sens. Environ. 2022, 281, 113261. [Google Scholar] [CrossRef]
Saraskanroud, S.A.; Ouri, B.F.; Zeinali, B.; Mostafazadeh, R. Estimation of land surface temperature (LST) using single-channel and multi-band methods in Sablan mountainous region. Adv. Space Res. 2024, 74, 2915–2929. [Google Scholar] [CrossRef]
Zhang, Z.C.; Luan, W.X.; Yang, J.; Guo, A.D.; Su, M.; Tian, C. The influences of 2D/3D urban morphology on land surface temperature at the block scale in Chinese megacities. Urban Clim. 2023, 49, 101553. [Google Scholar] [CrossRef]
Saleh, S.K.; Sanaei, A.; Amoushahi, S.; Ranjbar, S. Effect of landscape pattern changes and environmental indices on land surface temperature in a fragile ecosystem in southeastern Iran. Environ. Sci. Pollut. Res. 2023, 30, 34037–34053. [Google Scholar] [CrossRef]
Zhou, D.C.; Li, D.; Sun, G.; Zhang, L.X.; Liu, Y.Q.; Hao, L. Contrasting effects of urbanization and agriculture on surface temperature in eastern China. J. Geophys. Res.-Atmos. 2016, 121, 9597–9606. [Google Scholar] [CrossRef]
Sun, M.; Qiu, J.; Wang, N.; Ye, J.H.; Li, M.S. Predicting Land Surface Temperature and Land Cover Changes Based on Multisource Remote Sensing Spatio-Temporal Fusion in Hefei, Eastern China. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2023, 16, 8764–8781. [Google Scholar] [CrossRef]
Du, J.; Song, K.S.; Yan, B.H. Impact of the Zhalong Wetland on Neighboring Land Surface Temperature Based on Remote Sensing and GIS. Chin. Geogr. Sci. 2019, 29, 798–808. [Google Scholar] [CrossRef]
Li, Y.; Hou, J.; Huang, C. Spatial Downscaling Land Surface Temperature based on Copula. Remote Sens. Technol. Appl. 2017, 32, 818–824. [Google Scholar]
Hu, Y.X.; Tang, R.L.; Jiang, X.G.; Li, Z.L.; Jiang, Y.Z.; Liu, M. Spatial downscaling of land surface temperature based on surface energy balance. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Waikoloa, HI, USA, 26 September–2 October 2020; pp. 4926–4929. [Google Scholar]
Crosson, W.L.; Al-Hamdan, M.Z.; Hemmings, S.N.J.; Wade, G.M. A daily merged MODIS Aqua-Terra land surface temperature data set for the conterminous United States. Remote Sens. Environ. 2012, 119, 315–324. [Google Scholar] [CrossRef]
Coops, N.C.; Duro, D.C.; Wulder, M.A.; Han, T. Estimating afternoon MODIS land surface temperatures (LST) based on morning MODIS overpass, location and elevation information. Int. J. Remote Sens. 2007, 28, 2391–2396. [Google Scholar] [CrossRef]
Zhang, Z.M.; He, G.J.; Wang, M.M.; Long, T.F.; Wang, G.Z.; Zhang, X.M.; Jiao, W.L. Towards an operational method for land surface temperature retrieval from Landsat 8 data. Remote Sens. Lett. 2016, 7, 279–288. [Google Scholar] [CrossRef]
Njuki, S.M.; Mannaerts, C.M.; Su, Z.B. An Improved Approach for Downscaling Coarse-Resolution Thermal Data by Minimizing the Spatial Averaging Biases in Random Forest. Remote Sens. 2020, 12, 3507. [Google Scholar] [CrossRef]
Yang, Y.B.; Li, X.L.; Pan, X.; Zhang, Y.; Cao, C. Downscaling Land Surface Temperature in Complex Regions by Using Multiple Scale Factors with Adaptive Thresholds. Sensors 2017, 17, 744. [Google Scholar] [CrossRef]
Quan, J.; Zhan, W.; Chen, Y.; Liu, W. Downscaling remotely sensed land surface temperatures: A comparison of typical methods. J. Remote Sens. 2013, 17, 361–387. [Google Scholar]
Liu, Y.; Zhu, R.; Qian, J.; Dang, C.; Yue, H. Land Surface Temperature Downscaling Based on Multiple Factors. Remote Sens. Inf. 2020, 35, 6–18. [Google Scholar]
Kustas, W.P.; Norman, J.M.; Anderson, M.C.; French, A.N. Estimating subpixel surface temperatures and energy fluxes from the vegetation index-radiometric temperature relationship. Remote Sens. Environ. 2003, 85, 429–440. [Google Scholar] [CrossRef]
Agam, N.; Kustas, W.P.; Anderson, M.C.; Li, F.Q.; Neale, C.M.U. A vegetation index based technique for spatial sharpening of thermal imagery. Remote Sens. Environ. 2007, 107, 545–558. [Google Scholar] [CrossRef]
Mukherjee, S.; Joshi, P.K.; Garg, R.D. Evaluation of LST downscaling algorithms on seasonal thermal data in humid subtropical regions of India. Int. J. Remote Sens. 2015, 36, 2503–2523. [Google Scholar] [CrossRef]
Ma, J.; Yang, X.J.; Zhou, X.B.; Zhou, J. A practical method for downscaling land surface temperature with temporal and spatial information: A case study in a desert oasis. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 3656–3659. [Google Scholar]
Jiang, H.T.; Shen, H.F. SMOS soil moisture downscaling based on back propagation neural network with MODIS LST and EVI. In Proceedings of the 36th IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 1659–1662. [Google Scholar]
Wu, J.H.; Zhong, B.; Tian, S.F.; Yang, A.X.; Wu, J.J. Downscaling of Urban Land Surface Temperature Based on Multi-Factor Geographically Weighted Regression. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2019, 12, 2897–2911. [Google Scholar] [CrossRef]
Liu, H.M.; Li, M.; Zhan, Q.M.; Ma, Z.Y.; He, B.J. Homogeneity and heterogeneity of diurnal and nocturnal hotspots and the implications for synergetic mitigation in heat-resilient urban planning. Comput. Environ. Urban Syst. 2025, 117, 102241. [Google Scholar] [CrossRef]
Yang, G.J.; Pu, R.L.; Zhao, C.J.; Huang, W.J.; Wang, J.H. Estimation of subpixel land surface temperature using an endmember index based technique: A case examination on ASTER and MODIS temperature products over a heterogeneous area. Remote Sens. Environ. 2011, 115, 1202–1219. [Google Scholar] [CrossRef]
Firozjaei, M.K.; Mijani, N.; Kiavarz, M.; Duan, S.B.; Atkinson, P.M.; Alavipanah, S.K. A novel surface energy balance-based approach to land surface temperature downscaling. Remote Sens. Environ. 2024, 305, 114087. [Google Scholar] [CrossRef]
Yang, C.; Zhan, Q.M.; Lv, Y.Z.; Liu, H.M. Downscaling Land Surface Temperature Using Multiscale Geographically Weighted Regression Over Heterogeneous Landscapes in Wuhan, China. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2019, 12, 5213–5222. [Google Scholar] [CrossRef]
Uhrin, A.; Onacillová, K. Spatiotemporal analysis of land surface temperature and land cover changes in Prešov city using downscaling approach and machine learning algorithms. Environ. Monit. Assess. 2025, 197, 126. [Google Scholar] [CrossRef]
Gao, J.H.; Sun, H.; Xu, Z.H.; Zhang, T.; Xu, H.Y.; Wu, D.; Zhao, X. CPMF: An Integrated Technology for Generating 30-m, All-Weather Land Surface Temperature by Coupling Physical Model, Machine Learning, and Spatiotemporal Fusion Model. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5008216. [Google Scholar] [CrossRef]
Li, W.; Ni, L.; Li, Z.L.; Duan, S.B.; Wu, H. Evaluation of Machine Learning Algorithms in Spatial Downscaling of MODIS Land Surface Temperature. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2019, 12, 2299–2307. [Google Scholar] [CrossRef]
Li, S.Y.; Wan, H.; Yu, Q.; Wang, X.Y. Downscaling of ERA5 reanalysis land surface temperature based on attention mechanism and Google Earth Engine. Sci. Rep. 2025, 15, 675. [Google Scholar] [CrossRef] [PubMed]
Li, Y.K.; He, Q.; Liu, Y.Q.; Yan, Y.; Zhang, H.L.; Tan, J. A Physically Constrained Downscaling Framework for Hourly, All-Sky Land Surface Temperature in Mountainous Regions. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2025, 18, 8151–8174. [Google Scholar] [CrossRef]
Hoang, N.D.; Pham, P.A.H.; Huynh, T.C.; Cao, M.T.; Bui, D.T. Geospatial urban heat mapping with interpretable machine learning and deep learning: A case study in Hue City, Vietnam. Earth Sci. Inform. 2025, 18, 64. [Google Scholar] [CrossRef]
Li, W.S.; Yang, C.; Peng, Y.D.; Zhang, X.Y. A Multi-Cooperative Deep Convolutional Neural Network for Spatiotemporal Satellite Image Fusion. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2021, 14, 10174–10188. [Google Scholar] [CrossRef]
Liu, X.; Deng, C.W.; Chanussot, J.; Hong, D.F.; Zhao, B.J. StfNet: A Two-Stream Convolutional Neural Network for Spatiotemporal Image Fusion. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6552–6564. [Google Scholar] [CrossRef]
Tan, Z.Y.; Yue, P.; Di, L.P.; Tang, J.M. Deriving High Spatiotemporal Remote Sensing Images Using Deep Convolutional Network. Remote Sens. 2018, 10, 1066. [Google Scholar] [CrossRef]
Adeyeri, O.E.; Folorunsho, A.H.; Ayegbusi, K.I.; Bobde, V.; Adeliyi, T.E.; Ndehedehe, C.E.; Akinsanola, A.A. Land surface dynamics and meteorological forcings modulate land surface temperature characteristics. Sustain. Cities Soc. 2024, 101, 105072. [Google Scholar] [CrossRef]
Li, C.R.; Chen, F.; Wang, N.; Yu, B.; Wang, L. SDGSAT-1 nighttime light data improve village-scale built-up delineation. Remote Sens. Environ. 2023, 297, 113764. [Google Scholar] [CrossRef]
Zhong, B.; Ma, P.; Nie, A.H.; Yang, A.X.; Yao, Y.J.; Lü, W.B.; Zhang, H.; Liu, Q.H. Land cover mapping using time series HJ-1/CCD data. Sci. China-Earth Sci. 2014, 57, 1790–1799. [Google Scholar] [CrossRef]
Zhong, B.; Yang, A.X.; Nie, A.H.; Yao, Y.J.; Zhang, H.; Wu, S.L.; Liu, Q.H. Finer Resolution Land-Cover Mapping Using Multiple Classifiers and Multisource Remotely Sensed Data in the Heihe River Basin. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2015, 8, 4973–4992. [Google Scholar] [CrossRef]
Li, X.; Liu, S.M.; Xiao, Q.; Ma, M.G.; Jin, R.; Che, T.; Wang, W.Z.; Hu, X.L.; Xu, Z.W.; Wen, J.G.; et al. A multiscale dataset for understanding complex eco-hydrological processes in a heterogeneous oasis system. Sci. Data 2017, 4, 170083. [Google Scholar] [CrossRef] [PubMed]
Li, N.; Xu, J.H.; Li, X.; Qin, B.X.; Wang, Y.P.; Fu, D.J.; Zhong, K.W.; Qin, Z.H. A Novel Land Surface Temperature Retrieval Algorithm for SDGSAT-1 Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5000218. [Google Scholar] [CrossRef]
Carlos, H.; Moctezuma, D.; Nava, J. Super-Resolution of LiDAR Data Using EDSR-CBAM Neural Networks. In Proceedings of the 3rd International Conference on Geospatial Information Sciences (iGISc), Mexico, Mexico, 14–17 November 2023; pp. 135–146. [Google Scholar]
Ward, C.M.; Harguess, J.; Crabb, B.; Parameswaran, S. Image quality assessment for determining efficacy and limitations of Super-Resolution Convolutional Neural Network (SRCNN). arXiv 2019, arXiv:1905.05373. [Google Scholar]
Zeng, K.; Lin, H.J.; Yan, Z.Q.; Fang, J.S. Densely Connected Transformer With Linear Self-Attention for Lightweight Image Super-Resolution. IEEE Trans. Instrum. Meas. 2023, 72, 5023112. [Google Scholar] [CrossRef]
Chen, L.W.; Hu, B.T.; Sun, J.X.; Xu, Y.J.; Zhang, G.X.; Ma, H.B.; Ren, J.Q. Using remote sensing and machine learning to generate 100-cm soil moisture at 30-m resolution for the black soil region of China: Implication for agricultural water management. Agric. Water Manag. 2025, 309, 109353. [Google Scholar] [CrossRef]
Li, M.Q.; Wang, P.X.; Tansey, K.; Sun, Y.F.; Guo, F.W.; Zhou, J. Improved field-scale drought monitoring using MODIS and Sentinel-2 data for vegetation temperature condition index generation through a fusion framework. Comput. Electron. Agric. 2025, 234, 110256. [Google Scholar] [CrossRef]
Zhang, T.T.; Zhang, H.Y.; Wang, Y.Q.; Xiong, T.; Wang, M.Y.; Zhang, Z.X.; Guo, X.Y.; Zhao, J.J. A Global Extended MODIS-Compatible NDVI Dataset. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2025, 18, 8390–8398. [Google Scholar] [CrossRef]

Figure 1. Study area. (a) is the map of the China region. (b) is the Heihe River region [46,47,48] (For details, please refer to the “Supplementary Materials”).

Figure 2. Forward processes and backward processes of IGDM.

Figure 3. Framework of IGDM.

Figure 4. Comparison of Results of Different Deep Learning LST Downscaling Models in Visual Char.

Figure 5. Comparison of numerical distribution results of different LST downscaling models. The experimental results are derived from Figure 4.

Figure 6. Histogram of numerical error distribution in different deep learning LST downscaling models. The experimental results are derived from Figure 4.

Figure 7. Scatter plot of correlation between results of different deep learning LST downscaling models. The experimental results are derived from Figure 4.

Figure 8. Local magnification of LST downscaling results with and without guidance from different auxiliary information.

Figure 9. Scatter plot of correlation between LST downscaling results without guidance information and those with different guidance information. The experimental results are derived from Figure 8.

Figure 10. Scatter plot of correlation between LST downscaling results without guidance information and those with different guidance information. The experimental results are derived from Figure 8.

Figure 11. Visual representation of the downscaling results of LST in five different regions with and without auxiliary information. (a) DDPM. (b) IGDM_NDVI. (c) IGDM_NDWI. (d) IGDM_NDVI_NDWI.

Figure 12. RMSE curve of LST downscaling results in five different regions.

Table 1. The quantification of the seven coefficients Bi in Equation (1).

w (g·cm⁻²)	B₀	B1	B2	B3	B4	B5	B6
0–1.5	−8.85	0.03	1.85	−0.78	0.00	0.31	−0.14
1–2.5	17.33	−0.02	2.33	−1.25	0.00	0.34	−0.16
2–3.5	−17.10	0.22	2.76	−1.91	0.02	0.30	−0.17
3–4.5	−17.05	0.35	3.17	−2.46	0.02	0.24	−0.13
4–5.5	−25.05	0.45	3.42	−2.79	0.03	0.19	−0.09
5–6.5	−38.74	0.31	4.25	−3.43	0.01	0.16	−0.06

The coefficients of the three retrieval algorithms were regressed using the least square method from the simulation databases in six AWVC subranges: 0–1.5, 1–2.5, 2–3.5, 3–4.5, 4–5.5, and >5 g/cm².

Table 2. Comparison of downscaling results of surface temperature using different super-resolution models. The statistical results in this study are based on 193 satellite scenes.

Method	MAE (K)	RMSE (K)	PNSR (K)
LINEAR	0.640	1.123	34.01
EDSR	0.665	1.060	34.52
SRCNN	0.656	1.161	33.73
DCTLSA	0.850	1.220	33.29
DDPM	0.517	0.666	38.55
IGDM	0.376	0.547	40.27

Enhanced Deep Super-Resolution Network (EDSR), Super-Resolution Convolutional Neural Network (SRCNN), Discrete Cosine Transform and Local Spatial Attention (DCTLSA), and Denoising Diffusion Probabilistic Models (DDPM), Information-Guided Diffusion Model (IGDM).

Table 3. Comparison of results guided by different auxiliary information in IGDM.

Method	MAE (K)	RMSE (K)	PNSR (K)
DDPM	0.517	0.666	38.55
IGDM_NDVI	0.376	0.547	40.27
IGDM_NDWI	0.435	0.613	38.81
IGDM_NDVI_NDWI	0.590	0.796	37.01

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, J.; Fu, Z.; Tang, B.; Xu, J. Information-Guided Diffusion Model for Downscaling Land Surface Temperature from SDGSAT-1 Remote Sensing Images. Remote Sens. 2025, 17, 1669. https://doi.org/10.3390/rs17101669

AMA Style

Wang J, Fu Z, Tang B, Xu J. Information-Guided Diffusion Model for Downscaling Land Surface Temperature from SDGSAT-1 Remote Sensing Images. Remote Sensing. 2025; 17(10):1669. https://doi.org/10.3390/rs17101669

Chicago/Turabian Style

Wang, Jianxin, Zhitao Fu, Bohui Tang, and Jianhui Xu. 2025. "Information-Guided Diffusion Model for Downscaling Land Surface Temperature from SDGSAT-1 Remote Sensing Images" Remote Sensing 17, no. 10: 1669. https://doi.org/10.3390/rs17101669

APA Style

Wang, J., Fu, Z., Tang, B., & Xu, J. (2025). Information-Guided Diffusion Model for Downscaling Land Surface Temperature from SDGSAT-1 Remote Sensing Images. Remote Sensing, 17(10), 1669. https://doi.org/10.3390/rs17101669

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Information-Guided Diffusion Model for Downscaling Land Surface Temperature from SDGSAT-1 Remote Sensing Images

Abstract

1. Introduction

2. Data and Method

2.1. Study Area and LST Data

Study Area

2.2. SDGSAT-1 Land Surface Temperature Data

2.3. Method

IGDM

2.4. Methods of Comparison

2.5. Metrics

3. Result

4. Discussion

4.1. The Impact of Different Auxiliary Information on Downscaling Results

4.2. Analysis of Remote Sensing Surface Temperature Downscaling Results in Different Regions

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI