By combining the advantages of various spatiotemporal data fusion models, the Flexible Spatiotemporal Data Fusion Method (FSDAF) [

49] ensures the minimum input requirement of data sources while capturing the features of both gradual and abrupt changes. The MODIS LST data (1000 m resolution) were resampled to 30 m resolution based on the nearest-neighbor algorithm method. Additionally, all input images needed to be cropped into the same area with the predetermined number of rows and columns before performing the spatiotemporal fusion method. The input data included two parts: (1) a pair of Landsat LST and MODIS LST data at T

_{1}, which were used to estimate the spatial differences between fine-resolution pixels and coarse-resolution pixels; (2) one MODIS LST at T

_{2}, which was used as the spatial feature reference and applied to calculate the time differences between T

_{1} and T

_{2} for predicting fine-resolution LST at T

_{2}. The implementation process was divided into six steps: (1) classify Landsat LST at time T

_{1}; (2) estimate the temporal changes taking place for each class of coarse-resolution MODIS LST from T

_{1} to T

_{2}; (3) predict the fine-resolution LST at T

_{2} based on predicted temporal changes and calculate pixel residuals of MODIS LST; (4) use the thin plate spline (TPS) interpolation function to predict the high-spatial-resolution LST based on the MODIS LST at T

_{2}; (5) allocate the residuals to predicted high-spatial-resolution LST with the TPS interpolation function; and (6) generate final fine resolution “Landsat-like” LST at T

_{2} based on the weights of pixels in the moving windows, which are assigned by nearest-neighborhood information. The calculation process is as shown in (6)–(12):

where

${\tilde{R}}_{high2}\left({x}_{ij},{y}_{ij},b\right)$ represents the DN values of fine spatial resolution pixels;

${R}_{high1}\left({x}_{ij},{y}_{ij},b\right)$ is Landsat pixel values;

$\Delta R\left({x}_{k},{y}_{k},b\right)$ is the change of spatial resolution from T

_{1} to T

_{2};

${\omega}_{k}$ is the weight;

$\Delta {R}_{high}\left(a,b\right)$ represents the change of class a of high spatial resolution data on band b (represents LST, NDVI, and NDBI data) from T

_{1} to T

_{2};

${\partial}_{high}\left({x}_{ij},{y}_{ij},b\right)$ is the residual, which is allocated to the high spatial resolution pixel

$j$ from MODIS LST pixel

$i$.

The calculation of weight was carried out with reference to previous studies [

41], and the TPS function that guides the residual distribution was conducted as shown in Equations (8)–(12):

where m is the number of sub-pixels in MODIS LST;

$\partial \left({x}_{ij},{y}_{ij},b\right)$ represents the residual values between Landsat LST pixels and the fine-resolution LST pixels predicted based on the temporal changes;

$\Delta {R}_{low}\left({x}_{ij},{y}_{ij},b\right)$ represents the pixel value changes of band b in MODIS LST from T

_{1} to T

_{2};

${R}_{high2}{}^{TP}\left({x}_{ij},{y}_{ij},b\right)$ and

${R}_{high2}{}^{SP}\left({x}_{ij},{y}_{ij},b\right)$ are high-spatial-resolution LST pixel values at T

_{2} based on temporal changes and optimized TPS interpolation function parameters, respectively, and

${E}_{h0}\left({x}_{ij},{y}_{ij},b\right)$ is the difference between two types of LST pixels;

$CW\left({x}_{ij},{y}_{ij},b\right)$ represents the weights of guiding residual allocation;

$W\left({x}_{ij},{y}_{ij},b\right)$ is the normalized weight of

$CW\left({x}_{ij},{y}_{ij},b\right)$;

$HI$ is homogeneous coefficient; and

${f}_{TPS-b}\left({x}_{ij},{y}_{ij}\right)$ is the TPS function corresponding to band b.

By using the ENVI IDL8.5 software package, the FSDAF model could be operated after many parameter settings were defined, including 30 pixels × 30 pixels window size, four classes of LST classification, and a similar pixel search threshold of 30.

Table 2 shows the input and the output of the FSDAF method for predicting fine-resolution LST data. A pair of inputs (one MODIS LST and one Landsat LST) in T1 and another single input [MODIS LST data (as a reference of spatial feature)] in T2 were used in the FSDAF method to generate the output “Landsat-like” LST data in T2. To assure better fusion effects, the pair of input data should generally be cloud-free and close to the acquisition date of the single input data.

As for generating monthly-series "Landsat-like" LST data, there were two steps: (1) a pair of daily LST data (Landsat LST and MODIS LST <MOD11A1>) and 8-day MODIS LST (MOD11A2) were selected as inputs in the FSDAF model, and the output—8-day "Landsat-like" LST—could be predicted; and (2) four predicted 8-day “Landsat-like” LSTs in a month were integrated with maximum synthetic method to obtain monthly "Landsat-like" LST. If there was no Landsat image available in one month, the Landsat image of the previous or the next month was chosen, as well as the corresponding MOD11A1 in that date. If there were two Landsat images available in one month, four predicted 8-day "Landsat-like" LST were calculated according to the first and the second half month, respectively. In the study, for further discussing the FSDAF model significance on a serial analysis while data missing or cloud contaminating, the integrated monthly LST dynamics in a whole year were performed.