Reconstruction of MODIS Land Surface Temperature Products Based on Multi-Temporal Information

Land surface temperature (LST) products derived from the moderate resolution imaging spectroradiometer (MODIS) sensor are one of the most important data sources used to research land surface energy and water balance at regional and global scales. However, MODIS data are severely contaminated by cloud cover, which limits the applications of LST products. In this paper, based on the spatio-temporal autocorrelation of land surface variables, a reconstruction algorithm depending on the correlations between spatial pixels in multiple time phases from available MODIS LST data is developed to reconstruct clear-sky LST values for missing pixels. Considering the impacts of correlation and bias between predictors and reconstructed data on the modeling error, the known data in the reconstructed time phase are combined with the data temporally nearest to them as predictor variables to establish their temporal relationships with the reconstructed data. The reconstructed results are validated by a series of evaluation indices. The average correlation coefficient between the reconstructed results and ground-based observations is 0.87, showing high temporal change accuracy. The difference in Moran’s I, representing spatial structure characteristics between the known and reconstructed data, is 0.03 on average, indicating a slight loss of spatial accuracy. The average reconstruction rate is approximately 87.0%. The modeling error, as part of the reconstruction error, is only 1.40 K on average and accounts for 5.0% of the total error. If the product and modeling errors are removed, the residual error represents approximately 3.5 K and 5.6 K of the annual mean difference between the cloudy and cloudless LST at night and during the day, respectively. In addition, different reconstruction cases are demonstrated using various predictor data, including many combinations of multi-temporal MODIS LST data, the microwave brightness temperature, and the combination of the normalized difference vegetation index and terrain data. Comparisons among cases show that the known MODIS LST data are more reliable as predictor variables and that the data combination advocated in this paper is optimal.


Introduction
Land surface temperature (LST) plays an important role in land-atmosphere interactions.Therefore, LST data are widely required for studies on climate change, the water cycle, and energy budgets [1][2][3].With the development of remote sensing technology, the LST data obtained by transforming satellite-based thermal infrared observations are more valuable [4] and can better meet regional and global research requirements than ground-based observations.Due to high spatio-temporal resolution, the moderate resolution imaging spectroradiometer (MODIS) sensor has become one of the most widely used sources of satellite-derived LST data [5].However, daily MODIS LST products are strongly influenced by clouds, and only clear-sky pixel values are available, which leads to limited research applications.To overcome this restriction, several MODIS LST reconstruction algorithms have been developed.
To date, most of studies on the reconstruction of MODIS LST products have been focused on recovering clear-sky LST values for missing pixels.There are three categories of statistical methods using different information, including spatial, temporal, and spatio-temporal information.Spatial methods are the most common, wherein the relationship in space between the auxiliary data with spatial continuity and the available LST data is adopted, and the missing LST data are filled in by sharing the relationship model assuming that the known and unknown LST data have the same statistical relationship with the auxiliary data [6].Neteler [7] used the digital elevation model (DEM) as a covariate to generate reconstructed LST maps through linear regression when the LST map consisted of over 10% valid pixels; otherwise, 16-day MODIS LST data were used as the target variable.Fan, et al. [8] introduced land cover, normalized difference vegetation index, and MODIS band 7 as auxiliary data to fill the missing pixels using three methods: linear regression, regression tree analysis, and artificial neural networks.Ke, et al. [9] adopted latitude, longitude, DEM, and normalized difference vegetation index (NDVI) as predictor variables to establish a regression model used to capture large-scale spatial variability; they then applied the Kriging method to generate small-scale spatial information to supplement the regression prediction [10].For this category of methods, modeling accuracies are determined by the correlation between LST and its covariates.However, the LST variable is synthetically affected by many environmental factors, and limited kinds of predictor data cannot accurately represent the spatial heterogeneity of LST.The second category of methods is based on temporal information.Xu and Shen [11] used the harmonic analysis of time series (HANTS) algorithm [12] to fit available time series of LST data at each pixel and then recovered missing pixels using fitted values of the temporal model.Na, et al. [13] used another temporal method, called Savitzky-Golay filter [14], to reconstruct MODIS LST data under clear-sky conditions.However, the temporal filter method smooths LST diurnal variation and can represent only seasonal LST changes.Thus, a temporal method such as HANTS or Savitzky-Golay is not generally suitable to reconstruct daily MODIS LST products.The third category consists of spatio-temporal methods performed by establishing a regression relationship between similar pixels and missing pixels.Yu, et al. [15] selected pixels whose environmental factor vector (composed of DEM, slope, aspect, NDVI, and solar radiation) had the nearest Euclidean distance to the reconstructed pixel as the most similar pixel.However, the consistency in direction between environmental factor vectors of available and reconstructed pixels was ignored.It is possible that both vectors have almost the same magnitude but opposite directions, indicating a low degree of similarity between two pixels.Zeng, et al. [16] classified multi-temporal LST data and then considered several pixels in the class which the reconstructed pixels belong to as similar pixels.The more clearly land surface types are distinguished, the more accurate the regression relationship; however, the use of too many classes can lead to a lack of available similar pixels.Sun, et al. [17] defined similar pixels according to spatial distance.Pixels closer to the reconstructed pixel made a greater contribution to the reconstruction process.
The above methods can reconstruct cloud-free LST values but not cloudy ones; they are nonetheless valuable for long-term trend analysis and further estimation of real LST values under clouds.Few studies on recovering pixels under cloudy conditions have been performed based on model-based and statistical methods.Due to the difficultly of acquiring additional information, e.g., radiation and flux data at large spatial scales, model-based reconstruction methods have been given little attention.Jin [18] proposed a neighboring pixel method based on the surface energy balance theory assuming that the difference in LST values between a cloudy pixel and its neighboring clear-sky pixel is caused by differences in energy fluxes, e.g., net radiation and sensible and latent heat.Lu, et al. [19] improved the neighboring pixel method by introducing satellite observations.Zhang, et al. [20] used a one-dimensional heat transfer model to reconstruct cloudy pixels.In statistical methods, the solution for recovering cloudy MODIS LST pixels is to introduce microwave temperature brightness (TB) data due to penetrating clouds.Based on the strong correlation between infrared LST and TB data, TB data, particularly in the high-frequency range, have been widely used to estimate land surface emissivity [21,22] and LST [23,24] by employing infrared LST as auxiliary data.However, due to the difference in spatial scale, the 25-40 km microwave pixels cannot be directly filled into gaps of MODIS LST data.Kou, et al. [25] reconstructed missing MODIS LST pixels under cloudy conditions by merging microwave data using a Bayesian Maximum Entropy method.However, a complex land surface environment, e.g., with variations in terrain and vegetation, may reduce the correlation between LST and TB and thereby increase the reconstruction error.
In this paper, a reconstruction algorithm based on spatio-temporal information is used to recover daily MODIS LST data.The available LST data at different spatial pixels and times are used as predictor variables and are optimally selected by analyzing the correlation and bias between multi-temporal MODIS LST data.In addition, reconstructed results employing different kinds of predictor data, e.g., NDVI, DEM, and TB, are compared with each other.

Study Area
This work was performed in the Babao River Basin, which is situated in the eastern branch of the upper reach of the Heihe River on the northeastern margin of Tibetan Plateau in Northwestern China (Figure 1).The Babao River Basin is a cloudy area, and the cloudy time generally exceeds half of a year due to a combination of westerlies, East Asia monsoon, and Tibetan Plateau monsoon.The stronger the atmospheric circulation becomes, the more there is cloud cover.In addition, the characteristics of the study area increase the probability of clouds.The elevation of the Babao River Basin is 3604 m on average and ranges from 2640 m to 5000 m; the annual rainfall is approximately 400 mm, which drives the formation of orographic clouds by forcing humid air to rise.The high vegetation cover and soil water content in the study area increase the surface evapotranspiration under the strong solar radiation, especially in summer afternoons, and the enhanced convective motion of the air promotes the formation of convective clouds, causing the cloud cover to be higher during the day than the night.The Babao River Basin is an ideal study area to research the reconstruction of MODIS LST products.temperature brightness (TB) data due to penetrating clouds.Based on the strong correlation between infrared LST and TB data, TB data, particularly in the high-frequency range, have been widely used to estimate land surface emissivity [21,22] and LST [23,24] by employing infrared LST as auxiliary data.However, due to the difference in spatial scale, the 25-40 km microwave pixels cannot be directly filled into gaps of MODIS LST data.Kou, et al. [25] reconstructed missing MODIS LST pixels under cloudy conditions by merging microwave data using a Bayesian Maximum Entropy method.However, a complex land surface environment, e.g., with variations in terrain and vegetation, may reduce the correlation between LST and TB and thereby increase the reconstruction error.
In this paper, a reconstruction algorithm based on spatio-temporal information is used to recover daily MODIS LST data.The available LST data at different spatial pixels and times are used as predictor variables and are optimally selected by analyzing the correlation and bias between multi-temporal MODIS LST data.In addition, reconstructed results employing different kinds of predictor data, e.g., NDVI, DEM, and TB, are compared with each other.

Study Area
This work was performed in the Babao River Basin, which is situated in the eastern branch of the upper reach of the Heihe River on the northeastern margin of Tibetan Plateau in Northwestern China (Figure 1).The Babao River Basin is a cloudy area, and the cloudy time generally exceeds half of a year due to a combination of westerlies, East Asia monsoon, and Tibetan Plateau monsoon.The stronger the atmospheric circulation becomes, the more there is cloud cover.In addition, the characteristics of the study area increase the probability of clouds.The elevation of the Babao River Basin is 3604 m on average and ranges from 2640 m to 5000 m; the annual rainfall is approximately 400 mm, which drives the formation of orographic clouds by forcing humid air to rise.The high vegetation cover and soil water content in the study area increase the surface evapotranspiration under the strong solar radiation, especially in summer afternoons, and the enhanced convective motion of the air promotes the formation of convective clouds, causing the cloud cover to be higher during the day than the night.The Babao River Basin is an ideal study area to research the reconstruction of MODIS LST products.

Data
MODIS Aqua LST products are taken as the experimental data to be reconstructed.Figure 2 shows the spatial and temporal distribution of pixels requiring reconstruction.The missing Aqua LST data at night are substantially fewer and more homogeneous in space than those during the day; the highest intensity appears from April to October.The Aqua LST data are severely affected by clouds during the day, showing a high missing data rate throughout the year.There is a significant correlation between the number of absent data and elevation, indicating that a higher elevation leads to a larger number of missing data.

Data
MODIS Aqua LST products are taken as the experimental data to be reconstructed.Figure 2 shows the spatial and temporal distribution of pixels requiring reconstruction.The missing Aqua LST data at night are substantially fewer and more homogeneous in space than those during the day; the highest intensity appears from April to October.The Aqua LST data are severely affected by clouds during the day, showing a high missing data rate throughout the year.There is a significant correlation between the number of absent data and elevation, indicating that a higher elevation leads to a larger number of missing data.A large amount of remote sensing data is employed to perform the reconstruction algorithm (see Table 1).All available pixel values derived from MODIS Aqua and Terra LST products (10.5067/MODIS/MOD11A1.006 and 10.5067/MODIS/MYD11A1.006) are potential predictor data for filling missing pixels.To compare reconstruction results based on different data sources, two groups of predictor variables are used to reconstruct MODIS Aqua LST data.One group is multi-frequency and multi-polarization TB data derived from the Advanced Microwave Scanning Radiometer 2 (AMSR2) sensor (http://gcom-w1.jaxa.jp),and the other group is NDVI (10.5067/MODIS/MYD13A2.006) and terrain data, e.g., slope and aspect calculated using DEM data derived from shuttle radar topography mission (SRTM, https://lta.cr.usgs.gov/SRTM).As shown in  A large amount of remote sensing data is employed to perform the reconstruction algorithm (see Table 1).All available pixel values derived from MODIS Aqua and Terra LST products (10.5067/MODIS/MOD11A1.006 and 10.5067/MODIS/MYD11A1.006) are potential predictor data for filling missing pixels.To compare reconstruction results based on different data sources, two groups of predictor variables are used to reconstruct MODIS Aqua LST data.One group is multi-frequency and multi-polarization TB data derived from the Advanced Microwave Scanning Radiometer 2 (AMSR2) sensor (http://gcom-w1.jaxa.jp),and the other group is NDVI (10.5067/MODIS/MYD13A2.006) and terrain data, e.g., slope and aspect calculated using DEM data derived from shuttle radar topography mission (SRTM, https://lta.cr.usgs.gov/SRTM).
As shown in The spatial reconstruction method employing terrain data as predictor data requires a consistent spatial scale between the predictor and target variables.First, the slope and aspect values are calculated using an algorithm incorporating the DEM values of the calculated cell's eight neighbors [26], and terrain data, including DEM, slope, and aspect, are then resampled at the same spatial scale as the MODIS LST data by averaging all 90 m pixels in the 1 km grids.

Processing Outliers
Outlier data may increase prediction uncertainties.Generally, it is recommended to leave the data in the 95% confidence interval, which cannot directly be applied to temporal data due to the existence of trends.It is feasible to find time series outlier values by identifying the unreasonable bias between the data and their trend.The criterion to select available data that are not affected by clouds is defined as follows: where ε bias is calculated by removing the temporal trend from data and µ and σ are the mean and standard deviation of ε bias , respectively.The data with eligible ε bias are retained.

LST Retrieval from Ground-Based Measurements
Ground-based radiation observations cannot be used to evaluate reconstructed results and need to be converted into LST values.Based on thermal radiative transfer theory, the ground-based LST T in-situ can be calculated using in-situ longwave radiation observations [27]: where L ↑ in-situ and L ↓ in-situ indicate ground-based upwelling and downwelling longwave radiation, respectively.δ is the Stefan-Boltzmann constant (5.67 ) and ε b is the broadband emissivity, which can be estimated by MODIS narrowband emissivity products [28,29]: where ε 29 , ε 31 , and ε 32 are the narrow emissivities of MODIS bands 29, 31, and 32, respectively.

MODIS LST Reconstruction Algorithm
In consideration of the spatio-temporal autocorrelation characteristics of land surface variables, the MODIS LST data that are not affected by clouds as the predictor variables are more reliable than other data sources, e.g., TB and NDVI, for predicting missing MODIS LST pixels.MODIS LST data are observed at 01:30, 10:30, 13:30, and 22:30 local time.The missing MODIS LST pixel values can be estimated based on the spatio-temporal correlation among LST values at different spatial pixels and times.Due to the strong temporal variability, suppose that the temporal correlation range is less than one day.Then, the MODIS LST pixels without values at the jth pixel at local time t on the dth day are filled using the equation as follows: where ω t k n(d,t k ) is the weighting coefficient vector and each element is matched with w t k i in Equation (4).T m,t j is a temporal vector, including the known LST data at the jth pixel at time t on the mth and is written as follows: Remote Sens. 2018, 10, 1112 However, if the correlation among column vectors in the observation matrix is high, especially when both sets of sequence data are spatially close, invalid weighting coefficients that lead to over-fitting will be estimated.This problem can be addressed by performing ridge regression [30,31]: where ωt k n(d,t k ) is the estimator of the weighting coefficient vector.λ is a regularization parameter.Herein, we set λ to 0.1.

Evaluation Index
The performance of the reconstruction algorithm is considered from four aspects: (1) the level of uncertainty in the reconstruction process; (2) whether the reconstructed data is consistent with the temporal changes with the known data; (3) whether the reconstructed data maintains similar spatial structure characteristics to the known data; and (4) the value of the reconstructed rate.
(1) Error Analysis: When the ground-based LST data are used as the ground truth to evaluate reconstructed data, the sources of reconstruction error can be decomposed as follows: where the reconstruction error ε re can be estimated by calculating the differences between the reconstructed data and the ground-based LST data: where E(•) represents the mean and T re and T in-situ are the reconstructed value and ground-based validation data calculated by Equation (2), respectively.
ε product indicates the accuracy of the MODIS LST product, which can be obtained by evaluating the known LST data: where T known is the LST data that are not affected by clouds.
ε modeling , the modeling error representing the modeling reliability, can be obtained from the residual errors of the model described by Equation ( 5): The missing LST data are reconstructed in clear sky days, but the ground-based LST values under clouds are employed as validation data.Thus, the difference ε sky cloudy between the reconstructed data T re Remote Sens. 2018, 10, 1112 8 of 18 and true LST data under the clouds is also an important component of ε re .Suppose that all errors ε are uncorrelated and have a zero mean and that ε sky cloudy can be written as where D(•) represents variance.
(2) Temporal Consistency: The stochastic characteristics of reconstruction errors can decrease the temporal consistency between the reconstructed data and the ground-based LST data.The correlation coefficient is typically used to evaluate the prediction accuracy of temporal changes: where Cov(•) and D(•) indicate covariance and variance, respectively.
(3) Spatial Structure: In addition to temporal variability, spatial heterogeneity is a significant inherent characteristic of land surface variables.Thus, it is necessary to evaluate the spatial structure of reconstructed LST data.Moran's I is generally used to quantifiably evaluate the spatial autocorrelation of the variable: where T i is the attribution value for feature i. w ij represents the spatial weight between features i and j and is estimated using the inverse distance between T i and T j .n is equal to the total number of features.
(4) Reconstruction Rate The reconstruction rate is an index to evaluate algorithm performance.The reconstruction rate is defined as follows: where N missing is the number of the missing pixels and N re is the number of reconstructed pixels.

Selection of Predictor Data
If the correlation coefficient between both temporal variables is equal to 1, when using one as a predictor variable to estimate the other, the prediction error will be 0.However, a complete correlation is almost impossible, and then temporal bias between the predictor and target variables affects prediction accuracy.Due to the use of multi-temporal LST data as predictor data, bias between both sets of temporal data observed at different local times is obvious.Figure 3 shows the influence of the combination of correlation coefficient and absolute bias on the prediction accuracy, which is recorded using a temporal MODIS LST observation at one pixel to estimate those at other pixels via Equation (5).With the decrease of the correlation coefficient and the growth of bias between LST data, the prediction errors gradually increase.Thus, it is necessary to optimally select multi-temporal LST data as predictor variables to decrease prediction uncertainty.
Remote Sens. 2017, 10, x FOR PEER REVIEW 9 of 18 between the Aqua LST data and themselves, showing an average correlation coefficient of 0.98 at night and 0.86 during the day and an average bias of 2.2 K at night and 6.0 K during the day.Thus, the Aqua LST data that are not affected by clouds are the best predictor data.According to Figure 3, if the available Aqua data are used to predict the missing pixels, the average prediction errors are 1.6 K at night and 3.9 K during the day.However, the reconstruction rate is low if only using the available Aqua LST data at the reconstructed time.Without substantially increasing the prediction errors, the multi-temporal LST data can also be taken as predictor data.For the nighttime Aqua data, the correlation and difference with nighttime Terra data show the following average correlation coefficient and bias, representing 0.96 and 2.7 K, respectively, which leads to an average prediction error of 2.3 K.For daytime Aqua data, the average correlation coefficients with MODIS LST data at three other times are approximately 0.82, but the daytime Terra data have the smaller bias of 6.7 K, and the average prediction error is approximately 4.3 K.
Through above analysis, in addition to the Aqua data that are not contaminated by clouds, the nighttime Terra data one day ahead may be introduced as predictor data to reconstruct nighttime Aqua data, and daytime Terra data may be used to reconstruct daytime Aqua data.In MODIS LST data reconstruction, Aqua LST products are taken as the experiment data to perform the reconstruction algorithm.Based on the above analysis, the suitable predictor data need to be selected for the reconstruction of Aqua LST products.Figures 4 and 5 show analyses of correlation and bias, respectively, between the LST data at the reconstructed time and themselves and the LST data at other times.The highest average correlation and smallest average bias occur between the Aqua LST data and themselves, showing an average correlation coefficient of 0.98 at night and 0.86 during the day and an average bias of 2.2 K at night and 6.0 K during the day.Thus, the Aqua LST data that are not affected by clouds are the best predictor data.According to Figure 3, if the available Aqua data are used to predict the missing pixels, the average prediction errors are 1.6 K at night and 3.9 K during the day.between the Aqua LST data and themselves, showing an average correlation coefficient of 0.98 at night and 0.86 during the day and an average bias of 2.2 K at night and 6.0 K during the day.Thus, the Aqua LST data that are not affected by clouds are the best predictor data.According to Figure 3, if the available Aqua data are used to predict the missing pixels, the average prediction errors are 1.6 K at night and 3.9 K during the day.However, the reconstruction rate is low if only using the available Aqua LST data at the reconstructed time.Without substantially increasing the prediction errors, the multi-temporal LST data can also be taken as predictor data.For the nighttime Aqua data, the correlation and difference with nighttime Terra data show the following average correlation coefficient and bias, representing 0.96 and 2.7 K, respectively, which leads to an average prediction error of 2.3 K.For daytime Aqua

Reconstruction of MODIS Aqua LST Data
Figure 6 shows the monthly reconstruction rates of MODIS Aqua LST data.The average percentage of reconstructed pixels is 92.8% at night and 81.7% during the day.The average reconstruction rates in spring and winter are higher than those in summer and autumn.Depending on the reconstruction algorithm, the missing MODIS LST data can be completely filled as long as the predictor data exist.Thus, a high missing rate does not imply a low reconstruction rate (e.g., missing pixels accounting for more than 90% of daytime Aqua LST data in December are recovered to a degree of 96.7%).Even if MODIS LST data are completely affected by clouds, the absence of pixel values is still estimated by introducing other available temporal LST data.MODIS Aqua LST products have a number of data with 100% loss, specifically, 67 at night and 121 during the day, and 42 nighttime and 66 daytime empty data are reconstructed, indicating that multi-temporal data can be used to increase the reconstruction rate.

Spatial Characteristics of Reconstructed Data
The examples of the reconstructed Aqua LST data with different missing rates in space are shown in Figure 7. Strong spatial continuity generally exists between the known and reconstructed However, the reconstruction rate is low if only using the available Aqua LST data at the reconstructed time.Without substantially increasing the prediction errors, the multi-temporal LST data can also be taken as predictor data.For the nighttime Aqua data, the correlation and difference with nighttime Terra data show the following average correlation coefficient and bias, representing 0.96 and 2.7 K, respectively, which leads to an average prediction error of 2.3 K.For daytime Aqua data, the average correlation coefficients with MODIS LST data at three other times are approximately 0.82, but the daytime Terra data have the smaller bias of 6.7 K, and the average prediction error is approximately 4.3 K.
Through above analysis, in addition to the Aqua data that are not contaminated by clouds, the nighttime Terra data one day ahead may be introduced as predictor data to reconstruct nighttime Aqua data, and daytime Terra data may be used to reconstruct daytime Aqua data.

Reconstruction of MODIS Aqua LST Data
Figure 6 shows the monthly reconstruction rates of MODIS Aqua LST data.The average percentage of reconstructed pixels is 92.8% at night and 81.7% during the day.The average reconstruction rates in spring and winter are higher than those in summer and autumn.Depending on the reconstruction algorithm, the missing MODIS LST data can be completely filled as long as the predictor data exist.Thus, a high missing rate does not imply a low reconstruction rate (e.g., missing pixels accounting for more than 90% of daytime Aqua LST data in December are recovered to a degree of 96.7%).Even if MODIS LST data are completely affected by clouds, the absence of pixel values is still estimated by introducing other available temporal LST data.MODIS Aqua LST products have a number of data with 100% loss, specifically, 67 at night and 121 during the day, and 42 nighttime and 66 daytime empty data are reconstructed, indicating that multi-temporal data can be used to increase the reconstruction rate.
missing pixels accounting for more than 90% of daytime Aqua LST data in December are recovered to a degree of 96.7%).Even if MODIS LST data are completely affected by clouds, the absence of pixel values is still estimated by introducing other available temporal LST data.MODIS Aqua LST products have a number of data with 100% loss, specifically, 67 at night and 121 during the day, and 42 nighttime and 66 daytime empty data are reconstructed, indicating that multi-temporal data can be used to increase the reconstruction rate.

Spatial Characteristics of Reconstructed Data
The examples of the reconstructed Aqua LST data with different missing rates in space are shown in Figure 7. Strong spatial continuity generally exists between the known and reconstructed

Spatial Characteristics of Reconstructed Data
The examples of the reconstructed Aqua LST data with different missing rates in space are shown in Figure 7. Strong spatial continuity generally exists between the known and reconstructed data, and the renewed nighttime data show better continuity.Near the northeast boundary at coordinate (100.4 • , 38.2 • ) and around coordinate (100.2 • , 38.1 • ), the reconstructed daytime data exhibit spatial randomness caused by the stronger temporal heterogeneity of the daytime data (see Figure 5) and severe data loss (see Figure 2) representing a lack of prior knowledge of the target variable T m,t j in Equation ( 5) at those two locations.), the reconstructed daytime data exhibit spatial randomness caused by the stronger temporal heterogeneity of the daytime data (see Figure 5) and severe data loss (see Figure 2) representing a lack of prior knowledge of the target variable  , mt j in Equation ( 5) at those two locations.The LST variable exhibits spatial autocorrelation [32], which can be quantified by Moran's I index.The closer the value of Moran's I is to 1, the stronger the spatial autocorrelation.To detect the ability of the reconstruction algorithm to recover spatial structure, the differences in the Moran's I index of monthly data between the reconstructed and known data were calculated, as shown in The LST variable exhibits spatial autocorrelation [32], which can be quantified by Moran's I index.The closer the value of Moran's I is to 1, the stronger the spatial autocorrelation.To detect the ability of the reconstruction algorithm to recover spatial structure, the differences in the Moran's I index of monthly data between the reconstructed and known data were calculated, as shown in Figure 8.The average absolute differences are 0.02 at night and 0.04 during the day, meaning that the spatial characteristics of reconstructed nighttime data are closer to the known data.The average spatial autocorrelations of the known nighttime and daytime data are separately quantified as 0.93 and 0.86, respectively, using Moran's I index, indicating that the LST variable has stronger randomness during the day than at night (especially the daytime data in June, July, and August, with an average Moran's I value of 0.77).Spatial randomness increases the difficulty of recovering spatial characteristics.In addition to the weak spatial autocorrelation, severe data loss is a significant factor resulting in missing data that cannot be accurately reconstructed in terms of spatial structure because it is challenging to capture spatial randomness when employing few data.
Figure 8.The average absolute differences are 0.02 at night and 0.04 during the day, meaning that the spatial characteristics of reconstructed nighttime data are closer to the known data.The average spatial autocorrelations of the known nighttime and daytime data are separately quantified as 0.93 and 0.86, respectively, using Moran's I index, indicating that the LST variable has stronger randomness during the day than at night (especially the daytime data in June, July, and August, with an average Moran's I value of 0.77).Spatial randomness increases the difficulty of recovering spatial characteristics.In addition to the weak spatial autocorrelation, severe data loss is a significant factor resulting in missing data that cannot be accurately reconstructed in terms of spatial structure because it is challenging to capture spatial randomness when employing few data.

Validation Using Ground-Based Observations
The upwelling and downwelling longwave radiation observations at six ground-based stations (see Table 2) are transformed into LST values using Equation ( 2) to validate the reconstructed results.The comparisons between ground-based observations and reconstructed Aqua LST data are shown in Figures 9 and 10, for the nighttime and daytime, respectively.The missing data affected

Validation Using Ground-Based Observations
The upwelling and downwelling longwave radiation observations at six ground-based stations (see Table 2) are transformed into LST values using Equation (2) to validate the reconstructed results.The comparisons between ground-based observations and reconstructed Aqua LST data are shown in Figures 9 and 10, for the nighttime and daytime, respectively.The missing data affected by clouds are estimated under clear sky conditions, but the ground-based observations used to validate the reconstructed data represent values under clouds.Thus, the fitted line of the reconstructed nighttime data (red dotted line) is located below the fitted line of known data (blue dotted line) due to the higher LST under clouds than under clear sky caused by the downwelling longwave radiation of clouds at night.In contrast, the reconstructed daytime data values are higher than the ground-based LST data because the solar radiation reaching the ground is reduced by clouds, as indicated by the fitted line of the reconstructed data lying above the fitted line of the known data.
The detailed accuracy information of reconstructed data is shown in Table 3.The goal of the algorithm is to reconstruct cloud-free LST data; thus, the reconstruction error ε re calculated using cloudy ground-based LST data cannot exactly represent the achievement of the goal.As shown in Equation ( 8), ε re comprises three parts: ε product , ε sky cloudy , and ε modeling .The unchangeable product error ε product makes a significant contribution to ε re , and the average percentage of ε product to ε re is 54.3% at night and 44.2% during the day.The modeling error ε modeling can be taken as an indirect measure of the accuracy of reconstructed LST data under clear sky.The smaller ε modeling is, the better ε sky cloudy expresses the differences in the LST values between cloudy and cloud-free conditions.In this case, the average ε modeling is approximately 1.4 K and accounts for only 5.0% and 4.0% of ε re at night and during the day, respectively.Thus, ε sky cloudy can indicate the average temporal differences between cloudy and cloud-free LST values, which is reliable information to further recover LST values under clouds.However, it is more challenging to recover daytime cloudy LST values due to the stronger spatial heterogeneity of ε sky cloudy during daytime, with considerable differences existing among the six stations.In addition, the correlation coefficient r re calculated using Equation ( 13) is adapted to measure the temporal consistency between ground-based observations and reconstructed data.r re is close to the reference value r known representing the correlation coefficient between ground-based observations and the known Aqua LST data.The average difference between r known and r re is 0.05 at night and 0.03 during the day, indicating that the reconstructed data can accurately maintain temporal consistency with the ground validation data.In addition to the reconstruction accuracy, the reconstruction rate is an important index of the performance of the reconstruction algorithm.The average missing percentages of time series at the pixels where the six stations are located are 47.7% at night and 76.4% during the day, which corresponds to Figure 2. The average reconstruction rates of MODIS Aqua at these six pixels exceed 73.0%, and the average reconstruction rate of nighttime data exceeds 90%.
Remote Sens. 2017, 10, x FOR PEER REVIEW 13 of 18 re  at night and during the day, respectively.Thus,  sky cloudy can indicate the average temporal differences between cloudy and cloud-free LST values, which is reliable information to further recover LST values under clouds.However, it is more challenging to recover daytime cloudy LST values due to the stronger spatial heterogeneity of  sky cloudy during daytime, with considerable differences existing among the six stations.

Comparisons among Reconstruction Results Based on Different Predictor Data
To demonstrate the influence of different predictor data on reconstruction accuracy, five cases are designed using different data sources as predictor variables: the available MODIS LST data at the reconstructed time (case A), the optimal combination of multi-temporal MODIS LST data analyzed in Section 4.1 (case B), the combination of all temporal MODIS LST data (case C), TB data (case D), and the combination of NDVI and Terrain data (case E).Each case is performed by replacing temporal data vectors in matrix T m,t k n(d,t k ) of Equation ( 6) using its own predictor data except in the NDVI-terrain-based case, which is implemented by establishing the matrix composed of spatial data vectors at the reconstructed time.The evaluation indices of the five cases are shown in Figure 11.For case D, adopting TB data, only the accuracies of the temporal changes and spatial structure of the reconstructed daytime data are acceptable.Other indices are not ideal because the high correlation proven by previous research between LST and TB is not shown in this study area, which may be caused by the complex terrain.For case E, the correlation coefficient is high depending on the temporal consistency between NDVI and LST, but the combination of NDVI and terrain data cannot accurately represent the spatial heterogeneity of LST leading to a large modeling error and Moran's I values far from the reference values.In addition, the reconstruction rates in case E are also unsatisfactory.For cases employing MODIS LST data as predictor variables, the differences in the correlation coefficient and spatial structure among reconstruction results are slight.Case A, adopting the known MODIS LST data at the reconstructed time as the predictor data, has the lowest modeling errors due to this case having the highest temporal correlation and lowest temporal bias between the predictor and reconstructed data (see Figures 4 and 5), but it has low reconstruction rates.To improve the reconstruction rate, cases B and C introduce multi-temporal LST data.For case B, in addition to the known data at the reconstructed time, the predictor data at the nearest time to reconstructed data are adopted without seriously increasing the modeling error, and the accuracies on the correlation coefficient and spatial structure are consistent with those of case A. Case C employs all temporal LST data, increasing the reconstruction rate.However, relative to cases A and B, the reconstruction accuracies decrease, especially for the reconstructed daytime data.Overall, case B advocated in this paper provides a favorable balance between reconstruction accuracy and reconstruction rate.
Remote Sens. 2017, 10, x FOR PEER REVIEW 15 of 18 reconstruction rates.To improve the reconstruction rate, cases B and C introduce multi-temporal LST data.For case B, in addition to the known data at the reconstructed time, the predictor data at the nearest time to reconstructed data are adopted without seriously increasing the modeling error, and the accuracies on the correlation coefficient and spatial structure are consistent with those of case A. Case C employs all temporal LST data, increasing the reconstruction rate.However, relative to cases A and B, the reconstruction accuracies decrease, especially for the reconstructed daytime data.Overall, case B advocated in this paper provides a favorable balance between reconstruction accuracy and reconstruction rate.

Conclusions
MODIS LST products are one of the most important remote sensing LST observations.However, due to clouds, remotely sensed LST data contain a large number of missing values.It is very important to reconstruct MODIS LST data to perform large-scale and long-term LST application studies.
In this paper, the reconstruction goal is to obtain cloud-free LST values, and the recovery accuracy is decided by the correlation between predictor and reconstructed data.Relative to other predictor data, e.g., BT, NDVI and terrain data, the available MODIS LST data that are not affected by clouds have a higher correlation with the reconstructed data due to the autocorrelation characteristics of the land surface variable.Thus, using the available MODIS LST data as predictor data is the most reliable approach.

Conclusions
MODIS LST products are one of the most important remote sensing LST observations.However, due to clouds, remotely sensed LST data contain a large number of missing values.It is very important to reconstruct MODIS LST data to perform large-scale and long-term LST application studies.
In this paper, the reconstruction goal is to obtain cloud-free LST values, and the recovery accuracy is decided by the correlation between predictor and reconstructed data.Relative to other predictor data, e.g., BT, NDVI and terrain data, the available MODIS LST data that are not affected by clouds have a higher correlation with the reconstructed data due to the autocorrelation characteristics of the land surface variable.Thus, using the available MODIS LST data as predictor data is the most reliable approach.
Considering the high correlation between time series MODIS LST data at different spatial pixels and times, a spatio-temporal algorithm employing multi-temporal MODIS LST data is adopted to reconstruct MODIS LST data.When the available LST data at the reconstructed time are taken as the predictor data, the algorithm shows satisfactory performance except for the reconstruction rate.When the extra LST data that are nearest in time to the reconstructed data are introduced, the reconstruction rate increases without a severe decrease in reconstruction accuracy; this approach is thus advocated as the optimal case.Although the reconstruction rate can be further improved by adopting all temporal LST data, the overall reconstruction accuracies significantly decrease due to the low correlation and high bias of the time series between nighttime and daytime data.
Although the real LST under clouds cannot be recovered based on the reconstruction algorithm, the differences between the cloudy and cloud-free LST can be estimated by the error decomposition as long as the modeling accuracy is high enough, which can provide important evidence to further estimate ground truth values.However, the differences between the LST under clouds and clear sky show stronger spatial heterogeneity during the day than at night, which increases the challenge of estimating real LST data under clouds.
BT data have the potential to obtain cloudy LST values due to microwaves penetrating through cloud cover and the strong correlation between LST and BT.In this paper, however, the high correlation does not appear, which may be caused by the complex terrain in the study area.In future work, we plan to use the reconstruction algorithm to obtain cloudy LST data by merging multiple sources of remotely sensed microwave data in flat areas.

Figure 1 .
Figure 1.Study area and locations of ground-based stations.

Figure 1 .
Figure 1.Study area and locations of ground-based stations.

Figure 2 .
Figure 2. Spatial and temporal distribution of missing moderate resolution imaging spectroradiometer (MODIS) Aqua land surface temperature (LST) data.

Figure 2 .
Figure 2. Spatial and temporal distribution of missing moderate resolution imaging spectroradiometer (MODIS) Aqua land surface temperature (LST) data.

Figure 3 .
Figure 3. Influence of bias and correlation between the temporal predictor and target LST data on the prediction accuracy.

Figure 4 .
Figure 4. Correlation at different directions and distances between MODIS Aqua LST data at the reconstructed time and themselves and MODIS LST data at other times.

Figure 3 .
Figure 3. Influence of bias and correlation between the temporal predictor and target LST data on the prediction accuracy.

Figure 3 .
Figure 3. Influence of bias and correlation between the temporal predictor and target LST data on the prediction accuracy.

Figure 4 .
Figure 4. Correlation at different directions and distances between MODIS Aqua LST data at the reconstructed time and themselves and MODIS LST data at other times.

Figure 4 .
Figure 4. Correlation at different directions and distances between MODIS Aqua LST data at the reconstructed time and themselves and MODIS LST data at other times.

Figure 5 .
Figure 5. Average temporal absolute bias between one pixel from the MODIS Aqua LST data at the reconstructed time and all pixels from themselves and MODIS LST data at other times.

Figure 6 .
Figure 6.Monthly reconstruction rates of MODIS Aqua LST data.

Figure 5 .
Figure 5. Average temporal absolute bias between one pixel from the MODIS Aqua LST data at the reconstructed time and all pixels from themselves and MODIS LST data at other times.

Figure 6 .
Figure 6.Monthly reconstruction rates of MODIS Aqua LST data.

Figure 6 .
Figure 6.Monthly reconstruction rates of MODIS Aqua LST data.

Figure 7 .
Figure 7. Examples of reconstructed MODIS Aqua LST data with different missing rates in space.The reconstructed data are within the red polygon.

Figure 7 .
Figure 7. Examples of reconstructed MODIS Aqua LST data with different missing rates in space.The reconstructed data are within the red polygon.

Figure 8 .
Figure 8.The differences in monthly Moran's I index between the known and reconstructed data.

Figure 8 .
Figure 8.The differences in monthly Moran's I index between the known and reconstructed data.

Figure 9 .
Figure 9. Scatter diagrams between the reconstructed data and ground-based observations at night.

Figure 10 .
Figure 10.Scatter diagrams between the reconstructed data and ground-based observations during the day.

Figure 9 .
Figure 9. Scatter diagrams between the reconstructed data and ground-based observations at night.

Figure 9 .
Figure 9. Scatter diagrams between the reconstructed data and ground-based observations at night.

Figure 10 .
Figure 10.Scatter diagrams between the reconstructed data and ground-based observations during the day.

Figure 10 .
Figure 10.Scatter diagrams between the reconstructed data and ground-based observations during the day.

Figure 11 .
Figure 11.Comparisons among five reconstruction cases employing different predictor data.

Table 1 .
Remote sensing data information.

Table 2 .
Ground-based station information.
t 4 ) i is the available MODIS LST value situated at the ith pixel and observed at local time t k on the dth day.In eight directions with equal intervals from 0 • to 360 • , the nearest T

Table 3 .
Evaluations of the reconstructed data.