An Effective Method for Generating Spatiotemporally Continuous 30 m Vegetation Products

: Leaf area index (LAI) and normalized difference vegetation index (NDVI) are key parameters for various applications. However, due to sensor tradeoff and cloud contaminations, these data are often temporally intermittent and spatially discontinuous. To address the discontinuities, this study proposed a method based on spectral matching of 30 m discontinuous values from Landsat data and 500 m temporally continuous values from Moderate-resolution Imaging Spectroradiometer (MODIS) data. Experiments have proven that the proposed method can effectively yield spatiotemporally continuous vegetation products at 30 m spatial resolution. The results for three different study areas with NDVI and LAI showed that the method performs well in restoring the time series, ﬁlls in the missing data, and reasonably predicts the images. Remarkably, the proposed method could address the issue when no cloud-free data pairs are available close to the prediction date, because of the temporal information “borrowed” from coarser resolution data. Hence, the proposed method can make better use of partially obscured images. The reconstructed spatiotemporally continuous data have great potential for monitoring vegetation, agriculture, and environmental


Introduction
Landsat satellites are one of the most popular remote sensing (RS) data sources used for studying the characteristics of the Earth's surface [1][2][3], such as land cover monitoring [4][5][6], land surface temperature estimation [7,8], urban heat island studies [9,10], agriculture drought [11], wetlands [12], and glacier mapping [13]. However, the tradeoff between spatial and temporal resolution results in images deficiency at most places each year. Moreover, the effects of clouds, cloud shadows, and other "noises" make the problem worse [14]. The spatial and temporal discontinuity of satellite data has constrained many RS applications, such as crop mapping [15][16][17] and forest monitoring [18][19][20]. Although an ever-increasing number of satellites have been launched in recent decades [21] and an unprecedented wealth of remote sensing data has become accessible [22], meeting the commands of local or regional studies with single satellite sensors is still challenging, for which may need more details.
The time series of leaf area index (LAI) and normalized difference vegetation index (NDVI) (the two parameters are referred to as "VI" in this article) are important indicators of vegetation properties and ecosystem conditions, and provide indispensable tools for monitoring vegetation dynamics [23] in climate and ecosystem research [24]. At a global scale, frequent observations from coarse-to moderate-resolution satellite sensors make it possible to monitor land surface and vegetation changes in terrestrial ecosystems [25]. However, the coarse spatial resolution is not suitable for detailed studies. Currently available products for VI suffer from the same problem. For example, current LAI products such as Moderate-resolution Imaging Spectroradiometer (MODIS) [26], Global Land Surface Satellite (GLASS) [27,28], and Carbon Cycle and Change in Land Observational Products from an Ensemble of Satellites (CYCLOPES) [29] are produced at coarse spatial resolution, thus the pixels are usually mixed and not useful for local or regional studies [30]. In addition, cloud effects and bad weather conditions often caused the incompleteness of VI products at either scale. Therefore, efforts to develop spatially and temporally continuous and high-resolution products are urgently needed.
Fortunately, the advent and development of spatiotemporal fusion (STF) methods [22,[31][32][33] have efficiently combined fine and coarse spatial resolution data, and synthesized the advantages of each. Blending Landsat imagery with MODIS imagery is a common approach of STF [34,35], which can derive images with 30 m spatial resolution and daily (even sub-day, depending on the temporal frequency of the MODIS data) temporal resolution. However, owing to the limitations of data availability, STF methods still face challenges. Most importantly, the requirement of STF methods is collecting at least one pair of fine and coarse clear images at adjacent days as input data [34]. For fine-resolution images such as Landsat, no clear images are typically available during cloudy seasons in some locations. In addition, obtaining clear images at two dates for MODIS data also are also not guaranteed [4].
A good approach to the aforementioned problem is to simplify it. RS scenes are abstractions of real features on the ground and are composed of discrete objects that are arranged in a mosaic on continuous background [36]. From this perspective, the reflectance received by the sensor is the most important parameter, which is the product of the spatial structure of the scene. In other words, the relationship between the size of the objects and the spatial resolution is fundamental, which consists of the spatial pattern of the observed view. Therefore, in our study, we attempt to take a time series from a single fine-resolution pixel as an objective and find a counterpart in coarse-resolution images, expecting that the "found" pixel will have a similar spatial structure in the coarser view. The similarity was assessed by time series similarity measurement methods [37], which are usually carried out between a target and reference series.
In this study, the overriding objective is to obtain high spatiotemporal resolution VI time series using a spectral similarity matching (SIM) approach. Aimed at finding a similar time series, a main scheme using correlation measure and a backup scheme using shape measure were applied. Notably, the method is especially effective when there are no neighboring fine-resolution images as data input, which is a challenge for the usual STF methods. The SIM method was tested and evaluated by experiments in three study areas with different biome classes. In the following text, we introduce the methods in Section 2, describe data and study areas in Section 3, show the experiment results in Section 4, discuss the advantages and problems of SIM in Section 5, and conclude at the last section.

SIM Method
The proposed SIM method estimates the VI images using a search-and-match process. Intrinsically, the basis of SIM is to find a similar counterpart time series at coarse resolution, despite the geographic location. Unlike STF methods, the SIM requires inputs of the VI time series from different scales (excluding the fine image at the prediction date). Figure 1 shows a flowchart of the proposed SIM method. The first step was to determine if the VI time series from a fine scale (TS1) is representative of the study period. At least five points are important to composite the vegetation growth curve during one growing season, including the start and end of the foliage season, two inflection points, and the peak value [15]. Therefore, the threshold M of effective value numbers was set to six. If the available pixel values were less than M, the method sought the surrogate time series from neighboring pixels within the same class. Classification map was used in this case, which was classified using the clear fine-resolution images within each study period. Then, the closest neighbor from the same class was chosen and a linear relationship was built up between this neighbor and the current time series. The incomplete time series TS1 was filled up using the neighboring information.
The second step of the SIM is the preselection of MODIS pixels in order to speed up the searching process. A dynamic time warping (DTW)-based k-means clustering method [38,39] was applied to MODIS time series. Compared with the ordinary k-means algorithm, DTW-based k-means measures the similarity between time series more flexibly [39]. After clustering, MODIS time series were classified to multiple classes, each of which was represented by a centroid. The derived centroids then were directly compared with TS1 and determined which classes TS1 belongs to. The process of preselection accelerated the whole searching process significantly. Subsequently, the highlight and third step of the SIM was measuring similarities of time series from fine and coarse resolution and determining if the "found" time series is satisfactory. In theory, the search-and-match process will succeed if the search range is sufficiently large. However, unlimited extension of the search window is not feasible. In this study, a workable solution was to search the candidate coarse pixels around the neighboring area. The main scheme was to calculate the correlation coefficient r between time series as the criterion. To ensure efficiency and accuracy, an additional calculation of the Spearman rank correlation (SP) was evaluated. Two empirical thresholds, TH1 = 0.9 and TH2 = 0.8, were set to statistically illustrate that the measured time series were highly related (r > TH1), and their variations were mostly similar (SP > TH2). If these requirements were not met, a backup scheme based on Fourier component similarity measure (FCSM) was performed. The most proximate time series (with the smallest distance (D)) from coarse resolution according to FCSM was chosen. Then, a relationship was established between the found and target time series. The predictions can be derived with the help of established relationships and frequent coarse resolution data. Finally, as long as the coarse-resolution data were continuous and of high quality, a continuous VI time series at fine resolution was yielded by all the steps mentioned above. The second step of the SIM is the preselection of MODIS pixels in order to speed up the searching process. A dynamic time warping (DTW)-based k-means clustering method [38,39] was applied to MODIS time series. Compared with the ordinary k-means algorithm, DTW-based k-means measures the similarity between time series more flexibly [39]. After clustering, MODIS time series were classified to multiple classes, each of which was represented by a centroid. The derived centroids then were directly compared with TS1 and determined which classes TS1 belongs to. The process of preselection accelerated the whole searching process significantly.
Subsequently, the highlight and third step of the SIM was measuring similarities of time series from fine and coarse resolution and determining if the "found" time series is satisfactory. In theory, the search-and-match process will succeed if the search range is sufficiently large. However, unlimited extension of the search window is not feasible. In this study, a workable solution was to search the candidate coarse pixels around the neighboring area. The main scheme was to calculate the correlation coefficient r between time series as the criterion. To ensure efficiency and accuracy, an additional calculation of the Spearman rank correlation (SP) was evaluated. Two empirical thresholds, TH1 = 0.9 and TH2 = 0.8, were set to statistically illustrate that the measured time series were highly related (r > TH1), and their variations were mostly similar (SP > TH2). If these requirements were not met, a backup scheme based on Fourier component similarity measure (FCSM) was performed. The most proximate time series (with the smallest distance (D)) from coarse resolution according to FCSM was chosen. Then, a relationship was established between the found and target time series. The predictions can be derived with the help of established relationships and frequent coarse resolution data. Finally, as long as the coarse-resolution data were continuous and of high quality, a continuous VI time series at fine resolution was yielded by all the steps mentioned above.
Landsat VI images with a 16-day temporal frequency are usually used for long-term vegetation analysis [16]. However, long revisit time and frequent contamination, resulting in insufficient image numbers, are the main challenges for short-term vegetation studies. The temporal trajectories are dominated by covered biome types. For example, crops and forests usually have higher vegetation indices than grass in early spring, the crops will have lower VI values after harvest, and similar variations are found in some regions between Remote Sens. 2021, 13, 719 4 of 20 crops and grasslands. Trajectory features are identified as characteristics for different class types, and are thus often applied to classifications [37,40]. Similar methods have been used for spectral curve matching by calculating the linear correlation coefficient between a test spectrum and a reference spectrum.
Two schemes are employed as the main and backup, respectively, for time series similarity measure. The main scheme is to calculate the correlation coefficient between two time series across scales in order to determine whether a linear relationship exists. Then, if the main algorithm fails, a back-up algorithm is triggered using FCSM, which is performed at frequency domain and avoids the effects of noises. The details of the two schemes are described in Sections 2.1.2 and 2.1.3, respectively.

Selection of Candidate MODIS Pixels
The theoretical basis of the method is pixel-by-pixel searching, but it is time-consuming to search a temporally similar time series from MODIS data for each fine-scale pixel. Hence, it is helpful to filter and select candidate MODIS pixels in advance.
In this study, a DTW-based k-means clustering method was applied to MODIS time series. The number of clustering classes depends on the available amount of coarseresolution time series and the computation efficiency of the computer. The clustering process is as follows:

2.
For each time series, calculate the DTW distance and cluster to the closest centroid.

3.
Average all the time series for each clustering class and generate new k centroids based on the averages.

4.
Repeat steps 2 and 3 until steady centroids are derived.

Correlation Measurement
Although the Landsat time series are usually incomplete, if the scarce observations of Landsat and the complete MODIS time series are parallel on most of the observed days, it is very likely that the two time series from different scales will have similar changes. For one thing, the search in the study is within a neighboring range, meaning that the composition of land cover types (e.g., the species of trees and crops) as well as the influence of climates are usually very similar. Moreover, the variations of multiple vegetation types are different during the growing season, which is the criterion for feature classification [37].
A variety of approaches for time series measurement have been studied [37], and the correlation coefficient used as the similarity measure for time series is common in the remote sensing field, for example, the similarity measurement between time series as the criterion for crop type classification [41] or land cover type changes [42]. The most straightforward approach is the correlation measure method, among which the Pearson's cross-correlation coefficient (also referred to as Pearson's r) [43] is the most commonly used. It measures the linear relationship between the time series following Formula (1): where TS X and TS Y represent two time series, and µ T* and σ T* are the mean and standard variation of the time series TS * , respectively.

Fourier Component-Based Shape Similarity Measure
For a specified Landsat pixel, a temporally similar MODIS time series can be difficult to find because the search area is limited and data may be influenced by noise. Hence, a broader definition of similarity is necessary, and a more flexible measurement of the similarity between time series is applied following FSCM [44]. Fourier transformation is one of the approaches [37] used to assess the similarity of time series. It is superior for periodic signals, which is beneficial to vegetation dynamics. FSCM is a Fourier-based approach that focuses on the shape similarity of a time series. A distance measure D is calculated using the first three harmonics in the frequency domain after the Fourier transformation of the time series. D is expected to be close to zero if two time series have similar shapes, and increases with the differences between the time series. Detailed formulas of FSCM can be found at [37,44].

STF Methods
Past decades have witnessed the rapid development of STF methods. Numerous methods have been developed, with applications across interdisciplinary fields [22,32,33]. Five categories of STF methods have been described in previous literatures: weight functionbased, unmixing-based, learning-based, Bayesian-based, and hybrid methods [22,45].
Among the various STF methods, three were chosen in this study to compare the fusion results with the proposed approach. The first STF method is the spatial and temporal adaptive reflectance fusion model (STARFM)-the original and most widely used STF method [34,46,47]. It is based on the principle that changes in a fine-resolution pixel can be derived from a corresponding pure coarse-resolution pixel because the observations taken at the same time should be the same (ignoring the system errors), even at different scales. Neighboring pixels are combined by assigning weights, which are determined by spectral, spatial, and temporal differences.
Although STARFM can accurately predict pure and unmixed pixels, it is less advantageous when applied to heterogeneous areas. An enhanced STARFM (ESTARFM) has been developed [35] to address this insufficiency by incorporating unmixing theory and changing rate. ESTARFM assumes that the land cover conditions are constant over a short time period, and a conversion coefficient is used to relate coarse-and fine-resolution images. This method requires at least two pairs of images as base dates, whereas STARFM used only a single image pair at one base date.
The third method is the flexible spatiotemporal data fusion (FSDAF)-a hybrid STF method characterized as effective for predicting abrupt changes, and compensating for deficiencies of current STF methods [45]. FSDAF is a hybrid method of combining weightfunction and unmixing theory, in which temporal change is calculated according to spectral linearly unmixing theory and spatial prediction is made by thin plate spline interpolation. Neighboring pixels are involved, similar to STARFM, but the weights are calculated from a homogeneity index of each coarse pixel.

Quantitative Evaluation
In each study area, all available Landsat images are used as input to obtain the best prediction results. However, in terms of quantitative evaluation, fine-resolution data at the prediction day should not be involved. Therefore, "artificial" missing and predicting experiments were performed as cross-validation, dropping one day's fine-resolution data and estimating it with the remaining data. For example, in the central South Dakota, United States, (CSD) area, the prediction at date of year (DOY) 191 used 14 Landsat images from DOY 143 to DOY 271 and excludes only DOY 191. After prediction, statistical indices were evaluated. The predictions were compared with the actual observed NDVI or derived LAI images and results derived from popular STF methods (i.e., STARFM, ESTARFM, and FSDAF).
The first index is the mean absolute error (MAE), which is the mean of the difference between the predicted and true values. The second index is the root-mean-square error (RMSE), defined as Formula (2), which measures the average squared difference between the estimated and the actual values. where L andL denote the observed and predicted values. R represents the pixel rows, and C the columns. A better prediction result will produce smaller MAE and RMSE indices. Pearson's correlation coefficient (r) is the third evaluation index (Section 2.1.2), used for accessing the correlation between real and predicted images, where a larger r means a more similar prediction result. In addition, a structural similarity (SSIM) measure [48] is applied between the input and output image pairs. This index measures the similarity of spatial patterns, with a larger SSIM indicating the patterns are more likely to show resemblance.

Study Areas
To illustrate the general usefulness of SIM for various vegetation types and conditions, experiments were conducted over three study areas containing different classes of biome. A subset of each study area consisting of 500 × 500 Landsat pixels was used for direct comparison. All the experiments were conducted over a period of one growing season to explore the efficiency of similarities other than cyclic seasonality characteristics in the time series.
The first study area (44 • 32 N, 100 • 17 W) is located in central South Dakota, United States (CSD), and is covered by various-shaped patches of croplands ( Figure 2a). According to CropScape (https://nassgeodata.gmu.edu/CropScape/), the main types of crops here include corn, wheat, and sunflowers. The second study area (39 • 29.4 N, 120 • 49.8 W) is located at the northeast of Rocky Mountains (hereafter referred to as "NRM"). The area is covered mainly by evergreen needleleaf forests and intersected with shrub lands (Figure 2b). The third area (32 • 15 N, 110 • 27 E) selected is located in central China (hereafter referred to as "CCN"). This location is heterogeneous, covered by mixed forests, including evergreen, deciduous broadleaf, coniferous forests, and farmlands. The area has frequent cloud cover due to subtropical monsoon climate and high elevation.
Remote Sens. 2021, 13, x FOR PEER REVIEW 6 of 21 The first index is the mean absolute error (MAE), which is the mean of the difference between the predicted and true values. The second index is the root-mean-square error (RMSE), defined as Formula (2), which measures the average squared difference between the estimated and the actual values.
, (2) where L and L̂ denote the observed and predicted values. R represents the pixel rows, and C the columns. A better prediction result will produce smaller MAE and RMSE indices. Pearson's correlation coefficient (r) is the third evaluation index (Section 2.1.2), used for accessing the correlation between real and predicted images, where a larger r means a more similar prediction result. In addition, a structural similarity (SSIM) measure [48] is applied between the input and output image pairs. This index measures the similarity of spatial patterns, with a larger SSIM indicating the patterns are more likely to show resemblance.

Study Areas
To illustrate the general usefulness of SIM for various vegetation types and conditions, experiments were conducted over three study areas containing different classes of biome. A subset of each study area consisting of 500 × 500 Landsat pixels was used for direct comparison. All the experiments were conducted over a period of one growing season to explore the efficiency of similarities other than cyclic seasonality characteristics in the time series.
The first study area (44°32′N, 100°17′W) is located in central South Dakota, United States (CSD), and is covered by various-shaped patches of croplands ( Figure 2a). According to CropScape (https://nassgeodata.gmu.edu/CropScape/), the main types of crops here include corn, wheat, and sunflowers. The second study area (39°29.4′N, 120°49.8′W) is located at the northeast of Rocky Mountains (hereafter referred to as "NRM"). The area is covered mainly by evergreen needleleaf forests and intersected with shrub lands ( Figure  2b). The third area (32°15′ N, 110°27′E) selected is located in central China (hereafter referred to as "CCN"). This location is heterogeneous, covered by mixed forests, including evergreen, deciduous broadleaf, coniferous forests, and farmlands. The area has frequent cloud cover due to subtropical monsoon climate and high elevation.

Data and Preprocessing
In this study, all Landsat-7 Enhanced Thematic Mapper (ETM+) and Landsat-8 Operational Land Imager (OLI) NDVI images were acquired through the U.S. Geological Survey (USGS) (https://espa.cr.usgs.gov/ordering/new/) website. For the CSD area, a total of 15 Landsat NDVI images were collected during the growing season in 2013; for the NRM area, a total of 27 NDVI images from 2016 were collected, but only a third of them are complete and clear.
Frequent observations are crucial for a continuous VI time series, especially when the vegetation conditions change rapidly during the growing season. Therefore, instead of using existing NDVI products (with 16-day temporal resolution), the MODIS nadir Bidirectional Reflectance Distribution Function (BRDF)-adjusted reflectance (NBAR) product (MCD43A4) allowed NDVI to be calculated through the corresponding red Remote Sens. 2021, 13, 719 7 of 20 (0.620-0.670 µm) and near infrared (0.841-0.876 µm) bands with daily temporal resolution. MODIS data were download from the USGS land processes distributed active archive center (https://lpdaac.usgs.gov (accessed on 20 January 2021)). Continuous NDVI trajectories from MODIS during the growing season were collected at the CSD and NRM areas.
Landsat LAI data used in this study were produced by Wuhan University and followed studies by Jin et al. [49,50]. They used a machine learning method to integrate multisource data from fields, simulation data, and Landsat observations to derive 30 m resolution LAI dynamics. Due to the limitations of the production of Landsat LAI data (only Landsat-8 data were produced), only eight LAI images in 2013 were collected at the CSD area; and a total of 30 LAI images were collected at the CCN area in 2015, and only a third of them are complete and clear.
MODIS and GLASS LAI products were used to provide LAI temporal dynamics for this study. The MODIS/Terra+Aqua Leaf Area Index product (MOD15A3H, collection 6) is composited with 4-day temporal resolution and 500 m spatial resolution imagery. LAI retrieval includes a combination of main and backup algorithms [26]. The main algorithm relies on the inversion of a 3D radiative model, and if it fails, the backup algorithm estimates LAI through the empirical LAI-NDVI relationship. Because of the cloud effects in the CCN area, the MODIS LAI products are temporally intermittent and spatially discontinuous, thus are not reliable to use as a data source. Therefore, GLASS LAI products, one of the GLASS products suite [27,28,[51][52][53], were used instead in the CCN area. This LAI dataset also has a 500 m spatial resolution but an 8-day temporal resolution.
For this study, only the best-quality values were kept, while fill value and others were labeled as "no data". MODIS LAI pixels with the best quality assurance rating and retrieved by the main algorithm were used because the main algorithm usually obtains higher data quality than the backup [54]. Before the experiment, all coarse-resolution images were first reprojected into the same Universal Transverse Mercator (UTM) projection with Landsat data. Although validations of products were performed [55], the data quality of MODIS products varies when influenced by clouds or other conditions. Missing values and blanks are inevitable after the removal of poor-quality pixels, according to the quality flags provided by the products. Therefore, in this study, the MODIS time series which has more than 80% of the values valid was further processed with a Savitzky-Golay filter [56] to ensure the continuity and the basic trend of the time series.

NDVI and LAI Results in the CSD Area
The images in Figure 3 show the original NDVI images (the upper image of each pair) and the SIM results (the lower image of each pair) in the CSD area. Although a total of 15 Landsat NDVI images were collected during the growing season (date of year (DOY) 119 to DOY 271), most of the images suffer from the blanks and gaps after the cloud removal. The proposed SIM method facilitated the filling of the Landsat data, and demonstrated the reasonable evolvement of NDVI dynamics over time. The spatial patterns remained nearly the same before (upper image) and after (lower image) the SIM process. The images were restored even with only a few pixels available at some days (e.g., DOY 247). From the calculated SSIM between the input and output image pairs ( Table 1), most of the data pairs have extremely high structural similarity, which proves that the consistency remains after SIM.
the reasonable evolvement of NDVI dynamics over time. The spatial patterns remained nearly the same before (upper image) and after (lower image) the SIM process. The images were restored even with only a few pixels available at some days (e.g., DOY 247). From the calculated SSIM between the input and output image pairs ( Table 1), most of the data pairs have extremely high structural similarity, which proves that the consistency remains after SIM.   Figure 3 used all available Landsat data as input to obtain the best prediction results. In addition, quantitative experiments were implemented in this area to evaluate the strength of prediction. The metrics of the prediction results from the validation process are displayed in Table 2 and Figure 4. From the figure, MAEs are lower than 0.08, RMSEs are lower than 0.1, and the correlation coefficient r almost reaches 1.0, excluding DOY 143. The possible reasons for the low accuracy of the first day may include (1) only a small proportion of pixel values are usable for comparison; (2) a very low value range was present at DOY 143, thus, large differences with the other images existed; and (3) these compromise the regression between scales, which results in a better degree of fit further along in the process. In this case, if denser images had been available around the start day, the prediction of that day would have been better. The statistics also illustrate the fact that SIM is capable of predicting missing values from the coarse-resolution time series.  The results shown in Figure 3 used all available Landsat data as input to obtain the best prediction results. In addition, quantitative experiments were implemented in this area to evaluate the strength of prediction. The metrics of the prediction results from the validation process are displayed in Table 2 and Figure 4. From the figure, MAEs are lower than 0.08, RMSEs are lower than 0.1, and the correlation coefficient r almost reaches 1.0, excluding DOY 143. The possible reasons for the low accuracy of the first day may include (1) only a small proportion of pixel values are usable for comparison; (2) a very low value range was present at DOY 143, thus, large differences with the other images existed; and (3) these compromise the regression between scales, which results in a better degree of fit further along in the process. In this case, if denser images had been available around the start day, the prediction of that day would have been better. The statistics also illustrate the fact that SIM is capable of predicting missing values from the coarse-resolution time series.    Similarly, images for input and output LAI data at the CSD area are shown in Figure  5. Although sparser temporal images (only eight Landsat LAI images) were used, the proposed SIM method still worked effectively, filling in the missing data, and showed a reasonable evolvement of LAI dynamics over time ( Figure 5). The missing data were reconstructed after SIM and maintained high comparability with the original images and showed no significant changes. The SSIM between the image pairs was calculated (Table  3) as well, and the agreement between the input and output images was good, except the start day. Possible reasons are discussed in previous NDVI experiments.

• LAI results
Similarly, images for input and output LAI data at the CSD area are shown in Figure 5. Although sparser temporal images (only eight Landsat LAI images) were used, the proposed SIM method still worked effectively, filling in the missing data, and showed a reasonable evolvement of LAI dynamics over time ( Figure 5). The missing data were reconstructed after SIM and maintained high comparability with the original images and showed no significant changes. The SSIM between the image pairs was calculated (Table 3) as well, and the agreement between the input and output images was good, except the start day. Possible reasons are discussed in previous NDVI experiments.    Quantitative evaluations were also implemented on LAI data, similar to the NDVI experiments. Eight missing-and-estimating experiments were performed in total. The quantitative evaluation statistics are provided in Figure 6 and Table 4. The average MAE was approximately 0.7, the average RMSE was lower than 1.0, and correlation coefficient r was about 0.8 for most days, excluding DOY 143. From Figure 5, the vegetation status at DOY 191 was in transition, which was a significant turning point that may explain the lower accuracy of DOY 191.  Quantitative evaluations were also implemented on LAI data, similar to the NDVI experiments. Eight missing-and-estimating experiments were performed in total. The quantitative evaluation statistics are provided in Figure 6 and Table 4. The average MAE was approximately 0.7, the average RMSE was lower than 1.0, and correlation coefficient r was about 0.8 for most days, excluding DOY 143. From Figure 5, the vegetation status at DOY 191 was in transition, which was a significant turning point that may explain the lower accuracy of DOY 191.    Figure 7) showed the strength of the SIM method. From Figure 7, the tones of the predicted images by four methods are similar, but the details differ. Notably, the SIM result has the closest resemblance to the actual NDVI image (images with "actual" title), and STARFM and FSDAF have some blurs in details. These two STF methods rely on only one day data (clear image from DOY 175 was taken as the base date), hence, the shape change of the croplands is not correctly detected. However, the SIM can produce exact and clear boundaries and the correct pattern. The ESTARFM method also used image from DOY 207 as the second base date, which is the reason for the shadows of the ESTARFM result. An example is shown in the zoomed-in images (Figure 7, Panel B), which better illustrate the phenomenon. In addition, the ESTARFM result is influenced by clouds that came from the second base date because no clear images were available after DOY 191 until DOY 239. The accuracies of the results are shown in Table 5, which reveals that the SIM gets a higher accuracy than other STF methods with lower MAE, MSE, and RMSE, and higher value of r. For STF methods, the closest clear image from DOY 175 was taken as the base date. reason for the shadows of the ESTARFM result. An example is shown in the zoomed-in images (Figure 7, Panel B), which better illustrate the phenomenon. In addition, the ES-TARFM result is influenced by clouds that came from the second base date because no clear images were available after DOY 191 until DOY 239. The accuracies of the results are shown in Table 5, which reveals that the SIM gets a higher accuracy than other STF methods with lower MAE, MSE, and RMSE, and higher value of r. For STF methods, the closest clear image from DOY 175 was taken as the base date.  Figure 8 shows the LAI prediction results tested using the same methods as NDVI on the same day. It is difficult to implement the SIM for LAI here because only four complete images remain if the data of prediction day is artificially dropped. The lack of data causes an inaccurate composition of the time series at fine resolution, which would very likely mislead the search process at coarser resolution. Nevertheless, comparison experiments were conducted to DOY 191 with the remaining days as input (DOY 175, 239, 255, 271). From Figure 5, DOY 191 was in transitional period, which further complicates the issue. However, even though SIM was at a disadvantage, the derived result (Figure 8) was comparable to STF methods and visually similar to actual LAI images. Specifically, the shapes of the croplands in the SIM result were much more intact and clearer than the STF methods ( Figure 8, Panel B). In contrast, STF methods use images from DOY 175 (and DOY 239 for ESTARFM) as the data source, thus the transition status, which was hardly   Figure 8 shows the LAI prediction results tested using the same methods as NDVI on the same day. It is difficult to implement the SIM for LAI here because only four complete images remain if the data of prediction day is artificially dropped. The lack of data causes an inaccurate composition of the time series at fine resolution, which would very likely mislead the search process at coarser resolution. Nevertheless, comparison experiments were conducted to DOY 191 with the remaining days as input (DOY 175, 239, 255, 271). From Figure 5, DOY 191 was in transitional period, which further complicates the issue. However, even though SIM was at a disadvantage, the derived result ( Figure 8) was comparable to STF methods and visually similar to actual LAI images. Specifically, the shapes of the croplands in the SIM result were much more intact and clearer than the STF methods ( Figure 8, Panel B). In contrast, STF methods use images from DOY 175 (and DOY 239 for ESTARFM) as the data source, thus the transition status, which was hardly found in the corresponding MODIS data, confused the results to some extent. The accuracies of the LAI results are shown in Table 5, which proves the strength of SIM with lower MAE and RMSE, and higher value of r.

NDVI Results in the NRM Area
• NDVI Results Figure 9 shows the Landsat NDVI images collected (upper image of each pair) and the SIM results (lower image of each pair) at this area. The blanks and gaps in the upper images are also the results after the removal of clouds, cloud shadows, and filling values. However, the reconstruction of the SIM method is impressive, especially at the beginning of the time period during which nearly all valuable information is missing. The NDVI images predicted by SIM filled the missing values and restored spatial details. It is crucial for local or regional vegetation monitoring that the NDVI images have abundant details and clear patterns. The SSIM between the input and output image pairs was calculated and is shown in Table 6. Compared with the CSD area, the statistics in this area are not good enough; however, considering the proportion of available and complete images, the results are acceptable.
Remote Sens. 2021, 13, x FOR PEER REVIEW 12 of 21 found in the corresponding MODIS data, confused the results to some extent. The accuracies of the LAI results are shown in Table 5, which proves the strength of SIM with lower MAE and RMSE, and higher value of r.

NDVI Results in the NRM Area
• NDVI Results Figure 9 shows the Landsat NDVI images collected (upper image of each pair) and the SIM results (lower image of each pair) at this area. The blanks and gaps in the upper images are also the results after the removal of clouds, cloud shadows, and filling values. However, the reconstruction of the SIM method is impressive, especially at the beginning of the time period during which nearly all valuable information is missing. The NDVI images predicted by SIM filled the missing values and restored spatial details. It is crucial for local or regional vegetation monitoring that the NDVI images have abundant details and clear patterns. The SSIM between the input and output image pairs was calculated and is shown in Table 6. Compared with the CSD area, the statistics in this area are not good enough; however, considering the proportion of available and complete images, the results are acceptable.    A total of 27 quantitative experiments were conducted at this area. The evaluation results shown in Figure 10 and listed in Table 7 illustrate the importance of data frequency. The statistics at the earlier stage are worse than later, when complete and clear images are more frequent. For most days, the MAEs and RMSEs are less than or equal to 0.1, which is reasonable.  • Comparison with STF methods Due to data deficiency at the earlier stage of the study period, STF methods could not be implemented. SIM can improve the prediction outside the limits of STF when longterm data are missing (if the interval is longer than the requirement for the STF method). Instead, SIM relies on information from the coarse-resolution data that has frequent observations. This will be helpful for cloudy areas.
Overall comparison experiments were conducted at DOY 171 and 307 as examples to illustrate cases with and without adjacent, fine-resolution images, and results are showed in Figure 11. For DOY 171, the next visiting day DOY 179 was clear, providing the best  Due to data deficiency at the earlier stage of the study period, STF methods could not be implemented. SIM can improve the prediction outside the limits of STF when long-term data are missing (if the interval is longer than the requirement for the STF method). Instead, SIM relies on information from the coarse-resolution data that has frequent observations. This will be helpful for cloudy areas.
Overall comparison experiments were conducted at DOY 171 and 307 as examples to illustrate cases with and without adjacent, fine-resolution images, and results are showed in Figure 11. For DOY 171, the next visiting day DOY 179 was clear, providing the best condition for STF methods. Prior to DOY 171, no fine images are of value, thus only STARFM and FSDAF were tested (which required only one neighboring fine image). Table 8 shows the accuracies of comparable results from the three methods, indicating the effectiveness of SIM under this circumstance. In addition, SIM has slight improvements in predicting low values, considering scattered points are observed at STF results ( Figure 11).
Remote Sens. 2021, 13, x FOR PEER REVIEW 15 of 21 315 and 323) after that. Besides, the cloud and cloud effects are not fully removed at the "actual" NDVI image, which also contributes to the inaccuracy of the day. Figure 11. Actual and predicted NDVI images using different methods at the NRM area from DOY 171 and DOY 307.

LAI Results in the CCN Area
•

LAI Results
The input and output LAI images before and after SIM are shown in Figure 12. SIM results (the lower image of each pair) illustrate that temporal information "borrowed" from GLASS LAI time series helped to restore complete LAI dynamics. The variation of LAI after SIM follows the seasonal changes, slowly increasing at the onset of green up (i.e., the start of the season), reaching a peak during summer and gradually declining thereafter. Only slight clues could be derived if the original data were used, but the restored LAI data were capable of depicting the entire evolving process.
The SSIM between the input and output image pairs for the SIM are listed in Table 9 (only the days with more available pixels were kept). Most of the SSIM values were larger than 90%, indicating good agreement after SIM. During the time of foliage growth and development, the agreement was still high, but an exception occurred at DOY 239. Analysis on that day revealed the remaining cloud and cloud shadows were the main reason for disagreement. Figure 11. Actual and predicted NDVI images using different methods at the NRM area from DOY 171 and DOY 307. The results of DOY 307, by comparison, are less satisfactory. The closest and clear neighboring image is from DOY 275, which was used as the base data input (for ESTARFM, data from DOY 323 were used as the second base data). The right part of Figure 11 shows the predicted NDVI results of DOY 307. Although SIM appropriately avoids the clouds (shown as the dark blue area in Figure 11), the result is not sufficient because the prediction day is at the end of the study period and no reliable data are available (DOY 315 and 323) after that. Besides, the cloud and cloud effects are not fully removed at the "actual" NDVI image, which also contributes to the inaccuracy of the day.

• LAI Results
The input and output LAI images before and after SIM are shown in Figure 12. SIM results (the lower image of each pair) illustrate that temporal information "borrowed" from GLASS LAI time series helped to restore complete LAI dynamics. The variation of LAI after SIM follows the seasonal changes, slowly increasing at the onset of green up (i.e., the start of the season), reaching a peak during summer and gradually declining thereafter. Only slight clues could be derived if the original data were used, but the restored LAI data were capable of depicting the entire evolving process.  The quantitative evaluation results in this area are shown in Figure 13, and the statistics are provided in Table 10. Images at all input days were dropped and predicted in turn. The correlation coefficient r in this experiment was from about 0.6 to 0.9, which is lower than the results in the CSD and NRM areas. However, considering the heterogeneity and frequent cloud effects in this area, it can be concluded that the accuracy decreased reasonably. In contrast with previous research in this region [41], the accuracy is acceptable. The SSIM between the input and output image pairs for the SIM are listed in Table 9 (only the days with more available pixels were kept). Most of the SSIM values were larger than 90%, indicating good agreement after SIM. During the time of foliage growth and development, the agreement was still high, but an exception occurred at DOY 239. Analysis on that day revealed the remaining cloud and cloud shadows were the main reason for disagreement. The quantitative evaluation results in this area are shown in Figure 13, and the statistics are provided in Table 10. Images at all input days were dropped and predicted in turn. The correlation coefficient r in this experiment was from about 0.6 to 0.9, which is lower than the results in the CSD and NRM areas. However, considering the heterogeneity and frequent cloud effects in this area, it can be concluded that the accuracy decreased reasonably. In contrast with previous research in this region [41], the accuracy is acceptable.
The quantitative evaluation results in this area are shown in Figure 13, and the statistics are provided in Table 10. Images at all input days were dropped and predicted in turn. The correlation coefficient r in this experiment was from about 0.6 to 0.9, which is lower than the results in the CSD and NRM areas. However, considering the heterogeneity and frequent cloud effects in this area, it can be concluded that the accuracy decreased reasonably. In contrast with previous research in this region [41], the accuracy is acceptable.   Figure 14 shows the results obtained from the two days.   Figure 14 shows the results obtained from the two days.
For DOY 104, DOY 120 input was used as the base date (and also DOY 40 for ES-TARFM). Although there is one clear image on base date (DOY 120), the STF results for DOY 104 are not satisfactory. Only SIM and ESTARFM simulated the data with the correct tone, among which SIM has better accuracy (statistics shown in Table 11). In contrast, STARFM heavily blurred details, and although FSDAF result has better correlation with actual LAI image (r is largest among four methods), the MAE and RMSE are much larger than the SIM result. FSDAF depended more on the input fine image, resulting in overestimation of LAI.
The results from DOY 200 are a typical example of cloud contamination, and the base dates were DOY 120 and DOY 287 (for ESTARFM). For this day and all its neighboring days, the retrieved LAI images are partly cloudy. The predicted LAI values are inevitably affected by clouds, displaying as disordered shadows. One procedure is to remove the clouds first and fill the blanks by interpolation methods. Then, the repaired images can be used as a clear data source for the STF methods. However, this would result in a double accumulation of errors. Instead, the SIM addressed the problem by avoiding cloudy pixels when extracting time series, and using only continuous coarse-resolution data. Thus, the result for SIM at DOY 200 is the clearest and most similar to the actual LAI image.
In addition, Figure 14 displays MODIS LAI data on the two days. Apparently, MODIS data are underestimated due to the effects of clouds and also lack details and variations. The proposed SIM method innovatively breaks through these limits by seeking useful information from the coarse-resolution data.   For DOY 104, DOY 120 input was used as the base date (and also DOY 40 for ES-TARFM). Although there is one clear image on base date (DOY 120), the STF results for DOY 104 are not satisfactory. Only SIM and ESTARFM simulated the data with the correct tone, among which SIM has better accuracy (statistics shown in Table 11). In contrast, STARFM heavily blurred details, and although FSDAF result has better correlation with actual LAI image (r is largest among four methods), the MAE and RMSE are much larger than the SIM result. FSDAF depended more on the input fine image, resulting in overestimation of LAI. The results from DOY 200 are a typical example of cloud contamination, and the base dates were DOY 120 and DOY 287 (for ESTARFM). For this day and all its neighboring days, the retrieved LAI images are partly cloudy. The predicted LAI values are inevitably affected by clouds, displaying as disordered shadows. One procedure is to remove the clouds first and fill the blanks by interpolation methods. Then, the repaired images can be used as a clear data source for the STF methods. However, this would result in a double accumulation of errors. Instead, the SIM addressed the problem by avoiding cloudy pixels when extracting time series, and using only continuous coarse-resolution data. Thus, the result for SIM at DOY 200 is the clearest and most similar to the actual LAI image.
In addition, Figure 14 displays MODIS LAI data on the two days. Apparently, MODIS data are underestimated due to the effects of clouds and also lack details and variations. The proposed SIM method innovatively breaks through these limits by seeking useful information from the coarse-resolution data.

Temporal Profile of VI
The temporal profiles of NDVI/LAI were extracted from a representative Landsat pixel and the geographical corresponding MODIS pixel in each study area. In Figure 15, MODIS temporal profiles are shown in the upper row and actual Landsat data and the SIM results are shown in the lower row. Although the MODIS pixel is geographically close to the Landsat pixel, the temporal profile is totally different. Hence, the temporal information provided by corresponding MODIS pixels has little value for reconstructing a fine-resolution profile. Furthermore, the original Landsat profiles are scarce and discontinuous, thus it is difficult to restore the profiles. However, by "borrowing" temporal information from spectral similar profiles of MODIS, the SIM method achieved the goal, filling the missing values and following the original trend of Landsat data.

Temporal Profile of VI
The temporal profiles of NDVI/LAI were extracted from a representative Landsat pixel and the geographical corresponding MODIS pixel in each study area. In Figure 15, MODIS temporal profiles are shown in the upper row and actual Landsat data and the SIM results are shown in the lower row. Although the MODIS pixel is geographically close to the Landsat pixel, the temporal profile is totally different. Hence, the temporal information provided by corresponding MODIS pixels has little value for reconstructing a fineresolution profile. Furthermore, the original Landsat profiles are scarce and discontinuous, thus it is difficult to restore the profiles. However, by "borrowing" temporal information from spectral similar profiles of MODIS, the SIM method achieved the goal, filling the missing values and following the original trend of Landsat data. Figure 15. Temporal evolution of NDVI and LAI in one Moderate-resolution Imaging Spectroradiometer (MODIS)/Landsat pixel at three study areas.

Accuracy of Coarse-resolution Data
The performance of the SIM (as well as the STF methods) is strongly affected by the quality of the coarse-resolution data. Although MODIS data have been accurate at some locations, data reliability is still limited due to cloud contamination and other noises, especially for tropical or subtropical areas. Hence, the Savitzky-Golay filter was used in this

Accuracy of Coarse-Resolution Data
The performance of the SIM (as well as the STF methods) is strongly affected by the quality of the coarse-resolution data. Although MODIS data have been accurate at some locations, data reliability is still limited due to cloud contamination and other noises, especially for tropical or subtropical areas. Hence, the Savitzky-Golay filter was used in this study to obtain a continuous and smoothed VI time series. Therefore, GLASS LAI data was employed in the CCN area, owing to the advantages of spatial and temporal continuity, which could provide more reliable temporal information than MODIS.

Potentials and Limitations of SIM
The SIM is a special prediction method that utilizes information at two scales as well as spatial information from neighboring pixels. It breaks the barrier of geographical location and exploits the similarity relationship between time series, in contrast with STF methods, where one common feature involves utilizing information across scales. However, SIM is not limited to pixel-to-pixel correspondence while using similar time series across scales; and, different from single curve fitting methods, SIM followed the temporal changes borrowed from coarser images, which are the actual VI responses to environments from a larger view.
An obvious drawback of SIM is that it is easy to mistake abrupt changes as noises when measuring the similarity of the time series. Abrupt changes are most likely taken as outliers and are discarded when comparing similarities between time series. Similar things happen in the regression component, ensuring that most of the values linearly related may easily overlook lower values, which usually occur at the beginning and end of the study period.

Conclusions
Vegetation products (such as NDVI and LAI) with high spatial and temporal resolution are fundamentally important for characterizing vegetation changes and dynamics. However, the trade-off between spatial and temporal resolutions and frequent cloud cover often cause data discontinuity. An innovative approach based on spectral matching was proposed in this study to yield spatiotemporally continuous vegetation indices at 30 m resolution. Experiments at three different study areas with NDVI and LAI showed the proposed method performs well in restoring the VI time series, reconstructing the missing data, and predicting the images at 30 m resolution. In particular, it could address the lack of available clear images for STF due to temporal information "borrowed" from coarser data. The advantage of the proposed approach is the true vegetation changes and the true response of foliage to the external circumstances at local areas, other than speculating from sparsely-distributed, fine-resolution images. In addition, the proposed method could make better use of partly cloudy images at fine resolution, while these "imperfect" data are easily neglected with other methods. In addition, the full utilization of the cloudy data subsequently contributes to the reconstruction of vegetation time series.