A Framework of Spatio-Temporal Fusion Algorithm Selection for Landsat NDVI Time Series Construction

: Spatio-temporal fusion algorithms dramatically enhance the application of the Landsat time series. However, each spatio-temporal fusion algorithm has its pros and cons of heterogeneous land cover performance, the minimal number of input image pairs, and its e ﬃ ciency. This study aimed to answer: (1) how to determine the adaptability of the spatio-temporal fusion algorithm for predicting images in prediction date and (2) whether the Landsat normalized di ﬀ erence vegetation index (NDVI) time series would beneﬁt from the interpolation with images fused from multiple spatio-temporal fusion algorithms. Thus, we supposed a linear relationship existed between the fusion accuracy and spatial and temporal variance. Taking the Spatial and Temporal Adaptive Reﬂectance Fusion Model (STARFM) and the Enhanced STARFM (ESTARFM) as basic algorithms, a framework was designed to screen a spatio-temporal fusion algorithm for the Landsat NDVI time series construction. The screening rule was designed by ﬁtting the linear relationship between the spatial and temporal variance and fusion algorithm accuracy, and then the ﬁtted relationship was combined with the graded accuracy selecting rule (R 2 ) to select the fusion algorithm. The results indicated that the constructed Landsat NDVI time series by this paper proposed framework exhibited the highest overall accuracy (88.18%), and lowest omission (1.82%) and commission errors (10.00%) in land cover change detection compared with the moderate resolution imaging spectroradiometer (MODIS) NDVI time series and the NDVI time series constructed by a single STARFM or ESTARFM. Phenological stability analysis demonstrated that the Landsat NDVI time series established by multiple spatio-temporal algorithms could e ﬀ ectively avoid phenological ﬂuctuations in the time series constructed by a single fusion algorithm. We believe that this framework can help improve the quality of the Landsat NDVI time series and fulﬁll the gap between near real-time environmental monitoring mandates and data-scarcity reality.


Introduction
Landsat images recording the Earth's surface status since 1972 are irreplaceable in terrestrial ecosystem dynamics monitoring and biosphere processes modeling [1,2]. The Landsat normalized difference vegetation index (NDVI) acts on behalf of the land surface's greenness. Due to its high spatial resolution (30 m long-term studies. As a result, the Landsat NDVI time series plays an indispensable role in classification accuracy improvement, phenological measurement, and long-term land cover change monitoring [3][4][5][6]. However, limited by the 16-day revisit time, frequent cloud contamination, and 22% data loss of the Enhanced Thematic Mapper Plus (ETM+) sensor since 2003, there is a dearth of dual high resolution (spatial and temporal resolution) Landsat NDVI time series [7][8][9][10]. These missing observation data have caused the Landsat time series to fail to catch land cover change events or to extract important phenological nodes [5,11]. Thus, missing image reconstruction is expected to solve the mentioned problems [12][13][14]. Spatio-temporal essentially focuses on downscaling spatially coarse images to spatially fine images. It aims to produce high temporal and high spatial resolution images for any missing data by taking full advantage of the complementary information of images with a frequently observed but coarse spatial resolution (referring to as coarse images) and the images with an infrequently observed but fine spatial resolution (referring to as fine images). Unlike interpretation methods that depend on the input number of homologous and available Landsat reference images, spatio-temporal fusion is flexible in input reference images and even performs well when just using one neighboring observed image [15,16]. Furthermore, the introduction of temporal information from coarse images further enhances the dynamic depiction of land cover. Currently, more than 50 spatio-temporal fusion algorithms have been proposed. Technically, these algorithms were generally categorized into five groups: the weight function-based, unmixing-based, learning-based, Bayesian-based, and hybrid fusion methods. The weight function-based methods, based on the weight function to integrate the spectral, temporal, and spatial contribution of reference images, are the most broadly used methods [8,[17][18][19]. The unmixing-based methods unmix the coarse image into a fine-scale image based on linear spectral mixing theory [12,20,21]. Learning-based methods have been recently developed, but are promising, which predict fine images by the relationship between the coarse images and fine images learned by machine learning or deep learning technology [22][23][24]. The Bayesian-based methods use Bayes theory to estimate the fine image [25]. The hybrid fusion methods integrate two or more theories of the above-mentioned methods to realize image fusion [13].
Although spatio-temporal fusion methods have been booming, methods for directly producing Landsat NDVI time series based on all available Landsat images and coarse image time series are still in shortage [13,16,26]. Presently, most of the spatio-temporal fusion-based Landsat NDVI time series for land surface monitoring majorly utilize the strategy of filling the missing data by a single image fusion method [11,27]. However, different fusion algorithms have different pros and cons in image fusion. For example, the Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM) is feasible and efficient for image fusion as it only needs one pair of coarse and fine reference images. But it has limited performance in heterogonous land cover area [17]. While the Enhanced STARFM (ESTARFM) performs better in heterogonous land cover areas, it needs at least two pairs of coarse and fine reference images and is time-consuming [8]. Thus, the following decisions should be clear when fusing a single missing image for constructing the Landsat NDVI time series: (1) which fusion algorithm should be chosen for blending the missing image; and (2) whether the performance of the Landsat NDVI time series would be improved by employing fusion images generated from multiple fusion algorithms.
Therefore, the spatio-temporal variance, which quantitated the spatial and temporal change between the dates of a reference image and missing image, was introduced to support the spatio-temporal fusion algorithm selection for Landsat NDVI time series construction. We supposed a linear relationship existed between the fusion accuracy and spatial and temporal variance. Then, a framework was proposed for Landsat NDVI time series construction with different fusion algorithms by taking the STARFM and the ESTARFM as base algorithms due to their wide application, their performance in heterogeneous image fusion, and the minimum required number of input image pairs. Subsequently, the effectiveness of the screen rule in the proposed framework was verified. Moreover, the performance of the multi-algorithm-based Landsat NDVI time series in land cover change monitoring and phenological stability was demonstrated by comparing to the single-algorithm-based Landsat NDVI time series. The remainder of this study was organized as follows: Section 2 described the theoretical basis and the workflow of Landsat NDVI time series construction; Section 3 presented the experimentation and effectiveness analysis; Section 4 discussed the advantages, uncertainties, and perspectives; and Section 5 drew conclusions.

Methodology
The spatio-temporal fusion algorithm aims to generate high spatial and temporal resolution images, which takes the fine images and coarse images in the base date and coarse images in the prediction date as inputs to predict fine images in the prediction date ( Figure 1) [8,9,28]. However, each fusion algorithm has its pros and cons under different fusion situations. Hence, it is reasonable to assume that the performance of various spatio-temporal fusion algorithms would be different under different spatial and temporal conditions. Thus, the spatial and temporal variance was introduced to quantify the spatial and temporal difference between coarse images in the base and prediction date. The screening rule was constructed by first fitting the linear relationship between the spatial and temporal variance and fusion algorithm accuracy, and then the fitted relationship was combined with the graded accuracy selecting rule to select the fusion algorithm. Thus, the screened algorithm for a specific prediction date was complemented to predict the missing Landsat image. Then, those predicted Landsat images were utilized to calculate the NDVI for constructing the Landsat NDVI time series. Furthermore, the screening rule's performance was verified by the effectiveness of algorithm selection. The multiple-algorithm-based Landsat NDVI time series was assessed by the effectiveness of land cover change detection, and phenological extraction. It is worth noting that the STARFM and ESTARFM were selected as the basic algorithm to test the proposed framework due to their complementary characteristics of performance in homogenous or heterogeneous land cover, one pair or two pairs reference image, and efficiency. STARFM can yield satisfactory fusion results in the stage of non-linear land cover change as well as in homogenous regions [17], which assumed that the residual errors between the coarse image and fine image were immutable and caused by land cover and land use changes, the wavelength characteristics between two sensors, and the illumination conditions. Thus, the  STARFM can yield satisfactory fusion results in the stage of non-linear land cover change as well as in homogenous regions [17], which assumed that the residual errors between the coarse image and fine image were immutable and caused by land cover and land use changes, the wavelength characteristics between two sensors, and the illumination conditions. Thus, the difference between the fine image and coarse image in the base and prediction date should be equivalent. Given that neighboring similar pixels would have similar reflectance, the STARFM was modified by combining the weight function, where the weight function is integrated spatial distance, spectral difference, and temporal difference [17]. The implementation of the STARFM can be summarized as first preparing one pair of fine and coarse images in the base date and one coarse image in the prediction date, then finding the similar neighbor pixels within a moving window, and finally predicting the fine image in the prediction date. STARFM can be defined as follows: where x w/2 , y w/2 is the center location of the moving window, where the moving window (w) was set as a 31-pixel width. More parameter settings can refer to [12,17]. L and M are the reflectance of the fine image and coarse image, respectively. n is the number of reference images, hence k is from 1 to n. t p and t k is the date of the fusion and reference images, respectively. W ijk refers to the weight coefficient determined by the spatial distance, spectral similarity, and time difference, and is as follows: where S, T, and D indicate the spectral similarity, time difference, and spatial distance between the fine and coarse images.

The Enhanced STARFM (ESTARFM)
Unlike STARFM, ESTARFM is based on at least two pairs of coarse and fine reference images, which assume a linear land cover change between these two reference dates [8]. Thus, the linear conversion coefficient (V) between reference images should be calculated to represent the spectral change slope [8]. Like the STARFM, the weight function is utilized to integrate the spectral difference, time difference, and spatial distance of each neighboring similar pixel. Although ESTARFM has been enhanced, its fusion accuracy is affected by reference image quality and quantity, and its linear hypothesis [29,30]. ESTARFM can be defined as follows: where L and M are the reflectance of fine image and coarse image; t is the time; t p and t k are the dates of the fusion and reference images, respectively. The N is the number of similar pixels in the moving windows, where the size of the moving window was set as 31-pixel width in this paper. More parameter settings refer to [8]. W is the weight coefficient.

Spatial and Temporal Variance
The spatial and temporal variance partitions the variance into space and time [31,32]. These two variances represent the spatial and temporal differences of land cover caused by phenological changes or human activities. Mean spatial variance emphasizes the degree of surface spatial heterogeneity, while mean temporal variance reflects the temporal dynamics of objects. Therefore, the spatial ISPRS Int. J. Geo-Inf. 2020, 9, 665 5 of 21 and temporal variance provides a new way of qualifying the difference in fusion algorithm images. The partition of the time-first and space-first formulation can respectively be defined as follows: where δ g 2 is the spatial and temporal variance across space and time, while m and n represent m scenes in time and n pixels in space, respectively (m = 2 for STARFM and m = 3 for ESTARFM). δ 2 s and δ 2 t is the spatial variance and temporal variance, respectively. δ t 2 (µ) and δ s 2 (µ) represent the temporal variance of spatial means and spatial variance of time means, respectively.
Here, we assumed that the fusion accuracy was linear change with the temporal and spatial variance. Thus, a linear relationship between fusion accuracy and temporal and spatial variances was established. Therefore, a quantitative comparison of different spatiotemporal fusion algorithms with the distinct date and reference images could be carried out. To simplify the model and make the screen rule robust, we used a graded accuracy instead of the direct comparison of the accuracy value of spatio-temporal fusion algorithm for algorithm selection. Moreover, the coefficient of determination (R 2 ) ranging from 0 to 1 was used as the evaluation index of fusion accuracy. Thus, fusion accuracy was divided into different grades including high accuracy (1 ≥ R 2 > 0.8), general high accuracy (0.8 ≥ R 2 > 0.7), middle accuracy (0.7 ≥ R 2 > 0.6), and low accuracy (0.6 ≥ R 2 ). During the screening process, the fusion algorithm with a higher fusion accuracy grade was selected as the suitable image fusion algorithm for the prediction date image. This rule even worked under some unexpected situations such as the prediction R 2 exceeded one. Moreover, if STARFM and ESTARFM have the same fusion accuracy grade, both algorithms are suitable for image prediction. The screening rule can be defined as follows: where R 2 is the coefficient of determination, and a, b, and c are the regression coefficients. δ 2 s and δ 2 t are the mean spatial variance and mean temporal variance, while subscript i is the i-grade of fusion accuracy, and subscripts lower and upper represent the lower and upper limitations of R 2 , respectively.

Performance Assessment
The performance of the framework was assessed from three aspects including the effectiveness of the screening rule, the effectiveness of Landsat NDVI time series in land cover change detection, and the effectiveness of phenological extraction. First, the R 2 values between the NDVI from the observation image and fusion image from STARFM or ESTARFM were respectively calculated to assess the screening rule's effectiveness. The higher the R 2 graded level was, the less the prediction uncertainty of the spatio-temporal fusion algorithm would be. Second, the Breaks for Additive Seasonal and Trend (BFAST) monitor, which can be conducted with the R package (https://CRAN.R-project.org/package=bfast) was used to detect the land cover change, which includes seasonal change, gradual change, and abrupt change [33]. Technically, abrupt change, which generally shows the conversion of land cover type within the short term, was employed in this paper [34]. The default parameter setting for the BFAST monitor can refer to [33,35]. As a typical classification accuracy index, the overall accuracy, omission error, and commission error of change detection were referred to as the accuracy indicators. In this paper, overall accuracy was the proportion of correctly prediction samples to total samples in which the total samples include land cover change samples and unchanged samples. The error of omission referred to the ratio of samples omitted from the land cover changed samples to total samples. Commission error was defined as the ratio of incorrectly unchanged samples to total samples. Third, the mean absolute difference (MAD), referred to as the evaluation index, was used to represent the effectiveness of phenological node extraction, where the phenological nodes included the start of the growing season (SOS) and the end of the growing season (EOS), which was extracted by the dynamic threshold method [36]. In the dynamic threshold method, a single growing season cycle was commonly suggested in China's semi-arid region. The SOS and EOS are technically defined as the date when the NDVI increase and decrease to the seasonal amplitude of the left and right edge in the Savitzky-Golay (S-G) filtered NDVI annual curve [37]. The seasonal amplitude was determined by the relative threshold of 0.5, as suggested by previous studies [36,38]. The BFAST monitor can simply be described as first fitting the historical stability period NDVI time series by the season-trend model, and then detecting the land cover change based on the moving sums (MOSUMs) of the residuals in the motoring period, where the season-trend model can be described as follows: where y t is the prediction value of NDVI in time t; and α 1 and α 2 represent the intercept and slope. γ j and δ j represent the amplitude and phases for the jth season cycle. ε t is the error term at time t. If the MOSUMs significantly deviate from zero, the land cover conversion took place. The MOSUMs can be described as follows: where h is the bandwidth of the MOSUMs. More details on the parameters of the BFAST monitor can be found in [33,39].

Experimentation and Effectiveness Analysis
The framework was implemented in the Shendong coal mining area ( Figure 2). To construct the dual high-resolution Landsat NDVI time series for land cover monitoring, the NDVI extracted from all available Landsat images ( Figure 3) was treated by maximum-value composites with a 16-day interval. Dates, without a synthetic image or synthetic images contaminated area (cloud, cloud shadow, and snow) higher than 50%, were screened as the prediction dates. These fine images in prediction dates were blended by the spatio-temporal fusion algorithm selected based on the screening rule. The screening rule in the framework was first determined by the linear fitted relationship between the spatial and temporal variance and spatio-temporal fusion algorithm accuracy (Table 2), and then the fitted relationship was combined with the graded accuracy selecting rule (R2) to select the fusion algorithm. Specifically, the fitting samples employed the spatio-temporal fusion results in 2005 ( Figure 4). A 16-day interval Landsat NDVI time series was constructed by interpreting the fusion results instead of images in the prediction dates ( Figure 5). After fusing the images in the prediction dates, the performance of the framework was conducted ( mining pit, waste occupancy area, buildings, roads, water, forest, deserted forest land, mixed forests, shrubs, grassland and deserted grassland, desert, and farmland.

Datasets and Preprocessing
A total of 450 level-1 terrain-corrected Landsat images with cloud coverage less than 70% (from 2000 to 2016; Figure 3) including Thematic Mapper (TM), ETM+ and Operational Land Imager (OLI) products (WRS-2 path/row: 127/33) were download from the U.S. Geological Survey (USGS) of the Earth Resources Observation and Science (EROS) Center. The Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) [45] and Landsat 8 Surface Reflectance (L8SR) [46] were used to generated surface reflectance products of TM, ETM+, and OLI. The cloud, cloud shadow, and snow were masked from Landsat images using the Fmask algorithm [47]. Due to narrower spectral bands and improved radiometric resolution and signal-to-noise ratio, the surface reflectance product of OLI was significantly improved when compared with TM/ETM+ [48]. Therefore, taking ETM+ as the reference, relative normalization [49] was employed to reduce the surface reflectance inconsistencies between ETM+ and OLI. Furthermore, Landsat L1T level images reached a sub-pixel level, thus, Landsat did not need further geometric correction [50]. The MODIS MCD43A4 Version 5 Nadir Bidirectional Reflectance Distribution Function (BRDF)-Adjusted Reflectance (NBAR) dataset in tile h26/v05 was selected as the inputted coarse image for the spatio-temporal fusion algorithm [51] (https://ladsweb.modaps.eosdis.nasa.gov/). The MCD43A4 datasets were daily composite products according to a 16-day window, where each composite value is on behalf of the best BRDF possible [52]. Thus, the MODIS MCD43A4 datasets are broadly used as the coarse images for spatio-temporal fusion [44,52]. Moreover, due to its moderate spatial resolution (250 m) and high temporal resolution as well as its proven good performance in phenology, the terra moderate resolution imaging spectroradiometer (MODIS) vegetation indices (MOD13Q1) NDVI time series has been taken as a reference for phenology monitoring in many previous studies [53,54]. Thus, the phenological parameter extracted from the MOD13Q1 NDVI time series from 2000 to 2016 was employed as a base reference of the Landsat NDVI time series, where the phenological parameter was extracted based on the dynamic threshold method. To quantitate the relationship of the Earth's surface spatial and temporal changes with the fusion accuracy of the spatiotemporal fusion algorithm, the fusion results in 2005-with different time intervals by adjusting the reference images in the algorithm-were employed, and the historical fine images are represented in Figure 4. Furthermore, to compare the accuracy of near real-time change detection of land cover based on the NDVI time series constructed by different fusion strategies, five high-resolution images were selected as the reference for cross-validation samples ranging from 2012 to 2016 in the growing season (Table 1). All high-resolution images were processed with orthorectification, layer stacking, relative registration, fusion, and clipping.       Thus, the S-G filter was employed to reduce the noise caused by residual contamination area (cloud, cloud shadow, and snow) in raw Landsat [55], which was implemented by the TIMESAT with a 5-pixel-width filter window (http://web.nateko.lu.se/TIMESAT/timesat.asp).

Land Cover Change Detection and Phenological Stability Analysis
The effectiveness of a method is often related to its application purpose. Thus, further steps for near real-time monitoring were carried out by comparing the target Landsat NDVI time series (TTS) generated from the framework that this paper proposed with the MODIS NDVI time series (MTS) and NDVI time series using single algorithm fusion (STS: the STRAFM based Landsat NDVI time series, and ETS: the ESTRAFM based Landsat NDVI time series). BFAST Monitor [33] was selected as the change detection algorithm, and 110 change samples ranging from 2012 to 2016 were selected as validation samples by visual interpretation (using high-resolution images; Table 1). The overall accuracy, omission error, and commission error were calculated as the accuracy indicators. Figure 6 shows the accuracy results of different NDVI time series. The results indicated that TTS obtained the highest overall accuracy (88.18%), but also the lowest omission error (1.82%) and commission error (10.00%), while in turn, MTS achieved the lowest overall accuracy (10.00%) but the highest omission (19.09%) and commission error (70.91%). When compared with STS and ETS, respectively, the overall accuracy of TTS increased by 21.82% and 17.27%, the omission error decreased by 1.82% and 4.54%, and the commission error decreased by 20% and 12.73%. The accuracy demonstrated that the construction of the Landsat NDVI time series using multiple fusion algorithms with the framework could efficiently improve the application accuracy. On the other hand, to test the ability of TTS in phenological change, a phenological stability analysis was implemented using MAD as the evaluation index, and the phenological parameters of SOS and EOS were extracted by the dynamic threshold method [56]. Under the hypothesis that phenological features exhibited long-term stability where the year 2000 was referred to as the reference year, a lower MAD indicated a more stable phenology. The phenological stability results (Figure 8b) demonstrated that the TTS's SOS stability and EOS stability were lower than the STS, but higher than ETS. STS possessed the highest stability in SOS and EOS with MAD, respectively equal to six days and four days, while ETS obtained the lowest stability with MAD, respectively equal to seven days and nine days. These results indicated

Discussion
In this section, the advantages and adaptability of the framework proposed in this paper are discussed. Although this framework worked well in the construction of Landsat NDVI time series, there are still many uncertainties and challenges from the datasets and screening rules. Thus, we discuss the framework in detail, so that we can be more familiar with it and its potentials and shortages.

Advantages and Adaptability
As one of the most effective approaches to predict the missing high-resolution images for time

Experimental Area
The experimental area was located on the border of Shanxi Province and Inner Mongolia [40]. It is situated in a typical ecologically fragile area with a high intensity of human disturbance due to exploitation of the Shendong coal mining area (the largest field and a typical underground mining area in China). It is the transition zone between the Loess Plateau and the Mu Us Sandy Land [41]. This area has been gaining a lot of attention from ecologists, geographers, and surveyors in terms of analyzing ecosystem degradation [42], land cover change [34], and terrain deformation [43]. Moreover, compared with other ecosystems, the land cover in the coal mining area contains frequently abrupt and gradual change, and has an obvious season change, even in a small region [34,42]. Land cover change detection has always been the spatio-temporal fusion algorithm that designers care about [24,44]. Thus, this research selected a 689 × 590-pixel (30×30 m 2 ) area in the Shendong mining area as the experimental site (Figure 2), which contains multiple mining fields (including Majiata, Bulianta, Shangwan, Daliuta, Huojitu, and Zhugaita) and part reserved land. The experimental site range was from 39 • 13 57"N to 39 • 21 32"N, and 110 • 12 33"E to 110 • 22 52"E. This area has a plateau continental climate with arid, half-desert land. The perennial mean temperature is about 8.6 • C, the average yearly precipitation is about 380 mm, and the yearly evaporation ability is about 2113 mm. Compared with humid regions, vegetation cover in the study region is sparse and with an obvious single growing season. As a result of high-intensity mining and human activities, the landscape fragmentation of the land cover is high, and the majority of the land cover includes a mining pit, waste occupancy area, buildings, roads, water, forest, deserted forest land, mixed forests, shrubs, grassland and deserted grassland, desert, and farmland.

Datasets and Preprocessing
A total of 450 level-1 terrain-corrected Landsat images with cloud coverage less than 70% (from 2000 to 2016; Figure 3) including Thematic Mapper (TM), ETM+ and Operational Land Imager (OLI) products (WRS-2 path/row: 127/33) were download from the U.S. Geological Survey (USGS) of the Earth Resources Observation and Science (EROS) Center. The Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) [45] and Landsat 8 Surface Reflectance (L8SR) [46] were used to generated surface reflectance products of TM, ETM+, and OLI. The cloud, cloud shadow, and snow were masked from Landsat images using the Fmask algorithm [47]. Due to narrower spectral bands and improved radiometric resolution and signal-to-noise ratio, the surface reflectance product of OLI was significantly improved when compared with TM/ETM+ [48]. Therefore, taking ETM+ as the reference, relative normalization [49] was employed to reduce the surface reflectance inconsistencies between ETM+ and OLI. Furthermore, Landsat L1T level images reached a sub-pixel level, thus, Landsat did not need further geometric correction [50]. The MODIS MCD43A4 Version 5 Nadir Bidirectional Reflectance Distribution Function (BRDF)-Adjusted Reflectance (NBAR) dataset in tile h26/v05 was selected as the inputted coarse image for the spatio-temporal fusion algorithm [51] (https://ladsweb.modaps.eosdis.nasa.gov/). The MCD43A4 datasets were daily composite products according to a 16-day window, where each composite value is on behalf of the best BRDF possible [52]. Thus, the MODIS MCD43A4 datasets are broadly used as the coarse images for spatio-temporal fusion [44,52]. Moreover, due to its moderate spatial resolution (250 m) and high temporal resolution as well as its proven good performance in phenology, the terra moderate resolution imaging spectroradiometer (MODIS) vegetation indices (MOD13Q1) NDVI time series has been taken as a reference for phenology monitoring in many previous studies [53,54]. Thus, the phenological parameter extracted from the MOD13Q1 NDVI time series from 2000 to 2016 was employed as a base reference of the Landsat NDVI time series, where the phenological parameter was extracted based on the dynamic threshold method. To quantitate the relationship of the Earth's surface spatial and temporal changes with the fusion accuracy of the spatiotemporal fusion algorithm, the fusion results in 2005-with different time intervals by adjusting the reference images in the algorithm-were employed, and the historical fine images are represented in Figure 4. Furthermore, to compare the accuracy of near real-time change detection of land cover based on the NDVI time series constructed by different fusion strategies, five high-resolution images were selected as the reference for cross-validation samples ranging from 2012 to 2016 in the growing season (Table 1). All high-resolution images were processed with orthorectification, layer stacking, relative registration, fusion, and clipping.

Construction of Landsat Normalized Difference Vegetation Index (NDVI) Time Series
The MODIS prediction date and MODIS base date in 2005, as input images of the spatial and temporal variance, were used to express the image change with spatial and temporal variance. Additional experiments were carried out to increase the fitting samples by expanding the time interval between the fusion and reference images. Additionally, the samples were divided into growth and non-growing seasons due to the significant difference in fusion accuracy ( Table 2). The fitting results indicated that a linear relationship existed among the fusion algorithm accuracy and spatiotemporal variance. It is worth noting that to improve the robustness of high accuracy, over-fitting accuracy prediction in the discrimination function was allowed in the present study. If two suitable algorithms exist, STARFM will be selected due to its advantage, whereas if no suitable algorithm exists, the image will not be predicted. No suitable algorithm situations appeared when the predicted R 2 was lower than 0.6. As a result, 126 scenes in the study area were fused, among which ESTARFM fused 29 scenes, STARFM fused 91 scenes, and six scenes were not fused ( Figure 5). Moreover, we only predicted the images in the dates when lacking a synthetic image or synthetic images with a contamination area (cloud, cloud shadow, and snow) that was higher than 50%. As a result, the residual contamination area still existed in the interpreted Landsat NDVI time series. Thus, the S-G filter was employed to reduce the noise caused by residual contamination area (cloud, cloud shadow, and snow) in raw Landsat [55], which was implemented by the TIMESAT with a 5-pixel-width filter window (http://web.nateko.lu.se/TIMESAT/timesat.asp).

Effectiveness Analysis
In this section, the proposed framework's performance in algorithm selection and the effectiveness of the Landsat NDVI time series to detect land cover change and phenological nodes were verified.

Effectiveness of Algorithm Screening
To test the effectiveness of the screening rule, a total of 14 Landsat scenes from 2008 were employed as validation data ( Table 3). The spatial and temporal variances between the reference data for each scene were calculated, and then the screening scheme was applied to select a fusion algorithm. To select different regression functions for fusion images in different phenological phases, the average start and end of the growing-seasons extracted from MODIS13Q1 were used to distinguish the growing season from the non-growing season. The average SOS was the 154 day of the year, while the average EOS was the 278 day of the year. Table 3 shows that, except for two prediction images (DOY 075 and 107), the screening rule could effectively determine a suitable fusion algorithm for the special fusion image according to the R 2 accuracy grade calculated from the observational Landsat NDVI and predicted Landsat NDVI. The reason for choosing an unsuitable algorithm for image prediction in DOY 075 and 107 might have been caused by the potential cover of snow or ice in the reference images. These potential snow and ice cover areas caused a low variance in either the spatial or temporal dimension. However, it is worth noting that the scope of the screening rule was to select a higher R 2 grade algorithm from the basic algorithms, so its ability to select a higher value of R 2 under the same grade was poor. From the visualizing perspective ( Figure 6), the difference between NDVIs of the prediction images from STARFM and ESTARFM was calculated by using the NDVI of STARFM blended images minus the NDVI of ESTRAFM blended images. The results indicated that the difference between the STARFM and ESTARFM was higher in the growing season than in the non-growing season. For example, the AAD in DOY 179, 187, 235, 259, and 275 were greater than 0.03, while the AAD in DOYs in the non-growing season was less than 0.03. Moreover, in the growing season, the NDVI difference showed that the NDVIs calculated from the STARFM predicted images was higher than the NDVIs calculated from the ESTARFM predicted images. A comparison of the enlarged local region of the NDVI extracted from both STARFM and ESTARFM predicted images in the growing season were conducted (Figure 7). The result indicated that the ESTARFM not only caught the spatial details (Figure 7 region 1, region 2), but predicted well in the gradual change (Figure 7 region 4, region 5). However, STARFM defeated the ESTARFM in land cover conversion caused by an abrupt change of land cover (Figure 7, region 3). It further verified the effectiveness of the screening rule by selected ESTARFM for predicting images in the growing season, if we assume that the land cover gradually changed in 2008.

Land Cover Change Detection and Phenological Stability Analysis
The effectiveness of a method is often related to its application purpose. Thus, further steps for near real-time monitoring were carried out by comparing the target Landsat NDVI time series (TTS) generated from the framework that this paper proposed with the MODIS NDVI time series (MTS) and NDVI time series using single algorithm fusion (STS: the STRAFM based Landsat NDVI time series, and ETS: the ESTRAFM based Landsat NDVI time series). BFAST Monitor [33] was selected as the change detection algorithm, and 110 change samples ranging from 2012 to 2016 were selected as validation samples by visual interpretation (using high-resolution images; Table 1). The overall accuracy, omission error, and commission error were calculated as the accuracy indicators. Figure 6 shows the accuracy results of different NDVI time series. The results indicated that TTS obtained the highest overall accuracy (88.18%), but also the lowest omission error (1.82%) and commission error (10.00%), while in turn, MTS achieved the lowest overall accuracy (10.00%) but the highest omission (19.09%) and commission error (70.91%). When compared with STS and ETS, respectively, the overall accuracy of TTS increased by 21.82% and 17.27%, the omission error decreased by 1.82% and 4.54%, and the commission error decreased by 20% and 12.73%. The accuracy demonstrated that the construction of the Landsat NDVI time series using multiple fusion algorithms with the framework could efficiently improve the application accuracy. On the other hand, to test the ability of TTS in phenological change, a phenological stability analysis was implemented using MAD as the evaluation index, and the phenological parameters of SOS and EOS were extracted by the dynamic threshold method [56]. Under the hypothesis that phenological features exhibited long-term stability where the year 2000 was referred to as the reference year, a lower MAD indicated a more stable phenology. The phenological stability results (Figure 8b) demonstrated that the TTS's SOS stability and EOS stability were lower than the STS, but higher than ETS. STS possessed the highest stability in SOS and EOS with MAD, respectively equal to six days and four days, while ETS obtained the lowest stability with MAD, respectively equal to seven days and nine days. These results indicated that combining multiple fusion algorithms to construct a time series tends to gain more comprehensive results and effectively avoid phenological fluctuations caused by using a single fusion algorithm.

Discussion
In this section, the advantages and adaptability of the framework proposed in this paper are discussed. Although this framework worked well in the construction of Landsat NDVI time series, there are still many uncertainties and challenges from the datasets and screening rules. Thus, we discuss the framework in detail, so that we can be more familiar with it and its potentials and shortages.

Advantages and Adaptability
As one of the most effective approaches to predict the missing high-resolution images for time series construction, spatio-temporal fusion algorithms have been extensively developed for different purposes [30,57]. However, a balance between the required minimal number of input image pairs, performance of heterogeneous land cover, and calculation complexity is always needed when using a spatio-temporal fusion algorithm to generate a Landsat NDVI time series [44,58]. Emelyanova et al. (2013) proposed that different algorithms had their advantages and disadvantages by comparing the accuracies of the advanced blending algorithms (STARFM and ESTARFM) and the simple benchmarking algorithms (including linear interpolation model (LIM) and global empirical image fusion model (GEIFM)) [50] by taking the spatial and temporal variance as assessment metrics. Thus, this paper explored the possibility of employing multiple spatio-temporal fusion algorithms to construct a dual high-resolution Landsat NDVI time series. Hence, the prediction images would be predicted by an adaptable algorithm. It is worth noting that the basic spatio-temporal fusion algorithms in the framework should be complementary. For example, STARFM and ESTARFM were selected in this paper as basic algorithms because the STARFM was efficient and performed well in a homogeneous land cover area with one reference image pair, and the ESTARFM was relatively time-consuming, needed at least two coarse and fine reference image pairs, but performed well in a heterogeneous land cover area [7,24]. The performance of this framework would be worse if the selected base algorithm not only had a good performance in a heterogeneous area and different kind of land cover change area, but also operated efficiently and just needed one pair of reference images, which include algorithms such as the Spatial and Temporal Reflectance Unmixing Model (STRUFM) in [59], the improved STRUFM (ISTRUFM) in [60], and the Flexible Spatiotemporal DAta Fusion (FSDAF) method in [12]. To the best of the authors' knowledge, no previous studies have used multiple fusion algorithms to construct a long-term time series, and most of the research work to date has only employed a single algorithm [7,61]. Möller et al. (2017) generated a 30-m and 16-day interval time series utilizing STARFM to monitor soil erosion [4]. Moreover, other STARFM-generated time series were described in the study by Tian et al. (2013), which analyzed the land cover trends in the Loess Plateau [5] as well as in the studies by Schmidt et al. (2015) and Bhandari et al. (2012), which respectively detected the forest disturbance and recovery in Queensland, Australia [6,53]. From the application perspective, the constructed Landsat NDVI time series based on the framework proposed in this paper could be broadly applied for land cover change monitoring, phenological extraction, and land use and land cover classification. The near real-time monitoring method or its improved version is perfect for catching the land surface change including abrupt change, gradual change, and seasonal change, if the land cover change detection is the only application scope [33,35,39]. Therefore, a screening framework for multiple fusion algorithm selection can enrich the theory for Landsat NDVI time series construction, and result in the inclusion of more details in the Landsat NDVI time series compared to that in a time series constructed based on a single algorithm. On the other hand, temporal and spatial changes among images were emphasized in the proposed screening rule using the spatial and temporal variance, which provides an opportunity to quantitatively express the correlation between fusion performance and the Earth's surface change.

Uncertainty and Challenges
Although the framework performed well in the algorithm selected and the constructed Landsat NDVI time series was proven to be valid in land cover detection and phenological extraction, there are still many uncertainties that might affect the results. First, this paper fused the reflectance of images first, and then calculated the NDVI. This process has been reported as inferior to the direct fusion NDVI process. As a result, extra errors would be introduced in the Landsat NDVI time series [62,63]. Additionally, this paper only blended these Landsat images with the fine pixel covered area less than 50%. Those images with contaminated areas by cloud, shallow, ice, and observed gaps were interpreted by the S-G filter. The interpretation process might bring extra errors into the Landsat NDVI time series construction [64]. These interpolated areas showed an obvious difference from the true value (areas in the red rectangle of Figure 9). Moreover, the screened algorithm was confirmed to meet the R 2 graded accuracy, but it was unsatisfactory for selecting the spatio-temporal fusion algorithm with the highest accuracy value (Table 4). Furthermore, as this paper simply supposed a linear relationship of the fusion graded accuracy and spatial and temporal variance, the overfitting or non-linear relationship between the fusion accuracy and the spatial and temporal variance might also introduce errors in algorithm selection [15,39]. Ideally, it was expected to improve the stability of the screening rule when employing the comprehensive accuracy indexes or other accuracy assessment indexes [14,62]. Other accuracy assessment indexes include the average absolute difference (AAD), the root mean square error (RMSE), the structural similarity (SSIM) [65], and the universal image quality index (UIQI) [66].  Furthermore, even if the construction of Landsat NDVI time series using the framework has been proven to be effective for land cover change detection, some uncertainties should be considered, and further studies should be conducted. Additionally, more samples from different years need to be generated. This study considered the fusion result of the year with the most cloudless images (2005) as the fitting samples ( Figure 10), which may introduce errors in the quantitative model. Therefore, further analysis of the fitting model using all fusion images of all available and cloud-free images as samples needs to be carried out. Additionally, due to its computing efficiency [8,67], the STARFM was artificially more frequently selected than ESTARFM in the Landsat NDVI time series when the same R 2 grade of STARFM and ESTARFM appeared in the algorithm screening process. Furthermore, the fusion images around the phenological nodes (SOS and EOS) should receive more attention. The comparison of fusion algorithms demonstrated that the effectiveness of each fusion algorithm has significant differences between the growth and non-growing seasons [50]. Thus, the fitting function for each fusion algorithm was divided into the growing season and the non-growing season. As a result, the determination of the growth stage should be performed before image fusion, and the lack of phenology reference for incomplete years increases the uncertainty in the fusion algorithm selection. Furthermore, the framework proposed in this paper was applied in a step by step time series construction using multiple fusion algorithms, while continuous automated screening mechanisms and time series construction need to be implemented in further research. Finally, compared to all pixels, using a sliding window to quantify  Furthermore, even if the construction of Landsat NDVI time series using the framework has been proven to be effective for land cover change detection, some uncertainties should be considered, and further studies should be conducted. Additionally, more samples from different years need to be generated. This study considered the fusion result of the year with the most cloudless images (2005) as the fitting samples (Figure 10), which may introduce errors in the quantitative model. Therefore, further analysis of the fitting model using all fusion images of all available and cloud-free images as samples needs to be carried out. Additionally, due to its computing efficiency [8,67], the STARFM was artificially more frequently selected than ESTARFM in the Landsat NDVI time series when the same R 2 grade of STARFM and ESTARFM appeared in the algorithm screening process. Furthermore, the fusion images around the phenological nodes (SOS and EOS) should receive more attention.
The comparison of fusion algorithms demonstrated that the effectiveness of each fusion algorithm has significant differences between the growth and non-growing seasons [50]. Thus, the fitting function for each fusion algorithm was divided into the growing season and the non-growing season. As a result, the determination of the growth stage should be performed before image fusion, and the lack of phenology reference for incomplete years increases the uncertainty in the fusion algorithm selection. Furthermore, the framework proposed in this paper was applied in a step by step time series construction using multiple fusion algorithms, while continuous automated screening mechanisms and time series construction need to be implemented in further research. Finally, compared to all pixels, using a sliding window to quantify spatiotemporal differences among reference images may obtain a local optimization fusion result since STARFM and ESTARFM have their respective advantages in different land types. Last but not least, the framework is only a kind of proof-of-concept. These algorithms, which not only consider the spatial and temporal difference, but integrate the multiple processes of Landsat NDVI time series construction, would be desirable [13,16,26]. spatiotemporal differences among reference images may obtain a local optimization fusion result since STARFM and ESTARFM have their respective advantages in different land types. Last but not least, the framework is only a kind of proof-of-concept. These algorithms, which not only consider the spatial and temporal difference, but integrate the multiple processes of Landsat NDVI time series construction, would be desirable [13,16,26].

Perspectives
The framework proposed in this paper can be regarded as a proof-of-concept, and its implementation was proven to be successful. The greatest prospect of this framework was to construct long-term and updated timely Landsat time series for international institutions or programs such as for UNITAR's Operational Satellite Applications Program (UNOSAT) in disaster rapid mapping, the United Nations Collaborative Program on Reducing Emissions from Deforestation and Forest Degradation (UN-REDD) in deforestation detection, and the United Nations Environment Program (UNEP) in environment hotspots, to increase the ability of near real-time environmental protection and monitoring. Furthermore, this framework is expected to be used in a cloud-based platform such as Google Earth Engine (GEE), which not only contains petabyte-scale Landsat archives, but can also provide multiple MODIS products [68]. Using the Internet-accessible application programming interface (API) of GEE, this framework might realize full automation as preprocessing of dual high-resolution Landsat time series construction. Indeed, users could directly utilize Landsat time series, fused images, and algorithm screening carried out in a cloud computing platform and would only need to determine the required data attributes (such as position, spatial resolution, time resolution, feature index, etc.) for near real-time monitoring.

Conclusions
Each model has its advantages and disadvantages. Making full use of complementary information among different models is an effective way to improve application accuracy. This study assumed that selecting an appropriate algorithm from multiple complementary spatio-temporal fusion algorithms according to the spatial and temporal variance for each prediction image in a time series construction would improve the Landsat NDVI time series performance. In addition, we supposed that a linear relationship existed between the fusion accuracy and spatial and temporal variance. Thus, a framework for Landsat NDVI time series construction with multiple

Perspectives
The framework proposed in this paper can be regarded as a proof-of-concept, and its implementation was proven to be successful. The greatest prospect of this framework was to construct long-term and updated timely Landsat time series for international institutions or programs such as for UNITAR's Operational Satellite Applications Program (UNOSAT) in disaster rapid mapping, the United Nations Collaborative Program on Reducing Emissions from Deforestation and Forest Degradation (UN-REDD) in deforestation detection, and the United Nations Environment Program (UNEP) in environment hotspots, to increase the ability of near real-time environmental protection and monitoring. Furthermore, this framework is expected to be used in a cloud-based platform such as Google Earth Engine (GEE), which not only contains petabyte-scale Landsat archives, but can also provide multiple MODIS products [68]. Using the Internet-accessible application programming interface (API) of GEE, this framework might realize full automation as preprocessing of dual high-resolution Landsat time series construction. Indeed, users could directly utilize Landsat time series, fused images, and algorithm screening carried out in a cloud computing platform and would only need to determine the required data attributes (such as position, spatial resolution, time resolution, feature index, etc.) for near real-time monitoring.

Conclusions
Each model has its advantages and disadvantages. Making full use of complementary information among different models is an effective way to improve application accuracy. This study assumed that selecting an appropriate algorithm from multiple complementary spatio-temporal fusion algorithms according to the spatial and temporal variance for each prediction image in a time series construction would improve the Landsat NDVI time series performance. In addition, we supposed that a linear relationship existed between the fusion accuracy and spatial and temporal variance. Thus, a framework for Landsat NDVI time series construction with multiple spatiotemporal fusion algorithms was established. This framework was applied in a typical fragile and high heterogeneity ecosystem to construct a Landsat NDVI time series with a 30 m spatial resolution with a 16-day interval. The application of near real-time monitoring proved the effectiveness of this framework. This paper mainly contains two contributions. First, a fusion algorithm screening framework for constructing a Landsat time series was proposed, resulting in more detailed information in the time series than that included when using a single algorithm. Second, the Earth's surface change was divided into space and time according to recorded images, and the algorithm suitability was quantitatively expressed, which can make the screening scheme more adaptable and robust.  . Moreover, we would like to thank the anonymous reviewers for providing us with useful comments and suggestions, which helped us to improve the quality of this manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.