Blending Landsat and MODIS Data to Generate Multispectral Indices: A Comparison of “Index-then-Blend” and “Blend-then-Index” Approaches

The objective of this paper was to evaluate the accuracy of two advanced blending algorithms, Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM) and Enhanced Spatial and Temporal Adaptive Reflectance Fusion Model (ESTARFM) to downscale Moderate Resolution Imaging Spectroradiometer (MODIS) indices to the spatial resolution of Landsat. We tested two approaches: (i) “Index-then-Blend” (IB); and (ii) “Blend-then-Index” (BI) when simulating nine indices, which are widely used for vegetation studies, environmental moisture assessment and standing water identification. Landsat-like indices, generated using both IB and BI, were simulated on 45 dates in total from three sites. The outputs were then compared with indices calculated from observed Landsat data and pixel-to-pixel accuracy of each simulation was assessed by calculating the: (i) bias; (ii) R; and (iii) Root Mean Square Deviation (RMSD). The IB approach produced higher accuracies than the BI approach for both blending algorithms for OPEN ACCESS Remote Sens. 2014, 6 9214 all nine indices at all three sites. We also found that the relative performance of the STARFM and ESTARFM algorithms depended on the spatial and temporal variances of the Landsat-MODIS input indices. Our study suggests that the IB approach should be implemented for blending of environmental indices, as it was: (i) less computationally expensive due to blending single indices rather than multiple bands; (ii) more accurate due to less error propagation; and (iii) less sensitive to the choice of algorithm.


Introduction
Trade-offs between acquisition frequency and spatial resolutions of satellite image data are inherent in all single-sensor satellites [1]. In the last decade, several advanced blending algorithms have been developed to combine data observed from multiple sensors with various spatial resolutions and temporal densities (e.g., Landsat, Medium Resolution Imaging Spectrometer (MERIS), Moderate Resolution Imaging Spectroradiometer (MODIS) and Advanced Very High Resolution Radiometer (AVHRR)). Gao et al. [2] developed the Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM) algorithm to blend surface reflectance from two sensors to simulate more frequent higher spatial resolution surface reflectance output (e.g., Landsat-like imagery at the frequency of MODIS acquisition). Zhu et al. [3] enhanced the STARFM algorithm (denoted ESTARFM) to improve the model spatial variability of heterogeneous study sites. Both algorithms are widely used by the remote sensing community ( [1] their Table 3).
The objective of much blending research is to simulate reflectance data from which multispectral indices can be calculated, such as vegetation and water indices at a high spatial resolution and temporal density [4][5][6][7][8]. Vegetation indices are widely used to effectively characterize particular biophysical or biochemical properties and processes for vegetated surfaces [9]. The Normalized Difference Vegetation Index (NDVI; [10]) and the Enhanced Vegetation Index (EVI; [9]), are the most common satellite-derived indices used by the remote sensing community for monitoring vegetation at regional to global scale for numerous applications [11][12][13][14]. The Simple Ratio (SR) is best used for estimating Leaf Area Index (LAI; [15]). There are also several environmental moisture indices that are commonly used, including: (i) Global Vegetation Moisture Index (GVMI; [16]); and (ii) Depth of 1650 nm relative to a reference continuum line determined at 835 nm and 2208 nm (D 1650 ; [17]). Water indices, such as the Normalized Difference Water Index (NDWI) are also widely used to delineate open water features and enhance their presence in satellite images. The NDWI [18] has been used by researchers in its original and modified forms (Table 1). An example includes the modified version of NDWI (MNDWI of [19]), which uses the Shortwave-Infrared (SWIR1) band (i.e., Landsat TM band 5) in place of the Near-Infrared (NIR) band (i.e., Landsat TM band 4). The selected nine indices are the most widely used subset of remotely sensed indices from the: (i) vegetation domain; (ii) environmental moisture domain; and (iii) water domain. Table 1. List of nine indices and their equations used here. NIR, SWIR1 and SWIR2 are abbreviations for Near-Infrared, Shortwave-Infrared1 and Shortwave-Infrared2 bands, respectively. Except for Enhanced Vegetation Index (EVI), Simple Ratio (SR) and Depth of 1650 nm (D 1650 ), all indices use a normalized difference formulation generically given as (band x − band y)/(band x + band y), where x and y represent bands. Many recent studies using STARFM first blend reflectance from Landsat and MODIS data and then use these outputs to calculate vegetation or water indices (e.g., [4,[6][7][8]20,21]). Herein, we refer to this process as Blend-then-Index (BI), with the alternative approach being Index-then-Blend (IB). For the IB approach, the indices were calculated first and these indices were input into the blending algorithms to simulate indices at the date of simulation. For the BI approach, reflectance bands were input into the blending algorithms to simulate reflectance, which was used as input to calculate multispectral indices. In the IB approach we assume that a linear mixture model is applicable to indices (i.e., the mixed index for each MODIS pixel is the sum of the index weighted by the class area proportions), as it is for the reflectance bands. According to Kerdiles and Grondona [22] this assumption introduces very small errors to statistics when using indices directly into a linear mixture model (i.e., IB) instead of using individual band reflectance data in the model (i.e., BI). In the only previous study to compare IB with BI, Tian et al. [23] evaluated the accuracy of STARFM for simulating a time series of 12 NDVI images over a single study site. Our paper extends that study by: (i) using both the STARFM and ESTARFM algorithms; (ii) using three sites with contrasting spatial and temporal dynamics; (iii) calculating nine commonly used indices in vegetation, environmental moisture and standing water applications; and (iv) partitioning the spatial and temporal variances to explain the results. The aims of our paper are to: (i) comprehensively examine if one of the approaches (i.e., IB versus BI) consistently outperforms the other for a range of vegetation, environmental moisture and standing water indices; (ii) explore whether spatial and temporal variances are related to the blending accuracy of indices by the two algorithms (i.e., STARFM versus ESTARFM); and (iii) isolate the impact that the approach or algorithm has on blending accuracy (i.e., approach versus algorithm). These three aims provide the structure of our paper and are used as subheadings in the Methods, Results and Discussion sections.

Study Site and Data Sets
Three study sites with different relative spatial and temporal variances and different land cover patch sizes were selected in this study ( Figure 1); they are introduced in turn. The Thomson River floodplain study site (Thomson herein) is an extensive anabranching river system located in central Queensland, Australia (143.20°E, 24.50°S, see Figure 1). The Thomson study site covers 3850 km 2 (55 km E-W × 70 km N-S) within a Landsat-5 TM scene (path 96, row 77). The Thomson site is located in the Lake Eyre Basin in a region called the "Channel Country", which is characterized by extensive floodplains and a complex anabranching river system with ephemeral flows following precipitation [24,25]. It has a low topographic gradient and dynamic land cover in watercourses and floodplains [25,26]. Its Köppen-Geiger climate is in the arid (B), steppe (S) and hot (h) zone, with a mean annual temperature greater than 18 °C [27]. The land use in the Lake Eyre Basin is dominated by grazing. Mitchell Grass plains, sand dunes, spinifex grasslands, gibber deserts, stony plains and acacia woodlands are landscapes of the Cooper Creek catchment [28]. The flooding is the only water source for flora and fauna of the area, and there is a distinct greening up following the passage of floodwaters.
The Coleambally Irrigation Area study site (Coleambally from herein) is a rice based irrigation system located in southern New South Wales (NSW, Australia; 34.0034°E, 145.0675°S). Standing water associated with flood irrigation of summer rice fields is present in October and November [17,29,30]. Summer crop development (i.e., rice, soybeans, corn and sorghum-the last three crops being furrow irrigated) occurs from December to April, with many crops harvested by May. The surrounding dryland agricultural areas mainly have a winter growing season (cereals and pasture), and several small residual woodland patches in the northern part of the images are fairly constant throughout the time series.
The Lower Gwydir Catchment study site (Gwydir from herein) is located in northern NSW (149.2815°E, 29.0855°S). The temporal extent of data over the Gwydir was greater than one year, and included a winter and a summer crop-growing season. The Gwydir, which covers the typical dual growing season crop phenology and surrounding dryland agricultural area, experienced a large flood in mid-December 2004. This flooding, and subsequent inundation, occupied a large spatial extent of the Gwydir imagery and was temporally very dynamic ( Figure 1). The Gwydir site is spatially more homogenous than the Coleambally and Thomson sites, because of the larger agricultural fields, coupled with the large (and quick) flooding event. The flooding/inundation at Gwydir was a significant test of the blending algorithms in conditions with extremely high spatio-temporal variability.
For Thomson, 20 pairs of cloud-free Landsat-MODIS (L-M) images were used from April 2008-October 2011 (Table 2). This period was characterized by an intense La Niña, and major flooding occurred over much of eastern and central Australia in 2009 and 2010, during which large areas were covered with standing water. The most likely time to observe standing water at this site is during the wet season, from November-April. Between 2008 and 2011 most of the Landsat and/or MODIS images were cloudy during January and February. After selecting all cloud-free inundated images during the wet season, the other images were selected to be as close as possible to when the inundated images were acquired ( Table 2). The Coleambally site images were Landsat 7. The Thomson and Gwydir sites images were Landsat 5 data and were corrected for Bidirectional Reflectance Distribution Function (BRDF) effects. The Gwydir images were corrected using the Li et al. [31] BRDF algorithm. The Thomson images were corrected to at-surface reflectance using the Flood et al. [32] BRDF algorithm (also taking into account atmospheric conditions, topography, sensor location and sun elevation) using the parameterized bi-directional reflectance model for eastern Australia [32]. All MODIS data were BRDF corrected Terra MODIS Collection 5, daily reflectance (MOD09GA) images with 500 m pixels for all bands [33]. MODIS data, which were originally processed by the Land Processes Distributed Active Archive Center (LPDAAC) at the U.   For detailed information about the other two study sites, Coleambally and Gwydir, see [1]. Briefly, 17 L-M image pairs acquired over eight months for Coleambally and 14 L-M image pairs acquired over 12 months at Gwydir were used. Partitioning the variance into its spatial and temporal components [1] showed that Coleambally reflectance data had higher spatial variance (than temporal variance) and more accurate results were obtained with ESTARFM due to its design. In contrast, at Gwydir temporal variance dominated spatial variance and due to algorithmic assumptions STARFM worked best. Finally, Coleambally has a smaller effective patch size than Gwydir [1]. The three study sites have different relative spatial and temporal variances and different patch sizes, governing the area-to-perimeter ratio within the different resolution imagery used in the blending algorithms. These three sites are purposefully selected to form a continuum between solely man-made standing water and entirely natural standing water: (i) at Coleambally all standing water is man-made (due to irrigation); (ii) at Gwydir both irrigated fields and standing water associated with flooding are present, and (iii) at Thomson standing water is only associated with flooding. The dynamics and area-to-perimeter ratios of standing water (and associated responses) varied across the three sites, and these three sites are therefore a robust selection from which to evaluate performance within and between both blending algorithms across the IB and BI approaches.

Data Pre-Processing
A Landsat-like image was generated on a given simulation date by using a total of five input images, being two L-M pairs (one before and one after the simulation date) and the MODIS image on the simulation date as input to either STARFM or ESTARFM [1]. The observed Landsat image on the date of simulation was preserved for validation and was not used as input to the blending algorithms. Herein, the date of simulation is denoted t 2 , the first L-M pair date will be referred to as t 1 Table 2. At Thomson, a total of 18 Landsat-like images were simulated in this manner using the nearest temporal neighboring L-M pairs to the central dates listed in Table 2 as image #'s 2-19, herein referred to as "3-sequential date images". Observed MODIS images were resampled to Landsat resolution using the nearest neighbor approach and the five Landsat or MODIS images involved in any given blending operation were then co-registered based on a correlation test [1]. The co-registration results of Terra MODIS images in this study confirm the along-track and along-scan band-to-band co-registration error in Terra MODIS bands [34]. The optimal spatial offset to maximize the correlation between corresponding bands of L-M images was calculated by using the IDL (Exelis Visual Information Solutions, Boulder, Colorado) code developed by NASA [35] and applied to MODIS images.
We used the six reflective Landsat bands and the corresponding MODIS bands for the blending algorithms (Table 3). All nine indices (i.e., NDVI, EVI, SR, GVMI, D 1650 , NDWI 24 , NDWI 25 , NDWI 27 and NDWI 45 , see Table 1) were calculated for all Landsat and MODIS images at each site. All simulated indices (from both approaches) were compared with the observed Landsat indices at date t 2 . Then all nine indices were calculated and compared with indices calculated from the observed Landsat images at date t 2 .

Blending Algorithms
STARFM and ESTARFM assume that images from different sensors are acquired under similar land surface conditions and that surface reflectance is comparable after pre-processing [3]. The STARFM algorithm can use either one pair or two pairs of L-M images (i.e., dates t 1 and/or t 3 ) and one low resolution image (date t 2 ) to simulate a high resolution image at date t 2 , while the ESTARFM algorithm uses two pairs of L-M images (i.e., dates t 1 and t 3 ) and one low resolution image (date t 2 ) to simulate date t 2 . For STARFM we use the two L-M image pair option to have consistent input with ESTARFM.
The algorithms identify spatial changes of reflectance from the high spatial resolution images by finding spectrally similar neighbor pixels and temporal changes from the low-resolution images to simulate the high spatial resolution and high temporal density images at selected dates. A moving search window (w) is used to select similar neighboring pixels, and heterogeneity of the landscape is considered by the number of land cover classes in each pixel of the low resolution image [3]. The algorithms use weight factors for each spectrally similar pixel to blend temporal and spatial information. Proximal to the central pixel and spectrally similar fine resolution pixels have higher weights [3]. To make the result comparable with former studies, here the size of w was 50 by 50 Landsat resolution pixels and the assumed number of spectrally-different classes was four. STARFM is able to model non-linear changes between two Landsat images and would be expected to model temporal variability better than ESTARFM. In contrast, ESTARFM has been designed to work better in more spatially heterogeneous areas [1,2].

IB versus BI
The accuracy of the STARFM and ESTARFM algorithms for simulating all nine Landsat-like indices was assessed by comparing the bias (calculated as observed minus simulated), correlation coefficient of determination (R 2 ) and Root Mean Square Deviation (RMSD) between the simulated and observed Landsat indices, using both the IB and BI approaches. Additionally, the results from the IB and BI approaches for both blending algorithms were examined by comparing temporally mean bias, R 2 and RMSD across the entire blended dataset for each site (18 dates for Thomson, 15 dates for Coleambally and 12 dates for Gwydir; there are two less instances than the number of images available at each site due to using the blending algorithms with L-M pairs before and after each simulation date).
The paired t-test was used to assess if the difference of the mean error between the IB and BI approaches was statistically significant. Mean error between IB and BI for each "3-sequential date image" was paired. The assessment was performed for both the STARFM and ESTARFM algorithms, for each of the three sites, and for each of the three above-mentioned error statistics at the 90% (i.e., p < 0.1) and 95% (i.e., p < 0.05) confidence levels. For example using STARFM at Thomson, the mean bias of each of the 18 simulated images generated using IB were paired to the corresponding 18 mean biases generated using BI to test whether the biases between IB and BI were statistically significant. This example was extended to all combinations of error statistics, sites and algorithms.

STARFM versus ESTARFM
Quantification of spatial and temporal variances of image and index time series, given the strengths and weaknesses of each algorithm, is an important step toward selecting blending algorithms [1]. Here we used the same method ( [37], their Equation 10) to partition the grand (or spatio-temporal) variance into the spatial and temporal variance components and assessed the suitability of STARFM and ESTARFM. Following [1], we calculated spatial and temporal variances of each possible combination of 3-sequential dates of the high and low resolution images. The temporal to spatial variances ratio (T/S), as an indicator of algorithm selection, was also calculated for each 3-sequential dates of L-M bands and indices. This was performed for all six reflective bands and nine indices of L-M images at Thomson. Since the temporal and spatial variances were already reported for Coleambally and Gwydir for the bands [1], here we only report the indices' variances for these two sites. The paired t-test was used to assess if the difference of the mean error between the STARFM and ESTARFM algorithms was statistically significant, using the general technique as previously explained.

Approach versus Algorithm
To compare and quantify the impact of the IB versus BI approaches on the accuracy of STARFM versus ESTARFM, R 2 and RMSD statistics were calculated for four parameters (i.e., (STARFM-ESTARFM) IB , (STARFM-ESTARFM) BI , (IB-BI) STARFM and (IB-BI) ESTARFM ). To quantify STARFM versus ESTARFM for the IB approach, differences between STARFM and ESTARFM statistics; (STARFM-ESTARFM) IB ; were calculated and averaged for all 405 simulations (nine indices by 45 dates-the total from the three sites). For the BI approach, a similar parameter; (STARFM-ESTARFM) BI ; was also calculated. To quantify IB versus BI, averaged difference R 2 and RMSD statistics were calculated across all 405 simulations (as above) using the STARFM algorithm; (IB-BI) STARFM ; and ESTARFM algorithm; (IB-BI) ESTARFM .

IB versus BI
The statistics (Table 4) showed that for all nine indices examined, the IB approach outperformed the BI approach at all three sites. The paired t-test analysis showed that the means of the three abovementioned error statistics produced for the three sites of "3-sequential date images" for the two approaches were statistically different at the 95% confidence interval in 65% of the STARFM and 53% of the ESTARFM cases ( Table 5). The higher accuracy of the IB approach is most likely explained by error propagation, as the IB approach only incurs one instance of blending so there is only one process where blending-induced error can be introduced. In contrast, the BI approach incurs multiple blending instances and therefore multiple instances of error that subsequently propagates to the resultant indices. Moreover, for those indices having a normalized difference formulation (i.e., NDVI, GVMI, NDWI 24 , NDWI 25 , NDWI 27 and NDWI 45 , see Table 1) their algebra reduces error.
At the Thomson site, comparing the R 2 of the blended indices with those calculated from observed Landsat data revealed that the IB approach resulted in higher accuracy than the BI approach in 89% of 162 (9 indices by 18 dates) STARFM simulations and 75% of ESTARFM simulations ( Figure 2). The three averaged error statistics for the 18 STARFM and ESTARFM indices were statistically different at the 90% confidence level (bias; 11% of the cases, R 2 ; 61% of the cases and RMSD; 61% of the cases, Table 5). The sign of the average bias produced by STARFM overestimated (negatively-biased) 32% of 162 simulations and ESTARFM overestimated 51% of all simulations by the IB approach.
Comparing the nine indices showed that site-averaged simulated NDVI and EVI produced lowest bias in all four options compared with other indices due to use of red and infrared bands. ESTARFM results overestimated NDVI, EVI and GVMI and underestimated SR, D 1650 , NDWI 24 , NDWI 25, NDWI 27 and NDWI 45 (Table 4). Statistics derived using the BI approach have higher spatial variances during the wet season (December-March) of each year; especially during the major flood event on date 2 January 2011 (date 992). As was found for the IB approach, most of the 162 BI simulations were underestimated by STARFM (62%) and ESTARFM (53%). The NDVI and EVI values were overestimated by STARFM for the BI approach, while the NDVI, EVI and GVMI indices were overestimated by ESTARFM at Thomson (Table 4).
At Coleambally, from all 135 (nine indices by 15 dates) simulations, 90% produced higher accuracy by the IB approach when using STARFM and 98% when using ESTARFM (Figure 2). Using the paired t-test the IB and BI approaches were statistically different when comparing means of statistics (bias; for 56% of the cases, R 2 ; for 83% of the cases and RMSD for 78% of the cases) at the 90% confidence level (Table 5). When using IB the site-averaged bias in ESTARFM was lower compared with STARFM as shown for BI approach (Figure 2). The STARFM algorithm overestimated NDWI 24 and underestimated all other indices when using IB, while NDVI and EVI indices were overestimated by both STARFM and ESTARFM algorithms approach and other seven indices were underestimated by using BI. From all 135 simulations, 27% and 50% are overestimated when using STARFM and ESTARFM, respectively, using the IB approach. By using BI, STARFM overestimated 36% and ESTARFM overestimated 48% of the simulations in Coleambally.
At Gwydir, the IB approach outperformed the BI by producing higher R 2 in 90% of all 108 (nine indices by 12 dates) simulations when using STARFM, and 100% of all simulations when using ESTARFM (Figure 2). The IB and BI approaches were statistically different at the 90% confidence level when comparing means of statistics (bias; 18%, R 2 ; 89%, RMSD; 78%; see Table 5). STARFM underestimated 71% of 108 simulations and ESTARFM underestimated 57% of all 108 simulations when using the IB approach. When using the BI approach 41% were overestimated by STARFM and 40% were overestimated by ESTARFM. Site-averaged NDVI was overestimated and the other eight indices were underestimated by using either STARFM or ESTARFM when using IB and BI approach (Table 4). STARFM produced higher mean bias compared with ESTARFM for all nine indices when using IB and BI approach at Gwydir.
The IB approach also produced a lower RMSD (higher accuracy) at all three sites for both STARFM (Thomson; 89%, Coleambally; 81% and Gwydir; 82%) and ESTARFM (Thomson; 70%, Coleambally; 94% and Gwydir; 98%). The mean bias, R 2 and RMSD statistics for each site are presented in Figure 2. As shown in Figure 2a,c, Thomson had lower mean bias and RMSD statistics for all nine indices compared with Coleambally and Gwydir, because of its lower spatio-temporal variances (Section 4.2). The results for each index at each site presented in Figure 2 and Table 4 are the site-averaged statistics from all 3-sequential date simulations. The performance of STARFM and ESTARFM in each individual simulation is likely to be different from these averaged results. Table 4. Mean bias, R 2 and root mean square deviation (RMSD) statistics between the observed and simulated index values from Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM) and Enhanced Spatial and Temporal Adaptive Reflectance Fusion Model (ESTARFM) for the both the Index-then-Blend (IB) and Blend-then-Index (BI) approaches for Thomson, Coleambally and Gwydir, which are abbreviated as T, C and G respectively in the column headings. Bias and RMSD are presented in index units.  Table 5. t-test results to assess approaches and algorithms are statistically different. Probability of <0.05 is shown in the italic and bold font, and probability <0.1 are provided as italic, >0.1 is normal font. Last two rows show number of cases with <0.05 and <0.1 probabilities and their percentage (count/18 × 100) in brackets. Under the 'Approach' headings, testing differences due to IB or BI, statistics for the first grouping of the nine indices use STARFM and the second grouping of nine use ESTARFM. Under the 'Algorithm' headings, testing differences due to blending algorithms, statistics for the first grouping use the IB approach and the second grouping use the BI approach. Thomson, Coleambally and Gwydir are abbreviated as T, C and G, respectively.  (Table 4) showed that both STARFM and ESTARFM algorithms simulated indices with higher bias and RMSD and lower R 2 for dates with higher spatial and temporal variances (Section 4.2). For example, high temporal and spatial variances at date 992 for Thomson, due to inundation of the river channel, and at date 192, most likely due to changes in soil surface moisture and seasonal vegetation changes, resulted in lower accuracies when simulating all nine indices.

Indices
As an example, Figure 3 compares bias and R 2 statistics for NDWI 24 of Gwydir simulated by the IB and BI approaches by STARFM and ESTARFM at the flood date on 12 December 2004. As shown in Figure 3, IB outperformed BI when using either STARFM or ESTARFM. On this date STARFM produced higher accuracy (higher R 2 and lower RMSD and bias) compared with ESTARFM due to higher T/S variances ratio by a flood event, which is in agreement with the algorithm selection criteria proposed by [1]. Higher biases were shown in highly variable inundated areas of Gwydir by both STARFM and ESTARFM (Figure 3a,b,c,d).

STARFM versus ESTARFM
Performance of STARFM and ESTARFM algorithms in simulating IB and BI indices was related to the T/S variances ratio. STARFM showed higher accuracy in simulating all nine indices at dates with higher T/S ratio of Thomson, Coleambally and Gwydir. In contrast, ESTARFM produced better simulations at dates with lower T/S ratio (Table 4). Comparing the statistics also showed that all nine indices produced different T/S ratio by using similar inputs, therefore, even on a certain date, there is no single optimum algorithm (STARFM or ESTARFM) for all nine indices. For example, on date 880 at Thomson, the T/S ratio of NDVI and EVI were higher than the other indices (Figure 4), which resulted in higher R 2 and lower RMSD in simulating these indices by STARFM. Alternately, the other indices, which had lower T/S ratios, produced higher accuracies by ESTARFM. Higher temporal variance resulted when any 3-sequential date image set contained highly dynamic land-cover change (e.g., associated with flood events). For example the Thomson flood event (date 992) resulted in higher T/S variances ratio at dates 880, 992 and 1040 (Figure 4). The site-averaged spatio-temporal variance results were smaller for Thomson (0.015) than Coleambally (0.784) and Gwydir (1.610). The highly variable (river channel and floodplain) portion of Thomson imagery is relatively small compared to the surrounding low variance portion of that imagery (Figure 1), whereas in both Coleambally and Gwydir the highly variable portions of the imagery were relatively larger, being relatively largest for Gwydir (Figure 1).
At Thomson, ESTARFM produced slightly higher accuracies than those yielded from STARFM for both IB and BI approaches. Using the paired t-test the ESTARFM and STARFM approaches were statistically different when comparing means of statistics (bias; for 72% of the cases, R 2 ; for 44% of (e) (f) the cases and RMSD for 28% of the cases) at the 90% confidence level (Table 5). Similar to findings for Coleambally and Gwydir reported by [1], at Thomson the T/S variances ratio of Bands 7 and 5 were the dominant variances in both Landsat and MODIS resolutions followed by Band 4, Band 3, Band 2 and Band 1 (Figure 4a,b). This is due to selecting hydrologically active sites (in all cases). For all six bands through the entire time series results showed that spatial variance was greater than temporal variances (T/S < 1, Figure 4a,b), which means the area is more variable in space than in time.
Temporal variances of all six Landsat bands were lower than corresponding MODIS bands. In contrast, spatial variances of Landsat bands were higher than spatial variances of the corresponding MODIS bands; most likely because of the lower spatial resolution. Landsat bands showed higher spatio-temporal variances compared with MODIS bands. Comparison of the spatial and temporal variances of indices through the dataset showed that the magnitude of spatial and temporal variances depends on the magnitude of the indices (Figure 4). For example, SR had highest averaged spatio-temporal variance followed by NDWI 27 , D 1650 , GVMI, NDWI 25 , NDWI 45 , NDVI, NDWI 24 and EVI (Figure 4c,d). Distinct changes in the spatial and temporal variances occurred during the flood (date 992) and at the point of transition between the dry and wet seasons (dates 1088 and 1120) due to increased precipitation and water flow in the multi-channel river system and consequent vegetation growth. Normalized indices calculated from bands with diverse spectral regions showed higher spatial and temporal variances compared with other indices calculated from bands with similar spectral regions. For example, NDWI 27 and NDWI 25 water indices, which use SWIR1 and NIR bands, had higher spatio-temporal variances when compared with other water indices, i.e., NDWI 24 and NDWI 45 (Figure 4). EVI showed the highest T/S ratio followed by SR, NDVI, NDWI 24 , NDWI 45 , GVMI, D 1650 , NDWI 25 and NDWI 27 (Figure 4c,d). Vegetation indices showed a higher T/S ratio due to higher changes in greenness of the study site after precipitation and corresponding rapid vegetation growth. Comparing Landsat and MODIS variances revealed that all nine Landsat indices had higher temporal, spatial and spatio-temporal variances than the MODIS indices in Thomson. At Coleambally, with moderate spatio-temporal changes and lower T/S ratio for all dates ( Figure 5), ESTARFM produced better results than STARFM by using both IB and BI approaches. IB-ESTARFM produced the most accurate results (0.82 < R 2 < 0.95) followed by BI-ESTARFM (0.80 < R 2 < 0.94), IB-STARFM (0.78 < R 2 < 0.93) and BI-STARFM (0.76 < R 2 < 0.93). The three averaged error statistics for the 18 STARFM and ESTARFM indices were statistically different at the 90% confidence level (bias; 94% of the cases, R 2 ; 100% of the cases and RMSD; 100% of the cases, Table 5). All nine indices showed lower temporal variances compared with spatial variances in both Landsat and MODIS resolutions ( Figure 5). Date 97 was a transition date at Coleambally: for all dates before date 97, higher temporal and lower spatial variances of NDWI 24, NDVI and EVI resulted in a higher T/S ratio compared with the other indices. In contrast for dates after 97, these indices showed lower T/S ratios ( Figure 5). Rice was the dominant crop at Coleambally. Dates 0 through 97 were when rice fields were flooded, rice was planted and the crop grew to full canopy closure. During this time of active plant growth, the T/S ratio was higher for the three indices that make use of the visible and NIR bands when compared to the indices that make use of the longer wave bands. SR had the highest averaged spatio-temporal variance followed by D 1650 , NDWI 45 , GVMI, NDWI 27, NDVI, NDWI 25 , EVI and NDWI 24 ( Figure 5). SR showed the highest T/S ratio followed by EVI, NDWI 24, NDVI, NDWI 45 , NDWI 27 , D 1650 , NDWI 25 and GVMI ( Figure 5). Comparing Landsat and MODIS variances revealed that, all nine Landsat indices had temporal, spatial and spatio-temporal variances that were higher than the MODIS indices in Coleambally.   At the highly temporally dynamic Gwydir with higher T/S ratio at few selected dates, averaged statistics of all simulations showed that ESTARFM produced better results than STARFM by using the IB approach (Figure 2). By using the BI approach, STARFM produced better results than ESTARFM (Figure 2h,i). The STARFM and ESTARFM approaches were statistically different at the 90% confidence level when comparing means of statistics (bias; 17%, R 2 ; 28%, RMSD; 33%; see Table 5).

b) MODIS indices T/S ratio
Gwydir had higher temporal variances overall than the other two sites while non-normalized SR and normalized NDWI 25 and NDWI 27 produced higher temporal variances than the other six indices. At dates 208, 240 and 256, temporal variance was higher than spatial variance resulting in higher T/S ratios at these dates compared with other dates (Figure 6). The T/S ratios of dates 128, 288, 304, 320 and 336 were lower than the other dates due to lower temporal variances and higher spatial variances ( Figure 6). SR had the highest averaged spatio-temporal variance followed by D 1650 . SR and D 1650 are not normalized difference indices (Table 1), so they do not benefit from error reduction due to their formulation like the normalized difference indices do (as discussed in Section 4.1). SR produced the least accurate results as it is unbounded, so when the dominator → 0 then SR →∞, hence possibly producing large relative errors. Furthermore, D 1650 uses three bands (NIR, SWIR1 and SWIR2), while other moisture indices (i.e., GVMI and NDWI 45 ) only use two of these three bands. This increases the likelihood of higher error propagation in D 1650 . For these reasons, the R 2 statistic for both SR and D 1650 improved most when using IB compared to BI at Gwydir where the variance was highest due to extreme flood-related moisture dynamics. This suggests that using the IB approach might be even more important when the index is not a normalized difference index. The EVI is also not a normalized difference index using three bands (like D 1650 ) but two of these are visible bands which have much lower variances than the NIR and SWIR bands [1], so the EVI produces lower errors.
SR showed the highest T/S ratio followed by EVI, D 1650 , NDWI 45, NDVI, GVMI, NDWI 24 , NDWI 27 , and NDWI 25 ( Figure 6). Comparing Landsat and MODIS variances revealed that all nine Landsat indices had higher temporal, spatial and spatio-temporal variances than the MODIS indices due to lower spatial resolution of MODIS compared with Landsat.

b) MODIS indices T/S ratio
All presented results in Figure 2 and Table 4 are site-averaged and do not show results of individual simulations. As an example here in Figure 7, we presented individual statistics for all 12 simulations of NDWI 25 of Gwydir by using STARFM and ESTARFM and two IB and BI approaches. As shown on Figure 7, on dates with a higher T/S ratio (i.e., date 256), STARFM outperformed ESTARFM by producing higher R 2 and lower RMSD and bias compared with other dates.

Approach versus Algorithm
Comparing the R 2 and RMSD of four parameters, that is (STARFM-ESTARFM) IB , (STARFM-ESTARFM) BI , (IB-BI) STARFM and (IB-BI) ESTARFM , revealed that the differences between statistics of the IB and BI approaches in both algorithms were greater than STARFM and ESTARFM statistics by using both approaches. This means improvement in accuracy of simulations by selecting the right approach (IB) is more important and produces higher simulation accuracies than selecting the right algorithm. It was shown that using the IB approach improved the average R 2 by 0.4% when using STARFM and 3.7% when using ESTARFM. ESTARFM improved the average R 2 by 4.6% when using the IB approach and 1.2% when using the BI approach. Using the IB approach lowered the RMSD by 12.8% when using STARFM and by 16% when using ESTARFM compared to using the BI approach. STARFM versus ESTARFM lowered the RMSD by 6.1% using the IB approach and by 3.95% for the BI approach. This reduction in RMSD also emphasizes that the selection of the right approach (IB) is more important than choosing either the STARFM or ESTARFM algorithms. According to the t-test, the difference between the means of error statistic values (average of three sites) were also more significant when comparing the IB and BI approaches (bias; 28%, R 2 ; 78% and RMSD; 72%) than comparing STARFM and ESTARFM algorithms (bias; 61%, R 2 ; 57% and RMSD; 54%).

IB versus BI
This study found that the IB approach consistently outperformed the BI approach for all indices at all three study sites. The IB approach was less computationally expensive than the BI approach due to blending single indices rather than blending multiple bands. For example, the computational time of EVI using the IB approach is one-third the time required when using the BI approach due to blending three single bands (Blue, Red and NIR) rather than blending the single EVI when using IB.
Brown et al. [38] showed that the long-term NVDI time series derived from multiple sensors, are comparable because of the similarity between them, (i.e., Landsat-and MODIS-derived NDVI are comparable with ±1 standard error, and R 2 = 0.7). Tian et al. [23] compared the IB and BI approaches in simulating an NDVI time series by only using STARFM and found that the IB approach consistently generated better results (0.70 < R 2 < 0.76) than the BI approach (0.56 < R 2 < 0.70) in their study area. In this study we demonstrated how blending algorithms can be used for simulating nine multispectral indices directly at higher spatial resolution and temporal frequency. This paper introduced new insights into the downscaling approaches by blending indices directly from surface reflectance data. This approach (IB) demonstrated two major advantages over the three sites studied: (i) higher accuracy; and (ii) less computational time.

STARFM versus ESTARFM
This study confirms that performance of STARFM and ESTARFM in simulating Landsat-like indices from Landsat and MODIS indices depends on temporal and spatial variances of the input L-M indices into the algorithm; which is in agreement with [1,3]. Emelyanova et al. [1] proposed a conceptual model (Figure 9g of their paper) for advanced algorithm selection by using spatio-temporal variances of the study site. In this study we used a similar method to assess the blending algorithms performance by calculating the T/S variance ratio of input indices (3-sequential dates of MODIS with two Landsat indices). Results of this study confirmed their advanced algorithm selection conceptual model, which suggested using STARFM in sites with higher temporal and lower spatial (higher T/S ratio) variance and using ESTARFM for sites with lower T/S ratio (Figure 2c,d, and Figures 3,4a,b in each). Our study also found that the T/S variances ratio of each index was not similar to the T/S values of individual reflectance bands, which were used to calculate that index. This meant that the performance of STARFM and ESTARFM in IB and BI approaches was different. For example, for NDVI calculated with the IB approach, the performance depended on the T/S ratio of NDVI, whereas for the BI approach, where we blended individual reflectance data, performance depended on the T/S ratio of the individual reflectance Bands 3 and 4. At all three sites, temporal, spatial and spatio-temporal variances of MODIS were lower than Landsat variances due to the lower spatial resolution of MODIS (i.e., each 500 m MODIS pixel covers approximately 277 Landsat 30 m pixels). Rapid changes in land surface conditions (i.e., flood events) resulted in higher changes in temporal variances compared with spatial variance and produced higher T/S ratios.
When comparing STARFM and ESTARFM and using three study sites with different spatial and temporal variances, we showed that no blending algorithm was optimal in all conditions. Emelyanova et al. [1] showed that the performance of these algorithms depended on the spatial and temporal characteristics of the study site. Their analysis was performed using reflectance data, and here we show that the framework for algorithm selection can be extended to indices. We extended their framework by using the T/S ratio of 3-sequential indices, as opposed to using reflectance data over the entire dataset as the algorithm selection criteria. As ESTARFM was developed to blend Landsat and MODIS data in spatially complex heterogeneous regions [3], it outperformed STARFM when the 3-sequential date spatial variance dominated 3-sequential date temporal variance (i.e., a low T/S ratio). In contrast, STARFM outperformed ESTARFM, when the 3-sequential date temporal variance dominated the 3-sequential date spatial variance (i.e., a high T/S ratio).

Approach versus Algorithm
In this study, it was shown that the choice of the IB or BI approaches had a greater impact on the accuracy of simulations compared with choice of algorithm (STARFM or ESTARFM). It means choice of approach is more important than choice of algorithm in blending L-M indices. Comparing STARFM and ESTARFM in both IB and BI approaches showed that improvement in the accuracy of simulations by ESTARFM was higher than accuracy of simulations by STARFM.

Conclusion
The six main conclusions of this research were: (i) the IB approach consistently outperformed the BI approach for all nine indices at all three study sites; (ii) the choice of approach (IB versus BI) had a larger impact on accuracy of blending indices than did the choice of algorithm (STARFM versus ESTARFM); (iii) the IB approach was less sensitive than the BI approach to choice of algorithm; (iv) STARFM was less sensitive to the choice of approach (IB versus BI) than was ESTARFM (which does not mean that STARFM was always the most accurate); (v) using the IB approach was even more important for non-normalized difference indices because they did not benefit from the inherent cancelling of blending-induced errors in their algebraic implementation; and (vi) we confirmed previous findings [1] that STARFM had higher accuracy than ESTARFM when temporal variance was higher than spatial variance (T/S > 1) and ESTARFM had higher accuracy than STARFM when spatial variance was higher than temporal variance (T/S < 1).
To simulate Landsat-like indices from Landsat-MODIS images, the Index-then-Blend approach (IB) consistently produced better results in our study than when blending individual image bands, then calculating indices: the blend-then-index (BI) approach. We conclude the reason for this is that the IB approach only incurs one instance of blending and therefore only one instance of error due to blending, whereas the BI approach incurs multiple blending instances and therefore multiple instances of error. While [1] showed that algorithm selection between STARFM and ESTARFM was important to achieve a more accurately blended output of reflectance bands, we showed here that for blending indices, the choice of approach (IB versus BI) was more important than blending algorithm selection (STARFM versus ESTARFM). Our results have direct impact on operational considerations when blending Landsat and MODIS data for the purposes of generating multispectral indices for vegetation, environmental moisture and/or water applications. For this purpose, our study suggests that the IB approach should be implemented as it is: (i) less computationally expensive due to blending single indices rather than multiple bands; (ii) more accurate due to less error propagation; and (iii) less sensitive to choice of blending algorithm.