Application of A Simple Landsat-MODIS Fusion Model to Estimate Evapotranspiration over A Heterogeneous Sparse Vegetation Region

: A simple Landsat-MODIS (Moderate Resolution Imaging Spectroradiometer) fusion model was used to generate 30-m resolution evapotranspiration (ET) maps for the 2010 growing season over a heterogeneous sparse vegetation, agricultural region using the METRIC (mapping evapotranspiration with internalized calibration) algorithm. The fusion model performance was evaluated, and experiments were undertaken to investigate the frequency for updating Landsat-MODIS data into the fusion model during the growing season, to maintain model accuracy and reduce computation. Initial evaluation of the fusion model resulted in high bias stemming from the landscape heterogeneity and small landholdings. To reduce the bias, the fusion model was modiﬁed to be applicable pixel-wise (i.e., implementing speciﬁc pixels for generating outputs), and an NDVI-based (Normalized Difference Vegetation Index) coefﬁcient was added to capture crop phenology. A good agreement that resulted from the comparison of the fused and non-fused maps with root mean square error (RMSE) of 0.15 mm day − 1 with coefﬁcient of determination (R 2) of 0.83 indicated successful implementation of the modiﬁcations. Additionally, the fusion model performance was evaluated against in-situ observation at the pixel level as well as the watershed level to estimate seasonal ET for the growing season. The default METRIC model (Landsat only) yielded relative error (RE) of 31% and RMSE of 2.44 mm day − 1 , while using the modiﬁed fusion model improved the accuracy resulting in RE of 3.5% with RMSE of 0.37 mm day − 1 . Considering different data frequency update, the optimal fusion experiment (RMSE of 0.61 mm day − 1 , and RE of 6.5%) required the consideration of the crop phenology and weekly updates in the early growing stage and harvest time, and bi-weekly for the rest of the season. The resulting fusion model for ET output is planned to be a part of ET mapping and irrigation scheduling systems.


Introduction
The motivation of this study stems from the requirement of high frequency and high-spatial resolution evapotranspiration (ET) monitoring in agricultural areas of semi-arid regions. Water scarcity in these regions poses challenges for maintaining environmental stability [1] as well as geopolitical security. Southern Iran is indicative of a growing example of regions where excessive pumping groundwater occurs, and which faces water scarcity. The combination of small agricultural land

The METRIC Algorithm
The METRIC model is based on the surface energy balance equation [5]: where LE is the latent energy consumed by ET (W m -2 ); Rn is the net radiation (W m -2 ); G is the soil heat flux (W m -2 ), and H is the sensible heat flux (W m -2 ). The model computes G as a ratio G/Rn using an empirical equation [5] as follows: G R n =(T s -273. 15)×(0.0038+0.0074α)×(1-NDVI 4 ) where Ts is surface temperature (K); α is surface albedo and NDVI is the Normalized Difference Vegetation Index. Sensible heat flux is a function of air density (Pair, kg m -3 ), the specific heat constant of air at constant pressure (Cp = 1004 J kg -1 K -1 ), the aerodynamic resistance (rah, s m -1 ), and the temperature difference (dT, K), and computed as: The model computes LE based on other components (i.e., H, Rn, and G), and the instantaneous ET (ETinst) is then determined as:

The METRIC Algorithm
The METRIC model is based on the surface energy balance equation [5]: where LE is the latent energy consumed by ET (W m −2 ); R n is the net radiation (W m −2 ); G is the soil heat flux (W m −2 ), and H is the sensible heat flux (W m −2 ). The model computes G as a ratio G/R n using an empirical equation [5] as follows: where T s is surface temperature (K); α is surface albedo and NDVI is the Normalized Difference Vegetation Index. Sensible heat flux is a function of air density (P air , kg m −3 ), the specific heat constant of air at constant pressure (C p = 1004 J kg −1 K −1 ), the aerodynamic resistance (r ah , s m −1 ), and the temperature difference (dT, K), and computed as: The model computes LE based on other components (i.e., H, R n , and G), and the instantaneous ET (ET inst ) is then determined as: The instantaneous ET fraction (ET rF ) is then estimated as ET inst divided by the reference ET (ET ref ) at the image acquisition time.
The ET rF is typically considered to be constant throughout the day [5]. Combining daily reference ET (calculated from weather data) with ET rF (calculated from METRIC) results in actual daily evapotranspiration for the days after the Landsat overpass day until the next available image [5].

The MODIS-Landsat Fusion Model
We derived ET maps considering a growing season using the MODIS-Landsat ET fusion scheme [43]. The scheme applies linear regression to integrate Landsat-derived ET fraction using the METRIC model, and the VTCI (Vegetation Temperature Condition Index) derived from the MODIS imagery. The VTCI is derived from the plot of surface temperature and vegetation index, and the trapezoidal shaped distribution (Figure 2), and can be used to describe the water stress condition of a pixel. The VTCI is calculated using the following equations: LST max = a 1 + b 1 × NDVI LST min = a 2 + b 2 × NDVI (8) In the above, LST is land surface temperature corresponding to an NDVI value, and the LST max and LST min are maximum and minimum land surface temperatures, respectively, from the available satellite maps over the study domain. The coefficients (a 1 , b 1 , a 2 , and b 2 ) are estimated from linear regression utilizing maximum and minimum values of NDVI and the associated LST values within the image domain.
The instantaneous ET fraction (ETrF) is then estimated as ETinst divided by the reference ET (ETref) at the image acquisition time.
The ETrF is typically considered to be constant throughout the day [5]. Combining daily reference ET (calculated from weather data) with ETrF (calculated from METRIC) results in actual daily evapotranspiration for the days after the Landsat overpass day until the next available image [5].

The MODIS-Landsat Fusion Model
We derived ET maps considering a growing season using the MODIS-Landsat ET fusion scheme [43]. The scheme applies linear regression to integrate Landsat-derived ET fraction using the METRIC model, and the VTCI (Vegetation Temperature Condition Index) derived from the MODIS imagery. The VTCI is derived from the plot of surface temperature and vegetation index, and the trapezoidal shaped distribution (Figure 2), and can be used to describe the water stress condition of a pixel. The VTCI is calculated using the following equations:

VTCI=
LST max -LST LST max -LST min (6) LST max =a 1 +b 1 ×NDVI (7) LST min =a 2 +b 2 ×NDVI (8) In the above, LST is land surface temperature corresponding to an NDVI value, and the LSTmax and LSTmin are maximum and minimum land surface temperatures, respectively, from the available satellite maps over the study domain. The coefficients (a1, b1, a2, and b2) are estimated from linear regression utilizing maximum and minimum values of NDVI and the associated LST values within the image domain. The maps of VTCI corresponding to two days 'a' and 'b' were generated based on Equation 6. from the available MODIS imagery. Linear regression was performed between VTCIa and VTCIb for a pixel 'i' and a specific land cover as: (VTCI) b.i = e×(VTCI) a.i +f (9) Assuming that ET values are available at day of year 'a' (DOYa), the model estimated ETrF for DOYb is developed assuming a linear relationship between ETrF and the VTCI as: The maps of VTCI corresponding to two days 'a' and 'b' were generated based on Equation (6). from the available MODIS imagery. Linear regression was performed between VTCI a and VTCI b for a pixel 'i' and a specific land cover as: Assuming that ET values are available at day of year 'a' (DOY a ), the model estimated ET rF for DOY b is developed assuming a linear relationship between ET rF and the VTCI as: where, (ET rF ) b,i is the ET fraction values at 30 m resolution for a pixel 'i' at DOY b using METRIC and Landsat imagery; 'e' and 'f' are coefficients from the linear regression of MODIS-based VTCI a and VTCI b , respectively. For an available Landsat scene corresponding to DOY a , a high-resolution ET rF map is generated using the METRIC model. This map is then used in a linear regression equation to generate a high-resolution ET rF map corresponding to DOY b using the coefficients derived from the regression of MODIS VTCIs for DOY a and DOY b . This is the standard approach for using the fusion model and was applied over the study region. The model algorithm, as well as the modifications (described in the next section), are summarized in Figure 3. (ET rF ) b.i =e×(ET rF ) a.i +f (10) where, (ETrF)b,i is the ET fraction values at 30 m resolution for a pixel 'i' at DOYb using METRIC and Landsat imagery; 'e' and 'f' are coefficients from the linear regression of MODIS-based VTCIa and VTCIb, respectively. For an available Landsat scene corresponding to DOYa, a high-resolution ETrF map is generated using the METRIC model. This map is then used in a linear regression equation to generate a highresolution ETrF map corresponding to DOYb using the coefficients derived from the regression of MODIS VTCIs for DOYa and DOYb. This is the standard approach for using the fusion model and was applied over the study region. The model algorithm, as well as the modifications (described in the next section), are summarized in Figures 3.

Modifications to the Fusion Model
The fusion model has been successfully applied in prior published studies [e.g., 15,35,43]. For example, the fusion model ET was compared with eddy covariance measurements at a field site in Florida and RMSE (Root Mean Square Error) of 0.44 mm day-1, R 2 of 0.80, and 2% relative error for the cumulative growing season were reported [43]. Similar good performance was reported using MODIS-Landsat fusion model over corn, and soybean field sites in Iowa, with an average error of 0.58 mm day −1 [35]. While, for a vineyard, the error has been about 1 mm day -1 [15]. Indeed, this good performance reported in the literature was the motivation for pursuing the fusion modeling assessment for the study region.
However, initial evaluation of the model when applied over the study region resulted in significant bias. Reviewing the model spatial plots, it was apparent that the bias was likely a result of the landscape heterogeneity. As the spatial resolutions of the satellite sensors varied from 30-meter in Landsat to 1-kilometer in MODIS, and the fusion model needed input parameters from both resolutions, adequately dealing with the spatial scale issue was a challenge. The issue would be trivial over a homogeneous surface, but for a heterogeneous landscape, the difference in the satellite data products and measurements (e.g., NDVI data) can yield different values over the same surface with different resolution. Examples of the NDVI maps developed using Landsat and MODIS data over the study domain are presented in Figure 4.

Modifications to the Fusion Model
The fusion model has been successfully applied in prior published studies (e.g., [15,35,43]). For example, the fusion model ET was compared with eddy covariance measurements at a field site in Florida and RMSE (Root Mean Square Error) of 0.44 mm day-1, R 2 of 0.80, and 2% relative error for the cumulative growing season were reported [43]. Similar good performance was reported using MODIS-Landsat fusion model over corn, and soybean field sites in Iowa, with an average error of 0.58 mm day −1 [35]. While, for a vineyard, the error has been about 1 mm day −1 [15]. Indeed, this good performance reported in the literature was the motivation for pursuing the fusion modeling assessment for the study region.
However, initial evaluation of the model when applied over the study region resulted in significant bias. Reviewing the model spatial plots, it was apparent that the bias was likely a result of the landscape heterogeneity. As the spatial resolutions of the satellite sensors varied from 30-meter in Landsat to 1-kilometer in MODIS, and the fusion model needed input parameters from both resolutions, adequately dealing with the spatial scale issue was a challenge. The issue would be trivial over a homogeneous surface, but for a heterogeneous landscape, the difference in the satellite data products and measurements (e.g., NDVI data) can yield different values over the same surface with different resolution. Examples of the NDVI maps developed using Landsat and MODIS data over the study domain are presented in Figure 4.  The study region featured a heterogeneous landscape with small landholdings, and the subpixel variability considering MODIS and Landsat data introduced high uncertainties in calculations. Three different spatial resolutions were used in the fusion model's inputs: MODIS LST at 1-km, MODIS NDVI at 250-m, and Landsat data at 30-m resolution. Incidentally, the different grid spacing (resolution) between MODIS NDVI and LST was another source of bias in calculating MODIS VTCIs in Equations (6)- (8). A green pixel with high NDVI value and neighboring fallow land pixels with low NDVI can be combined in one corresponding LST pixel representing the temperature of a mixed landscape. However, this can result in a low correlation between MODIS NDVI and LST used to derive the coefficients in Equations (6)- (8).
To reduce the bias stemming from different resolutions under a heterogeneous landscape, we applied a selective filter approach. First, the MODIS data were resampled to 30-m resolution using the nearest neighbor method. The filter scans a MODIS-Landsat pair for a specific DOY and compares the NDVI values from Landsat scene with the neighboring pixels from the 30-m resampled MODIS image. It only selects the areas and pixels that have the NDVI values as obtained from MODIS and Landsat within a pre-specified variation. The variation is a user-specified input and was taken as 20% based on average value during trial and error procedure, and evaluating the values presented in the literature as a difference threshold between MODIS and Landsat vegetation indices. A similar value was reported by Sesnie et al. [46] in the Sonoran Desert whose vegetation cover and its sparsity resembled some parts of our study area. Tong and He [47] reported an R 2 range from 0.33 to 0.38 between MODIS and Landsat-5 NDVI in grassland at a national park in Canada. Ke et al. [48] compared Landsat 8 with multiple satellite sensors and reported that NDVI values have a mean bias error of between 5% and 10% for four study sites in China and Korea that covered forest, crop, urban, and water body cover. Busetto et al. [49] compared Landsat-5 and MODIS NDVI at an agricultural landscape of Northern Italy and found the RMSE between the two ranged from 0.065 to 0.094 (6.2% to 21.4% of RE) for different dates. That is, we considered 20% to be the implicit bias when comparing the Landsat and MODIS fields further justifying the pre-specified variation threshold. The filter scans spatio-temporally through the growing season and applies the similarity requirement for all available MODIS-Landsat pairs to capture the effect of changes in crop phenology. These sub-areas and pixels were then used as the input to the fusion model to derive the required coefficients. A The study region featured a heterogeneous landscape with small landholdings, and the sub-pixel variability considering MODIS and Landsat data introduced high uncertainties in calculations. Three different spatial resolutions were used in the fusion model's inputs: MODIS LST at 1-km, MODIS NDVI at 250-m, and Landsat data at 30-m resolution. Incidentally, the different grid spacing (resolution) between MODIS NDVI and LST was another source of bias in calculating MODIS VTCIs in Equations (6)- (8). A green pixel with high NDVI value and neighboring fallow land pixels with low NDVI can be combined in one corresponding LST pixel representing the temperature of a mixed landscape. However, this can result in a low correlation between MODIS NDVI and LST used to derive the coefficients in Equations (6)- (8).
To reduce the bias stemming from different resolutions under a heterogeneous landscape, we applied a selective filter approach. First, the MODIS data were resampled to 30-m resolution using the nearest neighbor method. The filter scans a MODIS-Landsat pair for a specific DOY and compares the NDVI values from Landsat scene with the neighboring pixels from the 30-m resampled MODIS image. It only selects the areas and pixels that have the NDVI values as obtained from MODIS and Landsat within a pre-specified variation. The variation is a user-specified input and was taken as 20% based on average value during trial and error procedure, and evaluating the values presented in the literature as a difference threshold between MODIS and Landsat vegetation indices. A similar value was reported by Sesnie et al. [46] in the Sonoran Desert whose vegetation cover and its sparsity resembled some parts of our study area. Tong and He [47] reported an R 2 range from 0.33 to 0.38 between MODIS and Landsat-5 NDVI in grassland at a national park in Canada. Ke et al. [48] compared Landsat 8 with multiple satellite sensors and reported that NDVI values have a mean bias error of between 5% and 10% for four study sites in China and Korea that covered forest, crop, urban, and water body cover. Busetto et al. [49] compared Landsat-5 and MODIS NDVI at an agricultural landscape of Northern Italy and found the RMSE between the two ranged from 0.065 to 0.094 (6.2% to 21.4% of RE) for different dates. That is, we considered 20% to be the implicit bias when comparing the Landsat and MODIS fields further justifying the pre-specified variation threshold. The filter scans spatio-temporally through the growing season and applies the similarity requirement for all available MODIS-Landsat pairs to capture the effect of changes in crop phenology. These sub-areas and pixels were then used as the input to the fusion model to derive the required coefficients. A schematic of this process is shown in Figure 5. Due to the finer NDVI resolution as compared to the LST in MODIS data, a relatively smaller bias is expected for the former over the heterogeneous landscapes. Recognizing this advantage, and considering the heterogeneity due to small landholdings in the study region, an NDVI-based coefficient was added to the MODIS VTCI calculations. As a result, Equation (6) was updated by considering the ratio of average NDVI of DOYb to average NDVI of DOYa for the calculation of VTCIb as: The modified fusion model then was applied over the study domain. Experiments were conducted by considering different strategies in updating MODIS-Landsat input pairs. These aspects are described in the following section.

Frequency Update Experiments
As a default approach, we used four Landsat imagery coupled with bi-weekly (16-day) MODIS data updates for generating the 30 m ET maps with the fusion model. At the base level, the fusion model requires a pair of MODIS images (that can be selected across any time window during the growing season) and one Landsat images. The pair of MODIS images is used to derive coefficients in Equations (6)- (9), and the coefficients were used with Equation (10). This, paired with the ETrF resulted from Landsat and METRIC model, was used to generate a fusion ET map at 30 m pixel size for the days in between the availability of MODIS pair. Accordingly, experiments were undertaken to select different dates to update MODIS LST, NDVI data from which the coefficients were calculated to be used with the available Landsat imagery. For example, one experiment could consider DOYa as the beginning of the growing season and DOYb at the harvest time to generate the coefficients and use it with the ETrF map at the DOYa to generate the fusion ET for the whole season between DOYa and DOYb. Alternatively, one can consider more frequent updates of MODIS and Landsat data for the fusion model. A total of 26 set of experiments were undertaken for updating satellite data for the fusion model. These included updates with Landsat only data (non-fusion approaches) or using one Due to the finer NDVI resolution as compared to the LST in MODIS data, a relatively smaller bias is expected for the former over the heterogeneous landscapes. Recognizing this advantage, and considering the heterogeneity due to small landholdings in the study region, an NDVI-based coefficient was added to the MODIS VTCI calculations. As a result, Equation (6) was updated by considering the ratio of average NDVI of DOY b to average NDVI of DOY a for the calculation of VTCI b as: The modified fusion model then was applied over the study domain. Experiments were conducted by considering different strategies in updating MODIS-Landsat input pairs. These aspects are described in the following section.

Frequency Update Experiments
As a default approach, we used four Landsat imagery coupled with bi-weekly (16-day) MODIS data updates for generating the 30 m ET maps with the fusion model. At the base level, the fusion model requires a pair of MODIS images (that can be selected across any time window during the growing season) and one Landsat images. The pair of MODIS images is used to derive coefficients in Equations (6)- (9), and the coefficients were used with Equation (10). This, paired with the ET rF resulted from Landsat and METRIC model, was used to generate a fusion ET map at 30 m pixel size for the days in between the availability of MODIS pair. Accordingly, experiments were undertaken to select different dates to update MODIS LST, NDVI data from which the coefficients were calculated to be used with the available Landsat imagery. For example, one experiment could consider DOY a as the beginning of the growing season and DOY b at the harvest time to generate the coefficients and use it with the ET rF map at the DOY a to generate the fusion ET for the whole season between DOY a and DOY b . Alternatively, one can consider more frequent updates of MODIS and Landsat data for the fusion model. A total of 26 set of experiments were undertaken for updating satellite data for the fusion model. These included updates with Landsat only data (non-fusion approaches) or using one to four Landsat images and considering MODIS updates for every 8-, 16-, 24-, 32-, 40-, and 48-day interval. We also considered one hybrid frequency update by using weekly (8-day) updates at an early stage and harvest time and bi-weekly (16-day) during maturity. These experiments aimed to identify a practical pairing that yielded high accuracy and relatively low data processing effort. therefore, the 30 m thermal band was used in this study. A subset of each image was extracted to focus on the study site for the analysis (Figure 1). The digital elevation model (DEM) was a required input to the METRIC model and was processed from ASTER (Advanced Spaceborne Thermal Emission and Reflection radiometer) data. The fusion model also required LST and NDVI as input, and were obtained from MODIS. The MODIS level 3 LST data at 1 km resolution (MOD11A1) and NDVI with 250 m resolution (MOD09GA) were retrieved from publicly available sources (https://modis.gsfc.nasa.gov). In addition, daily weather data were available from meteorological stations at the Gareh Bygone and the Fasa weather observatories (http://farsmet.ir/Amars.aspx). Pan evaporation data from the weather station were used with pan coefficient (0.75 for the station) to generate reference ET for the study region, which was consistent with the ET ref resulting from ASCE Penman-Monteith standardized method [50] as reported by Jamshidi et al. [10] for the study region.
The in-situ ET measurements are reported by Pakparvar [51] for the 2009 and 2010 growing seasons. Availability of these measurements was one of the main reasons we chose the 2010 growing season as our test period in this observational data void region. The experiments were conducted over several farms to measure the water use by the cultivated crops (corn) throughout the growing seasons. For this measurement, the water balance approach was adopted by direct measurements of the amount of irrigation, precipitation, soil water content, and drainage water. Pakparvar et al. [52] used the measurements to calculate an average volume of water loss through evapotranspiration per hectare of agricultural fields. Knowing the total acreage of cultivated farmlands in the study domain, the cumulative seasonal crop water use was computed. We used the data provided by Pakparvar et al. [52] for comparing the result at the regional scale (i.e., cumulative water use) and for calculating the crop coefficients during the growing stage. The coefficients were compared against the available measurement-based coefficients for similar climatic region of the province [52][53][54][55] as well as against the remotely sensed results [10]. These coefficients were then used with the reference ET values to generate daily time series of actual ET for the growing season. A summary of the vegetation characteristics and the measurements is presented in Table 1.

Model Evaluation
To evaluate the fusion model, two approaches were adopted. First, the ET estimates were evaluated at the pixel level for the study area. Applying the fusion model resulted in the generation of 30 × 30 m 2 ET maps which were then integrated to be used with in-situ ET measurements. The ET value from each pixel corresponding to agricultural land cover was extracted from the fusion maps. These values were averaged spatially by using arithmetic mean of all the extracted values to generate an average value for actual ET corresponding to each specific day during the growing season. These values were used to generate a time series of daily actual ET for the growing season. To compare and visualize the resulting fusion ET time series, these values were plotted against ET times series that were generated based on the in-situ measurements in the study area [51,52] described in the previous section. Additionally, the resulting ET maps from the fusion model were compared with those resulting from using only the METRIC model with no integration of MODIS products. To compare daily ET values resulting from the two maps (i.e., the fusion model and the non-fusion maps), two categories including cultivated areas (agricultural farmlands) and sparse vegetation (shrubland and pasture) were defined. The pixel values corresponding to each category were extracted, averaged, and compared.
Second, building on the availability of in-situ measurements including crop water use at the study area for the growing season [51,52], we evaluated the ET estimates integrated across the watershed in the study area. For this, the ET values corresponding to agricultural farmlands from the resulting fusion maps were extracted and summed over the entire season. The cumulative value was then converted to the volume of crop water use by multiplying it by the acreage of cultivated area. The calculated value was then compared against the cumulative crop water used from the in-situ measurements.
To quantify the performance of the fusion model in terms of ET estimates, we computed the coefficient of determination (R 2 ). Additionally, the error components including RMSE (root mean square error), and RE (relative error) were also used to compare the results from the model.
where, O i and P i represent the observed and estimated ET of the i th pixel, respectively, and O i and P i represent average values of the corresponding variable, n is the number of data samples considered.

Initial Results with the Original and Modified Models
Initial work with the fusion model highlighted an issue in estimating ET values over the study domain with heterogeneous landscapes. Figures 6a and 7a show a sample correlation for LST-NDVI and the resulting VTCI using the default approach (i.e., before any modifications). Using such a framework in regions with mixed agriculture and small farmlands (<500 m 2 , and 150 m 2 on avergae), it was obvious that the heterogeneity is an important factor introducing bias in MODIS pixels over the GBP landscapes. As a result, LST-NDVI trapezoidal (or triangular) distribution did not form, and a low correlation was found between LST and NDVI to derive the coefficients for use in Equations (7) and (8).
Using the selective filter (Figure 6b, Figure 7b), reduced the bias caused by heterogeneous landscapes and yielded a better accuracy that made the fusion model applicable to such regions. Adding the NDVI coefficient (Equation (11)) increased the weight of the NDVI in calculating the VTCI, and further helped achieve improvements in the results (Figure 7c). Remote Sens. 2019, 11, x FOR PEER REVIEW 11 of 19   Table 2 shows the improvement in R 2 that was acquired for VTCI after using the selective filter during the growing season. Adding this simple step helped decreased the error and overcame a potential limitation in applying the algorithm. The improvements ( Table 2) are more notable for the early and during harvest period. That likely occurred due to the maximum mid-season greenness which reduced heterogeneity and caused a more regionally homogeneous landscape.

Pixel Level Evaluation Results
After applying the modifications, the daily ET maps were first generated based on the METRIC model with no integration of MODIS products, and then by using the fusion model with more frequent MODIS and Landsat data updates. As the default, we generated the ET maps by using four Landsat imagery and 8-day MODIS updates. The METRIC-estimated ET maps using only Landsat   Table 2 shows the improvement in R 2 that was acquired for VTCI after using the selective filter during the growing season. Adding this simple step helped decreased the error and overcame a potential limitation in applying the algorithm. The improvements ( Table 2) are more notable for the early and during harvest period. That likely occurred due to the maximum mid-season greenness which reduced heterogeneity and caused a more regionally homogeneous landscape.

Pixel Level Evaluation Results
After applying the modifications, the daily ET maps were first generated based on the METRIC model with no integration of MODIS products, and then by using the fusion model with more frequent MODIS and Landsat data updates. As the default, we generated the ET maps by using four Landsat imagery and 8-day MODIS updates. The METRIC-estimated ET maps using only Landsat  Table 2 shows the improvement in R 2 that was acquired for VTCI after using the selective filter during the growing season. Adding this simple step helped decreased the error and overcame a potential limitation in applying the algorithm. The improvements ( Table 2) are more notable for the early and during harvest period. That likely occurred due to the maximum mid-season greenness which reduced heterogeneity and caused a more regionally homogeneous landscape.

Pixel Level Evaluation Results
After applying the modifications, the daily ET maps were first generated based on the METRIC model with no integration of MODIS products, and then by using the fusion model with more frequent MODIS and Landsat data updates. As the default, we generated the ET maps by using four Landsat imagery and 8-day MODIS updates. The METRIC-estimated ET maps using only Landsat data at DOYs 189, 285, 301, and 325 were compared to its corresponding values from the fusion model-estimated ET. These comparisons are shown in Figure 8. Considering cultivated areas, results showed that the METRIC model had closer estimates to the observations compared to the results from the modified fusion model. We cannot directly comment on the resulted values in sparse vegetation (as our observations were carried out from the cultivated areas), but it was observed that the fusion model underestimated the values, and the underestimation was more significant in sparse vegetation areas (Table 3) Comparing the computed ET time series at the pixel level (i.e., averaged values over all the cultivated farmland during the growing season) resulted in a significant difference when the fusion model is used (Figure 9). Analysis of the METRIC model when it was used standalone for generating ET time series against in-situ measurements yielded a poor result with RMSE ranging from 2.88 to 3.44 mm day -1 , when using available Landsat imagery. Using four Landsat images was inadequate to capture the phenology change due to crop growth that occurred between the gap dates of the available imagery. The inadequacy of Landsat data alone as an input to the METRIC model was specifically observed between DOY 205 and 280 when the planted crop (mostly corn) shifted from early tilling stages to leaf and stem development stage that contributed to the underestimation of vegetative indices (i.e., leaf area index and NDVI) in the model calculations. Adding MODIS data with 8-day updates helped to capture the phenology changes that occurred during the dates with missing Landsat data. Using the fusion model decreased the RMSE to 0.37 mm day -1 when the default approach (i.e., four Landsat images with 8-day updates of MODIS data) was used. However, using such a scheme would not be optimal for the end users in the study region because of the computational requirements, and also because the plant phenology change might not be significant over 8-day interval. Therefore, we tested different update frequency, and the results from which are presented in the following section.  Considering the domain, the results showed good agreement between the two sets of the resulting products, both visually and the statistical sense (RMSE of 0.15 mm day −1 and average R 2 of 0.83). The high-resulting correlation indicated that the modifications were successfully applied to the fusion scheme and eliminated/reduced the potential limitation in applying the scheme over the study region.
Comparing the computed ET time series at the pixel level (i.e., averaged values over all the cultivated farmland during the growing season) resulted in a significant difference when the fusion model is used (Figure 9). Analysis of the METRIC model when it was used standalone for generating ET time series against in-situ measurements yielded a poor result with RMSE ranging from 2.88 to 3.44 mm day −1 , when using available Landsat imagery. Using four Landsat images was inadequate to capture the phenology change due to crop growth that occurred between the gap dates of the available imagery. The inadequacy of Landsat data alone as an input to the METRIC model was specifically observed between DOY 205 and 280 when the planted crop (mostly corn) shifted from early tilling stages to leaf and stem development stage that contributed to the underestimation of vegetative indices (i.e., leaf area index and NDVI) in the model calculations. Adding MODIS data with 8-day updates helped to capture the phenology changes that occurred during the dates with missing Landsat data. Using the fusion model decreased the RMSE to 0.37 mm day −1 when the default approach (i.e., four Landsat images with 8-day updates of MODIS data) was used. However, using such a scheme would not be optimal for the end users in the study region because of the computational requirements, and also because the plant phenology change might not be significant over 8-day interval. Therefore, we tested different update frequency, and the results from which are presented in the following section. Comparing the computed ET time series at the pixel level (i.e., averaged values over all the cultivated farmland during the growing season) resulted in a significant difference when the fusion model is used (Figure 9). Analysis of the METRIC model when it was used standalone for generating ET time series against in-situ measurements yielded a poor result with RMSE ranging from 2.88 to 3.44 mm day -1 , when using available Landsat imagery. Using four Landsat images was inadequate to capture the phenology change due to crop growth that occurred between the gap dates of the available imagery. The inadequacy of Landsat data alone as an input to the METRIC model was specifically observed between DOY 205 and 280 when the planted crop (mostly corn) shifted from early tilling stages to leaf and stem development stage that contributed to the underestimation of vegetative indices (i.e., leaf area index and NDVI) in the model calculations. Adding MODIS data with 8-day updates helped to capture the phenology changes that occurred during the dates with missing Landsat data. Using the fusion model decreased the RMSE to 0.37 mm day -1 when the default approach (i.e., four Landsat images with 8-day updates of MODIS data) was used. However, using such a scheme would not be optimal for the end users in the study region because of the computational requirements, and also because the plant phenology change might not be significant over 8-day interval. Therefore, we tested different update frequency, and the results from which are presented in the following section.

Effects of Data Update Frequency on the Fusion Model
The resulting fusion based ET time series with different data frequency update are compared in Figure 10 for ET time series at the pixel level, and in Table 4 for estimating the seasonal crop water use in the study watershed. Statistical analysis highlighted that lower update frequencies yielded poor accuracy with relative error values as high as 53% when using only one Landsat image and updating the MODIS data for every . Higher frequency updates yielded better

Effects of Data Update Frequency on the Fusion Model
The resulting fusion based ET time series with different data frequency update are compared in Figure 10 for ET time series at the pixel level, and in Table 4 for estimating the seasonal crop water use in the study watershed. Statistical analysis highlighted that lower update frequencies yielded poor accuracy with relative error values as high as 53% when using only one Landsat image and updating the MODIS data for every 48-day (Figure 10a). Higher frequency updates yielded better accuracy; however, no significant improvement (based on ANOVA at 95% confidence interval) was seen between the hybrid, the 8-day, and 16-day frequency updates with the implementation of three to four Landsat data (Figure 10r-t,x,y). This explicitly highlighted the consideration of crop condition and phenology in the calculations. Using the fusion model with three Landsat images and 16-day frequency updates of MODIS data helped reduce the bias, optimize the processing time, and maintained accuracy. Additionally, the hybrid experiment yielded accurate results as the 8-day experiment with R 2 of 0.84, while using fewer computations. In the early growth stage, when crop growth is rapid and the crop coefficient changes significantly, updates can be done on a weekly basis. During maturity, the crop condition does not change appreciably, and updates every two to three weeks (depending on the crop type) could be sufficient to maintain the model accuracy. In addition, consideration should be given to the harvest time when a change in land cover is expected. accuracy; however, no significant improvement (based on ANOVA at 95% confidence interval) was seen between the hybrid, the 8-day, and 16-day frequency updates with the implementation of three to four Landsat data (Figures 10r, 10s, 10t, 10x, 10y). This explicitly highlighted the consideration of crop condition and phenology in the calculations. Using the fusion model with three Landsat images and 16-day frequency updates of MODIS data helped reduce the bias, optimize the processing time, and maintained accuracy. Additionally, the hybrid experiment yielded accurate results as the 8-day experiment with R 2 of 0.84, while using fewer computations. In the early growth stage, when crop growth is rapid and the crop coefficient changes significantly, updates can be done on a weekly basis. During maturity, the crop condition does not change appreciably, and updates every two to three weeks (depending on the crop type) could be sufficient to maintain the model accuracy. In addition, consideration should be given to the harvest time when a change in land cover is expected. Figure 10. Time series of ET estimates using different data frequency updates. The letters (a-y) associated with each subfigure denote the experiments referenced in Table 4.  Table 4.

Comparison at the Watershed Level
The estimated cumulative crop water use from the fusion ET values for the season led to a similar conclusion as the previous section. The results are presented in Table 4. It should be noted that in the table, each experiment is denoted with a letter (The first letter in each cell), the second number in each cell denotes the cumulative seasonal water use estimated correspond to the experiment, the third and the forth numbers are relative error (%) and the RMSE value (mm/day) and correspond to each experiment. For the growing season, the total volume of net applied water was measured as 5.44 Mm 3 which accounts for the ET a [52]. Using a non-fusion approach with three Landsat images (experiment 'o') predicted the seasonal crop water use as 2.83 Mm 3 . The use of the fusion scheme reduced the relative error (RE) from 48% in experiment 'o' to 3.5% in the high-frequency fusion approach (experiment 'z'). The fusion approach with 8-day updates provided estimated crop water use as 5.25 Mm 3 , and yielded the highest accuracy with 3.5% of RE. According to the ANOVA, the resulting accuracy was not significantly changed at 95% confidence interval when the 16-day and hybrid experiments were used, while lower frequency updates required even lesser computation, they yielded less accurate estimates (RE ranged from 22% to 53%). Thus, considering the accuracy and reduced computational burden, from the user perspective a 16-day or hybrid approach is recommended.  1 Each experiment is denoted with a letter (The first letter in each cell). The second number in each cell is the cumulative seasonal water use estimated corresponding to the experiment (Mm 3 ). The third and the forth numbers are the relative error (%) and the RMSE value (mm/day) corresponding to each experiment. ** = significant at P ≤ 0.05, * = significant at P ≤ 0.01, ns = not significant.

Conclusions
The potential of using existing MODIS-Landsat fusion algorithms for generating high spatio-temporal ET data in sparse, highly heterogeneous vegetation regions, motivated this study. The default fusion algorithm had a relatively high bias and did not account for different special resolutions of inputs because of heterogeneous landscapes caused by small land withholdings and farmlands of the study area. The issue of small landholdings and heterogeneity is more generic especially in arid and semi-arid farming regions. To reduce the bias, simple modifications were done to apply the fusion model by adding a selective pixel-wise filter and NDVI-based coefficient. The modified approach successfully reduced the bias over the heterogonous landscape and was used to assess weekly ET over the study domain (GBP region of southern Iran). The fusion model did not require complex, time-consuming processing steps, and yielded good results. Thus, the modification provided an adaptable framework to generate ET maps for local-scale applications. Another advantage of the fusion model was that it did not require processing MODIS data in the METRIC model, which can be challenging especially over heterogeneous landscapes for locating the "pure" end member pixels.
The question of frequency of imagery update and corresponding computation time was considered for ET estimations. Different experiments using available data over the growing season were analyzed to assess the impact on balancing accuracy and the practicality of the computational efforts. It was concluded that adding more information beyond bi-weekly updates of MODIS data made the algorithm more complex and did not result in significantly higher accuracy. The knowledge of plant phenology (crop growing cycle) helped the selection of data update frequency. The optimal fusion approach used weekly updates during initial crop growth stage and bi-weekly updates for the rest of the season. This led to ET estimates which agreed reasonably well with available observation (RE=4.2% and RMSE of 0.41 mm day −1 ). Results highlight the potential and value of satellite fusion models in generating high-resolution ET estimates in data-sparse regions for water and agriculture management. However, attention should be paid to the geographical characteristic, vegetation covers and land cover compositions of the study area, and as a result; additional pre-processing steps may be required before applying the fusion scheme. This simple methodology should be tested with additional case studies over different regions to further demonstrate its utility.