Regional Actual Evapotranspiration Estimation with Land and Meteorological Variables Derived from Multi-Source Satellite Data

: Evapotranspiration (ET) is one of the components in the water cycle and the surface energy balance systems. It is fundamental information for agriculture, water resource management, and climate change research. This study presents a scheme for regional actual evapotranspiration estimation using multi-source satellite data to compute key land and meteorological variables characterizing land surface, soil, vegetation, and the atmospheric boundary layer. The algorithms are validated using ground observations from the Heihe River Basin of northwest China. Monthly data estimates at a resolution of 1 km from the proposed algorithms compared well with ground observation data, with a root mean square error (RMSE) of 0.80 mm and a mean relative error (MRE) of − 7.11%. The overall deviation between the average yearly ET derived from the proposed algorithms and ground-based water balance measurements was 9.44% for a small watershed and 1% for the entire basin. This study demonstrates that both accuracy and spatial depiction of actual evapotranspiration estimation can be signiﬁcantly improved by using multi-source satellite data to measure the required land surface and meteorological variables. This reduces dependence on spatial interpolation of ground-derived meteorological variables which can be problematic, especially in data-sparse regions, and allows the production of region-wide ET datasets. estimated daily evapotranspiration. The Nash–Sutcli ﬀ e e ﬃ ciency coe ﬃ cient ( NSE ) values are also high (above 0.7). The mean absolute errors ( MAE ) vary from 0.229 to 0.582 mm / day and the root mean square errors ( RMSE ) from 0.278 to 0.768 mm / day. These small error values between observed and estimated daily evapotranspiration indicate that the proposed algorithms can accurately estimate daily evapotranspiration.


Introduction
Actual evapotranspiration (ET) is a major component of the water cycle and plays a key role in surface energy balance by linking the hydrological cycle, carbon budget, and energy transfer [1][2][3][4]. ET is the transfer of water from land surfaces to the atmosphere primarily through turbulence which depends on the aerodynamic characteristics of the surface as determined by the land surface cover type [5], as well as climatic conditions. Consequently, a number of remote sensing-based models for regional ET estimation incorporate both land [6][7][8][9] and meteorological variables [10][11][12][13]. Validation of existing ET datasets estimated from those models suggests varying performances over different landcovers, climate, and elevation zones [14][15][16][17], with the errors in large regional ET estimates often caused by land surface heterogeneity derived from terrain roughness, landcover, vegetation community, soil context, etc. [18].
Land variables such as landcover, soil moisture content, albedo, vegetation indices, and land surface temperature (LST) are usually derived from single sensor or satellite data sources such as Landsat Thematic Mapper (TM) [6][7][8] or the Moderate Resolution Imaging Spectroradiometer geo-stationary meteorological satellite [69][70][71], as well as atmospheric column total precipitable water and atmospheric profile products from polar satellites [22,72,73]. Long-sequence atmospheric profile retrieval products (such as MYD07 and AIRS) provide new solutions for obtaining atmospheric boundary layer parameters without reliance on ground measurements [37,74]. Use of these data should better represent the spatial characteristics of both the land and meteorological variables controlling surface energy balance and atmospheric turbulence, and thus better quantify fluxes between soil/vegetation and the atmospheric boundary layer.
This paper addresses the above-mentioned challenges and limitations by presenting algorithms to generate continuous ET estimates using multi-source satellite data for key land and meteorological variables. Section 2 describes the study area, the data used, and its processing. Section 3 explains the proposed algorithms. Results and validation are discussed in Section 4 using data for the Heihe River Basin in northern China where a water crisis has emerged over the past two decades. Discussion of the data analysis is provided in Section 5 and conclusions in Section 6.

Study Area
The Heihe River (from 97 • 24 to 102 • 10 E and 37 • 41 to 42 • 42 N) of northwest China forms a closed basin originating in the Qilian mountains and flowing through the Zhangye oasis in its middle reaches, the Gobi Desert, the Ejina oasis downstream, and finally terminating in Juyanhai Lake ( Figure 1). The basin is divided into upstream, middle, and downstream reaches by the Yingluo and Zhengyi gorges. The maximum altitude is approximately 5000 m in the upper reaches where the main types of landcover are alpine meadow and forest, the annual precipitation exceeds 350 mm, and the minimum air temperature approaches −40 • C in winter. The elevation ranges from 2000 to 3000 m in the middle reach where the main types of landcover are the Gobi Desert and oases and the annual precipitation is 100 to 250 mm. It also forms the center of the Hexi corridor of Gansu Province. The elevation is about 1000 m in the downstream area with mostly desert landcover with some oases irrigated from groundwater. The maximum annual precipitation does not exceed 50 mm, and air temperatures reach highs of 40 • C in summer. Over the past few decades, many scientific experiments and projects have been carried out focusing on ecological and hydrological aspects of the Heihe River Basin, including water conservation in irrigation, water rights allocation and other integrated water resources management issues [75][76][77][78]. Therefore, abundant ground-based measurement datasets are available including hydrological, meteorological, water and heat flux, and various eco-environmental surface parameters.

Field Observation Data
Ground-based measurements from six stations were collected from 2008 to 2014 by the WATER [77] and HiWATER [78] projects in the Heihe River Basin. Eddy covariance (EC) towers and automatic weather stations were standard equipment at each station. All of the data is freely available from the National Earth System Science Data Platform of the National Natural Science Foundation of China, including turbulent fluxes, air temperature, air humidity, air pressure, wind speed, net radiation, four radiation components, soil temperature profiles, soil moisture profiles, and soil heat flux plates. The six observation stations are described in Table 1 and detailed information for each station can be found in [77,78].
EdiRe software was used to process the raw EC data, including spike detection, lag correction of H 2 O/CO 2 relative to the vertical wind component, sonic virtual temperature correction, coordinating rotation using the planar fit method, correction for density fluctuation (WPL-correction), and frequency response correction [79][80][81]. Details of the data processing steps for the raw EC data and the automatic weather stations can be found in [79][80][81][82].

Field Observation Data
Ground-based measurements from six stations were collected from 2008 to 2014 by the WATER [77] and HiWATER [78] projects in the Heihe River Basin. Eddy covariance (EC) towers and automatic weather stations were standard equipment at each station. All of the data is freely available from the National Earth System Science Data Platform of the National Natural Science Foundation of China, including turbulent fluxes, air temperature, air humidity, air pressure, wind speed, net radiation, four radiation components, soil temperature profiles, soil moisture profiles, and soil heat flux plates. The six observation stations are described in Table 1 and detailed information for each station can be found in [77,78].
EdiRe software was used to process the raw EC data, including spike detection, lag correction of H2O/CO2 relative to the vertical wind component, sonic virtual temperature correction, coordinating rotation using the planar fit method, correction for density fluctuation (WPL-correction), and frequency response correction [79][80][81]. Details of the data processing steps for the raw EC data and the automatic weather stations can be found in [79][80][81][82].

Remote Sensing Data
The daily MODIS 1B clear-sky data, MYD07 and AIRS, were downloaded from the NASA Goddard Space Flight Center (GSFC) Distributed Active Archive Center (GDAAC), from 2000 to 2014 [4]. The built-in ground control point (GCP) calibration coefficients were used for geometric

Remote Sensing Data
The daily MODIS 1B clear-sky data, MYD07 and AIRS, were downloaded from the NASA Goddard Space Flight Center (GSFC) Distributed Active Archive Center (GDAAC), from 2000 to 2014 [4]. The built-in ground control point (GCP) calibration coefficients were used for geometric correction, and the reflectance or radiance calculated. Atmospheric correction was carried out based on 6S atmospheric radiative transfer models [29,57]. The threshold values method with multiple properties was then used to detected cloud pixels [83]. Daily water-vapor profile data with different layers can be extracted directly from MYD07 and AIRS products. A gap-filling method was used to calculate missing pixel values from the atmospheric profiles data [74].
The daily AMSR-E brightness temperature data at 25 km resolution, from 2000 to 2014, coupled with surface temperature and NDVI data at 1 km resolution from MODIS [4,[83][84][85], were used to estimate daily soil moisture contents.
Fengyun geostationary meteorological satellite (FY-2C/D/E) hourly cloud products, from 2005 to 2014, at 5 km resolution were downloaded from the National Satellite Meteorological Center (NSMC), Fengyun Satellite Data Center.

Meteorological and Radiation Data
Daily data for five national radiation stations (Gangcha, Minqin, Jiuquan, Dunhuang, and Ejin Banner) and twenty-one meteorological stations were collected from the China National Meteorological Remote Sens. 2020, 12, 332

of 23
Bureau from 2000 to 2014 [4]. Radiation data included sunshine hours, solar radiation, net radiation, and reflected radiation. Meteorological data included maximum air temperature, minimum air temperature, average air pressure, average air humidity, average wind speed, and sunshine hours. Two different spatial interpolation methods were used to interpolate these point data to raster data at 1 km resolution for correspondence with the satellite data. Inverse distance squared was used for air temperatures and air pressure based on digital elevation model (DEM) data, while thin plate splines were used for air humidity, wind speed, and sunshine hours [4,86].
Raster rainfall data from 2000 to 2014 were also collected from the China National Meteorological Bureau. This daily precipitation analysis at a resolution of 0.25 • x 0.25 • was produced by the National Meteorological Information Center (NMIC) based on daily precipitation observation from over 2419 gauges across China [87][88][89]. Error analysis showed that approximately 91% of the absolute error was less than 1 mm/d [87].

Algorithms
In order to maximize information obtained from the multi-source satellite data, an integrated set of algorithms was used characterize the land surface, soil, vegetation and atmospheric boundary layer, to quantify fluxes between soil/vegetation and the boundary layer, and to determine the individual components of the surface energy balance as well as some key land and meteorological variables. Aerodynamic roughness length and surface soil heat flux are land variables, vapor pressure deficit (VPD) and the height of the atmospheric boundary layer are meteorological variables, and daily net radiation combines land and meteorological variables. Net radiation, surface-soil heat flux, and sensible heat flux are individual components of the surface energy balance [4,6,9,10,20,39]. Algorithms for the individual components of the surface energy balance and key land and meteorological variables are integrated into the surface energy balance as: λLE is computed as the residue of the energy balance. ET can be converted from λLE based on the temperature dependent latent heat of vaporization and the density of water. Please refer to Appendix A for variable definition for all formulas in this paper.

Daily Net Radiation (R n,daily )
Daily net radiation was estimated from sunshine duration derived from the FY-2C/D/E cloud products from 2005 to 2014 using monthly Ångström coefficients (a s and b s ) calibrated from ground-based meteorological and radiation measurements from 2000 to 2014 derived through spatial interpolation [56][57][58].

Surface Soil Heat Flux (G 0 )
The instantaneous surface soil heat flux is estimated using the following equation, with the shortwave infrared band used to better reflect the influence of soil texture [29].
The daily soil heat flux is obtained from the daily integral of the instantaneous surface soil heat flux, with diurnal variation of the surface temperature simulated with land surface temperatures from multiple temporal satellite passages. A half-order derivative solution of the heat flow equation is then used to estimate the diurnal variation of surface soil heat flux, with the general value of β set at 0.5 [29,30].
Four factors are currently considered for the aerodynamic roughness length z 0m . The calibration of the wind gradient reflects the contribution of various land surface components.
The roughness component of the vegetation condition z v 0m was computed as: The factors b 1 , b 2 , and b 3 are empirically determined coefficients with values of 0.001, 0.5, and 2.5 and NDVI max is the maximum NDVI during the growing season for each year.
The roughness of the terrain is calculated as: The slope was calculated based on the 90 m digital elevation model data from Shuttle Radar Topography Mission (SRTM) aggregated to 1 km spatial resolution. Factors a and b are 4 and 15, respectively, determined empirically.
The aerodynamic roughness component deriving from non-vegetated roughness features such as cities and micro-terrain was incorporated to reflect the influence of the roughness length on momentum movement: where σ 0 represents the radar backscattering coefficient (dB) data acquired from Sentinel-1 data aggregated to 1 km spatial resolution using the nearest neighbor method. The bidirectional reflectance distribution function (BRDF) is linked to changes in the vegetation canopy structure over time [67] and is a linear fitting of the Ross-thick and LiSparseR kernels [68]. The geometric roughness caused by the change of the vegetation canopy structure was incorporated in the aerodynamic roughness length calculation by: where c 1 and c 2 are empirically determined coefficients which equal 10 and 0 here; k 0 is the weighting factor of the isotropic scattering component and can be calculated as the bidirectional reflectance with both the zenith view and solar zenith angles set to 0; and k 2 is the weighting factor from the Li-sparse-reciprocal geometric scattering kernel, obtained based on surface scattering and the shadow casting theory [68].
The variable F 3 in Equation (7) is the environmental constraint factor function using vapor pressure deficit (VPD). The VPD is estimated based on the atmospheric profile product data, NDVI and surface temperature [37]. The detailed equations are: where k 12 is a constant determined by statistical analysis, and k 10 , k 11 are regression coefficients. The variable F 4 in Equation (7) is the atmospheric boundary layer constraint factor function. A sharp decrease in the vertical profile of the water vapor mixing ratio based on atmospheric profiles and the minimum vertical gradient of the water vapor mixing ratio is used to estimate the boundary layer mixing height and retrieving temperature, humidity, and pressure at the height [74].

Land Surface Conditions
The following algorithms were adopted to address specific surface conditions based on recommendations from the World Meteorological Organization (WMO) and other sources: • for the sublimation of snow and ice estimation [91] ET ice = (0.18 • for water surface evaporation [92] ET water • for bare land [49], which was identified using an empirically determined NDVI threshold Remote Sens. 2020, 12, 332 8 of 23

Daily Surface Resistance (r s,daily )
The Penman-Monteith model is commonly used as the gap-filling algorithm for inversing surface resistances and the ET calculation [4,10,93]. Environmental and biophysical parameters have proven adequate for the calculation of daily ET in this algorithm, including minimum air temperature, m(T min ), vapor pressure deficit, m(VPD), smoothed leaf area index, (LAI), soil moisture content, and net radiation, [9,10,59,[94][95][96][97]. Of these, net radiation, soil moisture, and wind speed are more relevant with higher R 2 values [59]. The daily surface resistance (r s,daily ) is extended from neighboring clear days r s : SM daily can be obtained from AMSR-E data.

Results and Validation
The proposed algorithms were integrated into the ETWatch [4], and then daily evapotranspiration dataset from 2000 to 2014 at 1 km resolution across the Heihe River Basin was produced ( Figure 2 and Table 2).

Spatial and Temporal Variations of ET Across the Heihe River Basin
Heihe River Basin is a typical arid inland river basin with an annual precipitation of 101 to 166 mm, averaging 133.9 mm, over the period 2000 to 2014. The average annual ET is 132.9 mm/year over the same period, showing a 99% closure in the overall balance of precipitation and consumption in this closed basin. Annually, this balance varies from −20% to +29%, with a surplus of water in the wet years (2007 and 2010), and insufficient in dry years (2000 and 2004).
The annual ET for irrigated land is more than 422 mm/year, with 93.1 mm/year for the desert. The annual ET is approximately 300 to 600 mm/year over the Ejin Oasis area. The maximum ET is observed at East Juyanhai Lake with an annual ET of more than 1000 mm/year. There is also an inter-annual pattern variability for annual ET (Figure 2). The East Juyanhai Lake is visible every year, but the West Juyanhai Lake was only visible in 2003 and 2004, and again in 2008 and 2010, when rainfall was higher than in other years. Rainfall was highest in 2010 (Table 2), and therefore the West Juyanhai Lake was larger than in other years. There was almost no water available in downstream portions of the river in 2001, since rainfall was only 101.7 mm which was the lowest of the 15 years.
The spatial ET pattern reflects both vegetation cover and climate. In the eastern part of the upstream area, the annual ET at 1 km resolution ranges from 150 to 600 mm/year; the lowest ET is located at the western part of the upstream area which has arid conditions at less than 100 mm/year ( Figure 3). The areas in the eastern region with higher annual ET values have more vegetation cover and more humid conditions. Those central and eastern parts of the upstream area which have lower values of ET are mainly covered with seasonal snow, whereas the western part of the upper reaches (the black areas) are mainly covered with glaciers and permanent snow. The seasonal snow cover completely melts in summer and the ground becomes bare soil with a small ET, and therefore annual sublimation from glaciers and permanent snow in the western part of the upstream area is much higher than that of the seasonal snow in the eastern region. The ET values with large spatial variation found in the western part of the upper reaches are partly due to heterogeneity of the surface landcover. Table 3 shows the validation results for daily evapotranspiration over the 2008 to 2014 period estimated by the proposed algorithms at 1 km resolution and the ground measurements at six EC stations, each with different land surface types, including alpine meadow in the upstream areas; irrigated oases land, and Gobi and desert in the middle reaches; and sparse vegetation (tamarix) in the downstream areas (see Table 1). The correlation coefficients (R) between observed and estimated daily evapotranspiration are 0.84 or greater, indicating a high correlation between observed and estimated daily evapotranspiration. The Nash-Sutcliffe efficiency coefficient (NSE) values are also high (above 0.7). The mean absolute errors (MAE) vary from 0.229 to 0.582 mm/day and the root mean square errors (RMSE) from 0.278 to 0.768 mm/day. These small error values between observed and estimated daily evapotranspiration indicate that the proposed algorithms can accurately estimate daily evapotranspiration.

Validation with In Situ Ground Measurements
The spatial ET pattern reflects both vegetation cover and climate. In the eastern part of the upstream area, the annual ET at 1 km resolution ranges from 150 to 600 mm/year; the lowest ET is located at the western part of the upstream area which has arid conditions at less than 100 mm/year ( Figure 3). The areas in the eastern region with higher annual ET values have more vegetation cover and more humid conditions. Those central and eastern parts of the upstream area which have lower values of ET are mainly covered with seasonal snow, whereas the western part of the upper reaches (the black areas) are mainly covered with glaciers and permanent snow. The seasonal snow cover completely melts in summer and the ground becomes bare soil with a small ET, and therefore annual sublimation from glaciers and permanent snow in the western part of the upstream area is much higher than that of the seasonal snow in the eastern region. The ET values with large spatial variation found in the western part of the upper reaches are partly due to heterogeneity of the surface landcover.  Table 3 shows the validation results for daily evapotranspiration over the 2008 to 2014 period estimated by the proposed algorithms at 1 km resolution and the ground measurements at six EC stations, each with different land surface types, including alpine meadow in the upstream areas; irrigated oases land, and Gobi and desert in the middle reaches; and sparse vegetation (tamarix) in the downstream areas (see Table 1). The correlation coefficients (R) between observed and estimated daily evapotranspiration are 0.84 or greater, indicating a high correlation between observed and estimated daily evapotranspiration. The Nash-Sutcliffe efficiency coefficient (NSE) values are also high (above 0.7). The mean absolute errors (MAE) vary from 0.229 to 0.582 mm/day and the root mean square errors (RMSE) from 0.278 to 0.768 mm/day. These small error values  The Yingke site in the Zhangye oasis in the middle reaches of Heihe River Basin was selected to validate the daily ET estimates. Located in an irrigated area, six irrigation events occurred during the experiment. Figure 4 (left) and Figure 5 show scatter plots of estimated and measured daily ET in 2008. The coefficient of determination (R 2 ) between measured and estimated ET is 0.86, the RMSE is 0.69 mm, and the MAE is 0.49. Figure 5 shows that evapotranspiration decreased gradually before irrigation events as the soil dried up and, then, increased following irrigation, confirming that the proposed algorithms can capture the impact of irrigation on daily evapotranspiration ranging from less than 1 mm/day to as high as >5 mm/day.

Validation with In Situ Ground Measurements
Validation results for monthly ET against EC observations at the six EC sites are presented in Figure 6 and Table 4. The estimated monthly ET is slightly lower than the observed EC at four sites, but more different at the two desert sites, i.e., considerably lower at Huazhaizi and higher at Shenshawo. For Arou, in the highest reach of the basin, the coefficient of determination (R 2 ) is 0.96, the RMSE is 1.09 mm, the mean relative error (MRE) is −8.85%, and the mean absolute percentage error (MAPE) is 13%, indicating a strong relationship between estimated and measured monthly ET. For the Daman and Yingke sites in the midstream oasis area of the Heihe River Basin, and Sidaoqiao in the downstream oasis area, the coefficients of determination between measured and estimated monthly results are 0.95, 0.99, and 0.81, respectively. Desert is the major landcover type in the Shenshawo and Huazhaizi stations and the coefficients of determination are 0.85 and 0.78, respectively. Overall, the coefficients of determination are all relatively high suggesting that the proposed algorithms are able to accurately estimate evapotranspiration. Although there is some variation in this predictive ability, it does not appear to be consistently related to landcover type suggesting that the algorithms themselves are able to adequately account for landcover and related differences. The Yingke site in the Zhangye oasis in the middle reaches of Heihe River Basin was selected to validate the daily ET estimates. Located in an irrigated area, six irrigation events occurred during the experiment. Figure 4 (left) and Figure 5 show scatter plots of estimated and measured daily ET in 2008. The coefficient of determination (R 2 ) between measured and estimated ET is 0.86, the RMSE is 0.69 mm, and the MAE is 0.49. Figure 5 shows that evapotranspiration decreased gradually before irrigation events as the soil dried up and, then, increased following irrigation, confirming that the proposed algorithms can capture the impact of irrigation on daily evapotranspiration ranging from less than 1 mm/day to as high as >5 mm/day.  Validation results for monthly ET against EC observations at the six EC sites are presented in Figure 6 and Table 4. The estimated monthly ET is slightly lower than the observed EC at four sites, but more different at the two desert sites, i.e., considerably lower at Huazhaizi and higher at Shenshawo. For Arou, in the highest reach of the basin, the coefficient of determination (R 2 ) is 0.96, the RMSE is 1.09 mm, the mean relative error (MRE) is -8.85%, and the mean absolute percentage error (MAPE) is 13%, indicating a strong relationship between estimated and measured monthly ET. For the Daman and Yingke sites in the midstream oasis area of the Heihe River Basin, and Sidaoqiao in the downstream oasis area, the coefficients of determination between measured and estimated monthly results are 0.95, 0.99, and 0.81, respectively. Desert is the major landcover type in the Shenshawo and Huazhaizi stations and the coefficients of determination are 0.85 and 0.78, respectively. Overall, the coefficients of determination are all relatively high suggesting that the proposed algorithms are able to accurately estimate evapotranspiration. Although there is some variation in this predictive ability, it does not appear to be consistently related to landcover type suggesting that the algorithms themselves are able to adequately account for landcover and related differences.  Figure 7 compares the overall monthly ET estimates from the proposed algorithms and the observed values for all six stations combined, which have different climates, soil environments, and rainfall conditions. Again, the predictive ability is good between the proposed algorithms and the EC measurements. The root mean square error is 0.80 mm/month, the average relative error is −7.11%, the mean absolute percentage error is 16.41%, and the coefficient of determination (R 2 ) is 0.9425.   Figure 7 compares the overall monthly ET estimates from the proposed algorithms and the observed values for all six stations combined, which have different climates, soil environments, and rainfall conditions. Again, the predictive ability is good between the proposed algorithms and the EC measurements. The root mean square error is 0.80 mm/month, the average relative error is -7.11%, the mean absolute percentage error is 16.41%, and the coefficient of determination (R 2 ) is 0.9425.

Comparison with Water Balance Estimates
The Heihe River Basin is a closed river basin without outflows, which implies mean annual evapotranspiration can be assumed to equal mean annual precipitation. As noted previously (Section 4.1), there is only a 1% difference between average annual evapotranspiration and rainfall, indicating there is good balance over the 15-year study period and implying there is about 1.0 mm of rainfall available per year on average to recharge groundwater ( Table 2). The maximum surplus was 29% in 2007 when rainfall was relatively high (163.7 mm), and the greatest deficit was 20% in 2007 when rainfall was only 106.3 mm.
Palugou, a small closed watershed located in the upstream portion of the Heihe River Basin, was used to further investigate water balance. If there is no change in catchment water storage, the difference between rainfall and runoff can be taken as the actual evapotranspiration over the entire catchment. Therefore, water balance data between 2012 and 2013 were obtained for the Palugou watershed ( Table 5). The overall deviation between estimated ET and the water balance estimates of    Figure 7 compares the overall monthly ET estimates from the proposed algorithms and the observed values for all six stations combined, which have different climates, soil environments, and rainfall conditions. Again, the predictive ability is good between the proposed algorithms and the EC measurements. The root mean square error is 0.80 mm/month, the average relative error is -7.11%, the mean absolute percentage error is 16.41%, and the coefficient of determination (R 2 ) is 0.9425.

Comparison with Water Balance Estimates
The Heihe River Basin is a closed river basin without outflows, which implies mean annual evapotranspiration can be assumed to equal mean annual precipitation. As noted previously (Section 4.1), there is only a 1% difference between average annual evapotranspiration and rainfall, indicating there is good balance over the 15-year study period and implying there is about 1.0 mm of rainfall available per year on average to recharge groundwater ( Table 2). The maximum surplus was 29% in 2007 when rainfall was relatively high (163.7 mm), and the greatest deficit was 20% in 2007 when rainfall was only 106.3 mm.
Palugou, a small closed watershed located in the upstream portion of the Heihe River Basin, was used to further investigate water balance. If there is no change in catchment water storage, the difference between rainfall and runoff can be taken as the actual evapotranspiration over the entire catchment. Therefore, water balance data between 2012 and 2013 were obtained for the Palugou watershed ( Table 5). The overall deviation between estimated ET and the water balance estimates of

Comparison with Water Balance Estimates
The Heihe River Basin is a closed river basin without outflows, which implies mean annual evapotranspiration can be assumed to equal mean annual precipitation. As noted previously (Section 4.1), there is only a 1% difference between average annual evapotranspiration and rainfall, indicating there is good balance over the 15-year study period and implying there is about 1.0 mm of rainfall available per year on average to recharge groundwater ( Table 2). The maximum surplus was 29% in 2007 when rainfall was relatively high (163.7 mm), and the greatest deficit was 20% in 2007 when rainfall was only 106.3 mm.
Palugou, a small closed watershed located in the upstream portion of the Heihe River Basin, was used to further investigate water balance. If there is no change in catchment water storage, the difference between rainfall and runoff can be taken as the actual evapotranspiration over the entire catchment. Therefore, water balance data between 2012 and 2013 were obtained for the Palugou watershed ( Table 5). The overall deviation between estimated ET and the water balance estimates of ET from 2012 to 2013 was 9.44%, which is reasonably small, indicating that the proposed algorithms produced an accurate estimate of catchment-scale evapotranspiration.

Discussion
This paper presents a set of algorithms for estimating regional actual evapotranspiration using key land and meteorological variables derived from multiple satellite data sources. These data characterize the land surface, soil, vegetation and atmospheric boundary layer, and quantify fluxes between soil/vegetation and the boundary layer. This algorithm scheme takes advantage of multi-sourced satellite datasets to obtain more accurate information for key land and meteorological variables, particularly those that best represent on a region-wide basis the land surface and climate characteristics that are most relevant for evapotranspiration.

Validation of ET Estimates
This algorithm scheme was tested in a typical inland river basin, the Heihe River Basin of northwest China, where a high density of flux towers was available to provide validation data. The flux stations used in this study represent different climatic and landcover conditions, and good agreement was obtained between modeled and measured evapotranspiration. The results ( Figure 6) suggest that, collectively, the algorithms accurately represent the surface energy balance components and atmospheric characteristics that are directly relevant to evapotranspiration. Good agreement between 1 km resolution ET estimates and the flux data was obtained, with RMSE and MRE values of 0.80 mm and −7.11%, respectively.
Previous studies have validated the individual components of energy balance, as well as some key land and meteorological variables, using independent ground observations. The validation of net radiation had a R 2 of 0.85 and the averaged Nash-Sutcliffe equation was 0.84 in research by Wu et al. [57]. The R 2 of surface soil heat flux was 0.96 or better and the index of agreement was more than 0.98 under different underlying surface types in research by Zhu et al. [29]. Sensible heat flux, validated using data from a large aperture scintillometer (LAS), had RMSE and MAE values less than 12.08 w/m 2 and 9.90 w/m 2 , respectively, again under different underlying surface types, in research by Zhuang et al. [90].
For this algorithm scheme, only a few parameters need to be recalibrated if new data are used or different land surface types or climate systems are involved. The net radiation algorithm needs to be recalibrated when using cloud products from other geostationary satellites such as GOES or MSG for the western hemisphere. The boundary layer and VPD also need to be recalibrated for study areas with different climatic systems. This is also the case for soil heat flux if soil textures differ.
The estimated ET slightly underestimated measured ET (Figures 4, 5 and 7). Some of this can result from the differing formats of the estimated and observed ET values. Estimated ET is a pixel value averaged for a 1 km x 1 km area [4] which comprises a mix of many different landcovers such as bare soil, vegetation, water, roads, and buildings. Observed EC comprises a point value at a meteorological station which likely has a range of only ten to a few hundred meters. This mismatch in scales between the remote sensing pixel and flux station measurements is especially problematic for areas with heterogeneous land use [4,57,58].
The algorithms scheme was also validated in several other arid and semi-arid regions in China using the parameters derived from the Heihe River Basin. An ET dataset for the Ebinur Lake basin, derived from the proposed algorithms scheme, was used to assess ecosystem restoration under various scenarios [98]. Monthly ET, validated from July 2014 to August 2015 with the EC data, had an adjusted R 2 of 0.95 and a RMSE of 3.59 mm/month, indicating a strong correlation between the measured and estimated monthly ET. An ET dataset for the Turpan Depression, also derived from the proposed algorithms scheme, was used to assess a water-saving project [99]. Validation based on EC flux station observation data from 2012 to 2016 had an R 2 value of 0.85 and an RMSE of 20.8 mm/month. An ET dataset from the North China Plain was used to crosscheck the estimated ET against a triangle model [100]. It showed that the triangle ET and ET estimated from the proposed algorithms can provide accurate daily and monthly estimates of ET at a pixel by pixel scale. The same ET dataset was also used to validate simulated evapotranspiration from a coupled surface water-groundwater model using the MIKE SHE code for the alluvial North China Plain [101]. An R 2 value of 0.93 for the entire plain again demonstrated good performance from the proposed algorithms.
Further validation is provided by comparing the ET estimates derived here with the MOD16 and SSEBop satellite-derived ET products. Figure 8 relates overall ET from the algorithm scheme proposed in this paper, the MOD16 ET product [10,11], and the SSEBop ET product [102,103], all at 1 km resolution, to the measured ET at five stations variously located in the upstream, middle, and downstream reaches of the Heihe River Basin. The observations are monthly from 2008 to 2014. The proposed algorithms have the smallest RMSE and MRE values and the highest coefficient of determination as compared with those for the MOD16 and SSEBop data. derived from the proposed algorithms scheme, was used to assess ecosystem restoration under various scenarios [98]. Monthly ET, validated from July 2014 to August 2015 with the EC data, had an adjusted R 2 of 0.95 and a RMSE of 3.59 mm/month, indicating a strong correlation between the measured and estimated monthly ET. An ET dataset for the Turpan Depression, also derived from the proposed algorithms scheme, was used to assess a water-saving project [99]. Validation based on EC flux station observation data from 2012 to 2016 had an R 2 value of 0.85 and an RMSE of 20.8 mm/month. An ET dataset from the North China Plain was used to crosscheck the estimated ET against a triangle model [100]. It showed that the triangle ET and ET estimated from the proposed algorithms can provide accurate daily and monthly estimates of ET at a pixel by pixel scale. The same ET dataset was also used to validate simulated evapotranspiration from a coupled surface water-groundwater model using the MIKE SHE code for the alluvial North China Plain [101]. An R 2 value of 0.93 for the entire plain again demonstrated good performance from the proposed algorithms. Further validation is provided by comparing the ET estimates derived here with the MOD16 and SSEBop satellite-derived ET products. Figure 8 relates overall ET from the algorithm scheme proposed in this paper, the MOD16 ET product [10,11], and the SSEBop ET product [102,103], all at 1 km resolution, to the measured ET at five stations variously located in the upstream, middle, and downstream reaches of the Heihe River Basin. The observations are monthly from 2008 to 2014. The proposed algorithms have the smallest RMSE and MRE values and the highest coefficient of determination as compared with those for the MOD16 and SSEBop data.

Water Balance and Related Evaluations
The performance of the algorithm scheme at a regional scale can also be assessed by a comparison with mean annual evapotranspiration from the water balance. Since the water balance does not depend upon the energy balance and is independent of remotely sensed ET, this comparison is more persuasive on a regional scale. The results in Table 5 show a deviation of 9.44% between the water balance estimate and the prediction from the algorithm scheme. This is encouraging but the data covers a relatively short time span. Additional time is required to verify further the remotely sensed ET data.
Previously published research also helps to verify the algorithm scheme proposed here. Gao and Yang [104] and Qin and Yang [105] developed a geomorphology-based ecohydrological model (GBEHM) to simulate the hydrological processes associated with vegetation dynamics in the upper Heihe River Basin. They showed that annual average ET between the model simulation results and ET from the proposed algorithms are very similar with 311 mm for the model simulation and 307 mm for the proposed algorithms. Tian and Zheng [106] proposed an integrated surface water-groundwater model (GSFLOW) for the middle and lower reaches of the Heihe River Basin. Compared to the monthly basin-wide ET estimated by the GSFLOW model, the ET estimated from the algorithms proposed here matched fairly well, However the GSFLOW model provided higher ET estimates for farmlands than the proposed algorithm scheme.

Water Balance and Related Evaluations
The performance of the algorithm scheme at a regional scale can also be assessed by a comparison with mean annual evapotranspiration from the water balance. Since the water balance does not depend upon the energy balance and is independent of remotely sensed ET, this comparison is more persuasive on a regional scale. The results in Table 5 show a deviation of 9.44% between the water balance estimate and the prediction from the algorithm scheme. This is encouraging but the data covers a relatively short time span. Additional time is required to verify further the remotely sensed ET data.
Previously published research also helps to verify the algorithm scheme proposed here. Gao and Yang [104] and Qin and Yang [105] developed a geomorphology-based ecohydrological model (GBEHM) to simulate the hydrological processes associated with vegetation dynamics in the upper Heihe River Basin. They showed that annual average ET between the model simulation results and ET from the proposed algorithms are very similar with 311 mm for the model simulation and 307 mm for the proposed algorithms. Tian and Zheng [106] proposed an integrated surface water-groundwater model (GSFLOW) for the middle and lower reaches of the Heihe River Basin. Compared to the monthly basin-wide ET estimated by the GSFLOW model, the ET estimated from the algorithms proposed here matched fairly well, However the GSFLOW model provided higher ET estimates for farmlands than the proposed algorithm scheme.
The 1% water balance deviation reported in Table 2 for the Heihe River Basin overall from 2000 to 2014 is encouraging in that there appears to be no significant ground water overdraft. The most important water issue in the basin is the unevenly temporal and spatial water distribution and consumption between basin reaches [107][108][109]. The upper reach is the runoff producing area, the middle and lower reaches are the major water consumption areas. The middle reaches of the basin have experienced an increase in population and a large expansion of irrigated agriculture leading to a higher consumption of water than allowed by the water division agreement between the middle and lower reaches [108]. This has resulted in water availability problems in the lower reaches of the basin [81], and less inflow into its terminal lake. In fact, in some dry years Lake Juyanhai has disappeared (see Figure 2) [107][108][109].

Use of Multi-Source Satellite Data in the Algorithm Scheme
Multi-source satellite data are used here to estimate the key land and meteorological variables and energy fluxes. This reduces the dependence on spatial interpolation of meteorological variables that can be problematic, especially in data-sparse regions. As a consequence, evapotranspiration estimation accuracy can be greatly improved. Figure 4 indicates that the daily ET at Yinke is changed from overestimated to slightly underestimated by using satellite rather than interpolated ground data, and the coefficient of determination is increased by nearly eight percentage points. Similarly, in Figure 5 estimated daily ET derived solely from satellite data is closer to observed ET (measured by EC) than using atmospheric variables derived from interpolation of meteorological measurements between 1 January 2018 and 31 December 2018.
Although each satellite-derived variable was separately incorporated when developing the algorithm scheme, no systematic inclusion and exclusion process was undertaken to test the independent contribution of each to improve estimation accuracy. However, there are good reasons to believe that several variables are particularly helpful. For example, to estimate sunshine radiation for daily net radiation, the hourly cloud product from a low-resolution geostationary satellite was used to capture cloud changes over time, which more appropriately reflects the spatiotemporal change of sunshine duration. This method better depicts the spatial and temporal distribution of surface shortwave radiation, as well as the net radiation [57,58].
Atmospheric boundary variables (such as height, temperature, humidity, and pressure at boundary layer height) and near surface VPD are also important and sensitive variables in ET calculation. Up to a certain point, the higher the boundary layer, the larger the difference between surface temperature and boundary layer temperature, resulting in a smaller ET due to the inverse relationship between the ET and this difference. Therefore, an accurate estimation of atmospheric boundary layer variables is important to improve ET estimation performance. However, most existing models assume that the height corresponding to 850 hPa of atmospheric pressure is the height of the atmospheric boundary layer [4]. This value is usually obtained from weather balloon data from sounding stations. Generally, there is a limited number of sounding stations in a region and, consequently, it is hard to reflect the spatial heterogeneity of the atmospheric boundary layer. Furthermore, the weather balloons only observe the boundary layer at 24:00 and 12:00. A sine function should be used to adjust the air temperature in the boundary layer from 12:00 to 13:30 (the MODIS/Aqua overpass time). However, the variation of atmospheric layer temperature does not completely follow a sine function, and therefore this conversion method can generate additional errors for the atmospheric boundary layer variables. Our algorithm scheme uses daily MODIS moisture profile products (MOD07 and AIRS) which have more than a hundred layers of atmospheric temperature and water-vapor profile data. These are used to directly obtain the spatial distribution of the atmospheric boundary layer variables [74], which also solves the problem of time inconsistency. This provides a solution for regional atmospheric boundary layer variable estimation without reliance on ground measurements.
Aerodynamic roughness length is another variable that has a strong impact on the estimation of ET. Aerodynamic roughness length is strongly influenced by the geometric characteristics of the roughness elements (buildings, roads, vegetation canopy, hills, etc.) of urban and rural surfaces. This can lead to large uncertainties in ET estimates. Currently, there are major limitations to a purely empirical aerodynamic roughness length calculation based on either landcover data or just roughness height. For vegetated areas, the height of vegetation and differences in vegetation structure are not well reflected by NDVI. The BRDF data from the MODIS satellite provides a potentially superior solution by making full use of optical, infrared, topographic, and micro-topographic data for calculation of the aerodynamic roughness length.
In addition, to better reflect the influence of soil texture on surface soil heat flux variation, shortwave infrared reflectance data were obtained for the more than 200 days each year with an average cloud coverage less than 30% and were used to estimate the surface soil heat flux.
NDVI data were used to estimate aerodynamic roughness length, vapor pressure deficit, and daily soil moisture content. Since NDVI is sensitive to soil background reflectance at low fractional cover and saturates at moderate to high fractional cover, fractional vegetation cover can provide substantially more information for surface parameters and ET estimation [110]. VPD is an important parameter of sensible and latent heat estimation at the soil-vegetation-atmosphere interface. However, recent research shows that climatic water deficit (CWD) better reflects vegetation water stress and produces good results for predicting plant stress and mortality, especially in arid and semi-arid regions [111]. Both the fractional vegetation cover and CWD should be explored to further develop the ET algorithms in order to reduce the influence of parameter uncertainty on ET estimation.

Conclusions
In this study, we present a scheme for regional evapotranspiration estimation using land and meteorological variables primarily derived from multi-source satellite datasets, and validate these estimates using field measurements. The multi-source satellite data used in the proposed algorithm scheme better depict regional spatial patterns and temporal characteristics than more traditional ground-based observations since they provide more accurate estimates of the surface parameters which are relevant to evapotranspiration processes.
The proposed algorithms, integrated into the ETWatch, accurately estimate daily and monthly evapotranspiration from areas with different climatic and surface characteristics which is encouraging because these areas represent alpine meadow, agriculture, and Gobi Desert with very different evapotranspiration characteristics. The methodology was validated with data collected from 2008 to 2014 across a typical inland river basin, the Heihe River Basin in the northwest of China. The estimated ET from 2008 to 2014 in the Heihe River Basin at a spatial resolution of 1 km were evaluated against different eddy covariance (EC) flux observations at the local scale and compared with the annual precipitation at the regional scale. The overall deviations for the monthly ET data as compared with ground observation data were small with RMSE and MRE of 0.80 mm and −7.11%, respectively.
It can be concluded that the proposed algorithms are reliable for estimating regional evapotranspiration under complex surface conditions. Indeed, the evapotranspiration datasets developed in this study have already demonstrated their utility. They have been used by more than 50 research groups to carry out water consumption structure analysis and irrigation management in oases in the middle reaches of the Heihe River Basin, as well as water resources estimation and evaluation, eco-hydrological process research, and studies of sustainable water resource utilization. Future studies should focus on data covering more climatic systems and land surface conditions, as well as the comparison of remote sensing estimates of regional evapotranspiration or global evapotranspiration with simulation results from distributed hydrological models.