A Comparison of SSEBop-Model-Based Evapotranspiration with Eight Evapotranspiration Products in the Yellow River Basin, China

Accurate evapotranspiration (ET) estimation is important in understanding the hydrological cycle and improving water resource management. The operational simplified surface energy balance (SSEBop) model can be set up quickly for the routine monitoring of ET. Several studies have suggested that the SSEBop model, which can simulate ET, has performed inconsistently across the United States. There are few detailed studies on the evaluation of ET simulated by SSEBop in other regions. To explore the potential and application scope of the SSEBop model, more evaluation of the ET simulated by SSEBop is clearly needed. We calculated the SSEBop-model-based ET (ETSSEBopYRB) with land surface temperature product of MOD11A2 and climate variables as inputs for the Yellow River Basin (YRB), China. We also compared the ETSSEBopYRB with eight coarse resolution ET products, including China ETMTE, produced using the upscaling energy flux method; China ETCR, which is generated using the non-linear complementary relationship model; three global products based on the Penman–Monteith logic (ETPMLv2, ETMODIS, and ETBESS), two global ET products based on the surface energy balance (ETSEBS, ETSSEBopGlo), and integrated ET products based on the Bayesian model averaging method (ETGLASS), using the annual ET data derived from the water balance method (WB-ET) for fourteen catchments. We found that ETSSEBopYRB and the other eight ET products were able to explain 23 to 52% of the variability in the water balance ET for fourteen small catchments in the YRB. ETSSEBopYRB had a better agreement with WB-ET than ETSEBS, ETMODIS, ETCR, and ETGLASS, with lower RMSE (88.3 mm yr−1 vs. 121.7 mm yr−1), higher R2 (0.49 vs. 0.43), and lower absolute RPE (−3.3% vs. –19.9%) values for the years 2003–2015. We also found that the uncertainties of the spatial patterns of the average annual ET values and the ET trends were still large for different ET products. Third, we found that the free global ET product derived from the SSEBop model (ETSSEBopGlo) highly underestimated the annual total ET trend for the YRB. The poor performance of the land surface temperature product of MOD11A2 in 2015 caused the large ETSSEBopYRB uncertainty at eight-day and monthly scales. Further evaluation of ET based on the SSEBop model for site measurements is needed.

representative concentration pathway scenarios during the period of 2021-2050 [41]. Xiong et al. (2019) adopted the three-temperature model to estimate the ET in a heterogeneous oasis in northwestern China, and the mean absolute percent error values of the ET ranged from 9% to 25% over different landscapes [37]. The validation of ET derived from SSEBop in the semiarid basin of the Yellow River Basin (YRB) of China has not been done. A series of ecological restoration projects (e.g., Grain to Green Program) have been implemented since 2000 in the YRB, which have altered the hydrological cycle [42]. The long-term accurate estimation of ET in the YRB will help the government evaluate the results of the ecological restoration projects, assess the water resource carrying capacity, and adjust the future ecological recovery strategy.
Therefore, the main objective of this study is to evaluate the SSEBop model in the YRB (hereafter ETSSEBopYRB). The ETSSEBopYRB will be evaluated using water balance ET across the YRB. In addition, we compare the SSEBop ET performance against eight widely used coarse resolution ET products (500m ~0.1°). Finally, the uncertainty of ETSSEBopYRB is discussed from the perspective of the model input, model parameters, and validation method. This research is expected to reduce regional ET uncertainty.

Study Area
The Yellow River is the main source of water in northwestern and northern China. The YRB, a semiarid basin, is located between 95°E -119°E and 32°N -42°N, covering an area of approximately 7.95 × 10 5 km 2 ( Figure 1). The annual precipitation ranges from 140 mm yr −1 in the north to 1100 mm yr −1 in the east of the YRB, with an average value of 495.6 mm yr −1 . Precipitation is concentrated in the summer months. The irrigation agriculture is developed, and there are four major irrigation areas along the Yellow River. In the midstream of the region, the ecological environment is fragile and land degradation is severe [43]. A series of ecological restoration projects have been implemented, such as nature reserves, the Three-North Shelterbelt, and the Grain to Green Program, of which the area covered by the Grain to Green Program is the largest among all of the ecological restoration programs [44].

MODIS Data
Sixteen-day interval normalized vegetation index (NDVI) images (MOD13A2 and MYD13A2 version 6 (V6) products) and 8-day interval land surface temperature (LST) images (MOD11A2 and MYD11A2 version6 (V6) products) with a spatial resolution of 1 km covering the whole YRB were downloaded from the Level-1 and Atmosphere Archive & Distribution System (LAADS) Distributed Active Archive Center (DAAC) website (https://ladsweb.modaps.eosdis.nasa.gov/). MOD11A2 and MOD13A2 were produced from the Aqua satellite images during the years 2003-2015, while MYD11A2 and MYD13A2 were produced from Terra satellite images during the years 2003-2015. To simplify processing, we defaulted the 16-day NDVI values in MOD13A2 and MYD13A2, which occur in the median day. Over the short time period, there are small changes in NDVI. "Day of year VI pixel" provides the date of the NDVI value, which appears within 16 days. Therefore, we can avoid large errors by setting the date when the 16-day interval NDVI value appears in the middle, and do not use "day of year VI pixel". "Quality reliability of VI pixel" provides the quality of the pixel. In time series, we first used the Savitzky-Golay filtering to process the 23-period NDVI to obtain a new NDVI time series. For the new NDVI time series, we replaced the pixel values of the corresponding positions in the new time series with the high-quality pixel values of the original time series. The time resolution doubling algorithm [45] was used to fuse MOD13A2 and MYD13A2 during the years 2003-2015 to produce 8-day interval NDVI products. The missing values in MOD11A2 and MYD11A2 were filled by the adaptive window method [46], which constructs the relationship between the valid LST and NDVI, temperature, and elevation ( LST ∼ NDVI + Temperature + elevation), and uses the relationship to calculate the no data values in LST.

Meteorological Data
Daily meteorological data, including the atmospheric pressure (pa), temperature ( • C) at a height of 2m (maximum temperature ( • C), minimum temperature ( • C)and mean temperature ( • C)), wind speed (m s −1 ) at a height of 10 m, sunshine hours (hour), and relative humidity (%) from 2003 to 2015, were collected from the website of the National Meteorological Administration of China (http://data.cma.cn). The daily meteorological data were subjected to a good quality control via the website of the National Meteorological Administration of China. The quality control method includes: (1) valid range check; (2) extreme inspection of stations; (3) internal consistency check between the timing value, daily average value, and daily extreme value; (4) time consistency check; and (5) spatial consistency check. The total meteorological station number of sites is 163, distributed inside and around the YRB (Figure 1). The daily weather data were aggregated into an 8-day interval weather dataset using an averaging method. Combining climatologic and topographic data, the point climate data were interpolated at a 1-km resolution covering the whole YRB, using the thin-plate smoothing spline method provided in the ANUSPLIN 3.1 program [47][48][49]. The interpolated climate raster data were used to calculate reference evapotranspiration by the Penman-Monteith equation [50].

ET Products
Given the scale of the basins in this paper, eight ET products with spatial resolutions higher than 0.1 • (listed in Table 1) were collected to compare with ET SSEBopYRB . The algorithms used to produce eight ET products cover the mainstream ET estimation methods. The information about the time period, spatial resolution, data source, and spatial coverage of each ET product is listed in Table 1. Since the common time period for the ET products selected in this paper was 2003-2015, we took 2003-2015 as the time period for ET evaluation. The ET MTE is a china ET product that is produced by the upscaling method of a model tree ensemble with four kinds of data inputs: (1) remote sensing data of NDVI 3 g; (2) the eddy covariance technique observing ET data at 36 sites covering forest, shrub-land, grassland, wetland, and cropland; (3) the climate data for surface air temperature, precipitation, incoming solar radiation, and relative humidity; and (4) the vegetation distribution map [34]. The validation results showed that this product can capture ET well for nine different vegetation types. Compared with the global data-driven ET product [12], this product is less uncertain in China due to the use of more China flux tower data.
The ET CR is the China ET product that is estimated using the calibration-free nonlinear complementary relationship (CR) model, with climate variables (air and dew-point temperatures at a height of 2 m, wind speed at a height of 10 m, downward shortwave and downward longwave solar radiation) and remote sensing data (monthly land surface temperature, longwave broadband emissivity, and land surface albedo from MODIS) inputs [51]. Validation results from the 13 eddy covariance measurements and water-balance-derived ET for 10 major river basins showed that this product was reliable in China [51].
The ET MODIS is the global ET product that is based on the Penman-Monteith equation [52]. It is composed of information for the vegetation transpiration, soil evaporation, and evaporation from the wet canopy surface. The data inputs of ET MODIS are land cover (MOD12Q1), FPAR/Leaf area index (MOD15A2H), albedo (MOD43C1), and global Modern-Era Retrospective Analysis for Research and Applications (MERRA) meteorological data provided by the NASA Global Modeling and Assimilation Office (GMAO). This product has been applied in many regions [2,53]. Some studies have reported the shortcomings of ET MODIS , such as underestimating the ET of farmland [31,54].
The ET BESS is the global ET product that is produced using the simplified process-based model of the breathing earth system simulator (BESS), which can couple atmosphere and canopy radiative transfers, photosynthesis, and ET. The BESS adopts a quadratic form of the Penman-Monteith (PM) equation to calculate the two-leaf canopy latent heat flux [6]. Seven MODIS atmosphere (collection 6 MOD04_L2 aerosol, MOD06_L2 cloud, and MOD07_L2 atmosphere profile) and land (collection 5 MCD12Q1, MOD11A1, MOD06_L2, MCD15A2, and MCD43B3) products, four other satellite datasets, four reanalysis datasets, and three ancillary datasets were the input data for ET BESS . At the site scale, ET BESS agreed with FLUXNET observations, with R 2 = 0.67, and captured the majority of the seasonal variability [55].
ET PMLV2 is the global ET product that is based on the Penman-Monteith -Leuning model version 2. The main focus of the PM-based method is the accurate estimation of surface conductance (Gs), which describes the canopy-soil conductance to the water flux. Leuning et al. (2008) [56] and Zhang et al. (2010) [57] developed the biophysical model for Gs to account for canopy physiological processes and soil evaporation (resulting in PMLv1). Gan et al. (2018) [58] coupled vegetation transpiration with gross primary productivity using a biophysical canopy conductance (Gc) model in the PML model (resulting in PMLv2). Zhang et al. applied the PMlv2 model [59] at the global scale with 0.25 • resolution and 3-h Global Land Data Assimilation System (GLDAS-2.1 meteorological data, monthly atmospheric (CO2) concentration, LAI (MCD15A3H), albedo (MCD43A3), and surface emissivity data (MOD11A2) inputs. The ET PMLV2 method is well-calibrated against 8-day EC ET at 95 widely distributed flux towers for ten plant functional types [59].
Two global ET products based on the surface energy balance method were released by   [17] and Chen et al. (2019) [60]. The former is based on the SSEBop model (hereafter ET SSEBopGlo ) with the inputs of model-assimilated weather datasets and MODIS thermal images, while the latter is based on the surface energy balance system (hereafter ET SEBS ) with a column canopy-air turbulent diffusion method [60].
Considering that a single process-based algorithm cannot characterize global ET well, the Bayesian model averaging method was used to merge five process-based algorithms (hereafter ET GLASS ), including the MODIS latent heat flux product algorithm, the revised remote-sensing-based PM LE algorithm [61], the Priestley-Taylor-based LE algorithm [62], the modified satellite-based Priestley-Taylor LE algorithm [63], and the semiempirical Penman LE algorithm [64], to reduce the ET estimate uncertainty [65]. The production process of ET GLASS is based on the Global Land-Surface Satellite (GLASS) data.

Regional Database for ET Model Evaluation
Annual ET evaluation data from 2003 to 2015 were derived from the water balance approach (annual ET is the difference in annual precipitation and annual streamflow, ignoring the soil water storage change). Because the streamflow of the fourteen catchments ( Figure 1) is dominated by surface runoff, annual changes of soil water and groundwater are not likely to be large [39]. Therefore, ignoring the soil water storage change does not cause large water balance ET uncertainly in these catchments. Water balance ET (hereafter WB-ET) is an ordinary practice and a reliable method used to estimate basin-scale ET when streamflow and precipitation are accessible [10]. The annual streamflow of each hydrological site came from the Chinese Annual Hydrographic Yearbook of the Yellow River Basin (http://www.mwr.gov.cn/). The information on the 14 hydrological sites (total 182 samples) used for ET evaluation is listed in Table 2. The spatial distribution of the sites and the watershed extent are shown in Figure 1. The watershed boundaries are from previous studies [39,66]. The annual precipitation dataset was produced using ANUSPLIN3.1 to interpolate 507 precipitation points within and around the YRB ( Figure A4). In addition, we also collected three other precipitation datasets, including Tropical Rainfall Measuring Mission satellite precipitation data (TRMM3b42), which were derived from remote sensing satellite [67], reanalysis data of Climatic Research Unit (CRU) [68], as well as the precipitation product of the China meteorological forcing dataset (CMFD) [69], in order to analyze the uncertainty of WB-ET.  Figure 2 shows the framework for evaluating ET SSEBopYRB , which includes the input and output of the SSEBop model, evaluation of ET SSEBopYRB with water balance ET, comparison between ET SSEBopYRB and available eight ET products, and the uncertainty analysis of ET SSEBopYRB .  Figure 2 shows the framework for evaluating ETSSEBopYRB, which includes the input and output of the SSEBop model, evaluation of ETSSEBopYRB with water balance ET, comparison between ETSSEBopYRB and available eight ET products, and the uncertainty analysis of ETSSEBopYRB.

SSEBop Model
In previous surface energy balance methods (e.g., SEBAL), the ET is the residual of the energy balance terms (λET = − − , where H is the sensible heat (MJ m −2 day −1 ), G is the soil heat (MJ m −2 day −1 ), Rn is the net radiation (MJ m −2 day −1 ), and λ is the latent heat of evaporation of water). Among the four variables, the calculation process of H is complicated and especially depends on the setting of cold pixels and hot pixels. The SSEBop model assumes that the surface energy balance process is mostly driven by the available net radiation (Rn), and a decline in ET due to water stress and other factors can be quantified by differences in land surface temperature [8]. SSEBop directly estimates pixel-based ET by multiplying the ET fraction ( ) and reference ET ( ).
where is a coefficient that scales the grass reference ET into the level of a maximum ET experienced by an aerodynamically rougher crop; is the grass reference ET (mm day −1 ), which is calculated using the revised PM equation in YRB [50,70].

SSEBop Model
In previous surface energy balance methods (e.g., SEBAL), the ET is the residual of the energy balance terms (λET = R n − H − G, where H is the sensible heat (MJ m −2 day −1 ), G is the soil heat (MJ m −2 day −1 ), Rn is the net radiation (MJ m −2 day −1 ), and λ is the latent heat of evaporation of water). Among the four variables, the calculation process of H is complicated and especially depends on the setting of cold pixels and hot pixels. The SSEBop model assumes that the surface energy balance process is mostly driven by the available net radiation (Rn), and a decline in ET due to water stress and other factors can be quantified by differences in land surface temperature [8]. SSEBop directly estimates pixel-based ET by multiplying the ET fraction (ET f ) and reference ET (ET o ).
where k max is a coefficient that scales the grass reference ET into the level of a maximum ET experienced by an aerodynamically rougher crop; ET o is the grass reference ET (mm day −1 ), which is calculated using the revised PM equation in YRB [50,70].
where ∆ is the slope of the vapor pressure curve (kPa • C −1 ), γ is the psychrometric constant (kPa • C −1 ), Tem is the mean daily air temperature ( • C) at a height of 2 m, u 2 is the wind speed at a height of 2 m (m s −1 ), and VPD is the water vapor pressure difference (kPa). ET f is calculated as follows: where T s is the eight-day average satellite-observed LST of the pixel (K), T h is the estimated T s at the idealized reference "hot-dry" limit of the same pixel for the same time period, and T c is the estimated T s at the idealized "cold-wet" limit of the same pixel. The minimum ET f is set to 0.09, the maximum ET f is 1.05, and the ET f exceeding 1.3 is considered to be a pixel with cloud pollution and is set to "no data". The "no data" is filled through temporal interpolation [17]: where T a is the eight-day average near-surface maximum air temperature (K), c is a correction factor that relates T a to T s_cold ; T s_cold is the eight-day average LST on a well-watered, fully transpiring vegetation surface; T s_cold is obtained on all pixels where NDVI is greater than or equal to 0.8. If no pixels in an image in a certain period meet the NDVI conditions, the c value of that image is the median value of the effective c value within one year: where dT is a predefined temperature difference (K) between T h and T c for each pixel. The minimum value of dT is recommended as 6 K, which can reduce the ET uncertainty in cold climates [17]: where r ah is the aerodynamic resistance to heat flow from a hypothetical bare and dry surface (110 s m −1 ), ρ a is the air density (kg m −3 ), C p is the specific heat of the air at constant pressure (~1.013 kJ kg −1 K −1 ), and R n,cs is the clear-sky net radiation (MJ m −2 day −1 ), which is estimated using the series of equations in [50]: where R ns and R nl are the clear sky net shortwave radiation (MJ m −2 day −1 ) and clear sky longwave radiation (MJ m −2 day −1 ); α is the surface albedo, which is set to 0.23 in this study; R s and R a are the clear sky incoming solar radiation (MJ m −2 day −1 ) and extraterrestrial radiation (MJ m −2 day −1 ); z is the elevation (m); σ is the Stefan-Boltzmann constant (4.9039 10 −9 MJ K −4 m −2 day -1 ). T max and T min are the air maximum and minimum temperatures (K), and ea is the actual vapor pressure (kPa). More details about R a and ea can be found in [50].
The difference between the SSEBop model and other energy balance models is that it predefines a differential temperature (dT) as two extreme limiting surface temperatures-a cold temperature (T c ) for little or no sensible heat flux and hot temperature (T h ) for little or no latent heat flux, eliminating the subjectivity that could be introduced during manual hot and cold reference selection. More detailed information about the calculation process for SSEBop can be found in [8,17]. This study mainly used MOD11A2 and climate datasets to drive the SSEBop model to estimate the ET of the YRB.

Trend Analysis
We conducted linear regression based on the least squares method to detect the trend for annual ET. The regression coefficient was used as the annual ET change rate: where i is the sequential year range (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015), ET i is the annual ET in the year y i , and ET and y are the mean values. A positive or negative value predicts an increased or decreased rate of annual water yield. If the regression coefficient passes through the significance test (F test, p < 0.05), it shows a significant ascending or descending trend [71,72].

Performance of Estimated ET
Commonly used hydrological model evaluation metrics such as the coefficient of determination (R 2 ), root mean square error (RMSE), and relative percent error (RPE) [13] are used in this paper. The formula of RPE is as follows: where X is the simulated ET and Y is the observed ET. The closer the RPE (%) is to 0, the smaller the difference between the average of the observed and simulated values.

Evaluation of SSEBopYRB-ET on a Basin Scale
ET SSEBopYRB 's estimated annual ET performance for 14 small watersheds was lower than ET SSEBopGlo from the RMSE (88.30 mm yr −1 vs. 82.59 mm yr −1 ) and R 2 (0.49 vs. 0.52) values ( Figure 3). ET SSEBopGlo overestimated 3.01% of the annual WB-ET, while ET SSEBopYRB underestimated 3.38% of the annual WB-ET. Compared with the ET SEBS , ET MODIS , ET CR , and ET GLASS , ET SSEBopYRB had better performance when estimating the annual WB-ET, with a higher R 2 , lower RMSE, and lower absolute RPE. ET MODIS, ET SEBS , and ET CR highly underestimated the annual WB-ET, with RPE values of -19.97%, −28.73%, and −26.54%, respectively, while ET GLASS highly overestimated annual WB-ET (44.59%). As a widely used product, ET MODIS could only explain 37% of the annual WB-ET variability and had a much higher RMSE (121.75 mm yr −1 vs. 88.30 mm yr −1 ) than ET SSEBopYRB. ET MTE and ET PMLV2 had lower RMSE values than ET SSEBopYRB , but had smaller R 2 and higher absolute RPE values than ET SSEBopYRB . ET BESS had a lower RMSE (79.01 mm yr −1 vs. 88.30 mm yr −1 ) than ET SSEBopYRB , but ET BESS underestimated 11.62% of the annual WB-ET. We also evaluated ETSSEBopYRB and the other eight ET products from the perspective of the multiyear average ET using the multiyear average WB-ET on 14 small watersheds (total 14 samples). The result showed that ETSSEBopYRB performed better than six (ETSEBS, ETSSEBopGlo, ETGLASS, ETCR, ETMODIS, and ETMTE) out of the eight ET products ( Table 3). The R 2 , RMSE, and RPE values for ETSSEBopYRB were 0.85, 60.09 mm yr-1, and −3.38%, respectively, indicating that there is good agreement between ETSSEBopYRB and WB-ET on a multiyear scale. The R 2 of ETSSEBopYRB was slightly lower than that of ETBESS (0.87 vs. 0.85), but the RPE for ETBESS was higher than for ETSSEBopYRB. Although the RMSE for ETPMLv2 was lower than ETSSEBopYRB, ETPMLv2 could only explain 69% of the WB-ET variability, which was lower than ETSSEBopYRB.  We also evaluated ET SSEBopYRB and the other eight ET products from the perspective of the multiyear average ET using the multiyear average WB-ET on 14 small watersheds (total 14 samples). The result showed that ET SSEBopYRB performed better than six (ET SEBS, ET SSEBopGlo, ET GLASS, ET CR, ET MODIS, and ET MTE ) out of the eight ET products ( Table 3). The R 2 , RMSE, and RPE values for ET SSEBopYRB were 0.85, 60.09 mm yr-1, and −3.38%, respectively, indicating that there is good agreement between ET SSEBopYRB and WB-ET on a multiyear scale. The R 2 of ET SSEBopYRB was slightly lower than that of ET BESS (0.87 vs. 0.85), but the RPE for ET BESS was higher than for ET SSEBopYRB . Although the RMSE for ET PMLv2 was lower than ET SSEBopYRB , ET PMLv2 could only explain 69% of the WB-ET variability, which was lower than ET SSEBopYRB.

Spatial Pattern Comparison between ETSSEBopYRB and Other ET Products
The spatial pattern of the average annual ET SSEBopYRB during 2003-2015 in YRB exhibited a notable high value in the east and low value in the west (Figure 4), which is similar to the findings for ET SSEBopGlo. The main difference between ET SSEBopYRB and the other seven ET products was concentrated on the upper YRB and irrigated farmland. In the upper YRB, the mean annual ET estimated by ET SSEBopYRB and ET SSEBopGlo ranged from 200 to 300 mm yr −1 , while the estimated mean annual ET of the other seven ET products was higher than that of ET SSEBopYRB . For example, ET MODIS ranged from 600 to 700 mm yr −1 , while the ET CR ranged from 400 to 600 mm yr −1 in the upper YRB ( Figure 4). In the irrigated farmland located in the north ( Figure A1), the mean annual ET values for ET SSEBopYRB, ET SSEBopGlo , and ET GLASS ranged from 500 to 700 mm yr −1 , while the mean annual ET values estimated by the other six ET products ranged from 200 to 500 mm yr −1 .

Spatial Pattern Comparison between ETSSEBopYRB and Other ET Products
The spatial pattern of the average annual ETSSEBopYRB during 2003-2015 in YRB exhibited a notable high value in the east and low value in the west (Figure 4), which is similar to the findings for ETSSEBopGlo. The main difference between ETSSEBopYRB and the other seven ET products was concentrated on the upper YRB and irrigated farmland. In the upper YRB, the mean annual ET estimated by ETSSEBopYRB and ETSSEBopGlo ranged from 200 to 300 mm yr −1 , while the estimated mean annual ET of the other seven ET products was higher than that of ETSSEBopYRB. For example, ETMODIS ranged from 600 to 700 mm yr −1 , while the ETCR ranged from 400 to 600 mm yr −1 in the upper YRB ( Figure 4). In the irrigated farmland located in the north ( Figure A1), the mean annual ET values for ETSSEBopYRB, ETSSEBopGlo, and ETGLASS ranged from 500 to 700 mm yr −1 , while the mean annual ET values estimated by the other six ET products ranged from 200 to 500 mm yr −1 . Averaged over all vegetated areas (all land cover types except the barren type reported in Figure A1), the YRB mean annual ETSSEBopYRB weighted by area during the years 2003-2015 was 369.26 ± 29.72 mm yr −1 , which included 84.04% ± 8.27% of the mean annual precipitation ( Figure 5). This result was slightly higher than those of ETSEBS and ETCR and slightly lower than those of ETSSEBopGlo (375.63 mm yr −1 ) and ETBESS (395.32 mm yr −1 ). The mean annual ET estimates for the YRB in the ETMTE, ETPMLV2, and ETGLASS were higher than that in the ETSSEBopYRB and larger than the mean annual precipitation. Considering that the main land type in the YRB is grassland (Figure A1), it is Averaged over all vegetated areas (all land cover types except the barren type reported in Figure A1), the YRB mean annual ET SSEBopYRB weighted by area during the years 2003-2015 was 369.26 ± 29.72 mm yr −1 , which included 84.04% ± 8.27% of the mean annual precipitation ( Figure 5). This result was slightly higher than those of ET SEBS and ET CR and slightly lower than those of ET SSEBopGlo (375.63 mm yr −1 ) and ET BESS (395.32 mm yr −1 ). The mean annual ET estimates for the YRB in the ET MTE , ET PMLV2 , and ET GLASS were higher than that in the ET SSEBopYRB and larger than the mean annual precipitation. Considering that the main land type in the YRB is grassland (Figure A1), it is impossible for the average annual average ET to exceed the average annual precipitation. On a multiyear scale (e.g., 13 years), ignoring changes in groundwater in the YRB, the average annual ET of YRB derived from the water balance method using the run-off data from the Water Resources Bulletin of the YRB (http://www.yrcc.gov.cn/) is 375.86 mm. Therefore, ET MTE , ET PMLV2 , and ET GLASS highly overestimated the mean annual ET of the YRB, while the annual average ET of YRB estimated by ET SSEBopYRB was reasonable.  Figure 6 shows that the annual ET trend in the YRB estimated by ETSSEBopYRB shows an increasing trend in most areas (75.24%), of which the increase in the central part is more obvious, while the increase in the western part is lower. In the western YRB, the annual ET values in ETSSEBopYRB and ETSEBS mainly decreased, which is different from ETSSEBopYRB. In the irrigated farmland located in the north of the YRB, the annual ET in ETSSEBopYRB decreased, while the annual ET for other ET products increased. In the central parts of the YRB where the NDVI increased significantly ( Figure A2), all ET products showed that the annual ET increased. In the south of the YRB, the annual ET for ETSSEBopYRB decreased, which was similar to ETSSEBopGlo, ETCR, and ETGLASS and different from ETMODIS, ETPMLV2, and ETBESS.

Trend Comparison between ETSSEBopYRB and other ET Products
Overall, the annual ET of the YRB simulated by ETSSEBopYRB during 2003-2015 showed a significantly increased trend with a slope of 4.34 mm yr −1 , which was lower than the annual ET trends from ETMODIS and ETPMLV2 and higher than the annual ET trends from the other six ET products ( Table 4). The annual rate for ET change for the whole YRB estimated by ETSSEBopGlo was only 0.92 mm yr −1 , which was much lower than that of the ETSSEBopYRB. The reason for the different trends between ETSSEBopGlo and ETSSEBopYRB is that the forcing climate data are different. The annual ET trend type for ETSSEBopYRB is mainly a non-significant increase with a proportion of 62.50%, while the annual ET trend type for ETSSEBopGlo is mainly a non-significant increase (44.02%) and non-significant decrease (40.83%).  Figure 6 shows that the annual ET trend in the YRB estimated by ET SSEBopYRB shows an increasing trend in most areas (75.24%), of which the increase in the central part is more obvious, while the increase in the western part is lower. In the western YRB, the annual ET values in ET SSEBopYRB and ET SEBS mainly decreased, which is different from ET SSEBopYRB. In the irrigated farmland located in the north of the YRB, the annual ET in ET SSEBopYRB decreased, while the annual ET for other ET products increased. In the central parts of the YRB where the NDVI increased significantly ( Figure A2), all ET products showed that the annual ET increased. In the south of the YRB, the annual ET for ET SSEBopYRB decreased, which was similar to ET SSEBopGlo , ET CR , and ET GLASS and different from ET MODIS , ET PMLV2 , and ET BESS . Remote Sens. 2020, 12,

Discussion
The annual total ET trend in the YRB showed a significant increase for ETSSEBopYRB but no significant change for ETSSEBopGlo. When researchers directly apply the free global ET product simulated by SSEBop, especially the ET trend analysis, validation should be done if possible. In cases of flux data unavailability, the water balance method could be used or ET products could be compared with global ET products in their research area. Some researchers directly applied this product to analyze the relationships between ET variability and vegetation dynamics without validating the reliability of the product in the Nile Basin, Africa [26,27]. To further improve the estimation of ET, the uncertainty of ETSSEBopYRB in this research area should be discussed. Overall, the annual ET of the YRB simulated by ET SSEBopYRB during 2003-2015 showed a significantly increased trend with a slope of 4.34 mm yr −1 , which was lower than the annual ET trends from ET MODIS and ET PMLV2 and higher than the annual ET trends from the other six ET products ( Table 4). The annual rate for ET change for the whole YRB estimated by ET SSEBopGlo was only 0.92 mm yr −1 , which was much lower than that of the ET SSEBopYRB . The reason for the different trends between ET SSEBopGlo and ET SSEBopYRB is that the forcing climate data are different. The annual ET trend type for ET SSEBopYRB is mainly a non-significant increase with a proportion of 62.50%, while the annual ET trend type for ET SSEBopGlo is mainly a non-significant increase (44.02%) and non-significant decrease (40.83%).

Discussion
The annual total ET trend in the YRB showed a significant increase for ET SSEBopYRB but no significant change for ET SSEBopGlo . When researchers directly apply the free global ET product simulated by SSEBop, especially the ET trend analysis, validation should be done if possible. In cases of flux data unavailability, the water balance method could be used or ET products could be compared with global ET products in their research area. Some researchers directly applied this product to analyze the relationships between ET variability and vegetation dynamics without validating the reliability of the product in the Nile Basin, Africa [26,27]. To further improve the estimation of ET, the uncertainty of ET SSEBopYRB in this research area should be discussed.

The Effect of Potential ET Estimation on ET Uncertainty
The reference ET is one of the most sensitive factors in the SSEBop model [32]. In this paper, we use the PM formula [50] to calculate the reference ET with the eight-day average pressure, sunshine hours, wind speed, relative humidity, minimum temperature, maximum temperature, and mean temperature as inputs. The optimal interpolation method for each climate variable is different [73], but each climate variable was interpolated using the same thin-plate spline function interpolation method, which may cause reference errors and affect the ET. Because the reference ET is strongly related to the pan evaporation [50], the small observed pan evaporation in the station within the YRB from 2003 to 2015 was used to evaluate the reference ET. The results showed that the correlation coefficient of the eight-day average reference ET and small pan evaporation ranged from 0.86 to 0.95 (Figure 7), which indicated a strong relationship between the reference ET and the small pan evaporation. In the earlier research, three different methods, including the standard global PM formula [50], revised PM formula in China [74], and the revised PM formula in the YRB [70], were used to calculate the reference ET in the YRB. The difference between the three methods focuses on the radiation term. The difference in the annual reference ET of the YRB simulated by the three methods is very small ( Figure A3). Therefore, the uncertainty of the reference ET in this research cannot cause the large ET uncertainty simulated by the SSEBop model in the YRB from 2003 to 2015.

The Effect of Potential ET Estimation on ET Uncertainty
The reference ET is one of the most sensitive factors in the SSEBop model [32]. In this paper, we use the PM formula [50] to calculate the reference ET with the eight-day average pressure, sunshine hours, wind speed, relative humidity, minimum temperature, maximum temperature, and mean temperature as inputs. The optimal interpolation method for each climate variable is different [73], but each climate variable was interpolated using the same thin-plate spline function interpolation method, which may cause reference errors and affect the ET. Because the reference ET is strongly related to the pan evaporation [50], the small observed pan evaporation in the station within the YRB from 2003 to 2015 was used to evaluate the reference ET. The results showed that the correlation coefficient of the eight-day average reference ET and small pan evaporation ranged from 0.86 to 0.95 (Figure 7), which indicated a strong relationship between the reference ET and the small pan evaporation. In the earlier research, three different methods, including the standard global PM formula [50], revised PM formula in China [74], and the revised PM formula in the YRB [70], were used to calculate the reference ET in the YRB. The difference between the three methods focuses on the radiation term. The difference in the annual reference ET of the YRB simulated by the three methods is very small ( Figure A3). Therefore, the uncertainty of the reference ET in this research cannot cause the large ET uncertainty simulated by the SSEBop model in the YRB from 2003 to 2015.

Influence of the Land Surface Temperature Product on ET Uncertainty
The errors of LST images (MOD11A2) are generally within ±1 K in most areas, especially in the areas that have complicated biophysical conditions, or arid or semiarid climatic conditions, which contain errors of up to ±5 K [75]. The YRB is a typical arid and semiarid river basin, which results in large uncertainty in the LST. An uncertainty assessment of the SSEBop model implemented in the

Influence of the Land Surface Temperature Product on ET Uncertainty
The errors of LST images (MOD11A2) are generally within ±1 K in most areas, especially in the areas that have complicated biophysical conditions, or arid or semiarid climatic conditions, which contain errors of up to ±5 K [75]. The YRB is a typical arid and semiarid river basin, which results in large uncertainty in the LST. An uncertainty assessment of the SSEBop model implemented in the United States showed that an error of LST within a range of 0.35% (i.e., ±1 K) can lead to an ET error in the range of 20% [32]. We used the observed LST data from 94 meteorological stations evenly distributed across the YRB ( Figure A6)  We assumed that the above phenomenon was due to the large missing values in MOD11A2 in 2015. However, after performing statistical analysis of the missing MOD11A2 ratio for the whole YRB and meteorological observation sites, we found that our assumption was wrong, because the missing ratio in 2015 was the same as the missing ratio during the years 2003-2014 ( Figure A7). We concluded that the MOD11A2 product could not characterize the YRB's eight-day average LST variability in 2015.
Remote Sens. 2020, 12, x FOR PEER REVIEW 16 of 31 It should be noted that due to the scale mismatch between in situ measurements and satellite-based observations, it is relatively difficult to obtain true values for pixels, especially for terrestrial regions that are typically heterogeneous and non-isothermal terrestrial regions [76]. Figure A7 shows that the average missing LST value ratio (average of 46 ratios of missing LST values per year) in each image varies from 2.22% to 5.04% during the years 2003-2015. However, the missing LST ratio for some of the dates is as high as 25% or more ( Figure A7). The filling method for the LST missing value causes different errors [46]. This paper adopted the adaptive  Figure A8 shows that the eight-day average YRB ET f derived from MYD11A2 (hereafter ETf MYD ) and the eight-day average YRB ETf from MOD11A2 (hereafter ETf MOD ) were inconsistent for days 137-153 and days 297-337 in 2015. The abnormal LST of MOD11A2 in 2015 occurs during days 137-153 and days 169-217. Although there were abnormal MOD11A2 LST values for days 169-217, the ETf MOD value was normal and the same as ETf MYD , which is related to the ET f temporal interpolation method. The ET f temporal interpolation method had no effect on days 169-217 in 2015. Correspondingly, the intrayear change of the average YRB ET derived from MYD11A2 (hereafter ET SSEBopYRB, MYD ) was very close to the ET SSEBopYRB , which was derived from MOD11A2 during the years 2003-2014, except for 2015 at the 8-day scale and monthly scale (Figures A9 and A10). This is closely related with the intrayear change of the eight-day ET f . The abnormal ET SSEBopYRB value in 2015 occurred on days 137-153 at the eight-day scale and in May at the monthly scale. The 8-day LST abnormal data for MOD11A2 had a weak influence on the YRB annual ET ( Figure A11). Therefore, abnormal eight-day LST data have a great influence on the intravariability of ET, but have a small influence on the interannual variability.
It should be noted that due to the scale mismatch between in situ measurements and satellite-based observations, it is relatively difficult to obtain true values for pixels, especially for terrestrial regions that are typically heterogeneous and non-isothermal terrestrial regions [76]. Figure A7 shows that the average missing LST value ratio (average of 46 ratios of missing LST values per year) in each image varies from 2.22% to 5.04% during the years 2003-2015. However, the missing LST ratio for some of the dates is as high as 25% or more ( Figure A7). The filling method for the LST missing value causes different errors [46]. This paper adopted the adaptive window method, which considers the relationship between high-quality pixel LST values and influencing factors (e.g., elevation, NDVI, and air temperature) to fill the missing values [77].

Influence of Precipitation Products on ET Uncertainty
This paper used the difference between annual precipitation and run-off (WB-ET) as the annual evaluation data for ET SSEBopYRB . The uncertainty of the precipitation and run-off was approximately 10% for the water-balanced ET method [78]. Compared to the YRB precipitation site interpolation involved in previous studies [39,79], the annual precipitation was interpolated using more rain gauge stations (N = 507 vs. N = 134) ( Figure A4). With more site distributions, the annual precipitation uncertainty caused by the different interpolation methods, the thin-plate spline function, ordinary kriging, and inverse distance weight was small ( Figure A5). Cross validation results indicated that interpolated precipitation using the three methods accounted for 84.1% ± 1.7% of the annual precipitation observed at the sites ( Figure A5). The unexplained 15% may be due to the complexity of the land surface in the YRB, which greatly increases the interpolation difficulty.
There are also some free global coarse resolution precipitation products, such as TRMM3b42 derived from remote sensing satellites [67] and reanalysis data of CRU [68], while the Chinese precipitation product from the China meteorological forcing dataset (CMFD) integrates ground observations, satellite data, and reanalysis data [80]. ET SSEBopYRB can explain 38%, 47%, 50%, and 49% of the WB-ET based on the CMFD, CRU, TRMM3b42, and the interpolation of the annual precipitation, respectively, during the years 2003-2015. In the \ RMSE and RPE indices, there were lower RMSE and lower RPE values for WB-ET based on interpolation than for WB-ET based on the other three precipitation products (Figure 9). Therefore, the WB-ET used for evaluation will have a great impact on the evaluation results because of the differences in annual precipitation data.
Remote Sens. 2020, 12, x; doi: FOR PEER REVIEW www.mdpi.com/journal/remotesensing of the WB-ET based on the CMFD, CRU, TRMM3b42, and the interpolation of the annual precipitation, respectively, during the years 2003-2015. In the \ RMSE and RPE indices, there were lower RMSE and lower RPE values for WB-ET based on interpolation than for WB-ET based on the other three precipitation products ( Figure 9). Therefore, the WB-ET used for evaluation will have a great impact on the evaluation results because of the differences in annual precipitation data.

Influence of Model Parameters on ET Uncertainty
The model parameters of coefficient involving the magnitude of the estimated ET are additional sensitive and critical factors in the SSEBop model [32]. The coefficient factor was recommended to be set at 1.2 in the original research [8]. Later, Chen et al. (2016) recommended that the coefficient range should be from 0.95 to 1.2, and that it was acceptable to take the coefficient as 1.1, since its maximum error of 15% leads to an ET uncertainty of approximately 15% [32]. Senay et al. (2018) highly emphasized the correction of this factor in the applied research region using the WB-ET or site observation ET [17]. In this research, through the correction using WB-ET in 2003-2014, the coefficient was set as 1.218 ( Figure 10). When the coefficient was 1.2168, the RMSE (88.98 mmyr −1 ) between the WB-ET and ETSSEBopYRB (the LST input is MOD11A1) during the years 2003-2014 was lower than the other f setting, while the RPE (−4.49%) for ETSSEBopYRB was small.

Influence of Model Parameters on ET Uncertainty
The model parameters of coefficient k max involving the magnitude of the estimated ET are additional sensitive and critical factors in the SSEBop model [32]. The coefficient k max factor was recommended to be set at 1.2 in the original research [8]. Later, Chen et al. (2016) recommended that the coefficient k max range should be from 0.95 to 1.2, and that it was acceptable to take the coefficient k max as 1.1, since its maximum error of 15% leads to an ET uncertainty of approximately 15% [32]. Senay et al. (2018) highly emphasized the correction of this factor in the applied research region using the WB-ET or site observation ET [17]. In this research, through the correction using WB-ET in 2003-2014, the coefficient k max was set as 1.218 ( Figure 10). When the coefficient k max was 1.2168, the RMSE (88.98 mmyr −1 ) between the WB-ET and ET SSEBopYRB (the LST input is MOD11A1) during the years 2003-2014 was lower than the other f k max setting, while the RPE (−4.49%) for ET SSEBopYRB was small. Remote Sens. 2020, 12,

Influence of the Basin Size on ET Uncertainty
The fourteen basins have different basin sizes, varying from 1138 km 2 to 43,106 km 2 , which may influence the evaluation results. According to whether a watershed's area is larger or smaller than 10,000 km 2 , the 14 small watersheds were divided into two groups. The evaluation of the large basins ( Figure A12) showed that the nine ET products could only explain 22%-63% of the ET variability. For all ET products except ETMTE, the simulated ET could explain higher ET variability than if all fourteen basins were included, especially for ETSEBS (0. ) were small. In terms of the RPE, the changes for ETSSEBopYRB and ETSSEBopGlo were low, but were high in ETMODIS (−19.97% vs. −5.83%). Therefore, the basin size did not have a great impact on the evaluation of ETSSEBopYRB and ETSSEBopGlo, but did have a noticeable influence on the evaluation of ETMODIS.

Difference between ETSSEBopYRB and Eight Other ET Products
The results section showed the differences between the ETSSEBopYRB and the eight other ET products in capturing the annual ET for fourteen small catchments ( Figure 3, Table 3), estimating the average annual ET in YRB (Figure 4), and simulating the ET trend for the YRB ( Figure 5, Table  4).
ETSSEBopYRB and ETSSEBopGlo are both driven by the SSEBop model, however with different input datasets. A scatter plot based on the pixel values of the mean annual ET values shows that the R 2 is as high as 0.84 and the root mean square difference (RMSD) is as low as 24.04 mm yr −1 ( Figure A13). The small differences in the spatial distribution of the mean annual ET for the YRB during the years 2003-2015 are related to the same forcing data as MOD11A2 being used, the same methods (PM equation) being used to calculate the reference ET, and the same method (temporal interpolation)

Influence of the Basin Size on ET Uncertainty
The fourteen basins have different basin sizes, varying from 1138 km 2 to 43,106 km 2 , which may influence the evaluation results. According to whether a watershed's area is larger or smaller than 10,000 km 2 , the 14 small watersheds were divided into two groups. The evaluation of the large basins ( Figure A12 Therefore, the basin size did not have a great impact on the evaluation of ET SSEBopYRB and ET SSEBopGlo , but did have a noticeable influence on the evaluation of ET MODIS .

Difference between ET SSEBopYRB and Eight Other ET Products
The results section showed the differences between the ET SSEBopYRB and the eight other ET products in capturing the annual ET for fourteen small catchments ( Figure 3, Table 3), estimating the average annual ET in YRB (Figure 4), and simulating the ET trend for the YRB ( Figure 5, Table 4).
ET SSEBopYRB and ET SSEBopGlo are both driven by the SSEBop model, however with different input datasets. A scatter plot based on the pixel values of the mean annual ET values shows that the R 2 is as high as 0.84 and the root mean square difference (RMSD) is as low as 24.04 mm yr −1 ( Figure A13). The small differences in the spatial distribution of the mean annual ET for the YRB during the years 2003-2015 are related to the same forcing data as MOD11A2 being used, the same methods (PM equation) being used to calculate the reference ET, and the same method (temporal interpolation) being used to interpolate the no data value for ET f . The differences in the ET trends for the YRB are mainly due to the differences in the reference ET trends. The meteorological data used to calculate the reference ET for ET SSEBopGlo are derived from GLDAS, while the meteorological data used for ET SSEBopYRB were interpolated using regional dense station data.
The differences between ET SSEBopYRB and the other seven ET products are not just driven by model physics, but also forcing data. The forcing datasets for each ET product are described in Section 2.2.3. Eight-day MODIS LST and 16-day NDVI values are the main inputs used to quantify the land surface conditions in ET SSEBopYRB . The ET products that had similar remote sensing input data to ET SSEBopYRB were ET SEBS , ET CR , ET BESS , ET MTE , and ET GLASS . The daily MODIS LST, eight-day MODIS LST. and monthly MODIS LST were the important inputs for ET SEBS , ETBESS, and ET CR , respectively. ET MTE and ET GLASS used GIMMS NDVI and MODIS NDVI to represent land surface conditions, respectively. The ET products that had different remote sensing input data for ET SSEBopYRB were ET MODIS and ET PMLv2 . ET MODIS and ET PMLv2 used the eight-day MODIS LAI and FPAR data to characterize the land surface conditions. There was a correlation between NDVI and LAI-FPAR.
In terms of the meteorological forcing data, many ET products (e.g., ET MODIS ) rely upon global reanalysis data, which have their limitations. For example, Modern Era Retrospective Analysis for Research and Applications-meteorological data provided by the NASA Global Modeling and Assimilation Office, which has a spatial resolution of 1/2 • × 2/3 • , uses the same meteorological data as in ET MODIS and ET GLASS . ET BESS adopts National Centers for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) reanalysis data, while ET PMLv2 adopts the 0.25 • resolution and 3-h GLDAS-2.1 data. ET CR and ET MTE both use the 0.1 • CMFD data as the meteorological forcing data. The trend differences among these various climate datasets in the YRB affect the estimated ET trend.

Conclusions
We conducted a comparison of ET values derived from the SSEBop model using the water balance ET values in a highly spatially heterogeneous river basin (the YRB) and reached the following conclusions for the years 2003-2015: (1) ET SSEBopYRB and the other eight ET products were able to explain 23 to 52% of the variability in WB-ET for fourteen small catchments in YRB. ET SSEBopYRB had better agreement with WB-ET than ET SEBS , ET MODIS , ET CR , and ET GLASS , with lower RMSE (88.3 mm yr −1 vs. 121.7 mm yr −1 ), higher R 2 (0.49 vs. 0.43), and lower absolute RPE (−3.3% vs. −19.9%) values; (2) The free global ET product derived from the SSEBop model highly underestimated the annual total ET trend for the YRB. More validation regarding this product is required in other regions using site measurements (e.g., eddy covariance flux tower measurements); (3) The abnormal data in the land surface temperature products of MOD11A2 for 2015 limited the performance of the SSEBop model at the eight-day and monthly scales. Future studies will explore the use of MYD11A2; (4) At the basin scale, the uncertainties of the ET trends and spatial patterns are still large for different ET products. We need to further reduce the ET uncertainty to better serve water resource management and ecological restoration project construction purposes.

Conflicts of Interest:
The authors declare that they have no conflicts of interest.