Evaluation of Evapotranspiration Models Using Different LAI and Meteorological Forcing Data from 1982 to 2017

We evaluated the performance of three global evapotranspiration (ET) models at local, regional, and global scales using the multiple sets of leaf area index (LAI) and meteorological data from 1982 to 2017 and investigated the uncertainty in ET simulations from the model structure and forcing data. The three ET models were the Simple Terrestrial Hydrosphere model (SiTH) developed by our team, the Priestley–Taylor Jet Propulsion Laboratory model (PT-JPL), and the MODerate Resolution Imaging Spectroradiometer (MODIS) ET algorithm (MOD16). Comparing the observed with simulated monthly ET by the three models over 43 Fluxnet sites, we found that SiTH overestimated ET for forests with mean slope from 1.25 to 1.67, but it performed better than the other two models over short vegetation. MOD16 and PT-JPL models simulated well for forests but poorly in dryland biomes (slope = 0.25~0.55; R2 = 0.02~0.46). At the catchment scale, all models performed well, except for some tropical and high latitudinal catchments, with NSE values lower than 0 and RMSE and MAE values far beyond their mean values. At the global scale, SiTH highly overestimated ET in tropics, while PT-JPL slightly underestimated ET between 30°N and 60°N and MOD16 underestimated ET between 15°S and 30°S. Generally, the PT-JPL provided the better performance than SiTH and MOD16 models. This study also revealed that the estimated ET by SiTH and especially PT-JPL model were influenced by the uncertainty in meteorological data, and the estimated ET was performed better using MERRA-2 datasets for PT-JPL and using ERA5 datasets for SiTH. While the estimated ET by MOD16 were relatively sensitive to LAI data. In addition, our results suggested that the GLOBMAP and GIMMS datasets were more suitable for long-term ET simulations than the GLASS dataset.

Despite these progresses in model developments, there are still some insufficiencies in systematic inter-comparisons and evaluations of the model performances. First, there are few systematic assessments of the impact of forcing data uncertainties on global ET model performances [22,30]. As we known, both the vegetation characteristics (leaf area index) and meteorological variables (i.e., radiation, temperature, precipitation, and humidity) may have significant influences on model behaviors. For example, leaf area index (LAI) can influence the amount of absorbed solar radiation and its distributions between plant canopy and soil surface, which ultimately have significant influences on plant transpiration and soil evaporation [31][32][33]. The meteorological conditions regulate the atmospheric evaporation demand and moisture supply, which may have a great influence on ET as the result of global change [2,34]. Over the past 30 years, it has been well documented that there is a constant increase in LAI (Earth greening) [35][36][37], and continuous increasing in land temperature [38]. However, the increasing magnitude and their distributions in different LAI and meteorological datasets were also large [22,36,39]. Thus, it's urgently needed to systematically evaluate the impacts of the LAI and meteorological forcing data uncertainties on the estimates of ET. Second, the majority of previous studies have focused on evaluating the performances of one specific model over different sites or inter-comparing the performances of different models at single (or a few) locations. For example, Zhang et al. [16] evaluated the performance of the PT-JPL over 44 Fluxnet sites. Ershadi et al. [25] systemtically compared the performances of four ET models over different sites and boimes. To select the best candidate ET model for global applications, it is needed to comprehensively evaluate and intercompare the performances of different models at different spatial scales (local, regional and global) across different biomes and climate conditions. Third, there are still great uncertainties in the long-term changes in ET over the past few decades. For instance, some studies reported that global land ET increased from 1982 to 1998, and then there was a sharp decline until 2008 [2,[40][41][42]. Some studies suggested that an upward trend was observed from 1982 to 2000 and an obvious recovery in ET may have started from 2007 [4,43]. Other studies showed that the mean global annual values of terrestrial ET had no significant trend between 2001 and 2018 [44]. To clearly describe the multi-decadal trend in global terrestrial ET, we need to simulate ET using different models with multiple forcing datasets, so that we can properly assess the influences of model structures and forcing data uncertainties on long-term trend of land ET.
In this study, we intercompare the performances of three process-based ET models (MOD16, PT-JPL, and SiTH) at different spatial scales, and obtain a long-term trend of ET based on different combinations of LAI and meteorological datasets. Specifically, the goal of this study is to (i) evaluate the performance of three process-based ET models from local to regional and global scales using different forcing datasets, (ii) analyze the uncertainties due to different model structure and forcing data, and (iii) explore the uncertainty of long-term temporal ET trend in response to climate change and LAI increasing.

SiTH
The SiTH model proposed by Zhu et al. [18] is a relatively new satellite-based ET model at daily temporal resolution. Based on the framework of the groundwater-soil-plant-atmosphere continuum (GSPAC), SiTH uses well-established hydrological models to simulate important hydrological variables (i.e., groundwater, soil moisture, and runoff). Then, the potential evapotranspiration calculated by using P-T equation was constrained down to actual ET through the plant physiological factor and soil moisture conditions. In SiTH, the total ET (mm·day −1 ) consists of canopy interception evaporation (Ei, mm·day −1 ), soil evaporation (Es, mm·day −1 ), and vegetation transpiration (Et, mm·day −1 ). Soil evaporation is constrained to occur in the first soil layer, while plant transpiration can use both soil water and groundwater. There can be described as: the soil moisture of ith layer, θ c,i (m 3 m −3 ) is soil moisture below which plants start to endure water stress for soil layer i, and θ wp,i (m 3 m −3 ) is soil moisture at permanent wilting point.

MOD16
The MOD16 model, which was proposed by Mu et al. [10,11], estimated ET based on the P-M equation to calculate potential evaporation [45]. It distributes the available energy into the components of surface soil and vegetation through fractional total vegetation cover. Then, the soil evaporation includes the evaporation from the saturated soil surface and the moist soil surface. Furthermore, canopy water loss includes evaporation from the wet canopy surface and transpiration from the dry surface. Finally, it limits potential ET to actual ET through vegetation physiological factors and meteorological factors. In MOD16, evapotranspiration (E, mm·day −1 ) is equal to the sum of wet canopy evaporation (E wet , mm·day −1 ) and bare soil evaporation (E s , mm·day −1 ), and vegetation transpiration (E T , mm·day −1 ) at daytime and nighttime periods. The driving equations in their model are: where ρ is the air density (kg·m −3 ), Cp is the specific heat capacity of air (MJ·kg −1 ·°C −1 ), ε is the ratio of the molecular weight of water to dry air (i.e., 0.622), Pa is the atmospheric pressure (kPa), VPD is the atmospheric vapor-pressure deficit (kPa), r vc (s·m −1 ) and r hrc (s·m −1 ) represent the surface resistance and aerodynamic resistance to evaporated water on the wet canopy surface, respectively, f c is the fractional vegetation cover (dimensionless), r tot (s·m −1 ) is the total aerodynamic resistance, r as (s·m −1 ) the aerodynamic resistance at the soil surface, f wet is the relative surface wetness, f sm is the soil water constrain derived from Fisher et al. [12], r a (s·m −1 ) is the aerodynamic resistance between the mean canopy height and the air above the canopy, and r s (s·m −1 ) is the canopy surface resistance.

PT-JPL
PT-JPL is a relatively simple (input data and parameters are reduced), accurate model for estimating actual ET at monthly scale [12,25,28,46]. First, PT-JPL estimates potential evapotranspiration based on the P-T equation.Then, plant physiological and ecological constraints (i.e., LAI, green canopy ratio, vegetation temperature, and vegetation moisture) are used to limit potential plant transpiration and atmospheric constraints (vapor pressure deficit and relative humidity) to limit potential soil evaporation to actual ET. PT-JPL divides the actual evapotranspiration (LE, mm·day −1 ) into three components: canopy transpiration (LE c , mm·day −1 ), soil evaporation (LE s , mm·day −1 ), and interception evaporation (LE i , mm·day −1 ).
where f g is the green canopy fraction (unitless), f t is the plant temperature constraint (unitless), and f m is the plant moisture constraint (unitless). The bio-physiological constraint functions are calculated as: where the RH is the relative humidity (%), f APAR is the fraction of photosynthesis active radiation (PAR) absorbed by green vegetation cover, f IPAR is the fraction of PAR intercepted by canopy, T max is the air maxium temperature ( • C), and β is the sensitivity for f sm to VPD (kPa).

Input Data
The input data of the above three models includes leaf area index (LAI), net radiation (Rn), air temperature (Ta), precipitation (P), air pressure (Pa), relative humidity (RH) and land cover (LC) (Supplementary Table S1). To investigate the influences of vegetation and meteorological variables on ET estimations, three sets of LAI data and two sets of meteorological data were used in this study.
Two sets of meteorological reanalysis data based on observational data retrieval and assimilation are used in this study. The first one is the Modern-Era Retrospective analysis for Research and Applications Version 2 (MERRA-2) from NASA's Global Modeling and Assimilation Office (https: //disc.sci.gsfc.nasa.gov/) [50]. It provides near-surface air pressure and temperature, specific humidity, precipitation, and net radiation at a spatial resolution of 0.5 • × 0.625 • on hourly temporal resolution from 1982 to 2017. The second is the latest ERA-5 produced by European Centre for Medium-Range Weather Forecast (ECMWF) (https://cds.climate.copernicus.eu/). This dataset includes near-surface air pressure and temperature, dew point temperature, precipitation, and net radiation spanning 1982 to 2017 at a spatial resolution of 31 km and an hour temporal resolution [51]. In addition, we use static land cover data from MCD12C1 in 2001 [52] because land cover changes are relatively small on a global scale with time [41]. All driving data was interpolated to a 0.25 • × 0.25 • spatial resolution based on the non-linear spatial interpolation method [53]. A total of 18 ensembles ET products are obtained from the three ET models with different combinations of inputs data (Supplementary Table S2).

Data Used for Evaluation
At the local scale, the FLUXNET 2015 (https://fluxnet.fluxdata.org) from global network of eddy-covariance towers was used for model evaluations [10][11][12]25,46]. Here, a total of 43 flux sites were selected to cover a wide range of biomes. These sites can be divided into seven different vegetation types on the basis of the IGBP classification, and included the cropland (CRO, seven sites), deciduous broad-leaved forest (DBF, four sites), evergreen broad-leaved forest (EBF, five sites), evergreen coniferous forest (ENF, eight sites), grass (GRA, 13 sites), mixed forest (MF, two sites), and open shrubland (OSH, four sites) ( Figure 1). These selected sites contained at least one complete annual variation cycle, and the mean energy balance closure of all sites was 0.76, falling in the acceptable range from 70% to 90% [54,55]. We did not gapfill fluxes data because our aim is to validate and evaluate the accuracy of model. The statistical parameters of the observed ET data at sites are presented in Supplementary Table  S3, which provided a summary of the annual mean, maximum, minimum, standard deviation (SD), skewness and kurtosis values of ET for 43 sites. The SD values varied from 6.22 to 48.90 mm/month. The skewness values ranged from −0.11 to 1.22, within the conventional acceptable limit of ± 2, indicating that the data were normally distributed. The kurtosis values at most sites were lower than 3. At the catchment scale, the water balance dataset for 32 major (i.e., >200,000 km 2 ) river catchments developed by Pan et al. [56] was used to evaluate the model performance at regional scale. This data sets merged a number of global datasets including in situ observations (CPC, CPU, GPCC, WM, and CRDC), remote sensing retrievals (GRACE), land surface model simulations (VIC), and global reanalyzes (ERA-interim) using data assimilation techniques. This dataset is considered to be the best available water budget dataset [57][58], and includes monthly precipitation, ET, streamflow, and the change in water storage from 1984 to 2006. The main information and statistical parameters of the observed ET data across 32 catchments were given in Supplementary Table S4. The SD values varied from 6.65 mm/month to 36.55 mm/month. The skewness values at most catchments were within the acceptable limit of ± 2, and the kurtosis values of most catchments were lower than 3.
At the global scale, the model tree ensembles (MTE) product [15] spans from 1982 to 2011 at monthly temporal resolution and 0.5° spatial resolution. The MTE model integrates observed ET at the FLUXNET sites with satellite remote sensing and surface meteorological data in a machinelearning algorithm. It is widely used for the comparison and verification of model performances [4,41,59]. The Global Land Evaporation Amsterdam Model (GLEAM) calculates ET from satellite At the catchment scale, the water balance dataset for 32 major (i.e., >200,000 km 2 ) river catchments developed by Pan et al. [56] was used to evaluate the model performance at regional scale. This data sets merged a number of global datasets including in situ observations (CPC, CPU, GPCC, WM, and CRDC), remote sensing retrievals (GRACE and SEBS), land surface model simulations (VIC), and global reanalyzes (ERA-interim) using data assimilation techniques. This dataset is considered to be the best available water budget dataset [57,58], and includes monthly precipitation, ET, streamflow, and the change in water storage from 1984 to 2006. The main information and statistical parameters of the observed ET data across 32 catchments were given in Supplementary Table S4. The SD values varied from 6.65 mm/month to 36.55 mm/month. The skewness values at most catchments were within the acceptable limit of ± 2, and the kurtosis values of most catchments were lower than 3.
At the global scale, the model tree ensembles (MTE) product [15] spans from 1982 to 2011 at monthly temporal resolution and 0.5 • spatial resolution. The MTE model integrates observed ET at the FLUXNET sites with satellite remote sensing and surface meteorological data in a machine-learning algorithm. It is widely used for the comparison and verification of model performances [4,41,59]. The Global Land Evaporation Amsterdam Model (GLEAM) calculates ET from satellite observations based on the P-T equation [17] and performed well in ET estimates [17,27].

Analysis Method
The statistical measures used to evaluate model performance include the slope of the regression line, the coefficient of determination (R 2 ), root-mean-square error (RMSE), mean absolute error (MAE), and the Nash-Sutcliffe efficiency coefficient (NSE) [60]. The R 2 describes the degree of co-linearity between observed and simulated values and ranges between 0 and 1. Generally, an R 2 > 0.5 is considered as acceptable performance [61]. The RMSE, MAE, and NSE were computed as: where O(t) is the observed ET, S(t) is the simulated ET, andŌ is the mean of observed values. The NSE is a normalized statistic that determines the relative magnitude of the residual variance compared to the measured data variance. NSE values range between −∞ to 1. When the NSE values is closer to 1, the simulation is better. The RMSE and MAE show the magnitude and variance in error between observed and simulated values, both ranging from 0 to +∞.

Model Evaluation at Local Scale
The model performances on monthly estimated ET driving by different inputs data were compared with the observations from 43 Fluxnet sites ( Figure 2). The SiTH model overestimated ET when using all the forcing datasets. The linear regression slope between observed and simulated ET ranged from 1.18 to 1.25, especially for the MERRA-2 (slope = 1.23~1.25). The R 2 between observed and estimated ET for SiTH were highest (R 2 > 0.73) among the three models, suggesting that the estimated ET by SiTH had a better consistency with the observed ET. When the same LAI data was used, the estimated ET by SiTH using ERA-5 data was better than that using MERRA-2 data (Figure 2). When the same meteorological data was used, there were little differences in simulated ET between different LAI datasets ( Figure 2). Thus, it seemed that the influences of the meteorological data on the performances of SiTH was larger than that of the LAI data ( Figure 2). For MOD16 model, the regression slopes between observed and estimated ET were closer to 1 (slope = 0.92~1.0). However, the R 2 between observed and simulated ET by MOD16 is generally lower than the other models (R 2 = 0.55~0.79), indicating the estimated ET by MOD16 had poor consistencies with observed ET, especially using GLOBMAP LAI data (R 2 = 0.55) (Figure 2a,b). Importantly, the uncertainty in LAI had higher influences on the performances of MOD16 model. For PT-JPL model, its performances using MERRA-2 (slope = 0.97~0.99, R 2 = 0.70~0.72) were much better than that using ERA-5 (slope = 0.71~0.72, R 2 = 0.61~0.66), indicating that meteorological data had greater influences on model performances than LAI data. Due to the differences in model structure and parameterization, the model performances in ET simulations varied over different land surfaces [22,25,[62][63][64][65]. Figure 3 shows the performances of the three models across the different biomes. For the forest biomes, the SiTH model generally overestimated ET, with the mean slope of 1.60, 1.25, 1.28, and 1.67 for ENF, EBF, DBF, and MF, respectively. On the contrary, the MOD16 and PT-JPL models performed relatively well at most forest biomes with the exception of MF, and the mean slope ranged from 0.97 to 1.04 and from 0.81 to 1.05 among these biomes, respectively. However, over the short vegetation types (i.e., GRA, CRO, and OSH), SiTH performed better than MOD16 and PT-JPL, with mean slope of 0.95, 1.01, and 0.95 for GRA, CRO, and OSH, respectively. It was also observed that the MOD16 and PT-JPL significantly underestimated ET in dryland biomes (OSH) (MOD16: slope = 0.25~0.2; PT-JPL: slope = 0.29~0.55) (Figure 3). The values of R 2 for the three models were higher than 0.5 over forest biomes, indicating good consistencies between observed and simulated ET. In addition, the SiTH model was satisfactory over short vegetations with R 2 ranging from 0. 64   Due to the differences in model structure and parameterization, the model performances in ET simulations varied over different land surfaces [22,25,[62][63][64][65]. Figure 3 shows the performances of the three models across the different biomes. For the forest biomes, the SiTH model generally overestimated ET, with the mean slope of 1.60, 1.25, 1.28, and 1.67 for ENF, EBF, DBF, and MF, respectively. On the contrary, the MOD16 and PT-JPL models performed relatively well at most forest biomes with the exception of MF, and the mean slope ranged from 0.97 to 1.04 and from 0.81 to 1.05 among these biomes, respectively. However, over the short vegetation types (i.e., GRA, CRO, and OSH), SiTH performed better than MOD16 and PT-JPL, with mean slope of 0.95, 1.01, and 0.95 for GRA, CRO, and OSH, respectively. It was also observed that the MOD16 and PT-JPL significantly underestimated ET in dryland biomes (OSH) (MOD16: slope = 0.25~0.2; PT-JPL: slope = 0.29~0.55) (Figure 3). The values of R 2 for the three models were higher than 0.5 over forest biomes, indicating good consistencies between observed and simulated ET. In addition, the SiTH model was satisfactory over short vegetations with R 2 ranging from 0.64 to 0.73. However, the MOD16 and PT-JPL models do not capture the ET dynamics in dryland biomes (MOD16: R 2 = 0.02~0.12; PT-JPL: R 2 = 0.12~0. 46). The SiTH model generated negative NSE over most forests, while the PT-JPL model had the NSE values closer to 1.
The SiTH model produced high NSE for short vegetations (mean NSE = 0.2~0.47), and the PT-JPL and MOD16 models had the NSE values lower than 0 in OSH. The values of RMSE and MAE for SiTH were significantly higher than that of MOD16 and PT-JPL over forests biomes. Generally, the PT-JPL model had lower values of RMSE and MAE than the other two models for the majority of biomes.
Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 22 and MAE for SiTH were significantly higher than that of MOD16 and PT-JPL over forests biomes. Generally, the PT-JPL model had lower values of RMSE and MAE than the other two models for the majority of biomes.

Model Evaluation at Catchment Scale
At regional scale, SiTH tended to overestimate ET driving by different LAI and meteorological data ( Figure 4). The regression slope between observed and estimated ET by SiTH ranged from 1.22 to 1.36. The values of R 2 for SiTH (0.89~0.90) were generally higher than that for MOD16 and PT-JPL. The MOD16 model also slightly overestimated ET with values of slope being 1.03 to 1.16, and the R 2 values were lower than the other two models (R 2 = 0.71~0.80). The PT-JPL model performed well with the slope ranging from 0.87 to 1.03 and R 2 varying from 0.79 to 0.86. The regression slope between observed and PT-JPL estimated ET ranged from 0.87 to 1.03 with R 2 varying from 0.79 to 0.86. For the SiTH and PT-JPL models, the meteorological data has relatively higher influences on simulated ET than LAI data. For the MOD16 model, the estimated ET showed relatively great differences by using different LAI datasets.

Model Evaluation at Catchment Scale
At regional scale, SiTH tended to overestimate ET driving by different LAI and meteorological data ( Figure 4). The regression slope between observed and estimated ET by SiTH ranged from 1.22 to 1.36. The values of R 2 for SiTH (0.89~0.90) were generally higher than that for MOD16 and PT-JPL. The MOD16 model also slightly overestimated ET with values of slope being 1.03 to 1.16, and the R 2 values were lower than the other two models (R 2 = 0.71~0.80). The PT-JPL model performed well with the slope ranging from 0.87 to 1.03 and R 2 varying from 0.79 to 0.86. The regression slope between observed and PT-JPL estimated ET ranged from 0.87 to 1.03 with R 2 varying from 0.79 to 0.86. For the SiTH and PT-JPL models, the meteorological data has relatively higher influences on simulated ET than LAI data. For the MOD16 model, the estimated ET showed relatively great differences by using different LAI datasets. . Scatter plots of the simulated ET by three models using different forcing datasets versus measured ET at 32 catchments. On the left panels, the same meteorological datasets of ERA5 and different LAI datasets of (a) GLOBMAP, (c) GLASS, and (e) GIMMS was used to estimate ET, respectively. On the right panels, the same meteorological datasets of MERRA-2 and different LAI datasets of (b) GLOBMAP, (d) GLASS, and (f) GIMMS was used to estimate ET, respectively. The black dotted line represents the 1:1 line.
The performances of the three models over each basin were compared in Figure 5. Both MOD16 and SiTH models overestimated ET over the majority of the catchments, while PT-JPL performs relatively well in most catchments. The estimated ET by three models (especially SiTH) shows high consistency with water balance ET, with the R 2 greater than 0.8 in the majority of catchments. However, the values of R 2 in three models are very low in several basins (Amazon, Congo, Mekong, MOD16 in Aral, Indus, Murray and Olenek). The PT-JPL model has a relative good performance in most catchments than the other two models, with greater NSE values (closer to 1), lower RMSE (mean RMSE = 12.44 ± 5.10 month −1 ), and MAE values (mean MAE = 9.40 ± 4.53 month −1 ) than SiTH (mean RMSE = 18.95 ± 9.02 month −1 , mean MAE = 14.36 ± 8.53 month −1 ), and MOD16 (mean RMSE = 20.10 ± 10.72 month −1 , mean MAE = 16.58 ± 9.33 month −1 ) in all basins. Generally, the three models performed poorly over catchments in the tropical rainforest areas (Amazon, Congo, and Mekong) and Zhujiang region. In addition, MOD16 also performed poorly in high latitudes regions (Indigirk, Olenek, and Yukon). This may be due to large discrepancies of LAI and meteorological data sets [36,39] or to a lack of a robust description of snow and ice evaporation at high latitudes. . Scatter plots of the simulated ET by three models using different forcing datasets versus measured ET at 32 catchments. On the left panels, the same meteorological datasets of ERA5 and different LAI datasets of (a) GLOBMAP, (c) GLASS, and (e) GIMMS was used to estimate ET, respectively. On the right panels, the same meteorological datasets of MERRA-2 and different LAI datasets of (b) GLOBMAP, (d) GLASS, and (f) GIMMS was used to estimate ET, respectively. The black dotted line represents the 1:1 line.
The performances of the three models over each basin were compared in Figure 5. Both MOD16 and SiTH models overestimated ET over the majority of the catchments, while PT-JPL performs relatively well in most catchments. The estimated ET by three models (especially SiTH) shows high consistency with water balance ET, with the R 2 greater than 0.8 in the majority of catchments. However, the values of R 2 in three models are very low in several basins (Amazon, Congo, Mekong, MOD16 in Aral, Indus, Murray and Olenek). The PT-JPL model has a relative good performance in most catchments than the other two models, with greater NSE values (closer to 1), lower RMSE (mean RMSE = 12.44 ± 5.10 month −1 ), and MAE values (mean MAE = 9.40 ± 4.53 month −1 ) than SiTH (mean RMSE = 18.95 ± 9.02 month −1 , mean MAE = 14.36 ± 8.53 month −1 ), and MOD16 (mean RMSE = 20.10 ± 10.72 month −1 , mean MAE = 16.58 ± 9.33 month −1 ) in all basins. Generally, the three models performed poorly over catchments in the tropical rainforest areas (Amazon, Congo, and Mekong) and Zhujiang region. In addition, MOD16 also performed poorly in high latitudes regions (Indigirk, Olenek, and Yukon). This may be due to large discrepancies of LAI and meteorological data sets [36,39] or to a lack of a robust description of snow and ice evaporation at high latitudes.

Evaluation of ET at Global Scales
At global scale, we calculated the annual average ET during 1982-2011 using these models forced by six different combinations of inputs ( Figure 6). The spatial pattern of simulated ET by the three models were very similar. The highest annual ET was found over the amazon basin, the Congo rainforest and the southeast Asia near the equator, while the lowest annual ET were in the north Africa, most areas of central Asia, the southwestern United States, central and western Australia, and some parts of high latitude. The average total annual ET from 1982 to 2011 were (76.87 ± 2.98) × 10 3 km 3 , (71.68 ± 2.82) × 10 3 km 3 , and (61.25 ±1.92) × 10 3 km 3 for SiTH, MOD16, and PT-JPL model, respectively. These values fell in the ranges from 54.9 × 10 3 km 3 to 85 × 10 3 km 3 reported by previous studies [2,28,[66][67]. In addition, the mean annual global land ET calculated from MET and GLEAM during the same period were 63.34 × 10 3 km 3 and 65.7 × 10 3 km 3 , respectively. These values were slightly lower than that estimated by SiTH and MOD16 model, but very close to that estimated by PT-JPL model. The latitudinal average of ET simulated by the three model during 1982-2011 were also presented Figure 6 and showed similar latitudinal pattern. Relative high ET values were observed near the equator area and near 20 °N. However, the estimated ET in tropical regions by SiTH model was higher than that estimated by the other two models. The PT-JPL estimated ET was relatively low for the latitudes 30-60 °N, and estimated ET by MOD16 was low for the latitudes between 15 °S and 30 °S [22].

Evaluation of ET at Global Scales
At global scale, we calculated the average annual ET during 1982-2011 using these models forced by six different combinations of inputs ( Figure 6). The spatial pattern of simulated ET by the three models were very similar. The highest annual ET was found over the amazon basin, the Congo rainforest and the southeast Asia near the equator, while the lowest annual ET were in the north Africa, most areas of central Asia, the southwestern United States, central and western Australia, and some parts of high latitude. The average total annual ET from 1982 to 2011 were (76.87 ± 2.98) × 10 3 km 3 , (71.68 ± 2.82) × 10 3 km 3 , and (61.25 ±1.92) × 10 3 km 3 for SiTH, MOD16, and PT-JPL model, respectively. These values fell in the ranges from 54.9 × 10 3 km 3 to 85 × 10 3 km 3 reported by previous studies [2,28,66,67]. In addition, the mean annual global land ET calculated from MET and GLEAM during the same period were 63.34 × 10 3 km 3 and 65.7 × 10 3 km 3 , respectively. These values were slightly lower than that estimated by SiTH and MOD16 model, but very close to that estimated by PT-JPL model. The latitudinal average of ET simulated by the three model during 1982-2011 were also presented Figure 6 and showed similar latitudinal pattern. Relative high ET values were observed near the equator area and near 20 • N. However, the estimated ET in tropical regions by SiTH model was higher than that estimated by the other two models. The PT-JPL estimated ET was relatively low for the latitudes 30-60 • N, and estimated ET by MOD16 was low for the latitudes between 15 • S and 30 • S [22].    To evaluate the impact of LAI and meteorological data uncertainties on long-term ET trend, we classified 18 sets of ET products in Supplementary Table S2 into the following two categories: (1) the estimates of ET from three LAI datasets with different combinations of models and meteorological datasets, which were mainly used to investigate the influences of different LAI datasets on the ET estimations ( Figure 7b); and (2) the estimates of ET from two meteorological datasets with different combinations of models and LAI datasets, which were mainly used to investigate the influences of different meteorological datasets on ET estimations (Figure 7c). In Figure 7b, the estimates of ET using the GLASS dataset deviated from those using the GLOBMAP and GIMMS datasets. The estimated ET using the GLASS dataset were lower than those by the other two LAI datasets during the 1988-1993 period, while the estimated ET using the GLASS dataset were higher than those by the other two LAI datasets during 1999-2004 period. In Figure 7c, the estimates of ET using the MERRA-2 dataset were very similar to that using the ERA5 data, and their inter-quartile range of ET overlapped greatly and varied synchronously. Also, the median values of ensembles ET using the ERA5 and MERRA-2 datasets were consistent with the ET value estimated by MET.

Analyzing the Performance of Models
The three models with different structural complexity and process parameterizations have been developed to predict global ET. Hence, these models are expected to present various performance in estimating ET [21,43]. In this study, we found that the SiTH model overestimated ET over some specific biomes (i.e., forest) and in tropical regions. However, this model exhibited relatively high consistency with the observations (R 2 = 0.6~0.88). In this model, the P-T coefficient α is a dimensionless factor associated with the Bowen ratio to limit evaporation, and its value of 1.26 is derived from the data of daily fluxes at saturated land sites and open water [20]. Many studies revealed that the value of α for forests may be below 1. 26. For examples, Komatsu et al. [68] reported that the value of α is 0.83 ± 0.15 for deciduous forests and α = 0.63 ± 0.2 for coniferous forests. Cho et al. [69] discovered that the mean value of α for deciduous forests is 1.01 and for the coniferous forests is only 0.75. Luciana et al. [70] found that α values for forests are around 0.65. Therefore, the overestimates of ET by SiTH is due to the high α value used in this model for forest ecosystems. On the contrary, SiTH has a good performance over short vegetation ecosystems (i.e., grassland, cropland and shrubland). For the well-watered short vegetations, the value of α = 1.26 is confirmed in the literature [20]. Pereira et al. [71] reported that the value of α at the irrigated croplands ranged from 1.17 to 1.35. Some researchers found that the value of α close to the 1.26 over shrublands [72,73]. So, the SiTH performed relatively well over short vegetation ecosystems (i.e., grassland, cropland, and shrubland) (Figure 4). Notice that the α value may systematically vary on the daily and seasonal cycles [68,74,75]. So, it should take this variation into account in the future studies and optimize the value of α in SiTH for forests to improve its accuracy in ET simulations.
We also found that both PT-JPL and MOD16 performed poor in dryland biomes (OSH), which is consistent with the previous studies [22,25,27,29,46,59,[76][77][78]. In arid areas, soil moisture is the main factor to influences ET processes. However, the PT-JPL and MOD16 models used the atmospheric moisture conditions (i.e., air temperature, RH, and VPD) to reflect soil moisture constraints on ET [10][11][12] rather than directly using the soil moisture to constrain the ET. Novick et al. [79] reported that atmospheric moisture conditions may be significantly correlated with soil moisture at month to year timescales, but they tended to be nearly decoupled at the daily and hourly time scales. Thus, the PT-JPL and MOD16 models using the atmospheric moisture conditions to limit soil moisture may not properly describe the restricts of soil moisture on ET in arid regions. Recently, Purdy et al. [80] proposed to incorporate satellite-observed soil moisture into the PT-JPL model, and great improvements in model performances were obtained especially in water-limited regions. Furthermore, the soil moisture constraint was calculated as RH VPD/β in MOD16 and PT-JPL models. The parameter β is the sensitivity of soil moisture constraint to VPD, and plays an important role in accurate estimation of soil evaporation [12,27,46,59,76]. Zhang et al. [29,46] found that β was the most sensitive parameter, and its values in arid area were lower than that in humid regions, resulting in low soil evaporation due to soil moisture stress. On the contrary, the SiTH model directly uses soil water content to describe soil moisture limitation, and performs relatively well in arid areas [18]. Finally, plants have deep roots in arid regions [81,82] and can utilize deeper soil moisture or even groundwater to maintain growth [83]. However, only SiTH among the three models took the influences of groundwater into account during ET modeling.

Impact of the Uncertainties of Forcing Data on ET
Models tend to exhibit different behavior when forced with different input data [22,25,30,84]. From the evaluation of monthly estimated ET by the three models using different combinations of forcing datasets at local and regional scales (Figures 2 and 4), the SiTH model overestimated ET with all forcing datasets. Using different meteorological datasets, the differences of simulated ET by SiTH model were relatively larger than using different LAI datasets. A slight improvement in performances were observed by using ERA5 meteorological dataset. Also, it seemed that the estimated ET by MOD16 model was more sensitive to LAI datasets than meteorological datasets. For the PT-JPL model, the uncertainty in meteorological data had large influence on the estimated ET than LAI data, and relatively good performances were found using MERRA-2.
Moreover, the differences were found in estimated global ET anomalies by three models using different combination of forcing datasets, but the ensemble median of global ET anomalies agreed well with the MTE-estimated global land ET anomalies (Figure 7a). It indicates that the ensemble-model method can well capture the uncertainties in ET estimates [22,25,34,43]. Generally, the global ET shows an increasing trend from 1982 to 1998. After the summit in 1998, global ET shows a decreasing trend from 1998 to 2008 (Figure 7a). This agrees well with the results of previous studies [2,[40][41][42]. Some authors thought that the decline in global ET from 1998 to 2008 was caused by ENSO-induced anomalous dry conditions and consequent limited moisture supply, especially in the Southern Hemisphere [2,42]. However, the decline of ET was transient, and global land ET reached another summit in 2010 (Figure 7a). It demonstrated a transition from El Niño phase to the La Niña phase in 2010 with high precipitation [85], leading to a high ET [42]. Thus, the decline of ET after 1998 was a transient variation but not a constant decline signal. In the future, more studies are needed to figure out the mechanisms in controlling the long-term variations of global ET.
Comparing the global ET anomalies at annual scale under different forcing datasets, we found that the estimated ET anomalies using the GLASS dataset showed large inconsistency with those using the GIMMS and GLOBMAP LAI datasets. Thus, it seemed that the GLOBMAP and GIMMS datasets were more suitable for long-term ET simulations than the GLASS dataset. This is consistent with previous studies. For example, Jiang et al. [36] found that interannual variabilities of GLASS LAI products show large differences with other LAI products (i.e., GLOBMAP, GIMMS, and TCDR). Furthermore, previous studies found that vegetation greening is main driver to the multi-decadal ET trend since 1980s [40,[86][87][88][89]. Therefore, great disagreements among global long-term satellite LAI products can cause the large differences of estimated long-term ET trend. The differences of estimated ET anomalies using the different meteorological datasets was relatively small (Figure 7c). The ensemble median values of global ET using ERA5 and MERRA-2 datasets are in agreement with the MTE-estimated global land ET anomalies. Thus, it seemed that the influences of meteorological datasets on global ET estimates can be ignored partially due to their relatively good qualities.

Conclusions
In this study, we evaluated the performances of three process-based ET models in ET simulations at multiple scales by using various LAI and meteorological forcing datasets. The results showed that SiTH simulated well in dryland short vegetation ecosystems but overestimated ET in forest ecosystems because the P-T coefficient (α) may be set too high in this model. The PT-JPL and MOD16 models performed well in forests, but poorly in dryland biomes due to their improper description of soil moisture stress based on atmospheric moisture conditions. Similar model performances were observed at both catchment and global scales. To obtain proper long-term global ET estimates, the multi-model ensemble approach is a proper choice. We found that the ensembles median of global ET anomalies from different models and forcing datasets showed good consistency with that obtained by the MTE method. Generally, the uncertainty of LAI datasets has larger influences on the global ET estimates than the meteorological datasets. Thus, we should pay more attention to research global changes derived from the global LAI products. In further studies, we should optimize the P-T coefficient (α) over different vegetation types for SiTH to improve its accuracy in ET simulations over forest ecosystems. Finally, more studies are need to quantify the contributions of different driving factors to the variations in global ET and to figure out the mechanisms in controlling global ET changes.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/12/15/2473/s1, Table S1: Variables used as input to derive evapotranspiration (ET) in three models, Table S2: Details of the input datasets combinations for each ensemble members, Table S3: The monthly statistical parameters of the measured ET data at flux sites, and Table S4: The main characteristics and monthly statistical parameters at the 32 catchments.