A Bayesian Three-Cornered Hat (BTCH) Method: Improving the Terrestrial Evapotranspiration Estimation

: In this study, a Bayesian-based three-cornered hat (BTCH) method is developed to improve the estimation of terrestrial evapotranspiration (ET) by integrating multisource ET products without using any a priori knowledge. Ten long-term (30 years) gridded ET datasets from statistical or empirical, remotely-sensed, and land surface models over contiguous United States (CONUS) are integrated by the BTCH and ensemble mean (EM) methods. ET observations from eddy covariance towers (ET EC ) at AmeriFlux sites and ET values from the water balance method (ET WB ) are used to evaluate the BTCH- and EM-integrated ET estimates. Results indicate that BTCH performs better than EM and all the individual parent products. Moreover, the trend of BTCH-integrated ET estimates, and their inﬂuential factors (e.g., air temperature, normalized di ﬀ erential vegetation index, and precipitation) from 1982 to 2011 are analyzed by the Mann–Kendall method. Finally, the 30-year (1982 to 2011) total water storage anomaly (TWSA) in the Mississippi River Basin (MRB) is retrieved based on the BTCH-integrated ET estimates. The TWSA retrievals in this study agree well with those from the Gravity Recovery and Climate Experiment (GRACE). of time window on the accuracy of BTCH-integrated ET estimates is evaluated. The chosen time windows are 1, 2, 3, 6, and 12 months (i.e., ET integration by BTCH every 1, 2, 3, 6, and 12 months). Two-month time windows are December to January, February to March, . . . , and October to November. Three-month time windows are spring (April to June), summer (July to September), autumn (October to December), and winter (January to March). Six-month time windows are growth (May to October) and non-growth (November to April) seasons.


Introduction
Evapotranspiration (ET) refers to the amount of water vapor evaporated from the land surface to the atmosphere [1,2]. The accurate estimation of ET is required for understanding the water resource management, water cycle, and climate change [3][4][5][6]. ET can be estimated directly by flux tower systems (e.g., FLUXNET, AmeriFlux, EuroFlux, HiWATER, etc.) [7][8][9]. However, these measurements have a sparse distribution and limited time periods. Hence, several approaches have been developed to estimate the spatial and temporal variability of ET over the regional and global scales. The existing

Study Regions and Datasets
This study covers the contiguous United States (CONUS), ranging from 25.8 • N to 50.8 • N and 124.8 • W to 67.8 • W (Figure 1). The vegetation types of CONUS are forest, grassland, cropland, and shrubland. The land cover types are obtained from the MODIS land cover type product (MCD12Q1) (https://lpdaac.usgs.gov/products/mcd12q1v006/). The CONUS is composed of 12 Table 1. The energy closure imbalance of monthly ET data from 15 AmeriFlux sites was corrected by Jung et al. (2010) [10]. The locations of 12 RFCs and 15 AmeriFlux sites are shown in Figure 1.  Ten ET products (namely, GFET, GLEAM, and four NLDAS-2 ET, and four NLDAS-testbed ET datasets) are integrated by the BTCH method (Table 2). GFET is generated by the Max Planck Institute for Biogeochemistry in Germany (https://www.bgc-jena.mpg.de/geodb/projects/Data.php). This product is derived from a data-driven model using global FLUXNET eddy covariance measurements. The GLEAM ET dataset is estimated from satellite-based data (i.e., soil moisture, land surface temperature, vegetation optical depth, and snow water equivalents) using the Priestley-Taylor equation over a global scale with spatial resolution of 0.25 • [36,62,63]. The GLEAM ET products are available on the Global Land Evaporation Amsterdam Model archive (https://www.gleam.eu/).
The auxiliary data used in this study include precipitation (P), air temperature (Ta), runoff (R), TWSA, soil moisture (SM), and normalized differential vegetation index (NDVI) data. The monthly gridded P and Ta datasets are obtained from National Centers for Environmental Information (NCEI) (https://www.ncei.noaa.gov) [74,75]. The regional monthly runoff data are downloaded from the United States Geological Survey (USGS) water watch platform (http://waterwatch.usgs.gov/). The daily NDVI data are derived from the Advanced Very High-Resolution Radiometer (AVHRR) instrument onboard the series of the National Oceanic and Atmospheric Administration (NOAA) satellites with the spatial resolution of 0.05 • (https://climatedataguide.ucar.edu/climate-data/ndvi-normalized-differencevegetation-index-noaa-avhrr). The AVHRR NDVI data were further averaged yearly using the time-average method to avoid cloud contaminations. The TWSA data are obtained from the GRACE Tellus website with the spatial resolution of 1 • × 1 • [76,77] (https://grace.jpl.nasa.gov/). The monthly soil moisture (SM, 1 meter) and ET values from CLM4 are also used as references to evaluate the integrated ET products. The related regional data are summarized in Table 2.

Bayesian-Based Three-Cornered Hat (BTCH) Method
In this study, the BTCH method is utilized to generate an accurate ET product by combining various ET datasets [82,83]. The probability density function (PDF) for the ith ET product (ET i ) can be expressed as, where ET t is the true value of ET. ε i and σ i are a zero-mean white noise and error variance of the ith ET product, respectively. L(•) is the likelihood function. Similarly, the PDF for jth ET product (ET j ) can be expressed as, where ε j and σ j are a zero-mean white noise and error variance of the jth ET product. The maximum likelihood of true ET (ET t ) is the maximum value of its joint probability distribution, In order to obtain the maximum likelihood value of ET t in Equation (3), the cost function J is defined as, By setting the first variation of J(ET t ) to zero (i.e., δJ(ET t ) = 0), ET t can be obtained as, If we define ET t = w i ET i + w j ET j , then Similar to Equation (5), ET t can be obtained via ET t = w 1 ET 1 + ··· + w N ET N when there are N sets of ET products. The weight of each ET product (e.g., w k ) can be obtained by minimizing a similar cost function (Equation (4)) as, The error covariance (σ i ) of each ET product can be obtained using the three-cornered hat (TCH) method.
Since ET t is not available, the difference between (N-1) ET products and a reference ET product (ET R ) (chosen arbitrarily from N ET products) can be expressed as, where Y is the difference matrix with (N-1) rows and M columns (M is the total number of time samples in each ET product). The covariance matrix (S) of Y is defined as [84], where cov(•) is the covariance operator. The unknown covariance matrix of the individual noise R is related to S via, where J is the identity matrix, and can be defined as, and matrix R is defined as, where σ ij = cov(ε i , ε j ). There are N × (N + 1)/2 unknowns in Equation (10) (number of distinct elements of R), but there are only N × (N − 1)/2 equations (number of distinct elements of S). Thus, there remain N "free" parameters that must be reasonably determined to obtain a unique solution [84]. To determine the N free parameters, the following objective function is minimized based on the Kuhn-Tucker theorem.
The Kuhn-Tucker theorem is proposed by Galindo and Palacio (1999) [85], and the objective function is defined as, where K = N−1 det(S). The constraint function is given by, where Q is a diagonal matrix with elements σ 11 , ···, σ NN on its diagonal. These arrays can be calculated by minimizing Equation (13) through the initial condition iterations [86]. The square root of the diagonal elements of R (i.e., σ 11 , σ 22 , . . . , σ NN ) represent the relative uncertainty of each ET product [43].  [43] for detailed information on the TCH approach. The ensemble mean (EM) method is used to also integrate the ten long-term gridded ET products and generate an ET product. This method just simply calculates the average of ET products as follows,

Water Balance Budget Method
The water budget equation can be expressed as, where t is the time step (month), ET WB is the ET (mm/month) estimates from the water budget equation, TWSC is the total water storage change (mm/month), and P and R are the precipitation (mm/month) and runoff (mm/month), respectively. According to Zhang et al. (2010) [87] and Velpuri et al. (2013) [88], ET WB values can be used as the reference data to validate ET estimates at multiyear scale.

Long-Term Total Water Storage Anomaly Reconstruction
The regional TWSC data can be expressed as follows, where TWSA is the total water storage anomaly and, on the one hand, can be obtained from GRACE data. On the other hand, TWSC can be calculated from Equation (16) using P, R, and ET, and TWSA can be calculated via, As shown in Equation (18), the uncertainty of TWSA estimates is due to cumulative errors in TWSC estimates, and TWSC uncertainties can be attributed to errors in P, R, and ET. Thus, the PER (precipitation, evapotranspiration, and runoff) method proposed by Zeng, (1999) [89] is applied to Equation (18) to eliminate time accumulative errors, especially systematic bias, where P, ET, and R represents mean values of P, ET, and R, respectively. The Mann-Kendall method is used to analysis linear trend of the BTCH-integrated ET products [90,91]. The Mann-Kendall test is a non-parametric method to identify trends of time series and is widely used in previous studies [92][93][94][95].
The standard deviation (SD), root mean square deviation (RMSE), and correlation coefficients (R) metrics are used to access the performances of VDA approach, where P t and O t are the predicted and observed values at time step t, respectively. Partial correlation analysis [96,97] is used to obtain the correlation between collinear hydrologic variables (herein, Ta, P, and NDVI) that affect ET. The partial correlation analysis eliminates collinearity among the hydrologic variables (i.e., Ta, P, and NDVI) controlling ET. This analysis allows the impact of one or more variables (e.g., Ta and NDVI) to be eliminated when examining the relationship between another pair of variables (e.g., ET and P) [98,99]. Hence, the partial correlation coefficient quantifies the correlation between two variables (e.g., ET and P), controlling the other variables (i.e., Ta and NDVI).
The partial correlation between the variables x and y, which are conditioned on the variable z can be calculated by, where R xy,z is the partial correlation between x and y, given the control z. The variables R xy , R xz , R yz are the correlation between x and y, x and z, and y and z, respectively. More details on the formulation of the partial correlation method is provided by [96] and [97].    Figure 4 shows the statistics of BTCH-integrated, EM, and ten parent ET products as compared with flux tower measurements (ET EC ). As indicated, ET estimates from the BTCH method are closer to the observations as compared with other ET products. Since GFET is upscaled based on FLUXNET tower observations, it is also close to the AmeriFlux ET observations. The ET estimates from BTCH have higher R and lower RMSD as compared with those of EM and individual algorithms over the three land cover types. For DBF, ENF, and Crop, the RMSDs of ET estimates from BTCH are respectively 11.25, 15.13, and 17.25 mm/month, which are 31.28%, 20.16%, and 22.05% lower than those of 16.37, 18.95, and 22.13 mm/month from EM. ET WB values are also used to assess the accuracy of ET estimates from BTCH, EM, and ten individual models at the basin scale in Figure 5. As indicated, BTCH performs better than EM and other models. ET products from BTCH and Noah28 are closest to the ET WB estimates, followed by VIC412 and GFET. For the 30-year comparisons (1982 to 2011), the average RMSD of ET product from the BTCH method is 24.64 mm/year, which is 53.46% lower than the RMSD of 52.94 mm/year from the EM method. The weight (%) of each ET dataset has a key role in ET integration by the BTCH method. The weights of ten ET products over the main four land covers (cropland, grassland, ENF, and DBF) are shown in Figure 6. As indicated, GFET (22%), Noah28 (15%), and GLEAM (14%) have the largest weight over all land covers in the whole year. The relative contributions vary for different land cover types. For example, Noah28 has the highest weight (24%) over cropland; GFET has the highest weight (27%) over grassland; GFET and NoahMP36 have the highest weight (16%) over ENF; and GFET has the highest weight (24%) over DBF. GFET has the highest weight in different seasons. For different LSMs, Noah28 and Noah36 have the largest weight of 15% and 10% in a whole year, while VIC403 has lowest weight (5%). VIC412 increased weight from 5% to 8% as compared with VIC403 in whole year.

Relationships between BTCH-Integrated ET and Climate Factors
The Mann-Kendall trend test was used to investigate the temporal variations of BTCH-integrated ET along with dominant climate factors (i.e., air temperature, NDVI, and precipitation) in the CONUS from 1982 to 2011 (Figure 7). The black points represent 95% level of significance. The positive and negative values on the maps indicate increasing and decreasing trends, respectively. As shown, air temperature increased about 0.06°C/year over the southwest CONUS, and decreased~0.03°C/year over the northwest CONUS. NDVI shows a positive trend over the southeast and west coast, whereas a negative trend was found in southwest areas. The precipitation increased (decreased)~0.12 mm/year over the northeast (southeast) CONUS. As shown, there is a maximum of 0.04 mm/year growth (reduction) in the BTCH-integrated ET over center, northeast, and west coast (southwest) areas. The growing trend of ET could be caused by increased vegetation density (NDVI) and precipitation. The decreased air temperature in the north and west coast areas could be caused by the "cooling effect" of increased ET. In the southwest areas (e.g., Texas), ET, NDVI, and precipitation are all decreased, indicating the continuous drought in this area. In contrast, Ta is increased in the southwest area (e.g., Texas). Because the land surface is dry, the available energy at the surface is mostly dissipated through sensible heat flux, which heats the atmosphere. The results also show different trends of ET among the RFCs. Over Colorado and California-Nevada, the ET estimates show a small decrease over the 30-year period (1982 to 2011). While, there is a small increase in the ET estimates over Missouri, North Central, Mid-Atlantic, Northeast, Lower Mississippi, and Ohio over the same period. Texas is mainly covered by the West Gulf and Arkansas basins, and experienced the most extreme one-year drought on record in 2011. The extreme one-year drought in 2011 has a large impact on the ET estimates [100,101]. Milly and Dunne (2001) [102] showed that increased ET in Mississippi has both climatological and anthropogenic dimensions. Walter et al. (2004) [103] studied ET across the CONUS based on precipitation and stream discharge data and showed that the ET has increased over the past 50 years. This study also reported an increasing trend in ET over the Mississippi basin and CONUS from 1982 to 2011, which is consistent with those of [102] and [103].
Plots of 30-year-averaged ET against Ta, P, and NDVI over the 12 RFCs are shown in Figure 8. As shown, in general, ET grows by an increase in Ta, P, and NDVI over the 12 RFCs. ET estimates over Southeast, Lower Mississippi, and Ohio are higher than those of Colorado and California-Nevada because of higher vegetation density, precipitation, and air temperature. In the West Gulf, low vegetation density and precipitation limit ET in spite of high air temperature. These results are consistent with those of [23] and [104].  Figure 9 shows the partial correlation coefficients of the annual ET estimates from BTCH with NDVI, precipitation, and air temperature from 1982 to 2011. As indicated, the annual ET estimates is positively correlated with precipitation and NDVI in the southwest CONUS, implying that the rise in these variables increase ET, and vice versa. This is because at dry or slightly vegetated conditions (e.g., southwest areas, Figure 3), ET is water limited and is mainly controlled by the surface state variables (i.e., soil moisture and NDVI). The ET estimates are positively correlated with air temperature in the northwest, east, and southeast CONUS (wet or densely vegetated conditions, Figure 3). This happens because in wet or densely planted conditions (e.g., northwest, east, and southeast areas), ET is energy limited and is mainly influenced by atmospheric state variables (i.e., air temperature and specific humidity) [105,106]. Figure 10 shows time series of the annual BTCH-integrated ET estimates, soil moisture from CLM4, and precipitation anomalies over the twelve RFCs for 1982 to 2011. As indicated, ET anomalies are consistent with those of soil moisture and precipitation. ET anomalies rise with the increase of soil moisture and precipitation anomalies, and vice versa. Over relatively dry RFCs (i.e., Colorado and California-Nevada), ET increases rapidly with precipitation (e.g., 1983, 1998, and 2005). While, there is a lower consistency between ET and soil moisture over relatively wet RFCs (i.e., Northeast, Lower Mississippi). This is because over relatively dry RFCs, ET is water limited and is mainly controlled by the land surface state variable (i.e., soil moisture). Therefore, the coupling between ET and soil moisture (precipitation) is more robust. In contrast, over relatively wet RFCs, ET is energy limited and is mainly affected by the atmospheric state variables (i.e., air temperature and specific humidity). Hence, the coupling between ET and soil moisture (precipitation) becomes weak.

Long-Term Reconstruction of GRACE Total Water Storage Anomaly
Using BTCH-integrated ET, precipitation, and runoff datasets, the TWSA over the Mississippi River Basin (MRB) is estimated from Equation (18). The long-term (1982-2011) TWSA estimates from the PER method are shown in Figure 11 (top). In general, the TWS estimates increased by 0.06 mm/year from 1982 to 2011. TWS has a decreasing trend prior to 1993 as represented by the negative TWSA estimates. In contrast, TWS increases after 1993 as reflected in the positive TWSA retrievals. The retrieved TWSA is compared with the GRACE-derived TWSA from 2002 to 2011 over the MRB. As shown in Figure 11 (bottom), the TWSA retrievals agree well with those of GRACE in terms of magnitude, as well as seasonal and long-term variations. The RMSDs and R of TWSA estimates from PER method are 32.92 mm/month and 0.82. From 2002 to 2011, on average, PER TWS estimates increased by~0.32 (mm/year), which is close to the TWS increase of~0.38 (mm/year) from GRACE. The misfit between the GRACE and PER TWSA retrievals is due to uncertainties in the water budget balance components (i.e., ET, precipitation, and runoff) and GRACE data.

Time Window of BTCH Method
As indicated by   [43], uncertainties of ET estimates from 12 ET products (including one, three, and eight products from the machine learning model, remotely sensed observations, and land surface models, respectively) change with seasons. It is evident that the seasonal variations in the uncertainty of ET estimates from different methods affect ET integration by the BTCH method. Herein, the impact of time window on the accuracy of BTCH-integrated ET estimates is evaluated. The chosen time windows are 1, 2, 3, 6, and 12 months (i.e., ET integration by BTCH every 1, 2, 3, 6, and 12 months). Two-month time windows are December to January, February to March, . . . , and October to November. Three-month time windows are spring (April to June), summer (July to September), autumn (October to December), and winter (January to March). Six-month time windows are growth (May to October) and non-growth (November to April) seasons. Figure 12 shows the monthly ET estimates from BTCH for the six tested time windows over DBF, ENF, and cropland. As can be seen, the integrated ET products for different time windows are comparable in magnitude and temporal variation. The ET estimates increase with the vegetation growth and reach their maximum in mid-July. Then, ET reduces as the vegetation decays. As indicated, the results of BTCH method with time window are improved significantly as compared with the EM methods. ET estimates from BTCH with one-month and two-month time windows are closer to the observations as compared with those with three-month and six-month time windows. Over ENF, ET estimates are remarkably lower than observations during the whole year. For cropland and DBF, ET values are significantly underestimated during the second half of the year. This is because the Noah28, VIC403, NoahMP36, and VIC412 models typically underestimate ET and fail to capture its seasonal dynamics over DBF and ENF [43]. Overall, the BTCH method with the one-month time window provides the most accurate ET estimates. This is because the one-month time window is able to capture seasonal variations in uncertainties of ET estimates better than those of two-month, three-month, six-month, and 12-month time windows, and EM.  The ET derived from water balance method (ET WB ) and ET from CLM4 (ET CLM4 ) are used as the reference data to quantify different integration results. The results of EM are also shown for comparison. As indicated, the BTCH-integrated ET estimates are closer to observation, implying that BTCH outperforms EM. ET estimates from BTCH1 have the lowest RMSD and standard deviation, followed by those of BTCH2, BTCH3, BTCH6, and BTCH12. The seasonal variations in uncertainties of ET estimates can be efficiently used by the BTCH method to reduce the uncertainties in the integrated ET product.  Figure 14 shows uncertainty of ET retrievals from BTCH over the CONUS, for the five different time windows. For comparison, uncertainty of ET estimates from EM is also shown in Figure 14. The TCH method is used to calculate the uncertainties of ET estimates from BTCH and EM. As indicated, the uncertainties of ET retrievals from BTCH1 to BTCH12 are lower than those of EM. The ET product from BTCH1 has the lowest uncertainty, followed by BTCH2, BTCH3, BTCH6, and BTCH12. The individual ET products have lower uncertainties in the northwest as compared with the southeast and center of CONUS [43]. However, BTCH1 can take advantage of seasonal ET uncertainties and eliminate high uncertainties of the other BTCH and EM products.

Conclusions
In this study, a Bayesian-based three-cornered hat (BTCH) method is used to integrate ten long-term (30 years) gridded ET datasets (i.e., GFET, GLEAM, eight physical model outputs from NLDAS-2, and NLDAS-tested) without using any a priori knowledge.
The performance of the BTCH method is evaluated using ET measurements from AmeriFlux eddy covariance (ET EC ) stations and ET derived from the water balance method (ET WB ). The results show that BTCH is able to capture the seasonal variations of ET estimates more accurately than the ensemble mean (EM) method. The BTCH-integrated ET estimates have the RMSD of 14.54 mm/month, which is 24.07% lower than that of the RMSD of 19.15 mm/month from EM.
BTCH-integrated ET shows a maximum increase of~0.04 (mm/year) over the center, northeast, and west coast of CONUS and a maximum decrease of~0.04 (mm/year) over southwest areas. The positive trend of ET in the Mississippi River Basin from 1982 to 2011 is caused by increased vegetation density (NDVI) and precipitation. The decreased ET in the southwest areas is due to the reduction in NDVI and precipitation. The annual BTCH-integrated ET is positively correlated with precipitation and NDVI in dry or sparse vegetated conditions, while it is positively correlated with air temperature wet or densely vegetated conditions. This is because at dry or slightly vegetated conditions, ET is water limited and is mainly controlled by the surface state variables (i.e., soil moisture and NDVI). In contrast, at wet or densely planted conditions, ET is energy limited and is mainly influenced by atmospheric state variables (i.e., air temperature and specific humidity).
Finally, the long-term (1982-2011) total water storage anomaly (TWSA) is computed in the Mississippi River Basin (MRB) based on BTCH-integrated ET, using the PER method. The PER TWSA retrievals agree well with those of GRACE, and they both show an increasing trend over MRB from 1982 to 2011. Using the reconstructed long-term TWSA, the estimated TWS values increased by 0.06 (mm/year) over the MRB from 1982 to 2011.