Modelling High-Resolution Actual Evapotranspiration through Sentinel-2 and Sentinel-3 Data Fusion

: The Sentinel-2 and Sentinel-3 satellite constellation contains most of the spatial, temporal and spectral characteristics required for accurate, ﬁeld-scale actual evapotranspiration (ET) estimation. The one remaining major challenge is the spatial scale mismatch between the thermal-infrared observations acquired by the Sentinel-3 satellites at around 1 km resolution and the multispectral shortwave observations acquired by the Sentinel-2 satellite at around 20 m resolution. In this study we evaluate a number of approaches for bridging this gap by improving the spatial resolution of the thermal images. The resulting data is then used as input into three ET models, working under different assumptions: TSEB, METRIC and ESVEP. Latent, sensible and ground heat ﬂuxes as well as net radiation produced by the models at 20 m resolution are validated against observations coming from 11 ﬂux towers located in various land covers and climatological conditions. The results show that using the sharpened high-resolution thermal data as input for the TSEB model is a sound approach with relative root mean square error of instantaneous latent heat ﬂux of around 30% in agricultural areas. The proposed methodology is a promising solution to the lack of thermal data with high spatio-temporal resolution required for ﬁeld-scale ET modelling and can ﬁll this data gap until next generation of thermal satellites are launched.


Introduction
The fluxes of water (e.g., evapotranspiration-ET) and energy (e.g., of latent and sensible heat) at the surface of the Earth are critical to quantify for many applications in the fields of climatology, meteorology, hydrology and agronomy. Easy access to reliable estimations of ET is considered a key requirement within natural resource management, and if ET can be estimated accurately enough it holds a vast potential to assist in the current attempts of meeting the UN Sustainable Development Goals (SDG), e.g., SDG2-zero hunger, or SDG6-clean water and sanitation (https://sustainabledevelopment.un.org, last accessed 10 December 2018).
Water and energy fluxes show large spatio-temporal variability since they are highly dependent not only on the meteorological conditions, but also on different characteristics and properties of the land surface, such as soil moisture/water availability, land cover type and amount of vegetation biomass and its health. Remote sensing data can provide spatially-distributed information about relevant land surface states and properties used to model the relevant fluxes and hence this technology addresses Finally, like all remote sensing retrievals, satellite radiometric temperature is prone to uncertainty due to sensor noise, surface emissivity and atmospheric effects. To overcome this issue in ET estimation, several methods have been proposed based on either contextual models [39][40][41], by constraining the ET range between hot (no ET) and cold (potential ET) pixels [31,32], or using time-differenced morning temperature rise [42,43]. Regarding the contextual methods, all of them require homogeneous forcing and coupling between land surface/atmosphere which is a disadvantage when applied at large scales. In addition, those models assume that the coldest pixel in the image means potential transpiration, and the hottest pixel means zero transpiration which is not always the case (e.g., in humid and sub-humid areas).
In this study we will evaluate three different ET models driven by Sentinel-2 and Sentinel-3 imagery: METRIC [32] is a one source energy balance model that is less sensitive to heat transfer coefficient parametrizing than other OSEB model such as SEBS [33]; TSEB-PT [36] as a widely used two source energy balance model; and ESVEP [44] as a hybrid contextual-two source energy balance model.

Materials and Methods
In this section we first describe the evaluated ET models (Section 2.1), before presenting the validation sites (Section 2.2) and the input data sources (Section 2.3).

Description of ET Models
The energy balance can be expressed as (1) where net radiation R n is a key element in the energy budget of the land surface as it determines the available energy that the land utilises for water evapotranspiration (latent heat flux, λE) and for heating up the overlying air layer (sensible heat flux, H). Equation (1) assumes that other energy terms (heat advection, heat storage in the canopy layer, and energy for photosynthesis) are negligible. Since ET is the combined process of soil evaporation and canopy transpiration, R n can be also be partitioned into soil (R n,S ) and canopy net radiation (R n,C ), with both sensible and latent heat flux also partitioned between soil (i.e., evaporation process) and canopy (transpiration). Using remote sensing data to derive R n has proven to be a sound alternative to ground-based measurements of both shortwave and longwave net radiation. Different approaches have been proposed to estimate surface albedo, ranging from empirical relationships between ground measured albedo and the different reflective bands in satellite [45] to more physically based methods relying on modeling the surface anisotropic effects [46,47]. Indeed, one of the major challenges when estimating albedo with satellite remote sensing data is that such sensors typically measure the outgoing radiance at a given direction while the estimation of albedo needs to account for the outgoing radiance in all the directions of the hemisphere [48,49]. Methods based on the modelling of those bidirectional effects have proven to be effective to overcome this challenge. In this study, we use a method for retrieving soil and canopy shortwave net radiation, proposed by Kustas and Norman [50], based on the different spectral behaviour of soil and vegetation for the photosynthetically active radiation (PAR) and near infrared (NIR) regions of the spectrum. Such approach is based on the radiative transfer model (RTM) described in Campbell and Norman [51] to obtain estimates of soil and canopy albedo as well as canopy transmittance in the PAR and NIR. This approach requires as inputs Leaf Area Index and leaf inclination distribution [52], the different bihemispherical reflectances and transmittances of soil and a single leaf, and the proportion of diffuse irradiance. However, this approach assumes homogeneous canopies and it requires some corrections when dealing with clumped canopies [53], which were also used in this study.
On the other hand, longwave net radiation is primarily driven by the thermal radiation emitted by the surface, which depends on surface emissivity and skin temperature following the Stefan-Boltzman law. Besides, Kirchoff's law can be applied to derive the atmospheric longwave radiation that is absorbed by the surface. When running OSEB models (i.e., METRIC), only those two components are taken into account. However, when using TSEB models (i.e., TSEB-PT and ESVEP), surface anisotropy can also be accounted for when estimating the net longwave radiation, considering that leaves and soil have different temperatures and hence emit different amounts of thermal radiation [50].
Soil heat flux G is usually assumed to be a ratio of the soil net radiation. Choudhury et al. [54], Bastiaanssen et al. [31] suggested that G is ca. 35% of net radiation of the soil around midday hours and this is the approach used in this study by TSEB models. Since net radiation of the soil cannot be computed when using OSEB models, a specific equation suggested by Bastiaanssen et al. [31] is used instead.
In all three evaluated models, λE is estimated as residual of Equation (1). The main difference between the models is in the way in which H is calculated, as briefly described in the sections below.

Mapping Evapotranspiration at High Resolution with Internalized Calibration, METRIC
Sensible heat flux in METRIC [32] is derived in a contextual manner by finding hot and cold pixels (Equation (2)).
where δT is the estimated gradient between aerodynamic and air temperature, estimated as a linear equation function of T rad with c and m parameters are linearly solved from expressing Equation (2b) from two cold and hot endpoints: METRIC scales λE between these two hot (T hot ) and cold (T cold ) endmembers based on a linear relationship between actual ET and reference ET using the standardised ASCE Penman-Monteith equation for an ideal alfalfa field [55]. Therefore, METRIC, as opposed to SEBAL [31], does not assume zero sensible heat flux at the cold pixel, which can have a positive impact on model accuracy at well watered areas under large vapour pressure deficit conditions. According to Allen et al. [32], cold pixels yield a 5% larger ET than the reference ET (λE cold = 1.05λE re f ), but earlier in the season and off-season, cold pixel ET is instead a function of fractional cover or NDVI: λE cold /λE re f = f (NDV I). On the other hand, METRIC overcomes the issue of estimating kB −1 by computing R AH using the profile at two different heights above z 0H . Finally the authors stated the need for either computing an "excess resistance" in aerodynamically rough and dry surfaces when using the δT calibration performed over agricultural areas, or calibrating different δT slopes at different land covers/environmental conditions [32].
For Equation (2) to hold true, δT and H require constant wind speed at the application domain, so the model uses wind speed at blending height to overcome this issue. It also requires constant irradiance and air temperature, i.e., δT changes are only either due to root-zone soil moisture or aerodynamic roughness. Furthermore, the model requires heterogeneity in hydrologic and vegetation conditions and therefore we applied METRIC over two different vegetation domains, short vegetation (crops, grass and shrubs) and tall vegetation (broadleaved and conifer forests as well as wooded savannas). Finally, METRIC is sensitive to the definition of hot and cold pixels. Several different methodologies to find those endmember values were proposed, which can be especially challenging in heterogeneous areas where pixels become mixed at coarse spatial resolution. In our case we adopted the Exhaustive Search Algorithm solution proposed by Bhattarai et al. [56]. Taylor Two-Source Energy Balance Model, TSEB-PT   Two-source energy balance models treat the land surface as two layers, soil and canopy, contributing to the energy and water fluxes (Equation (4))

Priestley-
where soil (canopy) sensible heat flux is computed from the gradient between the soil (canopy) temperature (T S and T C respectively) and the air temperature at the sink-source height (equivalent to T 0 ). In the TSEB-PT model [28,36,53], an electrical circuit analogy is used in which H from soil and canopy are estimated based on three aerodynamic resistances to heat transport arranged in a series network. Since T C and T S are unknown a priori, they are estimated in an iterative process in which it is first assumed that green canopy (expressed as the fraction of LAI that is green, f g ) transpires a potential rate based on Priestley-Taylor formulation [36]: where α PT is the Priestley and Taylor [57] coefficient, ∆ is the slope of the vapour pressure to air temperature curve and γ is the psychrometric constant. Then the canopy transpiration is sequentially reduced (i.e., α PT < 1.26) until realistic fluxes are obtained (λE C ≥ 0 and λE S ≥ 0). TSEB-PT probably is the model that requires most accurate retrievals of physical inputs (LAI and T rad ), and studies already reported larger uncertainty in senescent vegetation (i.e., f g < 1) and very dense (high LAI) or tall vegetation [43,58]. It is more complex than METRIC and therefore has a large number of parameters and modelling options. Finally, the Priestley-Taylor formulation was shown to produce larger uncertainty in high advection conditions, cases in which initializing λE C with a Penman-Monteith formulation showed better results [37]. Combining TSEB-PT model with the disaggregation approach (described in Section 1) results in a disTSEB model [23].

End-Member-Based Soil and Vegetation Energy Partitioning, ESVEP
ESVEP is based on a trapezoid T rad − f cover framework, in which it considers fluxes acting in a "parallel" soil and canopy system [44]. As in TSEB-PT, ESVEP partitions T rad as a linear weight of emitted radiance. Other similar models to ESVEP are HTEM [59] and TTEM [60], but ESVEP solves the trapezoid in a pixel-per-pixel basis overcoming the need for homogeneous weather forcing and roughness (Equation (6a)).
where r a is the aerodynamic resistance, r b,dry and r b,wet are resistances of dry and wet canopy respectively, ρ a is the density or air, C p is specific heat capacity at constant pressure, γ is psychrometric constant and vpd is vapour pressure deficit of the air.

Validation Sites
Year 2017 measurements from eleven eddy covariance (EC) sites were used in this study, covering a wide range of land cover types and climates. Sites are summarised in Table 1 and data used in validation included the four components of net radiation R n (shortwave/longwave downwelling/upwelling), soil heat flux G, sensible heat flux H, and latent heat flux λE. In addition, friction velocity, Monin-Obukhov length, and wind direction data from the EC system was used to estimate the satellite pixel footprint contribution [61,62] to the turbulent fluxes at the satellite overpass. Validation sites comprise 5 agricultural sites, both irrigated and rainfed, including row crops (e.g., Sierra Loma vineyard) and an olive grove (Taous). In addition, two sites over grassland, one humid meadow (Grillenburg) and a semi-arid steppe (Kendall Grassland), one in conifer (Hyltemossa) and one in broadleaved forests (Harvard Forest) are also included in the validation list. Finally, complex heterogeneous landscapes are represented by two wooded savannas (Dahra and Majadas de Tiétar). From all these sites, 3 are in Mediterranean climate, and two more in semi-arid climates, whereas the rest of the sites are located in temperate climates. Error metrics included mean bias error (∑ (Obs. − Pred.)/N), root mean squared error (RMSE = ∑ (Obs. − Pred.) 2 /N), relative RMSE (RMSE/Obs), and Pearson correlation coefficient between observed and predicted. Due to the lack of energy closure in the eddy covariance data, unless otherwise stated all metrics were computed after adding the energy balance residual (residual = R n,EC − G EC − λE EC − H EC ) to the latent heat flux, taking the assumption that errors in measurements of λE are larger than errors in the measurements of H [68].

Input Data Sources
The input data required to run the evapotranspiration models came from three main and two ancillary sources. The main sources were: Sentinel-2 shortwave observations, Sentinel-3 thermal observations and European Center for Medium-range Weather Forecasts (ECMWF) ERA-5 meteorological reanalysis data. The ancillary sources were: a digital elevation model (DEM) from the Shuttle Radar Topography Mission (SRTM) satellite, and land cover map created as part of the ESA Climate Change Initiative (CCI).

Satellite Data
The main satellite data inputs come from the Sentinel-2 (both A and B) and Sentinel-3 (A only) satellites. Sentinel-2 and Sentinel-3 were chosen as the primary satellite data sources for this study for a number of reasons. Firstly, as mentioned previously, when treated as a constellation those satellites are able to satisfy the need for temporally dense observations at high spatial resolution and with good radiometric accuracies. Secondly, they are part of the Copernicus programme, meaning that there is a guaranteed long-term continuity of data into the future. Thirdly, the data from those satellites is free and open and will remain so, again due to being part of the Copernicus programme.
High-resolution shortwave observations needed to characterise the state of vegetation in the evapotranspiration model as well as to sharpen TIR data were obtained by the MultiSpectral Instrument (MSI) on board of the Sentinel-2A & 2B satellites. MSI acquires reflectance information in 13 spectral bands (with central wavelength ranging from 444 nm to 2202 nm) with a spatial resolution of 10 m, 20 m, or 60 m (depending the spectral band) and global geometric revisit of at least 5 days when both satellites are considered [6]. The MSI sensor has 3 spectral bands in the leaf-pigment sensitive red-edge part of the electromagnetic spectrum and two bands in water-content sensitive shortwave-infrared part of the spectrum, in addition to the more traditional visible and near-infrared bands, which makes it well suited for vegetation mapping and monitoring [69]. For each of the validation sites, all Sentinel-2 images for year 2017 were visually scanned and the ones which were cloud, fog and shadow free in the closest vicinity of the flux towers locations were selected for processing.
L1C top of the atmosphere images were converted to bottom-of-atmosphere (BOA) reflectances (L2A) using the Sen2Cor atmospheric correction processor [70] v2.5.5. BOA reflectance values were then used as input to the Biophysical Processor [71] available in the SNAP software v6.0.1 (step.esa.int-last accessed 28 November 2018) in order to obtain effective values of green Leaf Area Index (LAI), Fractional Vegetation Cover (FVC), Fraction of Absorbed Photosynthetically Active Radiation (FAPAR), Canopy Chlorophyll Content (CCC) and Canopy Water Content (CWC). The outputs of the SNAP Biophysical Processor have been validated in a number of studies, with good performance in herbaceous crops [72,73] but overestimation of LAI in bare-soil cases and underestimation of LAI in forests [74]. Those inaccuracies could have an impact on the results of this study in semi-arid and forested areas.
The fraction of vegetation which is green and transpiring ( f g ) was estimated based on Fisher et al. [75] (Equation (7)): where FIPAR is the fraction of photosynthetically active radiation intercepted by green and brown vegetation. FAPAR was obtained from the biophysical processor as described above, while FIPAR was derived iteratively from Equation (8) of Campbell and Norman [51]: where θ is the solar zenith angle at the time of the S2 overpass, and PAI is the plant area index with initial PAI equal to LAI and in subsequent iterations until f g converges. Two assumptions made in Equation (8) are that all intercepted PAR comes from the solar beams, and that both FAPAR and FIPAR are computed from a canopy with a spherical leaf inclination distribution. Indeed, from the the average leaf angle histogram, from which the training database was built in Weiss and Baret [71], most training cases in the Biophysical processor correspond to a spherical distribution (mode at 60 • leaf angle). Equation (9) was subsequently used within the land surface models to convert LAI, which was assumed to represent green LAI [71], into PAI. Afterwards, PAI, leaf bi-hemispherical reflectance and transmittance, together with constant values for soil reflectance in the visible (V IS = [400-700] nm, ρ soil,V IS = 0.15) and near infrared (N IR = [700-2500] nm, ρ soil,N IR = 0.25) were used to quantify the shortwave net radiation of the soil and canopy. Leaf chlorophyll concentration (i.e., C a+b = CCC/LAI) was used to derive the leaf bihemispherical reflectance and transmittance in the visible spectrum after a curve fitting of 45,000 ProspectD [76] simulations. Likewise, equivalent water thickness (i.e., C w = CWC/LAI) was used to retrieve leaf bihemispherical reflectance and transmittance in the NIR spectral region. The thermal data needed to drive the evapotranspiration model was obtained from the Sea and Land Surface Temperature Radiometer (SLSTR) on board of the Sentinel-3A satellite [5]. SLSTR contains 3 thermal infrared (TIR) channels (with two dynamic range settings-for fire monitoring and for sea/land surface temperature monitoring) with 1 km spatial resolution and less than two days temporal resolution with one satellite (less than one day with both A and B satellites) at the equator. For each selected S2 scene, all the S3 scenes falling on the day of S2 overpass or within ten days before and after, were selected for processing. In the current study we used the L2A Land Surface Temperature (LST) product as downloaded from the Copernicus Open Access Hub (https://scihub.copernicus.eu/-last accessed 10 September 2019). The accuracy of this product is reported to be below 1 K when comparing against in situ radiometer measurements and independent operational reference products [77].
Finally, the parameters in the ET models that could not be directly retrieved from shortwave observations (e.g., vegetation height or leaf inclination angle) were set based on a land cover map and a look-up table (see Table 2). The CCI landcover map from 2015 [78], which was produced with a global coverage and 300 m spatial resolution, was used as the initial input layer before being reclassified into the smaller number of classes as shown in Table 2. Out of the parameters set according to the look-up table, the vegetation height (h C ) has the largest influence on the modelled fluxes as it effects aerodynamic roughness [79,80]. Therefore in herbaceous classes where it can change throughout the growing season (grasslands and croplands) it was scaled with PAI using a power law, with maximum value h C,max indicated in Table 2 reached at a prescribed maximum PAI PAI max (5 in croplands and 4 in grasslands) and a minimum value set to 10% of the maximum value. Table 2. Land cover based Look-Up-Table for ancillary parameters used in ET models. CCI-LC is the land cover code for the ESA's CCI land cover legend (http://maps.elie.ucl.ac.be/CCI/viewer/ download/CCI-LC_Maps_Legend.pdf-last accessed 13 April 2020); h C,max is the maximum canopy height occurring when PAI reaches PAI max ; f C is fraction of the ground occupied by a clumped canopy ( f C = 1 for a homogeneous canopy); w C is canopy shape parameter, representing the canopy width to canopy height ratio; l w is the average leaf size; χ Campbell [52] leaf angle distribution parameter.
The meteorological data used in this study consists of air temperature at 2 m, dew point temperature at 2 m, wind speed at 100 m, surface pressure, total column water vapour (TCWV), aerosol optical thickness (AOT) at 550 nm and surface geopotential. Those inputs are obtained from the ECMWF ERA5 reanalysis ensemble means dataset [81]. The only exception was AOT which come from the Copernicus Atmosphere Monitoring Service (CAMS) forecast dataset [82], since it is not included in ERA5. Inputs at the time of the satellite overpass are computed by linear interpolation between the previous and posterior reanalysis timestep. Due to the low spatial resolution of the air temperature and wind speed fields (tens of kilometers) they are assumed to represent the surface conditions derived from conditions above the blending height (100 m above the surface) rather then the actual surface conditions. Therefore, air temperature at 100 m is calculated using the 2 m estimate, ECMWF surface geopotential, SRTM DEM and lapse rate for moist air. Those 100 m estimates are then used as inputs into the land surface flux models. AOT together with TCWV, surface pressure, SRTM DEM elevation and solar zenith angle at the time of Sentinel-3 satellite overpass were used to estimate the instantaneous shortwave irradiance on a horizontal surface at the satellite overpass [83,84].

Thermal Data Sharpening Approach
The thermal data sharpening approach used in this study is based on ensemble of modified decision trees. The basic scheme of the method (Figure 1) is based on Gao et al. [8] and has been previously applied by Guzinski and Nieto [4] to sharpen thermal data to be used as input to evapotranspiration models. Each S3 scene is matched with an S2 scene acquired at most ten days before or after the S3 acquisition and the regression model used for sharpening is derived specifically for each scene pair. Briefly, the atmospherically corrected Sentinel-2 shortwave data (all the 10 m and 20 m spectral bands) with a spatial resolution of 20 m is resampled to match the pixel sampling of the SLSTR sensor (around 1 km spatial resolution). Concurrently, the SRTM DEM is used to derive slope and aspect maps which, together with S3 overpass time, are used to estimate the solar irradiance incident angle of a flat tilted surface. The DEM and the solar angle maps are also resampled to the SLSTR resolution. A multivariate regression model is then trained with the three resampled datasets used as predictors and the T rad used as the dependent variable. The selection of training samples is performed automatically by estimating the coefficient of variation (CV) of all the high-resolution pixels falling within one low-resolution pixel and selecting 80% of pixels with lowest CV. The regression model is based on bagging ensemble [85] of decision trees. The decision trees are additionally modified such that all samples within a regression tree leaf node are fitted with a multivariate linear model, as proposed by [8].
The regression models are trained on the whole S2 tile (100 km by 100 km) as well as on subsets of 30 by 30 S3 pixels in a moving window fashion. Once they are trained they are also applied on the whole scene and on each window. The bias between the predicted high-resolution T rad pixels aggregated to the low-resolution and the original low-resolution T rad is calculated and the outputs of the whole-scene and moving-window regressions are combined based on a weight inversely proportional to the bias [8]. Finally, the T rad predicted by the regression model is corrected by comparing the emitted longwave radiance of the sharpened fine T rad versus the original coarse T rad . A bias-corrected T rad is therefore re-calculated by adding an offset all fine scale pixels falling within coarse scale pixel in order to remove any residual bias. This is done to ensure the conservation of energy between the two thermal images with different spatial resolutions [8]. The output of the sharpening is a 20 m representation of the LST.
This image sharpening approach relies on the direct or indirect relationship that different regions of the shortwave spectrum have with the radiometric temperature and/or the ET process. For instance the temperature of denser canopies, with higher contrast between visible and near-infrared bands, is lower than the temperature of bare soils [86,87]. In addition, surfaces with higher water content (i.e., larger absorption in the short-wave infrared) have a larger evaporative capability and hence lower temperature [88]. Likewise, higher chlorophyll concentrations (i.e., larger absorption in the red and red-edge regions) might lead to higher light and water use efficiency and hence lower temperatures. The information contained in the DEM (i.e., the altitude and solar illumination conditions) also has a direct relationship with radiometric temperature, with sunlit areas having higher temperatures than shaded ones and lower altitude sufraces having higher temperatures than higher altitude surfaces.

Results
The overall performance of the tested models using sharpened temperatures from Decision Trees regressor (hereinafter T rad,DT ) is shown in Table 3. Scatter plots of modelled versus measured fluxes for all the validation sites are in the Supplement. We removed all the cases in which the S3 image was contaminated by clouds in the vicinity of the flux towers or in which the SLSTR view zenith angle was larger than 45 degrees. In addition, we filtered all cases where estimated R n ≤ 50 W m −2 , assuming that noisy outputs will be produced under low available energy, as well as those yielding unrealistic fluxes during daytime (≤−500 W m −2 and ≥1000 W m −2 ). After filtering the data, more than 400 cases were available overall for the following analyses. However, it is worth noting that ESVEP yielded significantly fewer valid retrievals. This issue might be due to the fact that ESVEP's end-member estimation equations were designed and parametrised for herbaceous crops [44] while in this study they were applied to varied land-covers. All models returned a similar performance regarding the estimation of R n , with mean bias between −10 and −24 W m −2 , RMSE ranging between 49 and 59 W m −2 and r above 0.91. This similar behaviour is explained by the fact that all models share the same approach and same inputs in modelling net shortwave radiation, which is the component with larger magnitude of R n . Likewise, G showed similar behaviour as well, but in this case G METRIC is computed differently as it is a function of surface R n [31,32] as opposed to TSEB and ESVEP where, as two-source models, G is computed from R n,S [36,44].
The main differences in model performance are therefore in the estimation of turbulent fluxes (i.e., sensible and latent heat fluxes), and TSEB (TSEB-PT and disTSEB) usually produced most accurate estimates in terms of RMSE (≈80 W m −2 , 45% relative error, in H; and ≈90 W m −2 , 45% relative error, in λE) and higher correlation between observed and predicted values (≈0.67 for H and ≈0.76 for λE). disTSEB performs slightly better than TSEB-PT but the difference is not significant. For METRIC and ESVEP, the RMSE values are in all cases higher than 120 W m −2 (going as high as 220 W m −2 in case of H modelled with ESVEP) and with lower correlation (≤0.47).
The choice of closing the energy balance gap in field measurements by assigning it to λE has influence on the above results. Therefore, in Table 4 we also present the accuracy statistics of the turbulent fluxes when Bowen ratio is preserved during the energy gap closure procedure. The overall ranking of the models is preserved with the TSEB models still obtaining the lowest RMSE and highest correlation coefficients. However, the differences between the models (particularly in case of RMSE) are not as large as in Table 3. In particular the RMSE of the TSEB models increases significantly while there is a decrease in r, while the influence of closure method on the other two models is much weaker with the RMSE of ESVEP even decreasing slightly. In subsequent analysis we always assign the residual energy to λE. Table 3. Error metrics for METRIC, TSEB-PT, disTSEB (TSEB-PT with flux disaggregation) and ESVEP modelled fluxes using Decision Tree sharpened temperatures and closing the energy balance gap in field measurements by assigning residual energy to latent heat flux. N, number of valid cases; Obs.; mean of observed values (W m −2 ); bias, mean difference between predicted and observed (W m −2 ); MAE, Mean Absolute Error (W m −2 ), RMSE, Root Mean Square Error (W m −2 ); rRMSE, Relative RMSE (-); r, Pearson correlation coefficient (-). In order to evaluate the model sensitivity and uncertainty to different vegetation types, we have split the results of Table 3 into four main vegetation types, depending on differences in aerodynamic roughness, horizontal homogeneity and/or seasonal dynamics/senescence (i.e., croplands, grasslands, savannas and forests, Table 5). Similar to the overall results, the TSEB models output most accurate turbulent fluxes across all four vegetation types. They obtain the best results for H in grassland (RMSE ≈ 70 W m −2 , r ≈ 0.8) and for λE in cropland (RMSE ≈ 80 W m −2 , r ≈ 0.75). In grassland and cropland TSEB-PT and disTSEB produce very similar fluxes while in savanna disTSEB improves the accuracy of modelled H and λE by up to 10 W m −2 . METRIC has its best overall performance in savanna (RMSE of 132 W m −2 and r of 0.43 for H; RMSE of 99 W m −2 and r of 0.61 for λE) followed by cropland while ESVEP produces inaccurate H in all vegetation types (rRMSE > 1) and its best overall λE in grassland. It should also be noted that RMSE of R n is for all models double in savanna (≈65 W m −2 ) than in the other land cover types. This is due to vegetation being most sparse at those sites meaning that uncertainties in estimation of albedo and emissivity of soil have the biggest influence on shortwave and longwave net radiation respectively. Finally, very few valid cases are available to evaluate the forest sites and hence the results are not very conclusive, with the TSEB models again outperforming the METRIC and ESVEP. The agriculture class was further split into herbaceous and woody types, with results shown in Table 6. The former sub-class represents crops such as corn, soybean or wheat while the latter represents olive groves and vineyards. TSEB models produce the most consistent results for both types of crops, although somewhat surprisingly the RMSE of λE in woody crops (76-79 W m −2 ) is significantly lower than in herbaceous crops (91-93 W m −2 ), while opposite is the case for RSME of H (69-71 W m −2 in herbaceous crops and 91-94 W m −2 ). rRMSE of λE in both agricultural sub-classes was 0.32 which is of the same magnitude as energy closure gap at the validation sites (e.g., the mean value at CH was 0.34 at the times at which fluxes were modelled). METRIC is very clearly performing better in woody crops, while ESVEP obtains better results for H in herbaceous crops and better results for λE in woody crops. It is also worth noting that R n and G showed larger relative errors in woody crops than in herbaceous crops, since woody canopies are more complex and therefore more difficult to capture by the models and/or parametrizations used [89,90]. Finally, Table 7 lists the model performance depending on whether sites are under Mediterranean and semi-arid climate (i.e., water limited sites), or sites under temperate climate (i.e., energy limited sites). First of all it is worth noting that due to cloud coverage conditions, more valid cases are obtained over semi-arid conditions than in temperate areas. TSEB models showed similar range of errors in both climatic conditions, with RMSE in λE at around 85 W and 99 W m −2 for semi-arid and temperate conditions, and correspondingly around 80 and 70 W m −2 for H. ESVEP and METRIC yielded more varying results between climates, with METRIC producing more accurate estimates of both H and λE in semi-arid conditions and ESVEP showing better performance for H in temperate climates and better performance for λE in semi-arid climates.

ET Model Intercomparison
Overall results listed in Table 3 show that TSEB models produced more robust estimates of both sensible and latent heat fluxes, with lower errors around 80 to 90 W m −2 and larger correlation coefficient, while at the same time returning more valid cases than the other two models, METRIC and ESVEP. Those errors are within the expected and reported errors in literature, e.g., Kalma et al. [2] showed errors in λE ranging between 24 and 105 W m −2 for a wide range of models, Chirouze et al. [58] reported errors for TSEB > 100 W m −2 in a semi-arid area of Mexico, and 50 W m −2 errors are reported in Tang et al. [35]. Choi et al. [91] found TSEB-PT and METRIC produced similar errors of 54 W m −2 in a watershed in Iowa, US. However it is worth noting that most of the reported errors in these studies [35,58,91,92] used actual surface temperature at high spatial resolution (e.g., Landsat or ASTER), whereas in this study we used low resolution temperature sharpened to high spatial resolution, which provides an additional input uncertainty to the models. For that reason, Section 4.2 is dedicated to this issue in depth.
TSEB-PT was developed trying to solve some of the issues in sparse vegetation and semi-arid conditions previously raised by less complex models [36], and therefore it adapts better to a wider range of climatic and vegetation conditions [1] as it was shown in Tables 5 and 7. METRIC, on the other hand, was primarily designed for standard crops and requires concomitant presence of stressed and well watered-full vegetation conditions within the scene itself. This more often happens in semi-arid climate where irrigated crops and rainfed crops and natural vegetation are present. Those cases in which, either due to the increased presence of clouds (i.e., fewer available pixels in the scene) or in regions where these hot and cold pixels cannot be simultaneously found, METRIC would produce more uncertain retrieval, as already pointed by Choi et al. [91] and Tang et al. [35] (in humid or sub-humid areas), or even would not produce any valid data. Similarly, ESVEP was designed and tested in an agricultural area located in a subhumid and monsoon climate [44] and therefore certain assumptions and parameterizations taken in that model might not transfer well to other vegetation or climatic conditions.
Despite of TSEB being the model with the largest required amount of input data, this study proposed several new approaches to retrieve some of those inputs operationally, with special focus on exploiting the spectral capabilities of Sentinel-2, in particular the bands in the red-edge region that is sensitive to leaf pigments. A simple empirical approach relating leaf bihemispherical reflectance and transmittance with the leaf biochemical properties resulted in accurate estimates of net radiation. More importantly, due to the larger uncertainty of TSEB models over senescent vegetation, we derived a method to obtain both total LAI and its green fraction based on Fisher et al. [75] FAPAR/FIPAR relationship. Nevertheless, more research is needed to systematically derive other vegetation properties such as canopy height/aerodynamic roughness or vegetation clumping.
Finally, it is worth pointing out that even in situ EC measurements are prone to uncertainty as is confirmed for instance by the usual energy imbalance in those systems. Particularly we found a larger disagreement between observed and predicted net radiation in Dahra (see Figures in the Supplement). We hypothesise that this could be due to two possible reasons. Firstly, our modelled irradiance, with depends on TCWV and aerosol optical thickness, could be more noisy at Dahra than the other sites, due to unaccounted dust aerosols in that site placed over the Sahel. The second issue might be the actual R n measurements, as in this site only a NR-lite (Kipp & Zonen, Netherlands) is available to measure global R n that might be less accurate than the radiometers at the other sites, which are measuring the four components of radiation. In addition, Harvard Forest site lacks in situ G measurements, which effects the energy balance closure correction. This issue together with the fact that very few cases are available in forests (Table 5), leads us to avoid strong conclusions regarding the performance of the models in forested areas.

Sharpening and Disaggregation
As was previously mentioned, thermal sharpening relates empirically or semi-empirically coarse resolution surface temperature with fine resolution multispectral and other ancillary data. This technique could be a sound alternative to the lack high resolution thermal imagery for operational activities. However, previous studies in thermal sharpening have reported some uncertainties when compared to actual T rad temperatures, with errors ranging up to 3.5 K [7][8][9]11,12,14]. Therefore, for some applications requiring ET estimates at higher accuracy (i.e., precision agriculture), sharpening might not be considered as a suitable substitute of T rad but complementary to it, such as in the fusion approach by Knipper et al. [93].
In order to reduce flux retrieval errors with sharpened T rad inputs, we also tested a flux disaggregation method [10,23]. Our results listed in Tables 3-7 show that disTSEB model, i.e., coarse S3 TSEB-PT fluxes disaggregated with fluxes derived with TSEB-PT and fine resolution sharpened T rad , yielded only modest improvement (5 W m −2 reduction in RMSE in case of H and only 1 W m −2 in case of λE) to the TSEB-PT model, i.e., running TSEB directly on the sharpened T rad imagery. The one exception was at the savanna sites (see Table 5) where using disaggregation reduced the errors in H by around 12% and errors in λE by around 10%. However, previous studies have shown the robustness of this approach to overcome limitations of the likely less reliable fine resolution T rad images [4,93,94]. Furthermore, coarse input data must be produced beforehand for thermal sharpening and hence it is readily available for running the models at coarse resolution, which indeed is computationally inexpensive given the much lower number of pixels within a scene. Therefore, flux disaggregation would still be recommended when running TSEB-PT with sharpened temperatures.
In addition, the sharpening of a coarse resolution T rad image using fine resolution images acquired on different days, with a maximum of 10 days offset, might lead to additional uncertainties. This is caused by the fact that some changes in either land cover properties, (e.g., vegetation growth, harvests, fires) or moisture conditions (e.g., rainfall or irrigation) might happen between the Sentinel 2 and 3 acquisitions. Figure 2 shows that at a general level (all validation sites taken together) this does not appear to be a significant issue as the error does not increase as the day offset between thermal and shortwave acquisitions gets larger. Particularly relevant in this analysis is H since it is the energy component that is directly related to T rad , and hence more prone to errors in sharpening. However, more studies should be conducted to look at the effect of the day offset in particular situations, e.g., in crops during senescence or with localized irrigation patterns. It might be also worth to investigate using high-resolution radar data (e.g., from Sentinel-1), which is sensitive to soil moisture, in the thermal data sharpening approach [95]. Furthermore, the Landsat family of satellites could also be utilised during the sharpening since they acquire thermal data at around 100 m spatial resolution although at 8 days (two satellites) to 16 days (single satellite) temporal resolution. Using observations from those satellites would both increase the temporal density of the high-resolution data but also capture physical processes and properties which are not reflected in the shortwave data, such as near soil surface soil moisture and soil evaporative efficiency.
Finally, some studies have reported larger errors than in this study, but they were using coarser resolution imagery [43]. This is probably due to the scale mismatch between the coarse pixel estimate and the footprint of the EC towers' measurements. We evaluated this by comparing fluxes modelled at Sentinel-2 (i.e., with sharpened T rad ) and Sentinel-3 spatial resolutions against measurements form towers. This was done for all the sites put together and also for the validation sites split into two categories: those in which the tower is located in a landscape feature too small to have significant effect on the original resolution Sentinel-3 T rad (category "small" containing CH, KL, GR and SE sites), and those where the opposite is true (category "large" containing SL, DA, HF, HTM, MT TA and KG sites). The results (Table 8) indicate that using sharpened T rad is most important when modelling H in the "small" category. However, the correlation of high-resolution fluxes against tower measurements is in almost all the cases higher than that of low-resolution fluxes and rRMSE is lower or the same in case of turbulent fluxes. Therefore, even though sharpened T rad might be more prone to errors than actual high-resolution T rad , it is still a good option for downscaling fluxes for model validation [4], addressing therefore the vegetation cover variability within coarse resolution pixels. Nevertheless, there is still an open question on how feasible thermal sharpening is for early detection of water stress at small scales, compared to using high resolution thermal imagery. This issue is especially relevant for precision irrigation tasks and therefore future studies should address this topic.

Effects of Ancillary Inputs
Ancillary data is required to characterise the canopy structure, since it affects both the radiation transmission through the canopy [53], and hence albedo and radiation partitioning, as well as the surface aerodynamic properties [79]. In this study we have used a static land cover map at global scale to assign some standard values to each land cover type (Table 2). However, the large difference in spatial resolution between the S2 data and CCI map can lead to visible artefacts in the output fluxes when modelled at 20 m resolution, especially on the edges of two classes with different vegetation properties (e.g., croplands and forests). However, those spatial artefacts seem not to have any influence on the validation results. Nevertheless, some discrepancies were found between the land cover type flagged by the map and the actual type at the validation sites. In Majadas de Tiétar, CCI-LC flagged the site as cropland (CCI-LC = 11), thus h c,MAX = 0.5 m, f c = 1 and l w = 0.02 m, but actually this site is a savanna with 8 m tress at 20% coverage (CCI-LC = 30). In addition, the prescribed values that were assigned in Table 2 are very general, as they are trying to fit a global-based land cover legend. Therefore they can significantly deviate from the site's actual values. Indeed, all croplands were assumed to be not clumped ( f c = 1) although row crops, like the vineyard in Sierra Loma, or orchards like the olive grove in Taous have very different canopy structure compared to a standard crop. Therefore, a significant improvement could be expected if a more area-specific surface characteristics parametrization was used, either using some ancillary remote sensing like SAR imagery or LiDAR or a regional/local oriented land cover classification.
To conclude, atmospheric forcing from numerical weather prediction models might add some uncertainty to the ET model compared to using local meteorological data, specially for precision agriculture where access to local agrometeorological stations is possible. In this study we relied on ERA5 reanalysis data and despite large discrepancy between spatial resolution of ERA5 (tens of kilometres) and point scale measurements from the towers there is a strong agreement for the most important meteorological parameters (Figure 3). Instantaneous shortwave irradiance, which was computed at the Sentinel 3 overpass time using ECMWF AOT and TCWV dataset, showed no systematic bias but a RMSE of 29 W m −2 . This in turns directly effects on the accuracy of R n (Tables 3-7), and indirectly that of λE since it is estimated as a residual of the energy balance. Therefore, errors in λE could be significantly reduced if more accurate inputs of irradiance were used, especially over temperate areas (i.e., radiation limited) which are more sensitive to uncertainty in available energy. On the other hand, the errors in both air temperature (RMSE = 1.8 K) and windspeed (RMSE = 1.3 m s −1 ) affect mainly on the retrievals of sensible heat flux. This issue become more relevant in the estimation of λE over semi-arid (i.e., water limited) areas. For near-real-time applications it is necessary to use forecast or analysis data, instead of the ensemble mean reanalysis data, and those issues could become more evident.

Conclusions
The multispectral shortwave images acquired by the Sentinel-2 satellites at 10-20 m spatial resolution are highly suitable for characterizing vegetation in order to derive inputs required for evapotranspiration models. At the same time, thermal data acquired daily by Sentinel-3 satellites is also suitable as input to the ET models. However, its low spatial resolution (1 km) needs to be increased if the models are to be run at the scale of predominant landscape features, which is usually on the order of tens of meters. This study evaluated three thermal-based remote sensing ET models (METRIC, ESVEP and TSEB-PT) and a thermal sharpening method (ensembles of modified Decision Trees) in order to derive methodology for operational estimates of water and energy fluxes using Sentinel data and applicable for the whole globe. Further evolution of the thermal sharpening methodology by using other data sources with high spatial resolution and variable temporal resolutions, e.g., Sentinel-1 radar [95] or Landsat thermal observations [96], is planned.
TSEB-PT produced overall the most accurate estimates in terms of sensible heat and latent heat (i.e., evapotranspiration) fluxes, being robust in different land covers and climates. Additional disaggregation step further improved TSEB-PT output accuracy in savanna ecosystems. Without any site-specific tuning and relying only on global datasets the methodology achieved RMSE of 80-90 W m −2 for modelled instantaneous H and λE across eleven validation sites located in different land cover classes and climatic conditions. In an agricultural setting the modelled fluxes were more accurate with rRMSE of λE of around 0.3 which is of the same magnitude as uncertainty of the measured turbulent fluxes from the validation dataset. Until a new generation of thermal satellites are launched [97], the proposed methodology will be useful solution for overcoming the lack of thermal data with high spatio-temporal resolution required for operational ET modelling at field scale.
Supplementary Materials: The following are available at http://www.mdpi.com/2072-4292/12/9/1433/s1, Figure S1: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Sierra Loma, Figure S2: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Choptank, Figure S3: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Dahra, Figure S4: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Grillenburg, Figure S5: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Harvard, Figure S6: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Hyltemossa, Figure S7: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Klingenberg, Figure S8: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Majadas, Figure S9: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Selhausen, Figure S10: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Taous, Figure S11: Model scatterplot of predicted vs. EC observed using sharpened Trad with Decision Trees for Kendall Grassland.