Estimating Evapotranspiration over Heterogeneous Surface with Sentinel-2 and Sentinel-3 Data: A Case Study in Heihe River Basin

: Evapotranspiration (ET) is an important part of surface–atmosphere interactions, con-necting the transfer of matter and energy. Land surface heterogeneity is a natural attribute of the Earth’s surface and is an inevitable problem in calculating ET with coarse resolution remote sensing data, which results in signiﬁcant error in the ET estimation. This study aims to explore the effect and applicability of the evaporative fraction and area fraction (EFAF) method for correcting 1 km coarse resolution ET. In this study we use the input parameter upscaling (IPUS) algorithm to estimate energy ﬂuxes and the EFAF method to correct ET estimates. Five ground stations in the midstream and downstream regions of the Heihe River Basin (HRB) were used to validate the latent heat ﬂux (LE) calculated by the IPUS algorithm and EFAF method. The evaluation results show that the performance of the EFAF method is superior to that of the IPUS algorithm, with the coefﬁcient of determination (R 2 ) increasing, the root mean square error (RMSE) decreasing, and the mean bias error (MBE) decreasing by 17 W/m 2 on average. In general, the EFAF method is suitable for correcting the deviation in LE estimated based on Sentinel data caused by land surface heterogeneity and can be applied to obtain accurate estimates of ET.


Introduction
Evapotranspiration (ET) connects the fluxes of water with energy and is a critical component of hydrological and energy balance models [1]. Access to reliable estimates of ET is important to many applications, including in the fields of climatology, meteorology, and agronomy [2]. For example, as a key indicator of crop growth, ET can be used to calculate the amount of water that is consumed by crops, so it holds vast potential for agricultural irrigation [3][4][5][6]. Climate change can cause changes in temperature and wind speed, which will change the processes of the water cycle and energy balance. Therefore, ET can be used as a "sensor" of climate change to reflect global climate conditions [7]. Currently, the ET estimation models may be broadly classified as [8]: (1) fully physically models [9][10][11], including the surface energy balance system (SEBA) proposed by Su [12], Shuttleworth and Wallace model [13], two-source model (TSM) [14], and Penman-Monteith algorithms [15][16][17]; (2) semi-physically models, such as land surface temperature-vegetation index space algorithms [18][19][20], Priestley-Taylor algorithms [21][22][23], and the surface energy balance algorithm for land (SEBAL) model [24]; (3) black-box models based on artificial neural networks, empirical relationships, and fuzzy and genetic algorithms [25,26], such as the complementary relationship areal evapotranspiration (CARE) model [27]. These algorithms and models have been widely applied.
However, land surface heterogeneity is an inevitable problem in calculating ET with coarse resolution remote sensing data [28][29][30], and mainly includes different surface landscapes, surface variables, and surface topography [31,32]. Many studies have shown that land surface heterogeneity can cause errors in latent heat flux (LE) estimations [33]. For example, the topographic effect can bring noise to ET retrieval [34]. This study mainly focuses on the inhomogeneity of landscapes. On the one hand, different surface landscapes cause changes in the structure of the atmospheric boundary layer and physical processes, such as sensible heat flux (H) and LE [35]. On the other hand, many models for calculating ET are built on a fine scale [36], assuming that the land surface is flat and evenly distributed [37]. Nonetheless, the condition of the land surface is extremely complicated, and mixed pixels are probably produced by land surface heterogeneity [31]. When estimating coarse-resolution and medium-resolution ET, the parameters of the dominant landscape in a mixed pixel become representative of the entire pixel, resulting in inaccurate ET results [38]. Thus, it is necessary to address the problem of land surface heterogeneity to obtain a more accurate ET.
Some studies have demonstrated that the heterogeneous surface causing biased estimates of LE can be addressed through: (1) a correction-factor compensation method, which adds the correction factors to the surface variables that are required by the ET model, causing the surface variables to be as close to the true value as possible, thus achieving the purpose of correcting LE [39]. This method is simple to apply, but the correction factors and correction coefficients vary by regions [38]; (2) a temperature downscaling method, which downscales the coarse-resolution land surface temperature (LST) to a high-resolution LST using statistical-based models or neural networks models to further calculate highresolution LE values [40][41][42]. The disadvantage of this method is that new errors will be generated on the LST during the resampling or downscaling process, which will result in issues with the LE estimates; (3) an area-weighting method, which uses high-resolution land cover classification data to decompose the mixed pixels of the large-resolution data, and calculates the roughness length and H based on subpixel landscapes, then weighs the H of each sub-pixel according to the area ratio to obtain the H of the entire mixed pixel. Eventually, LE is corrected through correcting H. This method can improve the accuracy of estimating LE to a certain extent but does not fundamentally solve the heterogeneity of the land surface [43,44]; (4) an evaporative fraction and area fraction (EFAF) method. Li et al. first applied the EFAF method to correct the deviation of the LE with a spatial resolution of 300 m and proved its feasibility [45]. This method assumes that the available energy (AE) of different land types in a mixed pixel is equal to the AE of the entire mixed pixel, and that the evaporation fraction (EF) of a certain land type in a mixed pixel has some relationship with the EF of the neighboring pure pixels of the same land type. It corrects the error of ET estimates by decomposing mixed pixels to obtain the EF of the sub-pixel and selecting the EF of nearest pure pixels to replace the EF of the sub-pixel.
In recent years, algorithms for correcting biased ET estimations caused by land surface heterogeneity have mainly focused on the temperature downscaling method [2,46,47]. By obtaining high-resolution ET, the mixed pixels in low-resolution data are reduced, and the influence of the heterogeneous surface on ET is reduced [44]. However, most algorithms for estimating low-resolution ET ignore the problem of surface heterogeneity [48], and low-resolution ET products retrieved from satellite data, such as AVHRR, MODIS, and Himawari, are also not corrected [49,50], so it is necessary to take land surface heterogeneity into account and develop a correcting method for ET estimations with data of coarse resolution. Since correction coefficients are difficult to extend to a large area, the applicability of the correction-factor compensation method is limited [39]. Compared with the area weighting method, the EFAF method utilizes EF with good stability as a bridge to correct ET, and does not need to calculate H, so it can directly correct low-resolution ET products in combination with high-resolution land classification data. At present, this method is applied to only HJ-1 data with a 300 m spatial resolution, and its applicability to other data is not yet known. Therefore, the study chooses Sentinel images with a short release time as the study data in order to explore the effect and applicability of the EFAF method for correcting 1 km coarse resolution ET.

Study Area
The study areas are located in the midstream ( Figure 1a) and downstream (Figure 1b) regions of the HRB. The Heihe River is the second largest inland river in China, which lies in the regions of northwest China, as shown in Figure 1. The latitude of the HRB ranges from 37 • 42 to 42 • 42 N and longitude 97 • 06 to 102 • 00 E, having a unique natural landscape, with land cover types ranging from "ice and snow to frozen soil, forests, rivers, lakes, oases, and the Gobi desert" from south to north [51]. The HRB covers an area of approximately 143,000 km 2 , bordering Ejina Banner in the north, Qilian Mountain in the south, and Dahuang Mountain in the east. Precipitation is scarce in the downstream region, with an annual precipitation of less than 45 mm, and the land in this area is mostly barren land or shrubland. The main crops in the midstream region of the HRB are wheat and maize, and the average annual rainfall and average annual temperature are 136.8 mm and 7.5 • C, respectively [5].   [52], respectively, having an orbital phase difference of 180 • and operating in a sunsynchronous orbit with a height of 786 km and an inclination of 98.5 • (https://scihub. copernicus.eu, last accessed: 6 December 2020). A multispectral instrument (MSI) is mounted on the Sentinel-2 satellite, with an imaging width of 290 km, including 13 spectral bands, and a spatial resolution of 10 m, 20 m, or 60 m [53]. Visible and near-infrared bands (Band 2, Band 3, Band 4 and Band 8) are a sensitive canopy structure and suited for vegetation mapping and monitoring [2]. Three spectral bands (Band 5, Band 6 and Band 7) in the leaf-pigment are sensitive to the red-edge part of the electromagnetic spectrum, and bands at 60 m are mainly designed for atmospheric corrections and cloud screening (Band 1 is sensitive to aerosols retrieval; Band 9 is sensitive to water vapor correction, and Band 10 is sensitive to cirrus detection), which have adequate resolution to capture the spatial variability of the atmospheric geophysical parameters [52]. The revisit period of each satellite is 10 days, and the revisit period of both satellites running at the same time is shortened to 5 days. The European Space Agency (ESA) provides Sentinel-2 level-1C products, which can be converted to bottom-of-atmosphere (BOA) reflectance (level-2A) using the Sen2Cor algorithm of atmospheric correction in the SNAP software [53,54]. Then, BOA reflectance values are used as input data to calculate vegetation biophysical parameters, such as the normalized difference vegetation index (NDVI) and fractional canopy coverage (FVC) [55]. The overpass time of Sentinel-2 in the study area is at 12:05 p.m., and the overpass time of Sentinel-3 is uncertain. Therefore, in this study, 18 highquality Sentinel-3 images during vegetation growing period that were acquired on the same days as the Sentinel-2 images and had an overpass time at approximately 12 p.m. Sentinel-3 is composed of the Sentinel-3A and Sentinel-3B satellites, which were successfully launched in February 2016 and April 2018 [56], respectively. These satellites have a sun-synchronous orbit with an orbital height of 815 km and an orbital inclination of 98.65 • and are equipped with an ocean and land color instrument (OLCI), a sea and land surface temperature radiometer (SLSTR), a synthetic aperture radar altimeter (SRAL), and a microwave radiometer (MWR) [57]. The SLSTR can provide reliable global sea surface temperature and land surface temperature data in the forward (55 • ) observation mode and vertical (0 • ) observation mode. With its single field of view width (1400 km), the temporal resolution is less than one day, considering the 3A and 3B satellites. In addition, the SLSTR has six VIS-NIR/shortwave infrared channels (S1-S6, spatial resolution: 500 m) and three thermal infrared channels (S7-S9, spatial resolution: 1 km) [58].

Auxiliary Data
The auxiliary data required include land cover classification data and weather-driven data. The classification system of the land cover classification data referred to the International Geosphere-Biosphere Program (IGBP) classification system and the FROM_LC classification system and divided the land cover into 10 categories: cropland, forest, grassland, shrubland, wetland, water bodies, buildings, barren land, glacier, and snow, with an overall accuracy of 92.19% [59][60][61]. The spatial resolution of this dataset is 30 m, and the temporal resolution is one year from 2015 to 2019. In this study, the weather-driven data consisted of wind speed at 10 m, dew point temperature at 2 m, air temperature at 2 m, and surface pressure and total column water vapor data that were obtained from the ERA5 datasets (https://cds.climate.copernicus.eu/, accessed: 12 October 2020). The spatial resolution of the ERA5 datasets was 10 km [62], which was downscaled to 1 km using the bilinear interpolation method to match the spatial resolution of Sentinel-3 LST data, and were used as input parameters to calculate H.

Ground Measurements
The ground measurement data come from the "National Qinghai-Tibet Plateau Scientific Data Center" (http://data.tpdc.ac.cn, accessed: 20 November 2020), including eddy covariance system (EC) data and meteorological data, which are used to validate the results of the retrieved energy balance components [63]. The temporal period of the EC data, which mainly include H, LE, and wind speed et al., was 30 min. Since the observation conditions assume that the land surface is uniform and the atmospheric conditions are neutral, the actual surface conditions are very complicated, and the observed data cannot truly reflect the turbulent characteristics of the region; therefore, the data needed to be corrected, including eliminating the abnormal points, correcting the delay time, rotating the coordinates, and correcting the frequency response. The meteorological data were measured by the automatic weather stations (AWS) with a ten minutes interval, including the net radiation (Rn), soil heat flux(G), meteorological parameters, soil temperature, and humidity. The processing of the meteorological data included removing duplicate data, deleting abnormal data, and unifying the time format [64]. Three ground stations (Zhangye, Daman, and Huazhaizi) in the HRB midstream region and two flux sites (Huangmo and Sidaoqiao) in the HRB downstream region were equipped with EC and AWS. This study extracted ground data according to the Sentinel overpass time. The specific information is shown in Table 1.

Evaporative Fraction and Area Fraction (EFAF) Method
The EFAF method only needs low-resolution energy fluxes and high-resolution land classification data as input data, and its principle is: using high-resolution land classification data to decompose mixed pixels of the coarse-resolution data, and then correcting the EF values of the sub-pixels in the mixed pixels [45], thereby selecting the EF of nearest pure pixels to replace the EF of the sub-pixel to correct the LE values of the mixed pixels. The specific process is as follows: The EF is usually defined by LE and AE (Rn-G) [65] as follows: where EF is the evaporative fraction, LE is latent heat flux, Rn is net radiation, and G is soil heat flux.
Assuming that turbulence occurs only in the vertical direction, the LE of a mixed pixel is weighted by the LE of the sub-pixel, as shown in Equation (2): LE is the accurate LE of mixed pixels, i denotes the sub-pixel, s i is the area ratio of the sub-pixel i in the mixed pixel, LE i represents the LE of sub-pixel i in a mixed pixel, and R n − G i and EF i are the AE and EF of sub-pixel i.
It is assumed that the AE of each sub-pixel is equal and equal to the AE of the entire mixed pixel; therefore, Equation (2) can be transformed into: LE is the LE of a mixed pixel under the assumptions, which has a minor error with LE.
Combined with Equation (1), Equation (3) can be transformed into: where EF is the EF of a mixed pixel assuming that the AE of sub-pixels are equal in that mixed pixel. Assuming that the EF of sub-pixel i of a certain land type in a mixed pixel is equal to the EF of the nearest pure pixels of this land cover type, the EF of sub-pixel i can be replaced by the EF of the pure pixels to obtain the EF of the mixed pixel.
Incorporating Equation (4) into Equation (3), Equation (5) is obtained: Equation (5) shows that the EFAF method uses the EF of the sub-pixels in the mixed pixel as a bridge and achieves the goal of correcting LE by correcting the EF.
The following is a simple example to illustrate the EFAF method. In Figure 2, A, B, and C denote different land cover types, where pixel 1, pixel 2, and pixel 4 are pure pixels, whereas pixel 3 is a mixed pixel. The EF of A in pixel 1, EF of B in pixel 2, and EF of C in pixel 4 are used to replace the EF of A, EF of B, and EF of C, respectively, in pixel 3; thus, the EF of pixel 3 is obtained, which is then multiplied by the AE of pixel 3 to acquire the corrected LE. The flowchart for correcting the LE based on the EFAF method is shown in Figure 3. First, calculate a 1 km spatial resolution EF through energy fluxes. Then, resample the 30 m spatial resolution land classification data to 1 km according to the area ratio, and count pure pixels and mixed pixels in the 1 km land classification data. Meanwhile, obtain the sub-pixels land cover types and their area ratios according to the mixed pixels. The next step is to obtain the EF of each pure pixel through the EF data and the pure pixel data, and to then replace the EF of the sub-pixel in the mixed pixel with the EF of the nearest pure pixels of same land cover type, so that the corrected EF of the mixed pixel is obtained after area weighting. When the spatial resolution of the data is large, there is a possibility that no pure pixels can be found, so new measures need to be adopted. This topic is discussed in Section 6.2. Finally, the corrected EF is multiplied by the AE to obtain the corrected LE.

Parameter Retrieving
The parameters required by the ET model are calculated using Sentinel-2 and Sentinel-3 data.
NDVI is an essential component of any study aiming to investigate ecosystem services, especially those where vegetation, water, and biodiversity are involved [66]. For example, NDVI is a sensitive indicator of canopy structure, which can monitor vegetation growth [67], and its utility in indicating the soil moisture has been well demonstrated over the past two decades [68]. The NDVI products derived from moderate-resolution imaging spectroradiometer (MODIS) [69], advanced very high-resolution radiometer (AVHRR) [70] and Himawari-8 [71] have provided moderate-to low-resolution NDVI time series globally, and have been widely applied in studying dynamic monitoring, vegetation phenology, and global change [72]. However, widespread contamination (due to aerosols, clouds, etc.) and the long revisit period of these high-resolution satellites induce large gaps in the NDVI time series data and limit their application in related studies [73]. From the Sentinel-2 level-2A data, NDVI was retrieved through the B4 band (red) and B8 band (near-infrared), as shown in Equation (6).
where ρ nir is the reflectance in the near-infrared band and ρ red is the reflectance in the red band. The calculation of the FVC and leaf area index (LAI) was carried out through the Biophysical Processor in SNAP software. First, the Sentinel-2 level-2A product needed to be resampled by SNAP software to unify the spatial resolution of each band to 10 m, and then eight band data (B3-B7, B8a, B11, and B12) were input into the Biophysical Processor module in SNAP software. Finally, FVC, LAI, and their quality indicators were automatically obtained [54]. The Biophysical Processor module combined the PROSPECT+SAIL coupled radiation transmission model with an artificial neural network [2]. Its procedure to derive LAI and FVC was: using the PROSPECT model to simulate the upward and downward radiant flux of leaves to obtain the reflectance and transmittance of vegetation leaves; using the SAIL model to combine the reflectance and transmittance of vegetation leaves with environmental parameters, such as soil reflectivity and solar altitude angle, as input data, thus obtaining the vegetation canopy reflectance data. The vegetation canopy reflectance and eleven parameters, such as the zenith angle, were used as the input layer, and LAI and FVC were obtained through the back propagation artificial neural network [55].
The surface albedo was calculated from Sentinel-2 data using an algorithm (Equation (7)) proposed by Bonafoni (2020) [74]. The algorithm was simple, directly converting the surface reflectance to surface albedo, and required images acquired in clear and cloudless conditions. The root mean square error (RMSE) of this algorithm was good, with an average error of approximately 0.02.
However, it should be noted that Sentinel-2 BOA reflectance spectra were overestimated using the Sen2Cor algorithm [75], which affects the Rn and the albedo [76].
The algorithm for retrieving LST using Sentinel-3 data was provided by Li [77]. First, the vegetation cover method was used to estimate the surface emissivity, and then a modified generalized split-window (SW) algorithm was used to calculate LST. The algorithm was used to produce synergized quantitative (MuSyQ) LST products using MODIS TIR data. Sentinel-3 LST derived from this algorithm was validated with ground observed data from five sites in the study area, and the evaluation results shown that Sentinel-3 LST was overestimated, with the coefficient of determination (R 2 ) being 0.94, MBE being 2.7 K, and RMSE being 3.47 K. Zhou (2007) [78] proposed a new empirical model with nonlinear parameterization for calculating downward longwave radiation (DLR) based on the original Zhou-Cess algorithm, which needs 2 m air temperature and column water vapor (CWV) from ERA5 datasets as input data. As a result of the convenience of the input data, using this model to calculate DLR is easy, and it would expand the applicability of remote sensing in the field of energy balance. The retrieved DLR was validated using ground measurement data in the study area, and its evaluation results show that the accuracy of the DLR was good. The MBE is 9.1 W/m 2 and RMSE is 36.5 W/m 2 .
Zhang [79] provided an effective algorithm for calculating downward shortwave radiation (DSR) from Sentinel-2 data by employing look-up tables (LUT). First, the radiation transmission model was used to construct a DSR array corresponding to different atmospheric and surface conditions, and the dimensions of the DSR array were reduced and interpolated to obtain the look-up table. Then, Sentinel-2 B2, B3 and B4 bands were used to calculate the aerosol optical depth (AOD) and cloud optical depth (COD). Finally, the DSR was retrieved using the LUT according to the AOD and COD. The estimated DSR was compared against ground measurements collected from five stations in the HRB midstream region and downstream region, and the validation results achieved R 2 values of 0.8, MBE of 18.1 W/m 2 , and RMSE of 37.05 W/m 2 .

ET Estimation
Before using the EFAF method to correct the LE, the Rn, G, H, and LE are estimated using an input parameter upscaling (IPUS) algorithm. The spatial resolutions of TIR data and VIS-NIR data are quite different, and there are no TIR data that can match the high spatial resolution of VNIR data. Therefore, Sentinel-2 and Sentinel-3 data face the problem of spatial scale mismatching. The IPUS algorithm is an energy balance singlesource model that upscales high-resolution parameters to the same spatial resolution as LST data to calculate energy components [31], thus solving the problem of mismatches in the resolutions of satellite data.
Rn controls the AE used by vegetation and land for ET and is calculated based on incoming radiation and outgoing radiation, as shown in Equation (8).
where S d denotes DSR; α, ε s , and σ are the land albedo, land emissivity, and Stephen-Boltzmann constant, respectively; L d is DLR; and T rad is LST.
For water bodies, G is usually equal to a ratio of Rn (0.226). For buildings such as towns and roads, G is derived using an algorithm (Equation (9)) from the objective hysteresis model (OHM) [80,81]: where a 1 is 0.85, a 2 is 0.32 and its unit is in hours, and a 3 is −28.5 and its unit is in W/m 2 . In other cases, G is obtained based on its relationship with FVC as follows: where the value of Γ s is 0.05 for full vegetation, the value of Γ c is 0.315 for bare soil, and f c is FVC [12]. Equation (11), which requires many parameters and is based on the gradient diffusion theory, is used to calculate H [82]: where H is sensible heat flux, ρ represents the density of air, c p denotes specific heat of air constant pressure, T aero and T a are the aerodynamic surface temperature and the air temperature at the reference height, respectively, and γ a is the aerodynamic resistance, which was calculated through the Monin-Obukhov similarity theory (MOST) utilizing a stability correction function [83,84]. The zero-plane displacement height and roughness length were parameterized by the schemes proposed by Choudhury and Monteith [85]. The reference height is generally 2 m above the land surface [44], so T a is air temperature at 2 m above the land surface, which is provided by ERA5 datasets. Due to the special surface of the buildings, LE was assumed to be zero over impervious surfaces and the H of urban areas was calculated by subtracting G and LE from Rn (H = Rn − G − LE) [45,86].
In other cases, the LE is estimated as the residual of the energy balance (Equation (12)). More details can be found in the study by Peng et al. (2016) [31].
The ET values estimated by the IPUS algorithm were used as input data for the EFAF method and compared with the corrected ET values.  Figure 5d,e, it can be seen that the LE has obvious changes in local areas after being corrected. Figure 6 is a scatter plot of the LE based on the IPUS algorithm and EFAE method on 14 August 2019 for different land types in the study area, and the land cover types are mainly cropland, shrubland, and barren land. Among them, the cropland is mainly distributed in the midstream region, and its LE has generally decreased. In contrast, shrubland is mainly distributed in the downstream region, and its LE varies greatly, ranging from approximately −120 W/m 2 to 120 W/m 2 . Barren land is in the midstream and downstream regions, and its LE has both increased and decreased. It should be noted that the ERA5 datasets are inappropriate over built-up surfaces [87,88] and the ET is based on evaporation from soil and transpiration from vegetation [89]; thus, the LE of buildings is not considered here.    Figure 7 is a box plot of the EFs of different feature types throughout the study area on 14 August 2019. The blue boxes represent the EFs of the mixed pixels before correction, the yellow boxes are the EFs of the searched nearest pure pixels corresponding to the sub-pixels of the mixed pixels before correction, and the green boxes represent the distribution of the EFs of the mixed pixels after correction. In general, the EFs of the pure pixels in the cropland were slightly lower than those of the sub-pixels in mixed pixels, which will reduce the EFs of the cropland pixels after being corrected. Moreover, cropland has the largest EF exception of the water bodies, so the sub-pixels of other ground features in cropland mixed pixels will also reduce the EF of the cropland pixels. Therefore, the EF and LE values will eventually decrease. Table 2 shows the LEs and EFs of three pixels with different underlying surfaces, and Table 3 shows the area ratios and changes in the EFs of the sub-pixels in the mixed pixels. The pixel (Figure 8, pixel 1) in which the Zhangye site is located is used as an example to illustrate the changes in the LE in the mixed cropland pixels. When the IPUS algorithm estimated the LE, the underlying surface of the entire pixel was regarded as cropland, and the estimation of EF was 0.99 and the LE was 492.06 W/m 2 , which were higher than the observed data. In fact, pixel 1 contained seven types of ground features, among which, the EF and LE of forest and wetland did not change because they did not have corresponding pixels in the 1 km land classification data. The EFs of pure pixels of cropland, grassland, and barren land were all less than 0.99. The values of the EF and LE ultimately decreased after area weighting, and the overestimation of the LE in pixel 1 was partially corrected.

Pixel
Underlying Surface  Shrubland is distributed in the downstream region, and its mixed pixels are mainly mixed with barren land, which will reduce the EFs of the shrubland pixels after being corrected. When the EF of the nearest found pure shrubland pixels was smaller than the EF of the shrubland sub-pixel in the mixed pixel, the EF of the mixed pixel decreased, and the LE also decreased. When the area ratio of shrubland in the mixed pixels was much larger than that of barren land, and if the EF of the nearest found pure shrubland pixels was greater than the EF of the shrubland sub-pixel, the EF and LE of the mixed pixel became larger. The pixel (Figure 8, pixel 2) in which the Sidaoqiao site is located is used as an example to show the change in LE in a shrubland mixed pixel. The LE of pixel 2 estimated by the IPUS algorithm was 393.5 W/m 2 , which was lower than the ground station value of 437.38 W/m 2 . Pixel 2 contains barren land and shrubland, the area ratios of which are 0.3294 and 0.6706, respectively, and the EFs of the searched nearest pure pixels are 0.46 and 0.87, respectively. The corrected LE was 398.97 W/m 2 , which was closer to the observed data; thus, the underestimation of LE in pixel 2 was partially corrected.

LE-IPUS (W/m 2 ) EF-IPUS LE-EFAF (W/m 2 ) EF-EFAF
In barren land, the difference between the average EF of the pure pixels and the average EF of the mixed pixel was −0.2, which will reduce the EFs of the barren land pixels after being corrected. The EFs of the cropland and shrubland were larger than those of the barren land, and most of the barren land was mixed with these two types of land cover, thus increasing the EF in barren land mixed pixels. When the area ratio of the barren land in mixed pixels was much larger than that of the other land cover types, and if the EF of the sub-pixel of barren land is greater than the EF of the nearest pure pixels of barren land, the EF and LE of the mixed pixel decreased, otherwise, the EF and LE increased. When the area ratio of the mixed pixels dominated by barren land was slightly more than that of the cropland and shrubland, and if the EF of the barren land pure pixel was smaller than that of sub-pixel, and the EF of pure pixel of cropland and shrubland is greater than that of the sub-pixel, and the EF change in the barren land was greater than that of the cropland and shrubland, the EF and LE of the mixed pixels became smaller. Otherwise, the EF and LE became larger. The change in the LE of the pixel (Figure 8, pixel 3) where the Huazhaizi site is located on 14 August 2019 explains the correction process applied for the barren land. The area ratio of the barren land in pixel 3 was much higher than that of the cropland, so the impact of cropland on the change in the EF was negligible. The EF of the searched nearest pure pixels was smaller than the estimated EF of the pixel, so the EF and LE of pixel 3 decreased.
The box plot and examples of these three pixels prove that the changes in LE are reasonable, showing that the EFAF method is suitable for correcting the error of LE estimated with 1 km resolution Sentinel data caused by land surface heterogeneity.

Validation
The ground measurement data were corrected by the Bowen ratio method for energy closure and then compared against the estimated Rn, H, and LE values. The results are shown in Figure 9. The R 2 , RMSE, and MBE of Rn were 0.83, 31.24 W/m 2 , and 7.53 W/m 2 , respectively, and the R 2 , RMSE, and MBE of H were 0.79, 51.44 W/m 2 , and −17.00 W/m 2 , respectively. Compared with those of the Rn and H, the RMSE and MBE of the LE were larger, with the RMSE ranging between 60 and 80 W/m 2 and MBE ranging between −15 and 60 W/m 2 . Due to the fact that the LE was calculated through the remainder of the energy balance, the errors in the Rn and H accumulated to the LE. Figure 9 shows that these errors are within a reasonable range. The LE value of the pixel where the Huangmo site is located did not change because this pixel is a pure pixel. The EFAF method corrects only mixed pixels and does not correct pure pixels. After using the EFAF method to correct the LE estimated by the IPUS algorithm, the ground data and the estimated values were closer to the 1:1 line, indicating that the accuracy of the LE improved.  Table 4 shows the R 2 , MBE, and RMSE of the observation compared with the LE estimated by the IPUS algorithm and EFAF method. The LEs of the Zhangye and Daman sites were generally high. The EFAF method added the contributions of other land cover types into the mixed pixels, so the overestimation of LE was corrected. The error in the LST at the Huazhaizi site was relatively large, which led to the estimated Rn being higher and the estimated H being lower than ground data. The errors in the Rn and H accumulated to the LE, ultimately causing the error in the LE to be relatively large. The performance of the EFAF method was superior to that of the IPUS algorithm, with the R 2 increasing and RMSE decreasing, the MBE decreasing by 17 W/m 2 on average, and the LE value becoming more concentrated. Section 5.1 shows that the EFAF method performs well when correcting the LE, and the corrected LE values are closer than the LE values before correction to the observed values. Therefore, the EAFA method can address the problem of land surface heterogeneity and can correct the error of LE with coarse resolution remote sensing and obtain more accurate LE values than the IPUS algorithm. It is worth pointing out that the scale matching between the coarse pixel estimate and the source area of the EC measurements needs to be addressed. In this study, the pixel where the Huangmo site is located is a pure pixel and the pixels where the other four sites (Zhangye, Daman, Huazhaizi, and Sidaoqiao) are located are mixed pixels. Research shows that the main EC source areas were within a radius of 250 m [64]. It can be known from the positions of the four sites in the pixels where they are located: the source areas of the Huazhaizi and Sidaoqiao sites are in the pixels where the sites are located. The locations of the Zhangye and Daman sites are at the edge of the mixed pixels, and their source areas include the adjacent pixels areas of the pixels where the sites are located. Therefore, when verifying with the ground measurements, the estimations of the pixel where the site is located and the estimations of the adjacent pixels need to be averaged, and then compared with the EC measurements.
In addition, given that reliable ground measurements are restricted by many factors, such as complex conditions (for example, topography and unfavorable weather) [64], 30 min EC estimates have large uncertainty and may influence the validation results of LE. To further justify the EFAF method, daily LE measurements were aggregated using a range of time series LE estimates based on the time at which Rn shifted from positive to negative values. The LE calculated by the IPUS algorithm and the LE corrected by the EFAF method [45] were extrapolated to the daily scale based on the EF method for validation. The results are shown in Table 5. It can be seen that the MBE varies from −1 to 3 MJ/m 2 , the RMSE ranges from 2.5 to 3.5 MJ/m 2 , and the error of the validation results of the EFAF method is smaller. In general, the daily LE based on the EFAF method is more consistent with the EC measurements than the daily LE estimated on the IPUS algorithm. High-resolution land classification data are the basis for identifying pure pixels [45] and mixed pixels and decomposing mixed pixels in coarse-resolution data, which play a particularly vital role in the EFAF method. Therefore, it is necessary to analyze the influence of land classification data on the EFAF method.
Because the EFs of the sub-pixels in the mixed pixel are replaced by the EFs of the searched nearest pure pixels, it is necessary to calculate the average EFs of the pure pixel before correction for different land types when analyzing the sensitivity of the land cover map. The average EF values of the pure pixels of cropland, barren land, and shrubland on 14 August 2019, calculated before correction for the entire study area, were 0.82, 0.26, and 0.81, respectively, and the EFs of the water bodies and buildings were defined as 1 and 0, respectively. In addition, the average AE of all of the pixels in the entire study area was determined to be 463.98 W/m 2 .
Misclassification caused the EF and area ratios of sub-pixels within mixed pixels to be incorrect, which further led to deviations in the LE. Table 6 shows the difference between the EF and LE when the land cover types are misclassified. In this table, "+" means overestima-tion and "−" means underestimation. The error caused by the misclassification of cropland and shrubland was the smallest, because these land cover types have similar phenological conditions. Incorrectly classifying barren land as cropland and shrubland caused the LE to be overestimated by 259.83 W/m 2 and 255.19 W/m 2 , respectively. Furthermore, the EF and LE were generally underestimated when water bodies were incorrectly classified as other land cover types because their EFs were the largest. Conversely, the EF and LE were generally overestimated when buildings were misclassified as other land cover types, as the EF of buildings was the smallest. The error caused by the incorrect classification of water bodies and buildings was relatively large, but their spectral characteristics were obviously different from those of the other land cover types, so the probability of misclassification was low. When using the EFAF method to correct LE values, it is necessary to select the EFs of the pure pixels to replace the EFs of the sub-pixels in the mixed pixels. The selection of the pure pixels plays an important role in correcting the LE values.
The choice of pure pixels is closely related to the spatial scale of the images being used. The higher the image spatial resolution, the greater the number of pure pixels, the shorter the search distance for pure pixels, and the greater the probability that pure pixels can be found. Li [34] used the EFAF method to correct LE values at a spatial resolution of 300 m, whereas this research needs to correct LE values at a resolution of 1000 m. The high-resolution land classification data in the HRB midstream region were resampled to 100 m ( Figure 10, left), 300 m ( Figure 10, middle), and 1000 m (Figure 10, right) in order to compare and analyze the availability of pure pixels at different spatial scales. The proportions of pure pixels in the land classification data with a spatial resolution of 100 m and 300 m were both greater than 40% (Table 7), and a large number of pure pixels were distributed around the mixed pixels; therefore, the sub-pixel can easily find the corresponding same land cover type pure pixels. In contrast, when the spatial resolution was 1000 m, pure pixels were available for only barren land. The cropland, which accounts for a large proportion of the study area, had no pure pixels, so the EF of the cropland sub-pixels in the mixed pixels could not be replaced by the EF of pure pixels, and LE was not corrected or changed. Therefore, the EFAF method of selecting pure pixels at a coarse resolution needs to be adjusted.  When pure pixels are not found within a limited search range, the standard for identifying pure pixels needs to be lowered according to the spatial resolution. Cropland provides an example of the process of adjusting the strategy for selecting pure pixels. There were no pure pixels of cropland in the entire study area, and a total of 573 pixels included sub-pixels of cultivated land. The adjusted area ratio of pure pixels should be as close to 1 as possible, so, starting from 1 and decreasing by 0.01, the number of pixels with different area ratios was counted. The number of pixels with cropland area ratios greater than 0.99, 0.98, and 0.97 was 26, 74, and 116, respectively. The EFAF method does not correct the LEs of pure pixels. Therefore, if pixels with an area ratio greater than 0.97 are regarded as pure pixels, the LEs of 20% of the pixels will not change. This ratio is obviously unsuitable. In addition, the number of pixels with an area ratio greater than 0.99 was less than 5%. If the number of pure pixels is too small, the distances between sub-pixels and the searched nearest pure pixels may be too large. Although pure pixels are found, their phenological conditions may be very different from those in the mixed pixels. Therefore, pixels with a ratio of cropland greater than 0.98 were selected as pure pixels, and the result is shown in Figure 11. After adjusting the strategy of searching for pure pixels, pixel 1 (Figure 11, top) could search for pure pixels at a search distance of six cells, and pixel 2 (Figure 11, bottom) could find pure pixels at a search distance of two cells. The EFs of the cropland sub-pixels in the study area were replaced by the EFs of the pure pixels within 10 cells, so as to achieve the purpose of correcting LE. The criteria for finding pure pixels for other land cover types are similar to those for cropland. For a land cover type that has no pixels in the entire study area, its sub-pixel EF would not change.

Uncertainty of Energy Balance Closure Method with EFAF Method
The sum of the H and LE observed by the five EC sites is less than the available energy (AE, the difference between the Rn and G), and energy imbalances are common in ground measurement data [90]; thus, this study used the Bowen ratio closure method for energy closure on the ground measurement data. Traditionally, both the Bowen ratio closure method and residual LE closure method can be used for energy closure. The Bowen ratio closure method assumes that the ratio of H to LE (Bowen ratio) is correctly measured by the EC system, and then uses the Bowen ratio to divide the AE so that the values of H and LE can be adjusted [91]. The residual LE closure method assumes that H is accurately measured, and LE is obtained by subtracting H from the AE and serves as boundary conditions to account for the extreme cases [92,93]. Both methods have advantages and characteristics, and there is no evidence that either method performs better than the other [90]. To better contextualize the uncertainties of closure methods on the present study, some discussion about the differences between the Bowen ratio closure method and residual LE closure method is warranted.
The ground measurement data were corrected by the Bowen ratio closure method and residual LE closure method and then compared against the estimations of H and LE based on remote sensing data. The validation results of the two methods are presented in Table 8. The sum of the ground measurement data H and LE is less than the difference between the Rn and G; thus, the Bowen-ratio-corrected LE (BRLE) and H (BRH) increase, and the residual-LE-corrected LE (RELE) increase, whereas H (REH) does not change. Compared with the H estimated from remote sensing data, the MBE of BRH is −17.00 W/m 2 , indicating that the estimated His less than the mean of BRH. Since the BRH is greater than REH, the MBE between estimations of H and REH increases, being 15.49 W/m 2 . The error between the BRH and estimations of H is smaller than the LEH and estimations of H. The BRLE is less than RELE; thus, the MBE of BRLE is greater than the MBE of RELE. Compared with the LE based on the IPUS algorithm and EFAF method, the RMSE of BRLE is approximately 20 W/m 2 higher than that of RELE, whereas the MBE of BRLE is approximately 10 W/m 2 lower than that of RELE. Moreover, the error of the LE corrected by the EFAF method is smaller than the error of the LE estimated on the IPUS algorithm corrected by either the Bowen ratio closure method or residual LE closure method. Overall, regardless of whether the ground measurement data are corrected by the Bowen ratio closure method or residual LE closure method, the performance of the EFAF method is superior to that of the IPUS algorithm.

Conclusions
As a satellite that has been launched in the past two years, Sentinel data have great advantages in temporal and spatial resolution. Images from these satellites can be continuously acquired for a long period of time in the future and have great application potential for the long-term detection of the surface energy balance. In this research, Sentinel-2 and Sentinel-3 data provided the parameters required by the IPUS algorithm to estimate LE. The IPUS algorithm assumes that there is only one land cover type in a single pixel when calculating the LE, whereas the EFAF method considers that there may be multiple land cover types in pixels and adds the contributions of sub-pixels of different land cover types to the mixed pixels, thereby correcting estimations of the LE of mixed pixels. This study proved that the EFAF method is suitable for correcting the deviation in LE estimated based on Sentinel data caused by land surface heterogeneity. The performance of the EFAF method was superior to that of the IPUS algorithm. Furthermore, the results of the EFAF method showed highly consistent observations and showed that it can obtain more accurate LE values. In addition, the EF changes little during the day and has good stability, allowing the instantaneous EF estimated from remote sensing images to represent daily EF. However, the applicability of the EFAF method is closely related to the spatial resolution of remote sensing data. The feasibility of applying the EFAF method is limited when the spatial resolution of images is too large; thus, the algorithm needs to be further adjusted.