Assessing the Effects of Spatial Scales on Regional Evapotranspiration Estimation by the SEBAL Model and Multiple Satellite Datasets: A Case Study in the Agro-Pastoral Ecotone, Northwestern China

: Evapotranspiration (ET) estimation is important for understanding energy exchanges and water cycles. Remote sensing (RS) is the main method used to obtain ET data over large scales. However, owing to surface heterogeneities and different model algorithms, ET estimated from RS products with different spatial resolutions can cause signiﬁcant uncertainties, whose causes need to be thoroughly analyzed. In this study, the Surface Energy Balance Algorithm for Land (SEBAL) model was selected to explore spatial resolution inﬂuences on ET simulations. Three satellite datasets (Landsat Thematic Mapper (TM), Moderate Resolution Imaging Spectroradiometer (MODIS), and Advanced Very High-Resolution Radiometer (AVHRR)) were selected to independently estimate ET in SEBAL model to identify the inﬂuence of the spatial scale on ET estimation, and analyze the effects and causes of scale aggregation. Results indicated that: (1) the spatial distributions of ET estimated from the three satellite datasets were similar, with the MODIS-based ET having the largest uncertainty; and (2) aggregating input parameters had limited changes in the net radiation and soil heat ﬂuxes. However, errors in the sensible heat and latent heat ﬂuxes were relatively larger, which were caused by changes in the selection of hot and cold pixels and the NDVI and surface albedo parameters during scale aggregation. The scale errors caused by the model mechanisms were larger than those caused by the land use/cover pattern in the SEBAL model. Overall, this study highlights the impact of spatial scale on ET and provides a better understanding of the scale aggregation effect on ET estimation by RS.


Introduction
It has been widely recognized that many hydrological processes and energy exchanges are scale-dependent [1][2][3][4]. Scaling issues impact our ability to accurately model the exchanges of water and energy across the surface-atmosphere interface [5]. Several studies have illustrated that scale changes in hydrological models produce results that are considerably biased due to land surface spatial heterogeneities (e.g., variations in topography, land use/cover, and soil properties) [6][7][8][9]. Evapotranspiration (ET) is a key linchpin of the Earth's hydrological cycle and energy balance system [10]. The accurate estimation of regional ET is crucial for water resources management, agricultural production, and ecosystem protection, particularly in water-scarce regions [11][12][13]. Owing to the limitations of the equipment and resource constraint, ground measurements of ET over large scales are nearly impossible. Remote sensing (RS) provides an ideal method for determining and mapping regional ET at different spatial scales [14][15][16]. However, considering the spatial heterogeneities of land surface properties, as well as the non-linear processes in the model algorithms, the spatial resolution of RS affects the accuracy of ET estimates in regional and larger scale modeling [17][18][19]. In order to improve the accuracy of the ET estimates and to correct the scale errors in ET simulations, it is critical to examine the scale effect on ET simulations from RS under heterogeneous surfaces [17,[20][21][22][23].
The spatial resolution of the ET simulations based on RS is dependent on the satellite sensor type. Currently, satellite sensors with different spatial, temporal, and spectral resolutions have been widely used to estimate ET over multiple spatial scales [17,24,25]. These data can be divided into two categories: high spatial resolution with a poor temporal resolution (e.g., Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) ( [26][27][28]. Most satellite-based ET algorithms were initially developed at high spatial resolutions and hence require homogeneous conditions across a single pixel, whereas RS data contain mixed pixels, especially in low resolution datasets [29][30][31]. In addition, the parameters and processes that are important and predictive on a fine scale introduce uncertainty or significant error when being applied to a coarser scale [4,[32][33][34][35]. Therefore, a noteworthy issue in RS is how ET estimates differ based on data from satellite sensors with different spatial resolutions. To address such issues, several studies have assessed the effect of different satellite sensor types on ET simulations. Most of these studies have focused primarily on Landsat, ASTER, and MODIS sensors, and have reported good agreements in the spatial distribution patterns of the ET estimations [25,36]. However, studies have also indicated that there are discrepancies among the various spatial resolution data sources [17,18,25,31,[36][37][38]. For example, McCabe and Wood [37] used Landsat-ETM, ASTER, and MODIS data to estimate ET and found consistency between the higher-resolution data sources (Landsat-TM and ASTER); however, the MODIS-based estimations were unable to discriminate the influence of land surface heterogeneity at a field scale, but correctly estimated the mean values. Sharma et al. [18] found that a regression model explained 91% of the variability in the Landsat-based crop evapotranspiration (ETc) data, and~31% of the variability in the MODIS-based ETc data, which is higher than the measured ETc from the Bowen Ratio Energy Balance System. Landsat TM, with high spatial resolution (30 m) and MODIS with high temporal resolution (1-2 d) are widely used sensors that obtain local and regional ET information. AVHRR, another widely used sensor since the 1980s, has been used to estimate long-term ET but its accuracy has not been thoroughly analyzed in estimating regional ET [11,39,40].
In order to explore the effect of spatial resolution on ET estimation, until now, various studies have discussed the effect of the spatial resolution of input satellite data on ET estimation by a number of ET models, including the Surface Energy Balance System (SEBS) developed by Su [41] ( [18,30,37]); Surface Energy Balance Algorithm for Land (SEBAL) developed by Bastiaanssen et al. [42] ( [17,20,21,25,34,36]); Mapping Evapotranspiration at High Resolution with Internalized Calibration (METRIC) developed by Allen et al. [43] ( [27]); Two Source Energy Balance (TSEB) developed by Norman et al. [44] ( [38,45]), and Simplified Surface Energy Balance Index (S-SEBI) developed by Roerink et al. [46] ( [47]), among others. These studies have pointed out that for a given area, a decrease in the number of pixels led to a loss of information, which changes the statistical and spatial characteristics of the data during the aggregation process. They have also reported that aggregation strategies, land surfaces and model mechanisms can produce different errors for ET estimation. Moreover, the aggregation of the sensible heat fluxes could cause significant Remote Sens. 2021, 13, 1524 3 of 22 errors over a heterogeneous area in the energy balance-based model. In addition, several studies have attempted to identify the reasons for the significant errors in the aggregated results. Ershadi et al. [30] and Sharma et al. [18] postulated that the aggregation processes in ET simulations may be caused by decreases in aerodynamic resistance in the energy balance model. However, Wang et al. [31] argued that, although aerodynamic resistance was excluded from the three-temperature (3T) model, due to surface heterogeneity, the spatial scale effect was still present. Ramírez-Cuesta et al. [27] indicated that surface roughness parameters and land use characteristics were the two main causes of the aggregation effect in the METRIC model. Despite these studies, in general, the aggregation effect and its causes for ET estimation remain unclear and require further study [31]. Therefore, it is necessary to comprehensively examine the changes in aggregation errors and clarify the causes of such errors in the aggregation process.
The SEBAL model, with a relatively high simulation accuracy and minimal groundbased data, is one of the most widely applied models for ET estimation [42,48]. Most of the input parameters used (e.g., Normalized Difference Vegetation Index (NDVI), land surface temperature, albedo, emissivity, and roughness length) in the SEBAL model to simulate ET are derived from raw satellite data with different spatial, spectral, and radiometric resolutions. It is therefore necessary to identify and evaluate the performance of the SEBAL model for different satellite datasets with various spatial resolutions.
In this study, the Landsat TM, MODIS, and AVHRR images and the SEBAL model were selected to analyze the effects of spatial resolution on ET estimations. The specific objectives were: (1) assessing the consistency of the SEBAL model in estimating ET using different satellite datasets; (2) quantifying the aggregation errors in the SEBAL model, and (3) identifying the main factors for scale aggregation errors. The findings of this study can provide insights for reducing scaling errors and in turn improving the accuracy of ET estimates over regional scales.

Study Area
The Yuyang District is located at 37.81 • -38.92 • N, 108.94 • -110.41 • E (Figure 1), Shaanxi Province, China, and covers a total area of~7000 km 2 . This area is part of the ecologically fragile transition zone between the desert-grassland area and the Loess Plateau region [13,[49][50][51][52]. The region has a semi-arid continental monsoon climate, with an average annual temperature of 8.4 • C and an average annual precipitation of 402 mm that gradually decreases from the southeast to northwest. Approximately half of the annual precipitation occurs from July to September [53]. The altitude varies from 836 m to 1385 m above sea level, with an average of 1183 m. The main soil types are steppe aeolian sandy soil in the western part of the region, and Loess soil in the eastern part. The main land cover types (Figure 1b) are grassland (43.1%); barren land (30.6%); farmland (20.9%); wood land (3.93%); built-up land (0.85%); and water body (0.62%).

Datasets and Processing
The input data in the SEBAL model include meteorological data, satellite images, and digital elevation model (DEM) data. The satellite images were selected with less than 5% cloud coverage. Furthermore, because the in situ flux data used for model validation were measured in 2011, we chose the satellite overpass time on August 7, 2011. These data were processed using a unified projection coordinate system in the WGS-1984 coordinate system (zone M 49).

Datasets and Processing
The input data in the SEBAL model include meteorological data, satellite images digital elevation model (DEM) data. The satellite images were selected with less tha cloud coverage. Furthermore, because the in situ flux data used for model validation measured in 2011, we chose the satellite overpass time on August 7, 2011. These data processed using a unified projection coordinate system in the WGS-1984 coordinate tem (zone M 49).

Remote Sensing Datasets
Landsat TM data were downloaded from the Geospatial Data Cloud website, o Chinese Academy of Sciences (http://www.gscloud.cn, accessed on 20 May 2019). Th agery covering the study area is composed of two TM images (path 127, rows 33 an on August 7, 2011. These data were corrected with a level 1 T (terrain corrected). processing of the data was mainly performed in ENVI 5.3 (ESRI). The digital numbe ues were converted to radiance using radiation calibration, and the Fast Line-of-Sigh mospheric Analysis of Spectral Hypercubes (FLAASH) atmospheric correction wa lized to eliminate the effects of atmosphere and light on ground reflections [54].
MODIS data were retrieved from the Land Processes Distributed Active Ar

Remote Sensing Datasets
Landsat TM data were downloaded from the Geospatial Data Cloud website, of the Chinese Academy of Sciences (http://www.gscloud.cn, accessed on 20 May 2019). The imagery covering the study area is composed of two TM images (path 127, rows 33 and 34) on 7 August 2011. These data were corrected with a level 1 T (terrain corrected). Pre-processing of the data was mainly performed in ENVI 5.3 (ESRI). The digital number values were converted to radiance using radiation calibration, and the Fast Line-of-Sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) atmospheric correction was utilized to eliminate the effects of atmosphere and light on ground reflections [54].
MODIS data were retrieved from the Land Processes Distributed Active Archive Center (LP DAAC) (https://lpdaac.usgs.gov/, accessed on 10 July 2019), including MOD09GA and MOD11A1, which include surface reflectance at a 500 m spatial resolution and surface temperature at a 1 km spatial resolution. The original MODIS products were in HDF-EOS format and sinusoidal projection, and MODIS Reprojection Tools (MRT) was used for projection transformation, format conversion, and re-sampling to convert the data into the GeoTiff format. The input data were further processed by clipping and raster calculation using the ArcGIS 10.2 (ESRI) software package to be compatible with the SEBAL model requirements. AVHRR data were obtained from the United States National Oceanic and Atmospheric Administration Satellite Active Archive (https://www.avl.class.noaa.gov/saa/products/ welcome, accessed on 23 July 2019) in high-resolution picture transmission format, with a spatial resolution of 1.1 km, and five bands (as shown in Table 1). The first two bands (0.55-0.68 µm and 0.75-1.00 µm, respectively) are visible light and near-infrared, which are mainly used to calculate NDVI, surface albedo, and other related coefficients. The other three bands are thermal infrared and were used to estimate land surface temperatures [55]. The data processing in this study includes radiation corrections, geometric corrections, and clipping in the ENVI 5.3 (ESRI) software package.

Meteorological and Flux Data
Meteorological data, including daily air temperatures (T a ) and wind speeds (U), were used as forcing data for the SEBAL model. Data from eight meteorological stations (Etuoke, Hengshan, Jingbian, Shenmu, Xingxian, Yulin, Suide, YLEOS) in the study area collected on 7 August 2011 (Figure 1a), were downloaded from Chinese National Meteorological Information Center (https://data.cma.cn/, accessed on 7 May 2019). These data were interpolated using inverse distance weighting methods to obtain their spatial distributions for the study area.
Flux data were obtained from the YLEOS (38 • 26 43.6" N, 109 • 28 2.7" E, 1233 m elevation). A detailed description and introduction to these data can be found in Gong et al. [53].

Other Data
As a model input parameter, DEM data were obtained from the Geospatial Data Cloud site, of the Chinese Academy of Sciences (http://www.gscloud.cn, accessed on 3 June 2019). The original spatial resolution of the DEM is 30 m, to match the spatial resolution of the satellite data, the nearest neighbor method is used to resample the data to 500 m and 1000 m, respectively, for the SEBAL model input. Land use/cover data were used to specify the surface roughness parameter [48] and to assist in the selection of hot and cold pixels. We utilized a 2010 land-use map with a 30 m spatial resolution, from the Resource and Environment Cloud Platform, Chinese Academy of Sciences (http://www.resdc.cn/, accessed on 20 June 2019).

SEBAL Model
The SEBAL model was developed based on the surface energy balance equation [42]. A detailed description of the model can be found in Bastiaanssen et al. [42,48,56] and Allen et al. [57]. A flowchart for the SEBAL model algorithms is shown in Figure 2. ET value was calculated from the instantaneous ET based on the EF [42]: where Rn24 is the net radiant flux throughout the day (W/m 2 ), G24 is the soil h throughout the day (W/m 2 ), 86,400 is the number of seconds per day, and EF is t orative fraction.  The energy balance equation can be expressed as follows: where R n is the net radiation flux, G is the soil heat flux, H is the sensible heat flux, and LE is the latent heat flux. All units are in W/m 2 . The pixel-by-pixel net radiant flux calculation was obtained by subtracting the outgoing radiant components from the incoming ones [42,58]: where R swd is incoming (downward) shortwave radiation (W/m 2 ), α is the surface albedo (dimensionless), R lwd is downward longwave radiation (W/m 2 ), R lud is outgoing (upward) longwave radiation (W/m 2 ), and ε g is the surface thermal emissivity (dimensionless). G represents the rate of heat storage in the soil, plant, and water in heat transfer, is relatively difficult to estimate directly based on satellite data, and it can be calculated from the empirical relationship between R n , land surface temperature, NDVI, and surface albedo, as follows [42,59]: where T s is the land surface temperature (K). H is considered proportional to the ratio between the surface-air temperature difference (dT) and bulk aerodynamic resistance to heat transport (r ah ) [42], and it is obtained from Equation (4): where ρ is the air density (kg/m 3 ), C p is the air specific heat at a constant pressure (1004 J/kg/K); dT is the temperature difference between heights z 1 and z 2 (K); z 1 , specified as 0.01 m, is the height above the zero-plane displacement height of the crop canopy; z 2 , as 2 m, is the distance above the zero-plane displacement, but below the height of the surface boundary layer. Due to the two unknown variables (dT and r ah ) in the equation, in this algorithm, two anchor pixels (referred to as hot and cold pixels) were chosen based on their land surface temperatures, and then the variables of dT and r ah were calculated using a trial-and-error method in an iterative process. The cold pixels were selected from areas with an adequate water supply and lush vegetation and fully irrigated area (high NDVI, low temperature value and low albedo), and sensible heat flux is negligible, LE ≈ R n − G. Hot pixels were selected from sparsely vegetated and desert areas (high temperature, high albedo and low NDVI), and the latent heat flux is negligible, H ≈ R n − G. Bastiaanssen et al. [48] suggested that the cold pixels can be identified in open water bodies or in well-irrigated agricultural fields, and that the hot pixels can be chosen in bare soil surface. In this study, for each image, the specific extreme pixel selection was based on the land-use map, albedo, T s , and NDVI. r ah is dependent on surface roughness, wind speed, and atmospheric stability [60]. It is calculated as where k is the von Karman's constant (0.41), U x is the wind speed (m/s) at height z x , u* is the friction velocity (m/s), and z om is the momentum roughness length for each pixel (m), which can be calculated as Equation (7) [61]: The LE is the residual of the energy balance equation (Equation (1)). The evaporative fraction (EF), defined as the ratio of LE and the available radiant energy, has been shown to be approximately constant during the daytime [62,63]. The EF can be calculated as: Then, the instantaneous ET (ET inst ) (mm/h) value is calculated using the following equation: where λ is the latent heat of vaporization (J/kg). Therefore, cumulative daytime evaporation losses can be accurately estimated by multiplying an instantaneous EF estimate by the cumulative daytime R n [63]. The daily ET value was calculated from the instantaneous ET based on the EF [42]: where R n24 is the net radiant flux throughout the day (W/m 2 ), G 24 is the soil heat flux throughout the day (W/m 2 ), 86,400 is the number of seconds per day, and EF is the evaporative fraction.

Aggregation Process and Statistical Indicators
We chose a simple averaging method as the aggregation scheme, which has the advantage of not only preserving the spatial distribution of ET, but also maintaining the most statistically and spatially predictable behaviors, recommended by Sridhar et al. [4], Hong et al. [17], Wang et al. [31], and Bian and Butler [64]. Figure 3 shows the aggregation procedures in this study. First, we evaluated the consistency of the three satellite sensors used for ET simulation to understand the performances of the data sources with different spatial resolutions in the SEBAL model. Second, to assess the impact of spatial scale changes on ET simulation, the input parameters from the original 30 m × 30 m (Landsat TM) spatial resolution were aggregated to a coarser resolution (1200 m × 1200 m) at intervals of 30 m using the simple averaging scheme. The aggregated parameters were then used to calculate ET by the SEBAL model. Third, as land surface heterogeneity is the main cause for aggregation errors [5,65], in order to explore the causes of scale aggregation, the heterogeneous and homogeneous areas were chosen on a pixel scale to calculate ET over different spatial resolutions, independently.
Hong et al. [17], Wang et al. [31], and Bian and Butler [64]. Figure 3 shows the aggregation procedures in this study. First, we evaluated the con sistency of the three satellite sensors used for ET simulation to understand the perfor mances of the data sources with different spatial resolutions in the SEBAL model. Second to assess the impact of spatial scale changes on ET simulation, the input parameters from the original 30 m × 30 m (Landsat TM) spatial resolution were aggregated to a coarse resolution (1200 m × 1200 m) at intervals of 30 m using the simple averaging scheme. Th aggregated parameters were then used to calculate ET by the SEBAL model. Third, as lan surface heterogeneity is the main cause for aggregation errors [5,65], in order to explor the causes of scale aggregation, the heterogeneous and homogeneous areas were chose on a pixel scale to calculate ET over different spatial resolutions, independently.
To evaluate the aggregation effect, each aggregated result was compared with th original 30 m resolution data, and a set of evaluation criteria was applied to describe th changes in ET simulations due to scale aggregation, including spatial mean, coefficient o variation (CV), relative error (RE), mean relative error (MRE), absolute error (AE) [66 and spatial distribution characteristics. The relative spatial mean (μ) and relative spatia standard deviation (σ) were used to quantify the changes in the key input parameters fo each aggregated resolution [17,30]: where M0 and σ0 are the spatial mean and standard deviation of the original paramete spatial resolution (30 m), respectively, Mi and σi are the spatial mean and standard devia tion of the aggregated spatial resolution, respectively, and i represents each aggregate spatial resolution.  To evaluate the aggregation effect, each aggregated result was compared with the original 30 m resolution data, and a set of evaluation criteria was applied to describe the changes in ET simulations due to scale aggregation, including spatial mean, coefficient of variation (CV), relative error (RE), mean relative error (MRE), absolute error (AE) [66], and spatial distribution characteristics. The relative spatial mean (µ) and relative spatial Remote Sens. 2021, 13, 1524 9 of 22 standard deviation (σ) were used to quantify the changes in the key input parameters for each aggregated resolution [17,30]: where M 0 and σ 0 are the spatial mean and standard deviation of the original parameter spatial resolution (30 m), respectively, M i and σ i are the spatial mean and standard deviation of the aggregated spatial resolution, respectively, and i represents each aggregated spatial resolution.

Determination of Homogeneous and Heterogeneous Area
We selected the regions of interest with the most and least types of land use/cover, and then calculated the landscape pattern indices to determine the relatively heterogeneous and homogeneous regions to analyze the uncertainty of the simulated ET caused by scale aggregation on different land pattern complexities. The selected sample areas correspond to 25 km 2 (5 km × 5 km) lots (Figure 1d). Six landscape pattern indices were selected to indicate the surface features: number of patches (NP), largest patch index (LPI), mean shape index (SHAPE_MN), contagion index (CONTAG), Shannon's diversity index (SHDI) and patch richness index (PR) [4,67,68]. Detailed descriptions of these indices are shown in Table 2.  SHAPE ≥ 1 SHAPE = 1 when the patch is a square and increases without limit as the patch shape becomes more irregular.   [68]. Note: NP represents number of patches, SHAPE represents shape index, CONTAG represents contagion index, LPI represents largest patch index, SHDI represents Shannon's diversity index, and PR represents patch richness index.

CONTAG
The statistical results for the different land use/cover pattern are shown in Table 3. In the homogeneous area, the land use/cover pattern was mainly determined by a single land use/cover type (barren land, 81.75%, followed by grassland, 15.53%). The landscape indices indicate that the homogeneous area is characterized by fewer numbers of patches (NP = 23), the landscape fragmentation degree is lower, the geometrical shape is simple, with an LPI of 81.70%, and the landscape diversity is lower (SHDI = 0.55). In contrast, the heterogeneous area (shown in Figure 1d) clearly has a relatively complex land surface, comprising of mainly farmland (47.90%), and grassland (50.80%), with barren land and built-up land patches mixed together. Table 3 shows that the heterogeneous area has a larger number of the patches (NP = 43), the landscape fragmentation degree is higher, the geometrical shape is more complex (LPI = 28.17%) and the landscape diversity is higher (SHDI = 0.76). In order to check the consistency of the SEBAL performance for the three datasets, we compared the estimated ET daily with the observed ET daily from the eddy covariance data by extracting the pixel value corresponding to the geographic location of the flux tower site. AE and RE were selected to evaluate the model reliability. Table 4 shows the errors between the pixel values of the simulated ET daily and the in situ observation. The observed ET daily was 1.40 mm/d on 7 August 2011. All of the simulated values underestimated the ET daily values compared with the observed ET daily value. The simulated ET AVHRR , with a RE of 3.60% and an AE of −0.05 mm/d, was the closest to the observed values. The simulated ET Landsat had a relatively high agreement with the flux observation, with an AE of −0.17 mm/d and an RE of 11.98%. However, the simulated ET MODIS had the largest error, which was more than twice the RE between the high-resolution daily estimated ET and the observational data (AE = −0.46 mm/d, RE = 32.90%).  Figure 4 shows the spatial distribution of the simulated ET based on the different sensors. Although the input parameters were derived from the three satellite sensors with different spatial resolutions, the ET simulated using the SEBAL model exhibited similar spatial distribution patterns. Higher heterogeneity and ET values were distributed in the high vegetation density of woods and crops in the southeastern part of the study area. In contrast, lower heterogeneity and ET values were distributed in the grassland and barren land in the northwestern part of the study area (Figures 1b and 4). Landsat TM, which has a spatial resolution of 30 m, can capture the detailed textural features of the land surfaces. The spatial ET patterns based on the Landsat TM data in the mountainous areas with complex terrain in the southeastern part of the study area were well captured compared to the spatial patterns obtained from the MODIS and AVHRR data. For example, the narrow Wuding River valley has a large area of farmland with higher ET values, and can be easily observed in the Landsat TM images, whereas the low spatial resolutions of the ET maps derived from the MODIS and AVHRR data contained mixed pixels and were unable to reflect the detailed heterogeneity of the land surface, making the river valley difficult to observe on the derived ET maps.

R PEER REVIEW 11 of 22
high vegetation density of woods and crops in the southeastern part of the study area. In contrast, lower heterogeneity and ET values were distributed in the grassland and barren land in the northwestern part of the study area (Figure 1b and Figure 4). Landsat TM, which has a spatial resolution of 30 m, can capture the detailed textural features of the land surfaces. The spatial ET patterns based on the Landsat TM data in the mountainous areas with complex terrain in the southeastern part of the study area were well captured compared to the spatial patterns obtained from the MODIS and AVHRR data. For example, the narrow Wuding River valley has a large area of farmland with higher ET values, and can be easily observed in the Landsat TM images, whereas the low spatial resolutions of the ET maps derived from the MODIS and AVHRR data contained mixed pixels and were unable to reflect the detailed heterogeneity of the land surface, making the river valley difficult to observe on the derived ET maps. Figure 5 presents the frequency distribution and statistical analysis (mean, maximum, minimum, and standard deviation) of the three datasets. It shows a bimodal distribution among the three datasets. The histogram shapes of ETLandsat and ETAVHRR were similar and had two peak frequencies at 1.2-1.4 mm/d and 2.6-2.8 mm/d, respectively. However, the ETMODIS histogram had peak frequencies at approximately 1.0-1.2 mm/d and 2.8-3.0 mm/d. From the spatial statistical characteristics, the maximum value of ETLandsat was 4.33 mm/d, followed by ETAVHRR (3.42 mm/d), and ETMODIS (3.28 mm/d). The maximum ET values of the three datasets indicated a gradually decreasing trend from finer to coarser spatial resolution datasets.
In summary, the above analyses indicated that the different datasets affected the ET values and statistical characteristics. The coarser spatial resolution data had the larger uncertainty and the MODIS data underestimated the ET value with the largest discrepancies.      In summary, the above analyses indicated that the different datasets affected the ET values and statistical characteristics. The coarser spatial resolution data had the larger uncertainty and the MODIS data underestimated the ET value with the largest discrepancies.

Effects of Aggregation on the Energy Balance Components and ET Estimation
In order to evaluate the effect of input parameter aggregation on the results of the SEBAL model, we used simple averaging schemes to aggregate the input parameters from the original 30 m × 30 m spatial resolution to 1200 m × 1200 m at intervals of 30 m to test the responses of the four energy-balance components to changes in spatial scale, including R n , G, H, and LE. The spatial means, MRE, and CV were used to quantify the variations of the four fluxes during the aggregation process. Figure 6 shows the statistical results of the SEBAL input parameter aggregation. The spatial mean and spatial heterogeneity of the NDVI and albedo showed larger changes than other parameters. However, the spatial distribution of the T s and U were obtained by interpolation, showing limited changes for spatial scale aggregation. Figure 7 shows the statistical characteristics of the spatial mean and spatial heterogeneity of the fluxes calculated at different aggregation levels. There was a decreasing trend in the spatial mean values of R n when finer resolution data were aggregated to a coarser resolution, ranging from 562.96 W/m 2 to 562.46 W/m 2 and the MRE was −0.06%. In contrast, the spatial mean value of G tended to increase, ranging from 75.33 W/m 2 to 76.21 W/m 2 . G was slightly overestimated when the finer scale data were aggregated to a coarser scale, and the MRE was 0.82%. All errors exhibited a relatively steady increase. In terms of spatial variability, a relatively small CV was present in the R n and G (6.68-8.02%, 9.65-12.49%, respectively). The spatial CV of R n and G showed similar variations: a decreasing tendency when the scale was aggregated, indicating a decrease in spatial heterogeneity. Overall, scale aggregation had limited effects on the R n and G.
In contrast, the scale aggregation had greater effects and more significant variations on the H and LE than on the R n and G. The spatial mean values and CV had fluctuating variations. The spatial mean value of the H initially increased, then decreased, ranging from 209.80 W/m 2 to 247.96 W/m 2 . The MRE of the H was 3.03% and RE reached a maximum of 10.88% at a spatial resolution of 300 m. The spatial mean value of the LE first decreased, and then increased, ranging from 238.97 W/m 2 to 276.58 W/m 2 , and the MRE was −2.80%. The RE reached a maximum of −9.46% at a spatial resolution of 300 m.      To evaluate the error variation for scale aggregation, we compared the estimated fluxes at different scales with the observed data. We used the Bowen ratio adjustment method reported in Twine et al. [69] to modify the latent and sensible heat fluxes to solve the energy balance closure issue of the eddy covariance flux. Figure 8 shows the AE between the modified flux observation from the YLEOS and the simulated fluxes at different spatial resolutions. The results indicate that the R n and G values are similar to the original values (at 30 m resolution) and showed relatively little error variation in the scale aggregation (AE values:~10-20 W/m 2 and~−15 W/m 2 , respectively). However, when the pixels were enlarged, the differences between the tower flux data and model output of the H yielded a relative larger error, i.e., AE values increased from 20 W/m 2 to 60 W/m 2 . The LE error showed a similar tendency to the H, but in the opposite direction, i.e., AE decreased from 0 W/m 2 to −30 W/m 2 . This implies that the H is the main source of the error during the scale aggregation.    Figure 9 shows the statistics of the daily ET estimations with the scale aggregation. The spatial means of the daily ET initially decreased, then increased with the scale aggregation, ranging from 1.74 mm/d to 1.96 mm/d. A spatial resolution of 300 m was a critical point in the ET simulation scale aggregation. In addition, the spatial heterogeneity increased (CV ranged from 37.05% to 46.12%) with the spatial scale aggregation. We also found that the largest ET values corresponded to smaller spatial heterogeneities. This implies that spatial aggregation affects the SEBAL model simulation processes and performance in a complex way.

The Effects of Land Use/Cover Pattern on the ET Estimation during the Aggregation Process
The aggregation effect is mainly caused by non-linear processes in the model mechanisms and surface heterogeneity [17,18] as discussed in Section 3.2. To demonstrate the effect of land use/cover pattern on the ET estimation, two comparison groups (homogeneous and heterogeneous areas) were used to illustrate the changes in the ET estimation during the aggregation process. We selected the following representative nodes of the spatial resolution aggregation results to compare the spatial patterns of the ET estimates:

The effects of Land Use/Cover Pattern on the ET Estimation during the Aggregation Process
The aggregation effect is mainly caused by non-linear processes in the model mech anisms and surface heterogeneity [17,18] as discussed in Section 3.2. To demonstrate th effect of land use/cover pattern on the ET estimation, two comparison groups (homoge neous and heterogeneous areas) were used to illustrate the changes in the ET estimatio during the aggregation process. We selected the following representative nodes of th spatial resolution aggregation results to compare the spatial patterns of the ET estimates 30 m (original), 60 m, 120 m, 240 m, 480 m, and 960 m (coarse). Table 5 shows the four energy balance components of the statistical analysis result over the homogeneous and heterogeneous areas at different spatial resolutions. The R and G were minimally affected by the different land use/cover pattern with the scale ag gregation, and the maximum absolute error was less than 1 W/m 2 . The maximum error o the H in the heterogeneous area was −35.99 W/m 2 at the spatial resolution of 120 m, whil the maximum error of the H in the homogeneous area was −33.78 W/m 2 at the spatia resolution of 240 m.  Table 5 shows the four energy balance components of the statistical analysis results over the homogeneous and heterogeneous areas at different spatial resolutions. The R n and G were minimally affected by the different land use/cover pattern with the scale aggregation, and the maximum absolute error was less than 1 W/m 2 . The maximum error of the H in the heterogeneous area was −35.99 W/m 2 at the spatial resolution of 120 m, while the maximum error of the H in the homogeneous area was −33.78 W/m 2 at the spatial resolution of 240 m.  Figure 10 shows the spatial distributions and basic statistics of the simulated ET on the different surfaces at different spatial scales. We found that the maximum ET values decreased from fine to coarser resolutions, independent of the land use/cover type. Higher spatial resolutions of 30 m, 60 m, and 120 m preserved the detailed spatial patterns and textural features. For spatial resolutions of 240 m, 480 m, and 960 m, the small surface features disappeared and the spatial structures were obscured, resulting in larger statistical errors in the ET estimation. This indicates that lower spatial resolutions, fewer pixels, and higher spatial variabilities, all contributed to the uncertainty in the ET estimation. Particularly, at a spatial resolution of 960 m, much spatial information was lost, and the spatial patterns, ranges, and magnitudes of ET were degraded.

Estimation of the ET from the Three Different Satellite Sensors
Multiple satellites have been launched to obtain Earth observations, with sensors with different spatial, temporal, spectral, and radiation resolutions would usually cause uncertainties in estimating surface fluxes. The ET estimated using multiple sensors (ETLandsat, ETMODIS, and ETAVHRR) with different spatial resolutions were compared in this study. As shown in Section 3.1, both the spatial characterization and frequency distribution of the results demonstrate the similar results between the Landsat TM and AVHRR In the homogeneous area, patches of the grassland with higher ET values were spatially smaller, while the barren land covered a larger area and had lower ET values. After spatial pixel aggregation, the grassland patches lost clear boundaries and produced drastically high and low ET values. The peak of the histogram shifted to a low-value area. The spatial means decreased from 1.24 mm/d at the spatial resolution of 30 m to 1.06 mm/d at the spatial resolution of 240 m. The spatial heterogeneity also increased with the scale aggregation and the standard deviation increased from 0.37 mm/d to 0.77 mm/d. Likewise, in the heterogeneous area, farmland and grassland were staggered after the scale aggregation, the peak of the histogram shifted to the high-value area, the mean and maximum ET values decreased, and the standard deviation increased from 0.41 mm/d at the 30 m spatial resolution to 0.74 mm/d at the spatial resolution of 960 m, which indicates that the pixel aggregation increased both ET estimation errors and spatial heterogeneity.

Estimation of the ET from the Three Different Satellite Sensors
Multiple satellites have been launched to obtain Earth observations, with sensors with different spatial, temporal, spectral, and radiation resolutions would usually cause uncertainties in estimating surface fluxes. The ET estimated using multiple sensors (ET Landsat , ET MODIS , and ET AVHRR ) with different spatial resolutions were compared in this study. As shown in Section 3.1, both the spatial characterization and frequency distribution of the results demonstrate the similar results between the Landsat TM and AVHRR datasets. However, the MODIS data had the largest uncertainty. The results are similar to those of other studies [14,37,70]. The largest uncertainty in the ET MODIS could be caused by the following reasons: (1) the original thermal infrared band of MODIS has a spatial resolution of 1 km, and re-sampling may increase its spatial variability; (2) the influence of wavelength range and input parameter calculation methods resulted in the lower ET values; and (3) the difference of about 0.26 mm/d in the spatial mean between the ET MODIS and ET Landsat is consistent with the 30 min time gap between the overpass times of the two satellites, the time gap leads to the variation in the EF, causing the difference in the daily ET.
The validation of the RS-based ET at large scales is difficult due to discrepancies in spatial resolution between the ground measurements and remote sensing datasets [30,70]. Generally, input parameters with a higher resolution correspond to the ET estimates with higher accuracy [18,71]. In this study, we observed that finer spatial resolutions did not match the highest ET accuracies. Gaur et al. [47] reported that a high spatial resolution may lead to the underestimation of ET values, and thus, a higher resolution dataset was not necessarily the best choice. This is caused by the mismatch between the flux tower site position and the pixel. The main contributing source area of the YLEOS determined by the footprint model was approximately 1263 m wide and 1682 m long [53], which matches the spatial resolution (1.1 km) of the AVHRR pixels and explains the reason for the higher simulation accuracy of the AVHRR data. Therefore, verifications of the ET simulations with in situ observations must consider the match of the spatial scales.

Aggregation Effects on the Energy Balance Flux Simulation
Land-atmosphere interaction processes respond non-linearly to changes in scale, and land surface heterogeneities make the scaling process more complicated [4,27,34]. The effect of the spatial resolution on the energy balance components in the SEBAL model was assessed in this study. The results indicate that the R n and G has the limited changes in scale, while the H and LE yielded relatively larger errors with the spatial scale aggregation. This was also reported by Su et al. [34], Ershadi et al. [30], and Ramírez-Cuesta et al. [27]. The R n is computed as the algebraic sum of the incoming and outgoing short wave and long wave radiation at the surface, and is closely related to the T s , T a , albedo, and NDVI (Equation (2) and Figure 2). Small changes in the T s and T a values were observed at an aggregated resolution ( Figure 6). The increases in the albedo and NDVI resulted in a small increase in the R n . The G was calculated with the empirical relation using the R n , albedo, and NDVI (Equation (3)), which are linearly combined, producing similar small decreasing trends.
Since aggregation tends to average out small surface features, it will introduce more mixed pixels. The difference between the aggregated imagery and the original fineresolution imagery increases with the aggregation level [18]. This explains why the CV of H and LE increased with the pixel size. As the SEBAL algorithm is non-linear, the changes in the spatial resolution could impact the turbulent heat flux calculations, resulting in a spatial scale discrepancy [34]. A relatively larger error in the aggregation found in the H could be explained by the combined effect of the aerodynamic resistance on heat transfer and the aggregated roughness properties of the surface (z om ) in the SEBAL model ( Figure 2 and Equations (4)- (7)). However, the U and T s underwent relatively small changes, suggesting that the aerodynamic resistance could be the reason for the changes in the H when the scale was aggregated. Differences in the H at different pixel resolutions highlight the important role of z om parameterization in the flux estimation, which is directly related to the changes in NDVI.
In addition, the acquisition of aerodynamic resistance requires local calibration by the user to select cold and hot pixels in specific image scenes using multiple iterative steps, which exhibits a highly nonlinear relationship. A sensitivity analysis of the SEBAL model confirmed that the changes in the H are very sensitive to the selection of hot and cold pixels [21,72]. Compared with the low-resolution data, the high-resolution satellite data have fewer mixed pixels and the extremely hot and cold pixels are more readily identified. When the scale is aggregated, (1) it changes the locations of the hot and cold pixels in the T s images at the various spatial resolutions; (2) it introduces more land surface information, causing the value of the hot and cold pixels to decrease as the scale is enlarged, which could result in large deviations in the slope and intercept of the linear relationship between dT and T s (Equations (4)-(6) and Figure 11), resulting in a larger error in the H. The non-linear changes in the cold and hot pixel values also explain why the multiple high and low values were present in the ET trend during the scale aggregation. Ts images at the various spatial resolutions; (2) it introduces more land surface information, causing the value of the hot and cold pixels to decrease as the scale is enlarged, which could result in large deviations in the slope and intercept of the linear relationship between dT and Ts (Equations (4)-(6) and Figure 11), resulting in a larger error in the H. The non-linear changes in the cold and hot pixel values also explain why the multiple high and low values were present in the ET trend during the scale aggregation.
In general, the variation in the surface temperature for the selected extreme pixels (hot pixels and cold pixels) at the different spatial resolution are a major reason for the ET uncertainty in the ET simulation. In general, the variation in the surface temperature for the selected extreme pixels (hot pixels and cold pixels) at the different spatial resolution are a major reason for the ET uncertainty in the ET simulation.

Influence of the Land Use/Cover Pattern on the ET Estimates during the Aggregation Process
ET estimates using remote sensing data are largely affected by plant structure, canopy characteristics, and the proportion of bare soil present in a pixel [47]. A scale change could alter the composition of land cover in a pixel, which creates differences in the ET variance. This study indicates that the land use/cover pattern has obvious effects on the ET simulation using the different RS products. In addition, we also determined that a different pattern of land use/cover exhibited different aggregation results. As Figures 1d and 10 show in the homogeneous areas, the aggregation of the small grassland patches with higher ET values to large barren land patches with lower ET values decreased the maximum value of the ET, which in turn decreases the spatial average values of the ET. The aggregation process in the heterogeneous areas is dominated by the mixing of the high ET farmland and low ET grassland patches, increasing the maximum value of the ET during the aggregation, and in turn raising the overall spatial means of the ET. We also found that the land cover has a much greater impact on the H and LE than on the R n and G (Table 5). Unexpectedly, we found that regardless of the complexity of the land use/cover pattern, the entire land surface affected the ET aggregation results. This is because the aggregation increases the number of mixed pixels over the homogeneous and heterogeneous areas and affects the selection of the hot and cold pixels, resulting in the changes to the ET simulation. This illustrates that the effect of the model mechanisms is greater than the impact of the land use/cover pattern in the SEBAL model simulation.

Limitations and Outlook
In this study, we examined the effect of spatial aggregation on the ET simulations. However, the transit times of satellites, spectral resolutions and parameter estimation methods may also affect the analysis of the multi-sensor estimated ET values. Future studies should choose datasets with the same or closer transit times to evaluate the scale aggregation effects. In addition, there is only one in situ flux site available in the study area of the present study. If feasible, in situ data from more flux sites need to be used in the verification of simulation results in future studies.

Conclusions
In this study, we investigated the spatial resolution influences on the ET estimation using the SEBAL model. The main conclusions of this study are as follows: 1.
The spatial distribution patterns of the ET estimates using Landsat TM, MODIS, and AVHRR datasets were similar, with MODIS having the largest uncertainties. The validation of the simulated ET must consider matching spatial scales of the satellite datasets.

2.
With scale aggregation, the spatial average values of the R n decreased slightly, while the G increased, and the spatial variabilities of both R n and G all decreased. The spatial mean values of the H and LE were relatively larger, but the relative error was less than 20%. This was due to the changes in the selection of the locations of the hot and cold pixels in the model mechanisms as well as the NDVI and surface albedo input parameters in the SEBAL model.

3.
Surface heterogeneity affected the spatial scale aggregation results, and differences in the land use/cover pattern led to the different aggregation results. However, the scale errors caused by the SEBAL model mechanisms were larger than that contributed by the land use/cover pattern during the scale aggregation.