Estimating Daily Actual Evapotranspiration at a Landsat-Like Scale Utilizing Simulated and Remote Sensing Surface Temperature

: Actual evapotranspiration (ET) with high spatiotemporal resolution is very important for the research on agricultural water resource management and the water cycle processes, and it is helpful to realize precision agriculture and smart agriculture, and provides critical references for agricultural layout planning. Due to the impact of the clouds, weather environment, and the orbital period of optical satellite, there are difﬁculties in providing daily remote sensing data that are not contaminated by clouds for estimating daily ET with high spatial-temporal resolution. By improving the enhanced spatial and temporal adaptive reﬂectance fusion model (ESTARFM), this manuscript proposes the method to fuse high temporal and low spatial resolution Weather Research and Forecasting (WRF) model surface skin temperature (TSK) with the low temporal and high spatial resolution remote sensing surface temperature for obtaining high spatiotemporal resolution daily surface temperature to be used in the estimation of the high spatial resolution daily ET (ET_WRF HR ). The distinction of this study from the previous literatures can be summarized as the novel application of the fusion of WRF-simulated TSK and remote sensing surface temperature, giving full play to the availability of model surface skin temperature data at any time and region, making up for the shortcomings of the remote sensing data, and combining the high spatial resolution of remote sensing data to obtain ET with high spatial (Landsat-like scale) and temporal (daily) resolution. The ET_WRF HR were cross-validated and quantitatively veriﬁed with MODIS ET products (MOD16) and observations (ET_Obs) from eddy covariance system. Results showed that ET_WRF HR not only better reﬂects the difference and dynamic evolution process of ET for different land types but also better identiﬁes the details of various ﬁne geographical objects. It also represented a high correlation with the ET_Obs by the R 2 amount reaching 0.9186. Besides, the RMSE and BIAS between ET_WRF HR and the ET_Obs are obtained as 0.77 mm/d and − 0.08 mm/d respectively. High R 2 , as well as the small RMSE and BIAS amounts, indicate that ET_WRF HR has achieved a very good performance.


Introduction
Estimation of actual evapotranspiration (ET) with high spatiotemporal resolution not only conduces to the macro-control of water resources and the research of water cycle process, but also helps to realize precision agriculture and smart agriculture, and provides important references for urban and agricultural layout planning [1][2][3]. In the long-term 2 of 19 and stable monitoring environment of meteorological stations, as long as the instruments do not fail, the long-term series of ET based on a single site is usually easy to achieve. Very high temporal resolution and relatively accurate results can be obtained by the advantage of traditional methods to calculate ET [4][5][6][7][8]. Remote sensing data makes the estimation of regional ET more efficient and convenient compared with the site data [9][10][11]. Surface energy balance system (SEBS) model based on remote sensing technology has been widely used in recent years to estimate ET at different time and space scales [12][13][14][15][16]. By ignoring the water demand for the organisms and the effect of near-surface advection, the latent heat flux is estimated as the residual of the energy balance equation [17,18]. The calculation of sensible heat flux based on the difference between surface temperature and air temperature is the core of SEBS model [19][20][21]. Using satellite-borne thermal imaging sensors to record the thermal infrared information of ground features for retrieval of surface temperature has become an effective method for dynamically monitoring spatial distribution of surface temperature in a large area [22][23][24][25], making up for the inadequacy of traditional monitoring methods in obtaining regional surface temperature [26]. However, due to their vulnerability to clouds, weather environment (such as cloudy, haze, or sandstorm) and revisit cycles, most optical remote sensing satellites cannot provide continuous daily high-quality monitoring data in any area, which makes it a difficulty to obtain continuous daily high-resolution ET using remote sensing data from only a single source [27].
Effective fusion algorithms [28][29][30][31][32] or temporal upscaling and reconstruction methods [33][34][35][36][37][38][39] have been used to construct continuous time-series of surface temperature based on remote sensing data for the calculation of ET. Some high temporal resolution ET products are released for such purpose, such as MODIS ET product (MOD16A2) with a temporal resolution of 8 days and a spatial resolution of 500 m [37]; ETWatch products, published by Heihe Plan Data Management Center (HPDMC), with monthly temporal resolution and 1 km spatial resolution [38]; ETMonitor products, published by HPDMC, with a temporal resolution of 8 days and a spatial resolution of 250 m [40]; Daily ET Datasets (2012), published by HPDMC, with daily temporal resolution and 1 km spatial resolution. Although these achieved good results, the spatial resolution of these products does not exceed 100 m, and not many temporal resolutions can reach continuous daily intervals [31,35,41,42].
An important difficulty in using SEBS model to estimate daily ET with high spatialtemporal resolution is to provide daily surface temperature with high spatial-temporal resolution. With this original intention, we tried to develop a scheme to achieve continuous daily high spatiotemporal resolution ET through the fusion of surface skin temperature (TSK) simulated by Weather Research and Forecasting (WRF) model and remote sensing land surface temperature (LST) retrieved from Landsat thermal infrared band. Wang et al. [43] have offered an NDVI-based ET correction method which forms a complete scheme for estimating reliable daily ET based on SEBS using daily TSK. On this basis, this paper will further explore a feasible scheme to estimate higher spatial-temporal daily actual ET. The ESTARFM data fusion algorithm is improved by adding a normalization module that is adapted to obtain daily surface temperature data with high spatiotemporal resolution. For such purpose, TSK with high time resolution and low spatial resolution from WRF and LST with low time resolution and high spatial resolution from Landsat satellite images were merged, giving full play to the advantages of the high temporal resolution of the model surface skin temperature and the high spatial resolution of Landsat surface temperature to realize the reliable estimation of continuous high spatial (Landsatlike scale) and temporal (daily) resolution ET during the growth season (May 1-September 29) over Zhangye Oasis in 2015. It should be noted that this research is regarded as an in-depth extension of Wang et al. [43], therefore, the study area and WRF model experiment design are the same as in Wang et al. [43], from which the difference is that this research will provide a scheme to obtain higher spatial-temporal resolution daily ET.

Description of the Study Area
The Heihe River Basin (HRB) is located in the middle of the Hexi Corridor in the arid and semi-arid regions of northwest China. It is about 390 km wide from east to west and 510 km long from north to south. It covers an area of approximately 143,000 km 2 , which is the second-largest inland river basin in China (Figure 1, left) [44]. With a total length of 928 km, the mainstream receives snowmelt supply from Qilian Mountain in the south and moistens the vast area of Ejina Banner in the north. Zhangye Oasis area (46 km × 46 km), shown in the right panel of Figure 1, is the study area, which is located in the HRB midstream where the desert-oasis landscape exists. The predominant crops in this region are mainly wheat and maize [31]. The annual average temperature, rainfall, and evaporation potentials for the study area are 7.5 • C, 136.8 mm, and 1840.1 mm respectively [45].
13, x FOR PEER REVIEW 3 of 19

Description of the Study Area
The Heihe River Basin (HRB) is located in the middle of the Hexi Corridor in the arid and semi-arid regions of northwest China. It is about 390 km wide from east to west and 510 km long from north to south. It covers an area of approximately 143,000 km 2 , which is the second-largest inland river basin in China (Figure 1, left) [44]. With a total length of 928 km, the mainstream receives snowmelt supply from Qilian Mountain in the south and moistens the vast area of Ejina Banner in the north. Zhangye Oasis area (46 km × 46 km), shown in the right panel of Figure 1, is the study area, which is located in the HRB midstream where the desert-oasis landscape exists. The predominant crops in this region are mainly wheat and maize [31]. The annual average temperature, rainfall, and evaporation potentials for the study area are 7.5 °C, 136.8 mm, and 1840.1 mm respectively [45].

Ground Measurements
Meteorological variables including wind speed, air temperature, relative humidity, air pressure, and solar net radiation are required for the estimation of daily ET, and 2 cm soil temperature (Ts) is used for surface temperature validation. The observations are collected from four automatic weather stations (AWS) (Figure 1), which were established in the midstream of HRB during the Heihe Watershed Allied Telemetry Experimental Research (HiWATER) program conducted in 2015 [46]. The underlying surface and the geographic information of the sites are summarized in Table 1.
Each AWS is equipped with sensors to collect meteorological data with 10-min intervals, among which, Daman and Huazhaizi Desert stations are additionally equipped with an eddy covariance system (EC) to collect data every 30 min. Meteorological data covering 1 May-29 September 2015, as the study period, were downloaded from Heihe Plan Data Management Center (HPDMC, http://www.heihedata.org/). Due to instrument failure, meteorological data at Huazhaizi Desert station was only available from 11 June to 29 September 2015. The meteorological data closest to the revisit time of MODIS satellite (10:00-12:00) is considered for interpolation using Inverse Distance Weighting (IDW) method by taking the elevation into consideration.

Ground Measurements
Meteorological variables including wind speed, air temperature, relative humidity, air pressure, and solar net radiation are required for the estimation of daily ET, and 2 cm soil temperature (T s ) is used for surface temperature validation. The observations are collected from four automatic weather stations (AWS) (Figure 1), which were established in the midstream of HRB during the Heihe Watershed Allied Telemetry Experimental Research (HiWATER) program conducted in 2015 [46]. The underlying surface and the geographic information of the sites are summarized in Table 1.  29 September 2015. The meteorological data closest to the revisit time of MODIS satellite (10:00-12:00) is considered for interpolation using Inverse Distance Weighting (IDW) method by taking the elevation into consideration.

Satellite Data
A total of 16 Landsat images (including Landsat7 and Landsat8) with no or a small portion of clouds during the growing season of Zhangye Oasis were screened out to retrieve high spatial resolution NDVI and LST data. The Neighborhood similar pixel interpolator (NSPI) algorithm [47] was used to fill the Landsat7 ETM+ image strips which were caused by the failure of scan line corrector (SLC) onboard Landsat7 satellite on 31 May 2003 [48], and then the modified NSPI algorithm [49] was adopted to remove the clouds in the Landsat7 ETM+ image and Landsat8 OLI image. The radiation transmission method [50] was used to retrieve LST, and then the retrieved LST was downscaled with the cubic convolution method to obtain the 30 m LST. The 30 m LST would then be used as one of the preparation data for reconstruction of continuous daily surface temperature by fusing with WRF TSK extracted at the same or the closest time of the Landsat satellite revisit. The MOD09A1 reflectance product with a time resolution of 8 days during the growing season is used to retrieve time-series of NDVI with a spatial resolution of 500 m. The 30 m and 500 m NDVI are used to obtain NDVI with an interval of 8 days and spatial resolution of 30 m based on the Enhanced Spatial and Temporal Adaptive Reflectance Fusion Model (ESTARFM), and 30 m Digital Elevation Model (DEM) data from ASTER GDEM V2 is used to provide terrain information. The satellite data specifications used in the study are listed in Table 2. The hourly TSK simulated by the WRF model during the growing season of Zhangye oasis was adopted for the study. The resolution of the land surface information elements of the WRF model is too coarse, data generation time is too old to accurately describe the current surface characteristics, and data accuracy are not high in China. For this reason, the land surface information elements of WRF model including soil type, soil hydraulic properties table, land cover type, terrain height, leaf area index, surface albedo, and annual maximum snow albedo are updated and improved as the first step before performing the WRF simulation experiments. The TSK closest to the revisit time of the Landsat satellite is extracted for the fusion process with the Landsat surface temperature.

WRF Model Experiments
WRFv3.9.1 is utilized for the three nested domains namely D01, D02, and D03 with horizontal grid spacing of 25 km, 5 km, and 1 km respectively ( Figure 2). The outermost mesh for D01 is slightly larger than the HRB with center longitude and latitude of 100 • E and 40 • N respectively, while D02 is smaller than the HRB. D03 represents the Zhangye oasis area, the study area of this research. The simulation period is selected as 1 May-30 September 2015. The model output results are set hourly while the spin-up time for the first 48 h to stabilize the model. Table 3 summarizes the WRF configuration and physical parameterization for the study area.

Modifications to ESTARFM Fusion Algorithm
ESTARFM is an algorithm for the reconstruction of high spatial and temporal resolution data under various surface conditions including uniform and non-uniform surface [51], which is considered for the fusion of high temporal and low spatial resolution MODIS reflectance with low temporal and high spatial resolution Landsat reflectance to benefit from the advantage of high temporal resolution MODIS and high spatial resolution Landsat data [51][52][53].
This manuscript improved ESTARFM algorithm by adding a normalization module to obtain the daily surface temperature with high spatiotemporal resolution; water bodies and wetland are excluded when performing normalization. One of the key processes is to fuse TSK with high temporal and low spatial resolution with Landsat surface temperature with low temporal and high spatial resolution, and the fusion results are then normalized based on TSK to obtain surface temperature with high spatiotemporal resolution. TSK and

Modifications to ESTARFM Fusion Algorithm
ESTARFM is an algorithm for the reconstruction of high spatial and temporal resolution data under various surface conditions including uniform and non-uniform surface [51], which is considered for the fusion of high temporal and low spatial resolution MODIS reflectance with low temporal and high spatial resolution Landsat reflectance to benefit from the advantage of high temporal resolution MODIS and high spatial resolution Landsat data [51][52][53].
This manuscript improved ESTARFM algorithm by adding a normalization module to obtain the daily surface temperature with high spatiotemporal resolution; water bodies and wetland are excluded when performing normalization. One of the key processes is to fuse TSK with high temporal and low spatial resolution with Landsat surface temperature with low temporal and high spatial resolution, and the fusion results are then normalized based on TSK to obtain surface temperature with high spatiotemporal resolution. TSK and Landsat surface temperatures should be resampled to the same resolution and coordinate system before running ESTARFM. The assumptions are considered to be satisfied as the surface temperature change rate a can be thought as stable and the surface temperature linearly changes from t m to t n during a short time period [51].
WRF TSK needs to be pre-processed before the execution of the algorithm to ensure that the prepared data for model operation have the same dimensions as Landsat LST. The lines and the samples keep the same and the bilinear interpolation method is selected for resampling to reduce the impact of geographic reference errors. The process of acquiring surface temperature with the high spatiotemporal resolution is provided in Figure 3. The first step is to screen out the similar pixels as the one in the center of the window based on the provided Landsat surface temperature data at date t m and t n . The second step is to input the TSK of t m and t n which has been resampled to the same spatial resolution and coordinate system as Landsat surface temperature to calculate the similar pixel weight. In the third step, the conversion coefficient between Landsat surface temperature and WRF TSK is solved by the weighted least square method. The fourth step is to provide the WRF TSK at date t p to output the fusion result with the same spatial resolution as t p Landsat surface temperature. Finally, based on re-sampled WRF TSK, the Equation (1) is used to normalize the fusion results obtained in the fourth step according to different types of underlying surfaces to retrieve high-resolution surface temperature data.
where F j is the surface temperature of the No.j land cover type after normalization; F (DOY,j) is the surface temperature of the No.j land cover type that came from the fusion of TSK and the surface temperature retrieved by remote sensing; is the No.i pixel value in the No.j land cover type after the TSK is resampled to the same resolution as the remote sensing surface temperature; N is the total number of pixels of the No.j land cover type.
that the prepared data for model operation have the same dimensions as Land The lines and the samples keep the same and the bilinear interpolation method is for resampling to reduce the impact of geographic reference errors. The process o ing surface temperature with the high spatiotemporal resolution is provided in The first step is to screen out the similar pixels as the one in the center of the based on the provided Landsat surface temperature data at date and . Th step is to input the TSK of and which has been resampled to the same sp olution and coordinate system as Landsat surface temperature to calculate the pixel weight. In the third step, the conversion coefficient between Landsat surf perature and WRF TSK is solved by the weighted least square method. The fourt to provide the WRF TSK at date to output the fusion result with the same spa lution as Landsat surface temperature. Finally, based on re-sampled WRF Equation (1) is used to normalize the fusion results obtained in the fourth step a to different types of underlying surfaces to retrieve high-resolution surface tem data.
where is the surface temperature of the No. land cover type after no tion; ( , ) is the surface temperature of the No. land cover type that came from sion of TSK and the surface temperature retrieved by remote sensing; ( , ) is the N value of ( , ) ; ( , ) is the No. pixel value in the No. land cover type after th resampled to the same resolution as the remote sensing surface temperature; N is number of pixels of the No. land cover type.
The relevant parameters are set accordingly while the model is executed stance, the half-width of the window is set to 25 so the actual size of the window + 1 = 51 pixels according to the land cover data provided by HPDMC. The total ca number of terminal pixels is set to 7 to improve the calculation speed. Lastly, t mum number of similar pixels to be screened out is set to 20. The image is crop blocks with the row and column size of 500 × 500 and then the calculation is perf  pixels according to the land cover data provided by HPDMC. The total categories number of terminal pixels is set to 7 to improve the calculation speed. Lastly, the maximum number of similar pixels to be screened out is set to 20. The image is cropped into blocks with the row and column size of 500 × 500 and then the calculation is performed.

ET Estimation Scheme for High Spatiotemporal Surface Temperature
Remote sensing surface temperature is an important input of the SEBS model for estimating ET. It has been declared in Wang et al. [43] that ET obtained by using WRF TSK to directly replace remote sensing surface temperature input SEBS model is generally high in the whole area. Furthermore, the NDVI correction method is proposed to estimate reliable ET, which is also adopted in this research. To achieve daily ET with high spatialtemporal resolution, ET is estimated as the first step based on the SEBS model by using the surface temperature with high spatiotemporal resolution obtained by the fusion algorithm. Then, multiply the ET obtained in the first step by NDVI, i.e., Equation (7) in Wang et al. [43], while the correction target does not include the water area.
The key of SEBS model is to use the difference between the surface and air temperature to calculate the sensible heat flux through iteration (see Equation (2)), and then calculate the daily actual ET (see Equation (4)) according to the energy balance residual (see Equation (3)). The details about the algorithm process of the SEBS model in the estimation of the ET are referred to Su [17].
where R n is the net radiation; G 0 is the soil heat flux; H is the turbulent sensible heat flux; λE daily is the turbulent latent heat flux; λ is the latent heat of the vaporization and E daily is the daily actual ET; ρ is air density; C p is air specific heat; r ah is the aerodynamic resistance to heat transport; T s is the surface temperature; T a is the air temperature near the surface at a reference height of 2 m; Λ 24 0 is the daily evaporative fraction; ρ w is the density of water.

Evaluation Method
The statistical indicators including the Root Mean Square Error (RMSE), the bias (BIAS), and the coefficient of determination (R 2 ) are adopted to evaluate the accuracy of the simulation including WRF TSK and estimated ET as the calculation formulas as follows [54]: where S i is No.i day simulation and O i is No.i day measurement. S and O represents the mean of simulations and measurements respectively while n is the number of days.

Evaluation of High Spatiotemporal Resolution Surface Temperature
Both Sohrabinia et al. [55] and Wang et al. [43] used 2 cm soil temperature (T s ) to verify WRF TSK. T s was also used to evaluate the surface temperature with high spatial and temporal resolution obtained by fusion (TSK f ) in this research. Verification results in Figure  4 show that TSK f have good correlation with T s at each site. R 2 is obtained to be 0.6467, 0.7102 and 0.7613 respectively. RMSE and BIAS between TSK f and T s at Daman are the smallest, being 2.26 • C and −0.34 • C, respectively. Dynamic evolution trend and correlation ( Figure 5) between WRF TSK (before fusion) and T s was compared to infer the possible reason for high RMSE found at Heihe Remote Sensing Station and Huazhaizi. Results show that the BIAS of WRF TSK at Heihe Remote Sensing Station and Huazhaizi Station are both negative, and have high RMSE, indicating that WRF TSK is generally lower than T s , which may lead to high RMSE of the TSK f . The surface skin temperature (TSK) is simulated by Noah Land Surface Model (LSM) in WRF model, in which soil and vegetation are regarded as a whole [56], therefore, the simulation of TSK in vegetation and non-vegetation areas may show different accuracy. Accurate land heterogeneity will conduct reliable and accurate numerical modeling [57,58], improving the accuracy, timeliness and spatial resolution of parameters (such as soil texture, soil water content, and soil hydrological parameter tables) related to TSK simulation in the Noah LSM and is one of the important ways to enhance the accuracy of WRF TSK. Moreover, mathematical fitting methods or bias correction can be used to calibrate the simulated data (including WRF TSK and Landsat LST) is an efficient means to improve the accuracy of the TSK f . The BIAS between TSK f and T s at the three sites are all less than zero, which also indicates that TSK f is generally lower than T s at the three sites. Comparison with the observations shows that TSK f have achieved a good fusion effect, greatly improved the spatial resolution of the WRF TSK, and will be used for the estimation of daily ET with high spatiotemporal resolution.  [56], therefore, the simulation of TSK in vegetation and non-vegetation areas may show different accuracy. Accurate land heterogeneity will conduct reliable and accurate numerical modeling [57,58], improving the accuracy, timeliness and spatial resolution of parameters (such as soil texture, soil water content, and soil hydrological parameter tables) related to TSK simulation in the Noah LSM and is one of the important ways to enhance the accuracy of WRF TSK. Moreover, mathematical fitting methods or bias correction can be used to calibrate the simulated data (including WRF TSK and Landsat LST) is an efficient means to improve the accuracy of the TSKf. The BIAS between TSKf and Ts at the three sites are all less than zero, which also indicates that TSKf is generally lower than Ts at the three sites. Comparison with the observations shows that TSKf have achieved a good fusion effect, greatly improved the spatial resolution of the WRF TSK, and will be used for the estimation of daily ET with high spatiotemporal resolution.    Figure 6 gives out spatial variation of high spatiotemporal ET throughout the whole growing season, and the water bodies maintain a higher ET. The bare land exhibits lower ET amounts and the farmlands present significant changes over time. Since the period was still in the stage of crop planting in early May, the farmland as a whole represented a lower ET and then increased relatively by the end of May. As the growth of crops and the gradual increase of temperature, ET of farmland increased significantly in June and formed a strong contrast with the lower ET for the bare land regions. ET is generally higher in July and August when the vegetation grows vigorously. It reaches the maximum at the end of July and early August and then begins to decline as the crops gradually mature to the end of August. ET began to decrease significantly in September with the gradual harvesting of crops, the cessation of irrigation, and the drop in temperature. The end of September was the period holding the lowest ET during the entire growing season when ET of farmland and bare land in the whole oasis area basically had no significant difference in spatial variation.  Figure 6 gives out spatial variation of high spatiotemporal ET throughout the whole growing season, and the water bodies maintain a higher ET. The bare land exhibits lower ET amounts and the farmlands present significant changes over time. Since the period was still in the stage of crop planting in early May, the farmland as a whole represented a lower ET and then increased relatively by the end of May. As the growth of crops and the gradual increase of temperature, ET of farmland increased significantly in June and formed a strong contrast with the lower ET for the bare land regions. ET is generally higher in July and August when the vegetation grows vigorously. It reaches the maximum at the end of July and early August and then begins to decline as the crops gradually mature to the end of August. ET began to decrease significantly in September with the gradual harvesting of crops, the cessation of irrigation, and the drop in temperature. The end of September was the period holding the lowest ET during the entire growing season when ET of farmland and bare land in the whole oasis area basically had no significant difference in spatial variation.   Figure 7 displays the dynamic evolution of the averaged daily ET (ET avg ) variation of various land cover types for the growing season. Overall, the ET avg variation of various land cover types showed a clear stratification phenomenon, in which the ET avg of water bodies has always remained the highest throughout the growing season while the dynamic change is not significant. The ET avg of the deciduous broadleaf forest is close to the evergreen needle leaf forest and both of them are slightly higher than that of croplands. As a vegetation type, the ET avg of the deciduous broadleaf forest, evergreen needle leaf forest, croplands, and grasslands changed significantly over time, presenting a very similar consistent trend. The ET avg of urban and built-up is lower than that of grasslands but it is higher than that of barren or sparsely vegetated, the lowest ET avg of which is commonly observed in the growing season. During the whole growing season, ET avg of water bodies keep high level all the time. They begin to decline at the end of the growing season probably due to the decrease of air temperature. The rest of the land types showed a similar trend throughout the growing season. ET avg was relatively low at the beginning of the growing season in early May. It started to accelerate by the end of May and continued till mid-June and then the growth rate decreased. By the vigorous vegetation growth in July, ET avg gradually reached to the maximum and began to decline at the end of August. The ET avg in various land types decreased significantly and eventually dropped to the lowest level in September. The highest ET avg of various land types appeared in DOY200-220 (19 July-8 August).  Figure 7 displays the dynamic evolution of the averaged daily ET (ETavg) variation of various land cover types for the growing season. Overall, the ETavg variation of various land cover types showed a clear stratification phenomenon, in which the ETavg of water bodies has always remained the highest throughout the growing season while the dynamic change is not significant. The ETavg of the deciduous broadleaf forest is close to the evergreen needle leaf forest and both of them are slightly higher than that of croplands. As a vegetation type, the ETavg of the deciduous broadleaf forest, evergreen needle leaf forest, croplands, and grasslands changed significantly over time, presenting a very similar consistent trend. The ETavg of urban and built-up is lower than that of grasslands but it is higher than that of barren or sparsely vegetated, the lowest ETavg of which is commonly observed in the growing season. During the whole growing season, ETavg of water bodies keep high level all the time. They begin to decline at the end of the growing season probably due to the decrease of air temperature. The rest of the land types showed a similar trend throughout the growing season. ETavg was relatively low at the beginning of the growing season in early May. It started to accelerate by the end of May and continued till mid-June and then the growth rate decreased. By the vigorous vegetation growth in July, ETavg gradually reached to the maximum and began to decline at the end of August. The ETavg in various land types decreased significantly and eventually dropped to the lowest level in September. The highest ETavg of various land types appeared in DOY200-220 (19 July-8 August).

Discussion
Evaluation of high spatiotemporal resolution ET (i.e., ET_WRFHR) variation is mainly carried out from qualitative and quantitative aspects. The qualitative evaluation includes a comparison with available ET products and the reflection of ET_WRFHR on the refinement of ground features and the dynamic evolution process. However, no ET products with such a high spatial and temporal resolution covering the study area during the growing season have been released internationally, for which the MODIS ET product

Discussion
Evaluation of high spatiotemporal resolution ET (i.e., ET_WRF HR ) variation is mainly carried out from qualitative and quantitative aspects. The qualitative evaluation includes a comparison with available ET products and the reflection of ET_WRF HR on the refinement of ground features and the dynamic evolution process. However, no ET products with such a high spatial and temporal resolution covering the study area during the growing season have been released internationally, for which the MODIS ET product (MOD16A2, i.e., ET_MOD16) was adopted to compare with ET_WRF HR from the perspective of spatial distribution and details. Finally, the ET_WRF HR was quantitatively verified based on ET observations.

Comparison with MOD16 in Identifying Fine Features
ET_MOD16 product from 28 July to 4 August of 2015 is selected as the vegetation growth is vigorous and ET is strong in this period. The ET_WRF HR is accumulated for the same period to get 8 days of ET (denoted as ET_WRF HR _8d) to maintain the same time scale with ET_MOD16. Five target areas (Figure 8) are designated for the comparison of refined manifestations. The ET_MOD16 and ET_WRF HR _8d present similar spatial patterns for the area considered. ET_MOD16 provides no ET values in the non-vegetation area while ET_WRF HR _8d reflects ET of various ground features with a higher spatial resolution.

Comparison with MOD16 in Identifying Fine Features
ET_MOD16 product from 28 July to 4 August of 2015 is selected as the vegetation growth is vigorous and ET is strong in this period. The ET_WRFHR is accumulated for the same period to get 8 days of ET (denoted as ET_WRFHR_8d) to maintain the same time scale with ET_MOD16. Five target areas (Figure 8) are designated for the comparison of refined manifestations. The ET_MOD16 and ET_WRFHR_8d present similar spatial patterns for the area considered. ET_MOD16 provides no ET values in the non-vegetation area while ET_WRFHR_8d reflects ET of various ground features with a higher spatial resolution.
The finer scale comparison of the five targets is provided in Figure 9. Whether it is circular irrigation area (Target1 and Target2), lake (Target3 and Target4) or fine features between lakes like the non-aqueous zone in the center of the lake (Target3), narrow zone between the lakes (Target4), or farmland planting structure, ET_MOD16 product cannot effectively identify them, while ET_WRFHR_8d product can better describe these fine features. The cross comparison clearly shows that ET_WRFHR_8d behave with a distinct advantage over ET_MOD16 in identifying the heterogeneity of the underlying surface. The finer scale comparison of the five targets is provided in Figure 9. Whether it is circular irrigation area (Target1 and Target2), lake (Target3 and Target4) or fine features between lakes like the non-aqueous zone in the center of the lake (Target3), narrow zone between the lakes (Target4), or farmland planting structure, ET_MOD16 product cannot effectively identify them, while ET_WRF HR _8d product can better describe these fine features. The cross comparison clearly shows that ET_WRF HR _8d behave with a distinct advantage over ET_MOD16 in identifying the heterogeneity of the underlying surface.

The Refined Embodiment of High Spatiotemporal ET
The refinement embodiment of ET_WRF HR on a daily scale is elaborated from the aspects of spatial distribution and dynamic evolution in this section. Figure 10 clearly shows that ET_WRF HR can better reflect the contour of circular irrigation area in Target2. It can also identify the non-vegetation area and different irrigation areas as well as the fine boundaries between the vegetation area and the non-vegetation area within the irrigation area (see the red line in Figure 10). Target2 (August 4) ET_MOD16 ET_WRFHR_8d

The Refined Embodiment of High Spatiotemporal ET
The refinement embodiment of ET_WRFHR on a daily scale is elaborated from the aspects of spatial distribution and dynamic evolution in this section. (shown by the yellow line in Figure 11), and both Figures 10 and 11 indicate that ET_WRFHR also represents a good reflection of the roads between the irrigation area and farmland, which generally reflects the refined presentation of ET_WRFHR on the spatial distribution difference of ET.

ET Dynamic Evolution of Target Geography Objects
The ET_WRFHR for the Target1 in 25 July and 4 August 2015 are provided in Figure  12. It shows that the ring-shaped features in the yellow circle on July 25 present a certain amount of ET and are successfully recognized by ET_WRFHR. Dynamic changes of the farmlands in the 25 July and 4 August images indicate that the vegetation coverage for some of the circular irrigation farmlands decreased significantly. Contrary, vegetation coverage in the red circle did not change and the ET_WRFHR product reflects this phenomenon very well. There are red stripes in the ET_WRFHR product on 4 August. These are caused by the incomplete stripe removal of the Landsat7 ETM+ satellite image and they do not affect the analysis of the refinement of ET_WRFHR.  Figure 11 shows that ET_WRF HR presents a good performance in the identification of interphase distributed farmland (red box) and the farmland with dense planting structure (shown by the yellow line in Figure 11), and both Figures 10 and 11 indicate that ET_WRF HR also represents a good reflection of the roads between the irrigation area and farmland, which generally reflects the refined presentation of ET_WRF HR on the spatial distribution difference of ET.
R PEER REVIEW 14 of 19 (shown by the yellow line in Figure 11), and both Figures 10 and 11 indicate that ET_WRFHR also represents a good reflection of the roads between the irrigation area and farmland, which generally reflects the refined presentation of ET_WRFHR on the spatial distribution difference of ET.

ET Dynamic Evolution of Target Geography Objects
The ET_WRFHR for the Target1 in 25 July and 4 August 2015 are provided in Figure  12. It shows that the ring-shaped features in the yellow circle on July 25 present a certain amount of ET and are successfully recognized by ET_WRFHR. Dynamic changes of the farmlands in the 25 July and 4 August images indicate that the vegetation coverage for some of the circular irrigation farmlands decreased significantly. Contrary, vegetation coverage in the red circle did not change and the ET_WRFHR product reflects this phenomenon very well. There are red stripes in the ET_WRFHR product on 4 August. These are caused by the incomplete stripe removal of the Landsat7 ETM+ satellite image and they do not affect the analysis of the refinement of ET_WRFHR.

ET Dynamic Evolution of Target Geography Objects
The ET_WRF HR for the Target1 in 25 July and 4 August 2015 are provided in Figure 12. It shows that the ring-shaped features in the yellow circle on July 25 present a certain amount of ET and are successfully recognized by ET_WRF HR . Dynamic changes of the farmlands in the 25 July and 4 August images indicate that the vegetation coverage for some of the circular irrigation farmlands decreased significantly. Contrary, vegetation coverage in the red circle did not change and the ET_WRF HR product reflects this phenomenon very well. There are red stripes in the ET_WRF HR product on 4 August. These are caused by the incomplete stripe removal of the Landsat7 ETM+ satellite image and they do not affect the analysis of the refinement of ET_WRF HR .

Quantitative Comparison with ET_MOD16
ET_MOD16 and ET_WRFHR_8d were compared and verified on an 8-day scale and the same spatial scale ( Figure 13). Results demonstrate that ET_WRFHR_8d has a very high correlation with the observations (ET_Obs_8d), the coefficient of determination (R 2 ) reaches 0.9426, and the RMSE and BIAS are relatively low-respectively 5.58 mm/8 d and −0.63 mm/8 d-showing very good performance. In contrast, the correlation between ET_MOD16 and ET_Obs_8d is not so good as the former, whose R 2 is 0.6765, and shows relatively high RMSE and BIAS, which are 11.37 mm/8 d and −6.57 mm/8 d, respectively. Higher BIAS indicates ET_MOD16 underestimates the ET overall, while the lower BIAS indicates that ET_WRFHR_8d comes closer to the observations than ET_MOD16.

Quantitative Verification with Observations
ET_WRFHR is compared with the ET observations (ET_Obs) in Figure 14. ET_WRFHR and ET_Obs keep a consistent dynamic evolution trend throughout the growing season ( Figure 14a) in each station. ET_WRFHR provided relatively lower ET amounts compared to ET_Obs in Huazhaizi Desert Station, while the difference between the two is not significant. Figure 14b shows the scatter diagram of the ET_WRFHR and ET_Obs for both sites. R 2 reached 0.9186, while relatively small RMSE and BIAS amounts are obtained as 0.77 mm and −0.08 mm respectively. The higher R 2 , as well as the smaller RMSE and BIAS, indicate that ET_WRFHR provides pretty good ET estimation. However, some underestimations were found in high values, while some overestimations were observed in low values. According to the results in Figure 4 and the analysis of the principle of SEBS, the possible reason for this phenomenon is that the surface temperature obtained by fusion is underestimated in low vegetation coverage areas (such as Huazhaizi Station), which makes the difference between the surface and air temperature smaller, leading to lower

Quantitative Comparison with ET_MOD16
ET_MOD16 and ET_WRF HR _8d were compared and verified on an 8-day scale and the same spatial scale ( Figure 13). Results demonstrate that ET_WRF HR _8d has a very high correlation with the observations (ET_Obs_8d), the coefficient of determination (R 2 ) reaches 0.9426, and the RMSE and BIAS are relatively low-respectively 5.58 mm/8 d and −0.63 mm/8 d-showing very good performance. In contrast, the correlation between ET_MOD16 and ET_Obs_8d is not so good as the former, whose R 2 is 0.6765, and shows relatively high RMSE and BIAS, which are 11.37 mm/8 d and −6.57 mm/8 d, respectively. Higher BIAS indicates ET_MOD16 underestimates the ET overall, while the lower BIAS indicates that ET_WRFHR_8d comes closer to the observations than ET_MOD16.

Quantitative Verification with Observations
ET_WRF HR is compared with the ET observations (ET_Obs) in Figure 14. ET_WRF HR and ET_Obs keep a consistent dynamic evolution trend throughout the growing season ( Figure 14a) in each station. ET_WRF HR provided relatively lower ET amounts compared to ET_Obs in Huazhaizi Desert Station, while the difference between the two is not significant. Figure 14b shows the scatter diagram of the ET_WRF HR and ET_Obs for both sites. R 2 reached 0.9186, while relatively small RMSE and BIAS amounts are obtained as 0.77 mm and −0.08 mm respectively. The higher R 2 , as well as the smaller RMSE and BIAS, indicate that ET_WRF HR provides pretty good ET estimation. However, some underestimations were found in high values, while some overestimations were observed in low values. According to the results in Figure 4 and the analysis of the principle of SEBS, the possible reason for this phenomenon is that the surface temperature obtained by fusion is underestimated in low vegetation coverage areas (such as Huazhaizi Station), which makes the difference between the surface and air temperature smaller, leading to lower sensible heat flux, so that the latent heat flux reaches too high; while in dense vegetation coverage areas (such as Daman Station), the situation may be just the opposite.

Conclusions
It is often difficult to provide continuous daily remote sensing surface temperature due to the limitation of the environment, weather, and the orbital period of the optical remote sensing satellite when estimating continuous daily ET with high spatiotemporal resolution. Improvement of the ESTARFM data fusion algorithm is proposed in this research by adding a normalization module to achieve high spatiotemporal resolution surface temperature for estimating the continuous daily high spatial resolution ET. The most important difference of this approach between previous studies on estimating high spatiotemporal resolution ET is the novel application of the WRF model surface skin temperature, which gives full play to the availability of model surface temperature data at any

Conclusions
It is often difficult to provide continuous daily remote sensing surface temperature due to the limitation of the environment, weather, and the orbital period of the optical remote sensing satellite when estimating continuous daily ET with high spatiotemporal resolution. Improvement of the ESTARFM data fusion algorithm is proposed in this research by adding a normalization module to achieve high spatiotemporal resolution surface temperature for estimating the continuous daily high spatial resolution ET. The most important difference of this approach between previous studies on estimating high spatiotemporal resolution ET is the novel application of the WRF model surface skin temperature, which gives full play to the availability of model surface temperature data at any time and region, makes up for the shortcomings of the remote sensing data, and combines with the advantages of the high spatial resolution of remote sensing surface temperature.
Comparisons showed that the ET_WRF HR is better than the MODIS ET products in terms of temporal and spatial resolution. ET_WRF HR better reflects ET details for different land types and fine geographical objects, accurately captures the dynamic evolution of ET of various geographical objects and indicates a greater advantage in identifying the heterogeneity of the underlying surface.
Quantitative evaluation results showed that the ET_WRF HR and the ET observations not only kept a consistent dynamic evolution trend throughout the growing season but also hold very good correlation. High R 2 , as well as the small RMSE and BIAS, indicate that the ET_WRF HR achieved good performance. The study area for this research is located in arid and semi-arid areas, soil water stress is relatively strong and the contribution of vegetation transpiration to ET is relatively large. The scheme of NDVI correction achieved good results under this climatic condition. Findings in this study are expected to guide future work to test and verify the applicability of the same scheme in larger research areas and other climatic conditions such as the humid climatic region.