E ﬀ ects of Distinguishing Vegetation Types on the Estimates of Remotely Sensed Evapotranspiration in Arid Regions

: Accurate estimates of evapotranspiration (ET) in arid ecosystems are important for sustainable water resource management due to competing water demands between human and ecological environments. Several empirical remotely sensed ET models have been constructed and their potential for regional scale ET estimation in arid ecosystems has been demonstrated. Generally, these models were built using combined measured ET and corresponding remotely sensed and meteorological data from diverse sites. However, there are usually different vegetation types or mixed vegetation types in these sites, and little information is available on the estimation uncertainty of these models induced by combining different vegetation types from diverse sites. In this study, we employed the most popular one of these models and recalibrated it using datasets from two typical vegetation types (shrub Tamarix ramosissima and arbor Populus euphratica ) in arid ecosystems of northwestern China. The recalibration was performed in the following two ways: using combined datasets from the two vegetation types, and using a single dataset from specific vegetation type. By comparing the performance of the two methods in ET estimation for Tamarix ramosissima and Populus euphratica , we investigated and compared the accuracy of ET estimation at the site scale and the difference in annual ET estimation at the regional scale. The results showed that the estimation accuracy of daily, monthly, and yearly ET was improved by distinguishing the vegetation types. The method based on the combined vegetation types had a great influence on the estimation accuracy of annual ET, which overestimated annual ET about 9.19% for Tamarix ramosissima and underestimated annual ET about 11.50% for Populus euphratica . Furthermore, substantial difference in annual ET estimation at regional scale was found between the two methods. The higher the vegetation coverage, the greater the difference in annual ET. Our results provide valuable information on evaluating the estimation accuracy of regional scale ET using empirical remotely sensed ET models for arid ecosystems.


Introduction
Evapotranspiration (ET) is the water transferred from land surfaces to the atmosphere through surface evaporation and plant transpiration [1]. As  is of great significance to a wide range of water-related research and applications [2,3]. Accurate estimates of regional scale ET are needed for sustainable water resource management, particularly for arid ecosystems due to competing demands for water resources among agricultural irrigation, public and domestic needs, industrial production, and ecological environments [4,5]. In recent decades, some empirical remotely sensed ET models have been developed [6][7][8][9][10][11][12][13][14] and their potential for regional scale ET estimation in arid ecosystems has been demonstrated [15][16][17][18][19][20][21][22][23][24][25][26]. These models extrapolate ET observed or estimated at the site scale to regional scale based on the empirical relationship constructed at the local site scale, which relates daily ET from the eddy covariance or Bowen ratio flux towers to vegetation indices (VIs) and meteorological data [27][28][29]. Since the empirical relationship constructed at the site scale plays an extremely fundamental role, its estimation accuracy directly affects the regional scale ET estimation on a long time scale. Therefore, it is necessary to evaluate and quantify the estimation uncertainty of these empirical remotely sensed ET models at site and regional scales. These empirical remotely sensed ET models are usually built at the site scale in the following steps: (1) observing the site-specific daily ET using eddy covariance or Bowen ratio flux towers; (2) collating corresponding remotely sensed data and meteorological data such as normalized difference vegetation index (NDVI), enhanced vegetation index (EVI), average daily land surface temperature (T s,d ), nighttime land surface temperature (T s,n ), daily reference evapotranspiration (ET 0 ) and maximum daily air temperature (T a,m ), and so on; (3) combining daily datasets from diverse sites and different vegetation types; and (4) selecting an appropriate model structure, calibrating the model's empirical parameters by regression analysis, and validating the model. Some representative models are summarized in Table 1.  [7] ET = a 1 − e −b×EVI + ce d×Ts,n + e Mesquite woodland; Sacaton grassland; Mixed mesquite/sacaton shrubland. [9] ET = a × ET 0−BC × EVI * Saltcedar; Cottonwood; Arrowweed; Quailbush shrubs; Screwbean Mesquite; Alfalfa. [10] ET = ET 0 a 1 − e −b×EVI − c Mesquite woodland; Mesquite shrubland; Sacaton grass; Aafalf; Cotton; Saltcedar; Reed; Crops. [11] ET = a 1 − e −b×EVI * × e c×Ts,n + d Mesquite woodland; Mesquite shrubland; Sacaton grassland; Brunchgrass; Mesquite savannah. [12] ET = a 1+e − ET 0 ×EVI−b c Mesquite savanna; Grass; Forbs; Larrea tridentata, Parthenium incanum; Acacia constricta; E. lehmanniana; Prosopis velutina. [13] In these models, coefficients a, b, c, d, e, and f are the empirical parameters; ET 0-BC and ET 0 are the reference evapotranspiration calculated from the Blaney Criddle formula or the Penman-Monteith equation [30]; EVI* is the scaled EVI, which converts the lowest EVI (EVI min ) in the dataset to 0 and the highest (EVI max ) to 1 using the formula EVI* = 1 − (EVI max − EVI)/(EVI max − EVI min ).
Notably, in the process of obtaining the model's empirical parameters by regression analysis, the common practice is to calibrate empirical remotely sensed ET models using the combined daily datasets from diverse sites that usually have different vegetation types or mixed vegetation types. However, under the condition of frequent hydrological fluctuations induced through natural flow variations in arid regions, different vegetation types have different water-consumption characteristics [31,32], water use strategies [33][34][35], and eco-physiological regulation mechanisms [36][37][38][39][40]. Whether combining datasets of different vegetation types affects the accuracy of ET estimation is a question that needs further analysis. The riparian forests in the arid areas of northwestern China are typically composed of two vegetation types: Tamarix ramosissima (shrub vegetation) and Populus euphratica (arbor vegetation) [41,42]. They are often characterized by discontinuous and plaque-like distribution in space, and have different water-consumption characteristics [31,32]. This particular surface condition makes it possible to calibrate empirical remotely sensed ET models using a dataset from specific vegetation type. Based on the models for specific vegetation type, the estimation uncertainty of the commonly used models based on combined vegetation types could be evaluated and quantified.
In this study, we first employed the most popular one of these empirical remotely sensed ET models, and then recalibrated it based on combined vegetation types and specific vegetation type, respectively. Finally, we investigated and compared the estimation uncertainty of the two methods at the site and regional scales. The primary objectives of this study were to (1) evaluate the effects of distinguishing vegetation types on the accuracy of ET estimation at the site scale; (2) investigate the difference in annual ET estimation between the two methods at regional scale; and (3) improve the empirical parameters of the employed ET model for two typical vegetation types (Tamarix ramosissima and Populus euphratica) in arid ecosystems of northwestern China.

Models and Estimating Process
The model presented by Nagler et al. [11] in Table 1 (hereinafter referred to as the Nagler model) was chosen to conduct this study. The Nagler model is widely used and has a high estimation accuracy [18,20,23,43,44]. In the Nagler model, the term (1 − e −b×EVI ) is derived from the Beer-Lambert Law that has been modified to predict the absorption of light by a canopy, with −b × EVI replacing LAI, and EVI can be adjusted to use the NDVI [11]. The empirical parameters a, b, and c are obtained by regression analysis using the daily ET, ET 0 , and EVI (NDVI) datasets.
The daily ET, ET 0 , and NDVI datasets from Tamarix ramosissima and Populus euphratica in the lower reaches of the Tarim River in northwestern China were used to recalibrate the empirical parameters a, b, and c of the Nagler model. The datasets covered the period from mid-April, 2013 to late October, 2018. We divided the datasets from Tamarix ramosissima and Populus euphratica into the calibration dataset and validation dataset, respectively. Taking into account the relative integrity of NDVI and daily ET data throughout the year, we selected the 2017 data of Tamarix ramosissima (21 samples) and the 2016 data of Populus euphratica (18 samples) as their respective validation dataset, and data from the other years (60 samples for Tamarix ramosissima and 62 samples for Populus euphratica) were used as calibration dataset. We recalibrated the empirical parameters a, b, and c using the two calibration datasets from the two vegetation types in the following different ways: (1) using the combined calibration datasets from Tamarix ramosissima and Populus euphratica; (2) using only the calibration dataset from Tamarix ramosissima; and (3) using only the calibration dataset from Populus euphratica. The models corresponding to the above three cases were called combined-vegetation-type model (CVTM), specific-vegetation-type model for Tamarix ramosissima (SVTM-T), and specific-vegetation-type model for Populus euphratica (SVTM-P), respectively. Furthermore, the latter two models were collectively referred to as the specific-vegetation-type model (SVTM).
Based on the CVTM and SVTM, ET at daily, monthly, and yearly scales can be estimated for Tamarix ramosissima and Populus euphratica at the site and regional scales. We first analyzed the accuracy of daily, monthly, and yearly ET from CVTM and SVTM based on the above-mentioned validation dataset at the site scale by comparing the simulation capabilities of CVTM and SVTM-T for Tamarix ramosissima, and the simulation capabilities of CVTM and SVTM-P for Populus euphratica. Subsequently, we investigated the difference in annual ET estimation between CVTM and SVTM-T for Tamarix ramosissima, and between CVTM and SVTM-P for Populus euphratica at regional scale. Ultimately, we could investigate and compare the estimation uncertainty of CVTM and SVTM for Tamarix ramosissima and Populus euphratica, respectively.

Site Description and Measurements
The study area is in the Tarim River Basin, which lies in northwestern China and is the largest continental river basin in China with an area of 1.04 × 10 6 km 2 [45] (Figure 1). The topography of the study area is relatively flat, and the climate conditions in this region are extremely arid [46]. According to the meteorological records of the Tikanlik Weather Station, the mean annual precipitation was 33.7 mm from 1957 to 2012. The observed maximum annual precipitation was 75.7 mm in 1974 and the minimum annual precipitation was 3.4 mm in 2001. However, the annual potential evapotranspiration was as high as 2000 mm [47,48]. The narrow riparian forest in the lower reaches of the Tarim River distributes within a range of about 3 km from the river channel. Vegetation coverage is less than 0.2 in most of the riparian forest and it generally decreases with increasing distance from the river channel [14]. The vegetation types in this area mainly consist of Tamarix (Tamarix ramosissima) thicket and Populus (Populus euphratica) woodland. Both of the two vegetation communities are phreatophytes that depend almost entirely on groundwater for survival because of the low precipitation [49][50][51].

Daily ET and ET0 Data
Sensible heat flux (H), latent heat flux (LE), net radiation (Rn), and soil heat flux (G) were measured continuously at each site using the EC system. Data were collected every 30 min during the entire study period. The data processing has been described in [31] and [49] in detail.  Figure 1). The instruments have been described in detail in [31,49].

Daily ET and ET 0 Data
Sensible heat flux (H), latent heat flux (LE), net radiation (R n ), and soil heat flux (G) were measured continuously at each site using the EC system. Data were collected every 30 min during the entire study period. The data processing has been described in [31,49] in detail.
The quality of the measured EC data was evaluated using the energy balance ratio (EBR) method [52]. Due to the sparse and low canopy, the canopy storage heat and the photosynthetic energy consumption were ignored during the analyses, and the EBR was calculated using Equation (1). Our results showed that the EBR was 0.84 for the whole study period. Thus, the measured data satisfied the accuracy requirements for further analysis [52].
The energy balance was forced to close by augmenting both H and LE while retaining the observed Bowen ratio [53,54]. In order to ensure accuracy of the analysis, we did not perform gap filling, and only used data with all four components available. Hourly ET (mm h −1 ) was calculated by the following formula: where LE b is the latent heat (J m −2 h −1 ) after energy balance; L is the latent heat of vaporization of water (2.45 kJ g −1 ); and ρ w is the water density (1 g cm −3 ). The daily ET (mm d −1 ) was the sum of the hourly ET for one day. In the calculation process, when the hourly ET was negative, it was set to zero. Daily ET 0 was calculated from the meteorological data with the Penman-Monteith equation [30]. The meteorological data including daily temperature, pressure, wind speed, actual duration of sunshine, and relative humidity were from the Tikanlik Weather Station (Chinese National Weather Station number: 51765). This weather station is the closest one in the China's National Weather Stations to our study sites with a distance of approximately 30-40 km.

Landsat OLI Imagery and Processing
We used all available Landsat OLI images during the study period. A total of 127 cloud free L1T images with a 30 m pixel resolution were selected and downloaded from the Global Visualization Viewer of the United States Geological Survey (USGS) [55]. The two sites fall in an area overlapped by two adjacent Landsat scenes (paths 141 and 142, row 32), thus resulting in near weekly coverage.
Landsat L1T product processing includes a systematic geometric correction, precision correction assisted by ground control chips, and the use of a digital elevation model (DEM) to correct parallax errors due to terrain relief [56]. Therefore, we did not perform a geometric correction during the pre-processing. The original digital number (DN) values were first converted to absolute at-sensor radiances. The conversion was automatically performed by applying the OLI radiometric calibration parameters available in the ENVI (Environment for Visualizing Images) 5.3.1 software (Exelis Visual Information Solutions, Inc., Boulder, USA) to each band. Afterward, atmospheric correction was performed using its FLAASH (Fast Line-of-Sight Atmospheric Analysis of Spectral Hypercubes) module. This is a MODTRAN (Moderate Resolution Atmospheric Transmission) based algorithm that aims to reduce the extraneous path radiance affecting the pixel's at-sensor radiometry (i.e., adjacency and haze effects) while modeling the at-surface irradiance. To do so, the mid-latitude summer and sub-Arctic summer atmospheric model was used to define the water vapor amount based on a seasonal-latitude surface temperature model, and the rural aerosol model was selected to define the aerosol type. In addition, the "2-band (K-T)" option was selected for aerosol retrieval in the FLAASH module. After the atmospheric correction, the absolute at-sensor radiances were converted to a surface reflectance value for each pixel.

Derivation of NDVI
The NDVI was calculated using the following equation [57] for each image: where ρ NIR is the reflectance of near-infrared and ρ Red is the reflectance of red. After completing the NDVI calculation for all of the selected Landsat OLI images, we obtained the NDVI time series data. In order to reduce the perturbations from varying atmospheric conditions and sun-sensor-surface viewing geometries, the NDVI time series data need to be smoothed before being used [58,59]. In the present study, the Savitzky-Golay filter [60] was used to smooth the NDVI time series due to its better performance in arid ecosystems of northwestern China [61]. Two parameters must be determined when the Savitzky-Golay filter is applied to NDVI time series smoothing. The first parameter is the half-width of the smoothing window. Usually, a larger value of it produces a smoother result at the expense of flattening sharp peaks. The second parameter specifies the degree of the smoothing polynomial, which is typically set in a range from 2 to 4. A smaller value will produce a smoother result, but may introduce bias; a higher value will reduce the filter bias, but may "over fit" the data and give a noisier result [62]. In this study, we used the value of three for both parameters according to the NDVI observations. In order to comprehensively compare the simulation capabilities of CVTM and SVTM using more observed daily ET data at the site scale and to perform comparison of the annual ET estimation at regional scale, a daily-basis NDVI time series for 2017 of the Tamarix transect and for 2016 of the Populus transect were needed. Therefore, we interpolated the two NDVI time series in a way that treated each pixel individually. In the process of NDVI time series interpolation, we divided the whole year into three stages according to local phenology and the filtered NDVI time series: Stage I was from the beginning of the year to the beginning of the growing season (DOY 1 to DOY 120); Stage II was the growing season (DOY 121 to DOY 300); and Stage III was from the end of the growing season to the end of the year (DOY 301 to DOY 365 or DOY 366). Stage II was further divided into the germination Remote Sens. 2019, 11, 2856 7 of 18 and leaf expansion period (GEP, DOY 121 to DOY 160), the peak growing season (PGS, DOY 161 to DOY 260), and the leaf senescence period (LSP, DOY 261 to DOY 300) [49]. For Stage II, a quadratic function was used to fit the filtered NDVI data points with DOY, and daily NDVI could be simulated using the fitted quadratic function. For Stage I and Stage III, the daily NDVI was assigned by the corresponding average values respectively ( Figure 2). In order to ensure that the fitted quadratic curve of Stage II was as continuous as possible with the end point of Stage I and the starting point of Stage III, during the fitting process, the quadratic function was forced to pass through the first and last data points of Stage II.
Finally, we extracted the NDVI values over the footprint area (9 pixels, Figure 1e) around the Tamarix flux tower and over the footprint area (81 pixels, Figure 1f) around the Populus flux tower, respectively. The extraction was performed for both the filtered NDVI time series and the interpolated NDVI time series, and the extracted NDVI values were spatially averaged for analyses at the site scale. Figure 2 shows the comparison between the filtered NDVI and the interpolated daily NDVI throughout 2017 for the Tamarix site and throughout 2016 for the Populus site. In addition, the interpolated NDVI time series values for pixels covering the Tamarix transect (3446 pixels) and covering the Populus transect (7984 pixels) were extracted for the annual ET estimation of each transect.
at regional scale, a daily-basis NDVI time series for 2017 of the Tamarix transect and for 2016 of the Populus transect were needed. Therefore, we interpolated the two NDVI time series in a way that treated each pixel individually. In the process of NDVI time series interpolation, we divided the whole year into three stages according to local phenology and the filtered NDVI time series: Stage I was from the beginning of the year to the beginning of the growing season (DOY 1 to DOY 120); Stage II was the growing season (DOY 121 to DOY 300); and Stage III was from the end of the growing season to the end of the year (DOY 301 to DOY 365 or DOY 366). Stage II was further divided into the germination and leaf expansion period (GEP, DOY 121 to DOY 160), the peak growing season (PGS, DOY 161 to DOY 260), and the leaf senescence period (LSP, DOY 261 to DOY 300) [49]. For Stage II, a quadratic function was used to fit the filtered NDVI data points with DOY, and daily NDVI could be simulated using the fitted quadratic function. For Stage I and Stage III, the daily NDVI was assigned by the corresponding average values respectively (Figure 2). In order to ensure that the fitted quadratic curve of Stage II was as continuous as possible with the end point of Stage I and the starting point of Stage III, during the fitting process, the quadratic function was forced to pass through the first and last data points of Stage II.
Finally, we extracted the NDVI values over the footprint area (9 pixels, Figure 1e) around the Tamarix flux tower and over the footprint area (81 pixels, Figure 1f) around the Populus flux tower, respectively. The extraction was performed for both the filtered NDVI time series and the interpolated NDVI time series, and the extracted NDVI values were spatially averaged for analyses at the site scale. Figure 2 shows the comparison between the filtered NDVI and the interpolated daily NDVI throughout 2017 for the Tamarix site and throughout 2016 for the Populus site. In addition, the interpolated NDVI time series values for pixels covering the Tamarix transect (3446 pixels) and covering the Populus transect (7984 pixels) were extracted for the annual ET estimation of each transect.

Statistical Analyses
Statistical analyses were performed using the software OriginPro 2016 (OriginLab Corporation, Northampton, MA, USA). Nonlinear regression equations were fitted to data points by the least

Statistical Analyses
Statistical analyses were performed using the software OriginPro 2016 (OriginLab Corporation, Northampton, MA, USA). Nonlinear regression equations were fitted to data points by the least squares method and goodness of fit is reported as the determination coefficient (R 2 ), reduced chi-sqr values, and P-values for the regression coefficient. The reduced chi-sqr value equals the sum of the residuals squared divided by the degrees of freedom. Smaller reduced chi-sqr values represent a better curve fit. P < 0.001 was considered to be statistically significant.

Evaluation of Model Performance
The simulated daily ET values were compared with the observed values by the flux tower at each site. Model performance was evaluated with metrics that included determination coefficient (R 2 , Equation (4)), root mean square error (RMSE, Equation (5)), Nash-Sutcliffe efficiency (NSE, Equation (6)), mean error (ME, Equation (7)) and maximum error (MaxError, Equation (8)). These metrics are the most widely used by researchers to evaluate the performance of ET models [63,64].
where n is the total number of observation days; ET s,i is the simulated value during day I, ET o,i is the observed value during day I, and µ 0 and ϕ 0 are the mean of the observed and simulated values, respectively. Table 2 shows the calibration results for the CVTM, SVTM-T, and SVTM-P. The minimum determination coefficient (R 2 ) in all models was 0.809, and the maximum reduced chi-sqr value was 0.492, indicating successful calibrations for all models. Notably, the parameters varied greatly between the CVTM and the SVTM-T, and the parameters of the CVTM were closer to that of SVTM-P. The calibrated CVTM, SVTM-T, and SVTM-P were then validated using the validation dataset of the Tamarix site and Populus site. Figure 3      Overall, the SVTM-T and the SVTM-P had better performance with a higher determination coefficient (R 2 ), NSE, and relatively lower RMSE for both sites, and the CVTM showed relatively poor performance and tended to underestimate at higher values of daily ET and slightly overestimate at the low daily ET range. For the Tamarix site, the CVTM overestimated the daily ET values at the beginning of the growing season (Figure 4a). For the Populus site, the CVTM underestimated the daily ET values for the peak growing season (PGS) (Figure 4d).  Overall, the SVTM-T and the SVTM-P had better performance with a higher determination coefficient (R 2 ), NSE, and relatively lower RMSE for both sites, and the CVTM showed relatively poor performance and tended to underestimate at higher values of daily ET and slightly overestimate at the low daily ET range. For the Tamarix site, the CVTM overestimated the daily ET values at the beginning of the growing season (Figure 4a). For the Populus site, the CVTM underestimated the daily ET values for the peak growing season (PGS) (Figure 4d). Table 3 shows the comparison of the mean error (ME) and maximum error (MaxError) between the simulated daily ET from CVTM, SVTM-T, SVTM-P, and the observed values. Results showed that both ME and MaxError for SVTM-T and SVTM-P were lower than that for CVTM, indicating the better performance of SVTM-T and SVTM-P. Interestingly, for the ME, the SVTM-T and SVTM-P showed a small positive bias for both sites, but it was different for the CVTM, which manifested a small positive bias for the Tamarix site and a small negative bias for the Populus site. Table 4 shows the comparison of the total amount of ET on the monthly and yearly scales. On the monthly scale, compared with the CVTM, the SVTM-T and SVTM-P had better performance, with lower residual errors of monthly average ET values. The CVTM obviously overestimated ET of the Tamarix site from March to May, and underestimated ET of the Populus site from June to September. On a yearly scale, the CVTM had larger residual errors than the SVTM-T and SVTM-P. The residual errors of the simulated annual ET from the SVTM-T and SVTM-P were 1.50% and 3.33%, respectively, while that from the CVTM were 9.19% and −11.50% for the Tamarix site and the Populus site, respectively. Remote Sens. 2019, 11, x FOR PEER REVIEW 10 of 19  Table 3 shows the comparison of the mean error (ME) and maximum error (MaxError) between the simulated daily ET from CVTM, SVTM-T, SVTM-P, and the observed values. Results showed that both ME and MaxError for SVTM-T and SVTM-P were lower than that for CVTM, indicating the better performance of SVTM-T and SVTM-P. Interestingly, for the ME, the SVTM-T and SVTM-P showed a small positive bias for both sites, but it was different for the CVTM, which manifested a small positive bias for the Tamarix site and a small negative bias for the Populus site. Table 3. Comparison of mean error (ME) and MaxError for the CVTM, SVTM-T and the SVTM-P.

Sites
Dataset N ME (mm/d) MaxError (mm/d)  Table 4 shows the comparison of the total amount of ET on the monthly and yearly scales. On the monthly scale, compared with the CVTM, the SVTM-T and SVTM-P had better performance, with lower residual errors of monthly average ET values. The CVTM obviously overestimated ET of the Tamarix site from March to May, and underestimated ET of the Populus site from June to September. On a yearly scale, the CVTM had larger residual errors than the SVTM-T and SVTM-P. The residual errors of the simulated annual ET from the SVTM-T and SVTM-P were 1.50% and 3.33%, respectively, while that from the CVTM were 9.19% and −11.50% for the Tamarix site and the Populus site, respectively.

CVTM SVTM-T SVTM-P CVTM SVTM-T SVTM-P
These results are complementary and indicated that distinguishing vegetation types can improve the accuracy of ET estimation at the site scale, and that the combination of datasets from different vegetation types introduced obvious errors at the beginning of the growing season of Tamarix ramosissima and at the peak growing season of Populus euphratica.  Table 3. Comparison of mean error (ME) and MaxError for the CVTM, SVTM-T and the SVTM-P.

Sites
Dataset N ME (mm/d) MaxError (mm/d) These results are complementary and indicate that distinguishing vegetation types can improve the accuracy of ET estimation at the site scale, and that the combination of datasets from different vegetation types introduced obvious errors at the beginning of the growing season of Tamarix ramosissima and at the peak growing season of Populus euphratica. Figures 5 and 6 show the difference in annual ET estimation between CVTM and SVTM for the Tamarix transect and the Populus transect, respectively. Overall, the annual ET estimated by CVTM and SVTM-T for the Tamarix transect (Figure 5b,c) and estimated by CVTM and SVTM-P for the Populus transect (Figure 6b,c) showed similar spatial patterns. The annual ET decreased with decreasing vegetation coverage.   However, substantial difference in annual ET estimation was revealed between CVTM and SVTM for both transects (Figures 5d and 6d). For pixels over the Tamarix transect (Figure 5d), the difference in annual ET (CVTM−SVTM-T) ranged from −333.94 to 160.31 mm yr −1 . For pixels over the Populus transect (Figure 6d), the difference in annual ET (CVTM−SVTM-P) ranged from −200.42 to 11.87 mm yr −1 . For pixels in the bare soil area, the annual ET difference of both the Tamarix transect and the Populus transect were positive, indicating that CVTM overestimated annual ET of bare soil area when compared with the SVTM. For pixels in the vegetation area, the higher the vegetation coverage, the greater the difference in annual ET between CVTM and SVTM. The annual ET difference for pixels in the vegetation area of the Populus transect was negative, suggesting that CVTM underestimated annual ET in the vegetation area of the Populus transect compared with SVTM-P. Notably, for pixels in the vegetation area of the Tamarix transect, the annual ET difference was negative for pixels in the upper right corner with higher vegetation coverage, while positive for the others with lower vegetation coverage. These characteristics of the annual ET difference for the vegetation area of the Tamarix transect suggest that compared with SVTM-T, CVTM overestimated annual ET in area with lower vegetation coverage and underestimated annual ET in area with higher vegetation coverage. These results indicate that for the area with higher vegetation coverage of Tamarix ramosissima, the underestimation of high daily ET values by CVTM had greater impact on the annual ET than the overestimation of low daily ET values by CVTM (Figure 4b), and that was opposite for the area with lower vegetation coverage.

Effects of Distinguishing Vegetation Types
The estimation accuracy of ET was improved by distinguishing vegetation types at the site scale in our study. The SVTM-T and SVTM-P achieved higher accuracy for the ET estimation of Tamarix These results revealed that the empirical daily ET model built on the site scale played the dominant role in the accuracy of ET estimation, and that a minor difference in daily ET estimation at the site scale can lead to a large difference in annual ET estimation at regional scale.

Effects of Distinguishing Vegetation Types
The estimation accuracy of ET was improved by distinguishing vegetation types at the site scale in our study. The SVTM-T and SVTM-P achieved higher accuracy for the ET estimation of Tamarix ramosissima and Populus euphratica, respectively, than the CVTM at the daily, monthly, and yearly scales. Similarly, Nagler et al. [7] constructed empirical remotely sensed ET model using datasets from five vegetation types (cottonwood, mesquite, saltcedar, giant sacaton, and arrowweed) over the western U.S. rivers, and then estimated ET for individual vegetation type using the constructed ET model. They reported that the simulated and observed average daily ET during the growing season were within 25% of the 1:1 line for all vegetation types, except for the arrowweed, for which simulated average daily ET was 40% higher than the observed values. Nouri et al. [43] compared the ET estimation of the Nagler model with a detailed soil water balance analysis in an urban parkland of Australia, which was fully covered by kikuyu turf grasses and more than 60 species of trees and shrubs. They noted that the estimates of the Nagler model was not accurate at the monthly scale, but errors were cancelled out to give good agreement on an annual time step. Moreover, compared with the validation results of SVTM in this study, the empirical ET models constructed based on combined vegetation types had relatively poor validation accuracy. For example, Nagler et al. [10] reported an error or uncertainty of about 20% in the average daily ET (6.2 mm d −1 with RMSE = 1.2 mm d −1 ); Nagler et al. [11] reported that the average error of annual ET was 5.5%, ranging from 2.9% to 9.3% across sites; Glenn et al. [13] reported validation results with R 2 = 0.72. Nevertheless, Oliveira et al. [44] recalibrated the Nagler model using two year of data from a cerrado woodland site (arborous cover is about 50-70%) in South America and validated it using another year of data. They reported that the recalibrated model showed significant agreement with the observed ET at the daily, monthly, and yearly scale. Therefore, combined vegetation types produced biased ET estimation.
The CVTM overestimated ET of 2017 about 9.19% for Tamarix ramosissima and underestimated ET of 2016 about 11.50% for Populus euphratica at the site scale, mainly at the beginning of the growing season of Tamarix ramosissima and at the peak growing season of Populus euphratica, respectively. Furthermore, for annual ET at regional scale, there was considerable difference between CVTM and SVTM. Compared with SVTM-P, the CVTM underestimated annual ET for vegetation areas of the Populus transect. Compared with SVTM-T, the CVTM overestimated annual ET in areas with lower vegetation coverage and underestimated annual ET in areas with higher vegetation coverage for vegetation areas of the Tamarix transect. These unquestionably resulted from the model parameters calibrated using the combined datasets from different vegetation types, which led to biased daily ET estimation at the site scale and ultimately exaggerated for annual ET estimation at the regional scale. In addition, Nagler et al. [7] estimated ET of five vegetation types with different water-consumption characteristics using a model based on combined vegetation types. Their estimation results indicated that the average daily ET of cottonwood with high water-consumption was underestimated, and that of arrowweed with low water-consumption was obviously overestimated. Therefore, when using these empirical remotely sensed ET models to estimate regional scale ET for highly heterogeneous riparian zones on a large time scale, the CVTM may underestimate ET for high water-consumption vegetation (for example, Populus euphratica) and tend to overestimate ET for low water-consumption vegetation (for example, Tamarix ramosissima).
After distinguishing vegetation types, the SVTM-T and the SVTM-P obtained very high estimation accuracy at the site scale. However, both the mean errors (ME) of daily ET (Table 3) and the annual ET from SVTM-T and SVTM-P (Table 4) showed positive bias. Likewise, Nagler et al. [11] reported an average bias of 5.5% (ranging from 2.9% to 9.3% across sites) for annual ET estimation. Nouri et al. [43] reported that the annual ET estimated by the Nagler model had a bias of 4 mm yr −1 compared with a detailed soil water balance analysis. Thus, this may be a systematic bias determined by the structure of the Nagler model and needs to be further tested.

Feasibility of the Application of SVTM
Essentially, the Nagler model is a modification of the crop coefficient method [6]. The crop coefficient approach assumes that the plant is growing under unstressed conditions [29]. In the Nagler model, crop coefficient was replaced by a satellite-derived VI that provides information about the actual status of the canopy at the time of measurement. VI-based methods cannot detect early signs of plant moisture stress [29], which added scatter and uncertainty into the ET estimation [6]. In arid ecosystems, plant transpiration dominates ET, with weak and negligible soil evaporation due to the scant rainfall [6,49]. Moreover, the phreatophytes mainly obtain water from the capillary fringe and groundwater [50,51], thus they have a relatively constant transpiration rate that related to groundwater table depth and salinity [6]. Therefore, these features in arid ecosystems provide good conditions for the application of these empirical remotely sensed ET models. In this study, the very high estimation accuracy of the SVTM-T and SVTM-P at the site scale seems to be related to the suitable hydrological conditions of the two vegetation types in the study area. Yuan et al. [31] reported that the groundwater table depth is a decisive indicator of hydrological conditions in the study area. According to the reported thresholds in [65], the groundwater table depth during the study period (see Section 2.2) is suitable for the growth of vegetation in the lower reaches of the Tarim River.
When the SVTM is applied to other arid regions, the necessary prerequisite is to distinguish different vegetation types. Vegetation in arid regions is relatively sparse and their spatial structure is usually simple. As a result, it is easy and feasible to distinguish different vegetation types by using remote sensing images [41,66,67], and some effective classification methods with high accuracy have been proposed [68,69]. Therefore, the estimation method based on specific vegetation type proposed in this study is feasible in practical applications. It should be noted that it is also necessary to recalibrate the empirical parameters of these remotely sensed ET models for the specific vegetation type of concern.
The number of remote sensing images available is crucial to the application of these empirical remotely sensed ET models because it is the basis for achieving high-quality NDVI time series data. Ruhoff et al. [70] reported that the vegetation index (NDVI) was the most important item among the main inputs for ET estimation in the SEBAL algorithm. In this study, the Nagler model was used to estimate ET based on the meteorological data and the vegetation indices (NDVI), which were derived from Landsat OLI images. The Nagler model is an empirical ET model, which was constructed by directly relating ET to vegetation indices (VIs) and meteorological data, and thus it is different from the SEBAL algorithm that estimates ET using the energy balance equation. In addition, although we interpolated the NDVI time series during the comparison between CVTM and SVTM, we still obtained a satisfactory ET estimation (Figure 4). This may be related to the NDVI observations we used for interpolation. Since the study area is located in the overlap of two adjacent images, we had at least one NDVI observation every month throughout the year (Figure 2). Therefore, the number of images available seemed to have little impact on the ET estimation in this study. Nevertheless, the 16 day revisit cycle of the Landsat series satellites (e.g., TM/ETM+/OLI with 30 m spatial resolution) and frequent cloud contamination in some regions limit their application for frequent ET estimation using these empirical remotely sensed ET models [7]. On the other hand, MODIS (MODerate-resolution Imaging Spectroradiometer) has a high temporal resolution (1 day) with the spatial resolutions of 250 m, 500 m, and 1000 m. However, it is difficult to capture spatial details necessary for monitoring land cover and ecosystem changes in heterogeneous areas, especially for narrow riparian corridors [71]. Hence, in the process of ET estimation, appropriate spatial-and temporal-resolution NDVI data should be selected according to the research needs.

Conclusions
In this study, using the Nagler model [11] and the datasets from two typical vegetation types (Tamarix ramosissima and Populus euphratica) in arid ecosystems of northwestern China, we investigated the effects of distinguishing vegetation types on the estimates of remotely sensed ET at site and regional scales.
By distinguishing vegetation types, the accuracy of ET estimation was improved. The combined vegetation types introduced errors in the ET estimation and had a great influence on the estimation accuracy of annual ET. Furthermore, CVTM and SVTM had a substantial difference in annual ET estimation at the regional scale, and the higher the vegetation coverage, the greater the difference in annual ET between the CVTM and SVTM. In addition, the SVTM-T and SVTM-P had very high estimation accuracy of daily, monthly, and yearly ET at the site scale, therefore, the calibration parameters for SVTM-T and SVTM-P in Table 2 are recommended for the ET estimation of Tamarix ramosissima and Populus euphratica in arid ecosystems of northwestern China. At the same time, the estimation method based on specific vegetation type proposed in this study is feasible in practical applications in other arid ecosystems.
Further study is recommended to investigate the impacts of distinguishing vegetation types on the accuracy of ET estimation at the site scale using data from more sites under different vegetation types and to indirectly evaluate the regional scale ET estimation with the aid of the groundwater model using the measured groundwater level [72,73]. Acknowledgments: The authors appreciate the help from Pei Zhang, Xiaobo Yi, and Xuchao Zhu during field data collection. Special thanks go to Rui Sun for his help in the processing of remote sensing images. We are also grateful to all of the reviewers for their invaluable comments and suggestions, which contributed to improving the manuscript.