Remote Sensing of Evapotranspiration over the Central Arizona Irrigation and Drainage District, USA

: Knowledge of baseline water use for irrigated crops in the U.S. Southwest is important for understanding how much water is consumed under normal farm management and to help manage scarce resources. Remote sensing of evapotranspiration (ET) is an effective way to gain that knowledge: multispectral data can provide synoptic and time-repetitive estimates of crop-speciﬁc water conducting multi-model experiments during summer-months over sites with independent ground validation. A.N.F.; Validation, A.N.F., D.J.H.; Formal Analysis, A.N.F. and D.J.H.; A.N.F., D.J.H., L.B. and A.K.; Resources, A.N.F., D.J.H., L.B., A.K., W.E.L. and R. Strand; Data Curation, A.N.F. and D.J.H.; Writing—original Draft Preparation, A.N.F.; Writing—Review and Editing, A.N.F., D.J.H., L.B. and A.K.; Visualization, A.N.F.; Supervision, A.N.F.; Project Administration, A.N.F. and A.K.; Funding Acquisition, A.N.F., L.B. and A.K.

remote sensing data, while its chief drawbacks are needed for crop-specific calibrations and relative insensitivity to rapid changes in vegetation water status. Use and interest in vegetation index (VI) models to estimate are widespread for improving otherwise standardized ET estimates [19,20].
Site selection to test the models was also based on practical matters, namely identifying sites growing familiar crops and managed using practices familiar to the authors. Additionally, it was important that the site be spatially extensive, i.e., encompass areas large enough to include 10's to 100's of fields, and that it contribute significant regional economic impact. The chosen site was the Central Arizona Irrigation and Drainage District (CAIDD) a 35,450 ha, agricultural region in South Central Arizona, USA. Typical crops at CAIDD have been cotton, small grains, forage, and vegetables. The useful attributes for this site are long growing seasons, non-deficit irrigation, hot and mostly dry summers, and ample clear skies for remote sensing. CAIDD made water delivery data for 2008 available, hence that year became the assessment period. Crops modeled for ET were durum wheat, cotton, and alfalfa in plots ranging between 1.4 and 313.4 ha.

The Central Arizona Irrigation and Drainage District (CAIDD)
The CAIDD district is located in South Central Arizona and is a triangularly shaped region ∼35 km North-South and ∼26 km East-West at its southern boundary Figure 1. CAIDD lies 90 km to the southeast of Phoenix, and 80 km northwest of Tucson. CAIDD water deliveries were ∼339,207,000 m 3 /year (ed4.biz). Irrigation water is supplied in approximately equal portions from ground water wells (350), and Colorado River water via the Central Arizona Project (CAP). Water deliveries within CAIDD are conveyed by three main canals: Santa Rosa, Central Main, and South Main. On-farm deliveries are provided by canal turnouts; in 2008, there were 327. Per agreement with CAIDD directors, water volumes and release dates at each turn-out during 2008 were provided to the authors, but with the condition that proprietary water use information not be disclosed. The nearest weather station data for CAIDD was provided by the Coolidge AZMET [21] station, approximately 50 km to the North. Using a CAIDD-provided map and imagery from Google Earth TM (mention of trademarked or commercial products in this manuscript is for reader convenience, USDA provides no endorsement or recommendations for these products), CAIDD canals and turn-out locations were digitized and incorporated into an in-house GIS database. Using Google Earth TM imagery and Landsat 5 (TM5) scenes, boundaries of 754 active cropping areas were digitized as shape files within the ArcGIS editor environment. Considering the objective to assess average crop evapotranspiration across the district, a small sample (∼30) would likely be sufficient for analysis; however, a larger sample size was needed to offset errors in a multi-cropping environment where planted areas change shape and size seasonally. Thus, this study identified defined crop areas for most of the 2008 season at CAIDD. Using the USDA Cropland map for Arizona 2008 [22], crop mask files for wheat, cotton, and alfalfa were generated at its source 30 m resolution. Estimated crop classification accuracy exceeded 80%. Dominant crops at CAIDD include wheat, cotton, alfalfa, corn, and sorghum, with several combinations occurring within a year. Using the total area of digitized cropping areas, ∼22,000 ha (54,305 acres), and provided water delivery for 2008 of 374,506,700 m 3 (303,618 ac-ft), average irrigation water depth was 1700 mm (5.6 feet). Crop varieties grown at CAIDD in 2008 were not available.

Remote Sensing Data
This study evaluated ET over the CAIDD considering five main sources of remote sensing data collected in 2008: Landsat TM5, Landsat TM7, ASTER, MODIS Terra and Aqua. Additionally, the study implicitly used Resourcesat-1 (IRS-P6) and AWiFS via USDA Cropland crop cover maps.
The core data for the study were 22 scenes from TM5 (Table 1). Data were downloaded from the United States Geological Survey (USGS) Explorer website (https://earthexplorer.usgs.gov) as GeoTIFF files; geo-positioning errors relative to well-defined ground-reference structures such as irrigation canals were small and usually less than the nominal 30 m pixel size. In addition, 23 TM7 were also assessed, but, due to missing scan lines and inconsistent reflectance relative to TM5, their use was limited to assessment of land surface temperatures. Table 1. TM5 data used for the 2008 CAIDD study. Quality, representing scanline integrity, clouds, sky cover, solar elevation, and solar azimuth were provided by USGS. Weather data: 2 m air temperature (Tair), relative humidity (RH), wind speed (U), incoming solar radiation (Rsolar), and relative solar radiation (Rsolar/Clear Sky Solar Radiation; [23]), were derived from the Coolidge AZMET station https://cals.arizona.edu/azmet/az-data.htm. Normalized Difference Vegetation index (NDVI) maps were derived from 30 m red and near infrared (bands 3 and 4, respectively) obtained from the USGS LEDAPS data collection [24]. LEDAPS data are atmospherically corrected land surface reflectance, obviating the need to independently perform atmospheric corrections in the visible and near infrared bands. Land surface temperature (LST) maps were derived from 120 m TM5 band 6 and 60 m TM7 band 6 and resampled to 30 m by USGS (https://landsat.gsfc.nasa.gov/landsat-4-7-thermal-data-to-be-resampled-to-30meters/). Note that the spatial resampling to 30 m greatly simplifies the modeling process but does not address the inability of TM to resolve temperature variations at scales less than 120 m. The impact on ET estimates for large fields would be minimal, but for small fields ET effects are significant due to mixed pixels. Bare soil would be over-estimated, while full-cover ET would be under-estimated. Considering irrigated settings, the only reliable way to address this problem is to provide higher resolution sensors, but data fusion of thermal infrared TIR with VNIR data provides a meaningful way of downscaling coarse resolution LST data in the absence of such sensors [25]. In this CAIDD study, the majority of fields exceeded 100 m extents and data fusion was not used. LST values were computed by converting TM top-of-atmosphere radiances (L sensor,λ ) to land surface radiances (L sur f ,λ ) via Equation (1):

Index
The spectrally dependent atmospheric correction terms are τ λ (transmissivity), L ↑ λ (upwelling atmospheric radiance), L ↓ λ (downwelling radiance), λ (surface emissivity, 0.97 assumed), all obtained by using radiosonde atmospheric profile data from the National Oceanic and Atmospheric Administration (NOAA) FSL Tucson site [26] and the radiative transfer modeling package MODTRANv5 [27] (Spectral Sciences, Inc., Burlington, MA, USA). To convert L sur f into LST, a temperature-spectral radiance lookup table was used. This table was created by solving the Planck blackbody function for 0.2 µm steps, each weighted by the corresponding TM5 spectral response function F: Both TM5 and TM7 were downloaded, merged and geopositioned. Although TM data are nominally georegistered, on occasion, large positioning errors occurred and had to be corrected. Use of both TM5 and TM7 would provide 8-day samples under clear sky conditions. Unfortunately, two factors made use of TM7 difficult: inconsistent reflection patterns with respect to TM5, and the well-known gaps due to the broken scan line corrector. While the gaps could be accommodated when developing plot averages, reflection inconsistencies were troublesome. For relatively slowly developing crops such as cotton, reflection anomalies could be readily identified, but, for highly dynamic alfalfa cropping systems, where cuttings occur at intervals of ∼6 weeks, uncertainties were high as to whether the TM7 variations were real or artifacts.
To help validate LST data obtained from TM sensors, 7 clear to mostly clear, L1B ASTER [28] scenes covering all or parts of CAIDD were downloaded ( Table 2). Data from its 5 thermal infrared bands were processed in a similar way to that shown in Equations (1) and (2), but with additional steps to refine the surface temperature and emissivity estimates via an approach known as 'normalized emissivity' (NEM, [29]). By considering temperatures observed in different parts of the 8-11 µm thermal infrared window, one is able to improve LST accuracy over estimates provided by a single thermal band such as for TM5 and TM7 [30]. ASTER is mounted on the sun-synchronous Terra platform and has similar overpass times of day as for TM5 ( 18:20 vs. 17:46 UTC), though with different overpass days. This means that, while direct LST validations were not possible, it was reasonable to expect temperature differences to be on the order of a few • C. Lastly, 250 m red/near infrared MODIS from both the morning overpass platform Terra, and the afternoon overpass platform Aqua were downloaded and incorporated into the ET analysis routines. The relatively coarse spatial resolution of MODIS would normally preclude its consideration for this study-typical field sizes at CAIDD are ∼180 m-these data sets have superior temporal resolution relative to TM5 and may provide improved time precision of cropping activities such as planting and harvesting . 47 MOD09Q1 and 46 MYD09Q1, weekly composited, red/NIR surface reflectance data were downloaded and included in the CAIDD data set. Map tiles, sourced in sinusoidal grid, were re-projected to UTM zone 12 coordinates as GeoTIFF files using the MODIS Reprojection Tool (MRT, [31]).

Evapotranspiration Models
To assess the efficacy of different remote sensing approaches for estimating spatially distributed ET, three different models were implemented: the Satellite-Based Energy Balance for Mapping Evapotranspiration with Internalized Calibration (METRIC [32]), the Two-Source Energy Balance (TSEB [16]), and Evapotranspiration Mapping with Vegetation Index ( [17,18]). Summaries of and deviations from their standards are described below; complete and authoritative details for METRIC and TSEB can be found in the source citations. Implementation by the authors of METRIC and TSEB has been reported previously [10], while implementation and comparisons with VISW are new. To provide context for model estimates, three standardized, non-remote sensing ET estimates were included: Penman-Monteith reference evapotranspiration, ET0 [33,34]; FAO-56 ET ( [15] and referenced below in Section 2.3.3); and USDA-SW ET [35], crop specific consumptive water use estimates derived from gravimetric sampling at 1-foot intervals over multiple seasons within the Salt River valley of Central Arizona. The models obtain ET using distinctly different approaches: • METRIC: primarily relies upon spatial contrast within LST images, solves the surface energy balance, and resolves rapid changes in crop water status. • TSEB: uses explicit LST, air temperature data, and distinguishes between soil and canopy biophysical properties, solves the surface energy balance, and resolves rapid changes. • VISW: uses VI and local weather observations to estimate crop coefficients, potential ET (ET • ), and ET itself, without solving the surface energy balance, and only resolves changes over several days.

METRIC
ET estimation with METRIC is based on the assumption of surface energy balance: where ET is represented in its energy flux form, latent heat (LE), and equal to net radiation (R n ), minus soil heat flux (G) and sensible heat flux (H). Units for all components are W/m 2 ; sign convention is positive R n for incoming radiation, and positive G, H, LE for fluxes away from the soil/canopy surface.
METRIC estimates R n using satellite-based reflectance and thermal infrared data [32,36]: where the radiation components-all estimated using remotely sensed observations-are downwelling shortwave radiation (R ↓ s ), surface albedo (α), downwelling longwave radiation (R ↓ l ), upwelling longwave radiation (R ↑ l ), and land surface emissivity ( sur f ). METRIC estimates of G were derived from a SEBAL empirical relationship with surface albedo and NDVI [37]: where T sur f is surface temperature in Celsius. The CAIDD study obtained R n and G using Equations (4) and (5) and parameters as described in [32]. The remaining term on the right-hand side of Equation (3), H, is solved using METRIC's distinctive feature of representing vertical temperature gradients between the surface soil/vegetation canopy and the overlying atmosphere in terms of horizontal temperature gradients. The usual resistance equation for H is described: where ρ a is density of moist air, c p is specific heat of the moist air, ∆T is the air temperature gradient between the surface and overlying air, and r a is a resistance term representing the effectiveness of heat transport between the surface and overlying air. The crux of remotely sensed surface energy balance models is how they resolve ∆T and r a . In some models, such as TSEB (described below), ∆T is determined from radiometric surface temperature and ambient air temperature. The r a term is estimated from models of surface roughness, wind speed, and atmospheric stability. Following the SEBAL method, METRIC solves for these terms indirectly by selection of so-called 'cold' and 'hot' pixels that are presumed to represent reference H end-members. Knowing H for the extreme conditions allows estimation of H fluxes at all other pixels. These reference pixels are respectively denoted H cold and H hot , and are used together to compute the coefficients needed to convert LST observations into an apparent ∆T: where the parameters a and b are solved used the selected hot and cold pixels: ∆T hot and ∆T cold are temperature gradients derived from solutions of Equation (3) using reference conditions and observations. This approach simplifies and potentially increases accuracy in ET computations because biases due to errors in land surface temperature (LST) estimation are canceled by the hot/cold differencing. Thus, a reference cold pixel is assumed to represent ET at a standardized rate,1.05 * ET r , a rate typically corresponding to either a short (0.12 m) or tall (0.5 m) crop ET as defined in [23]; the additional 5% above reference ET accommodates conditions with evaporation at the soil surface. The surface energy balance equation for cold pixels becomes: where LE cold is the energy flux equivalent of 1.05 ET r . At the 'hot' pixel, METRIC allows residual evaporation based on antecedent surface moisture, and follows the pattern in Equation (10). For this study, however, antecedent moisture was considered unknown and difficult to accurately model. Thus, the traditional SEBAL approach was used, where LE was assumed to be zero: Having solved H, R n , and G, Equation (3) could then be solved, resulting in an estimate of instantaneous LE flux. Options to transform this estimate into daily ET include the use of diurnal incoming solar radiation [38] or use of a reference quantity, for example by assuming a constant evaporative fraction [39], i.e., presumption that the ratio LE R n −G is constant over most of the day. These options have been reviewed and discussed in [40]. The METRIC approach, and in this study, also used for TSEB, is to compute a reference ET fraction (ET r F): where ET inst in instantaneous ET (mm/s) computed: where λ v is latent heat of vaporization of water (J/kg) and ρ w is density of liquid water (kg/m 3 ).
Daily ET (ET d ) is then computed by multiplying ET r F by the cumulative ET • : A significant obstacle for wide deployment of METRIC is the stated need for trained experts knowledgeable in surface energy balance. This requirement not only limits the number of practitioners, but also introduces subjectivity and non-reproducibility of ET results [41]. This study assessed the need by implementing an alternative objective and reproducible approach based on the hypothesis that quantile selection of hot and cold pixels can return meaningful ET reference locations. The implementation is similar to that previously described in [41] who demonstrated the approach over almonds in a California study. Given a 30 m resolution TM scene, choose as cold reference pixels all those within 95-99% Normalized Difference Vegetation Index (NDVI) quantiles and also with 1-10% LST quantiles as illustrated in Figure 2. While the quantile selection could be fully automated, subjectivity when choosing quantiles was retained in order to balance the need for large (>1000) sample sizes while minimizing outliers. Next, for each selected cold pixel, neighboring pixels within a fixed search radius were scanned for maximal LST. The radius was set to 300 m, a value to approximate the typical CAIDD field size × 2 to improve the probability of finding bare soil pixels. Third, using all previously selected cold/hot value pairs, median hot and cold pixel values were computed and used to estimate the a and b coefficients in Equations (8) and (9). LST vs. NDVI two-dimensional histograms using TM7 data. Cold-pixel selection thresholds are illustrated using TM7 data from 21 June 2008 over CAIDD. Using co-registered LST and NDVI images, all pixels with the 95-99% NDVI quantile range and also within the 1-10% LST quantile range were included in the 'cold' pixel sample set.

TSEB
The TSEB model is a thermal-based remote sensing ET approach that is distinguished by its partitioning of surface fluxes into separate transpiration and evaporation components. This partitioning allows for differing transport resistances between soil, canopy, and atmosphere and accommodates use of radiometric instead of aerodynamic temperature. The separate components are obtained by remote sensing of fractional cover, where canopy and soil fractions are respectively f and 1 − f . In this study, f was obtained from NDVI observations following [42,43]: where NDV I max is the NDVI value corresponding to full canopy cover, and NDV I min the value for bare soil. ξ is a leaf angle distribution function parameter, ranging between 0.8 for planophile to >1.4 for erectophile canopy geometry. For the wheat and alfalfa plots, ξ was set to 1.4, and, for cotton, 1.0 was used. TSEB modeling of the canopy requires estimation of leaf area index (LAI) to obtain net radiation values using [42]: where β is a leaf orientation parameter. Due to uncertain calibration of Equation (16), this study used β = 0.67, which models randomly oriented leaves. For row crops, LAI estimates need to be reduced to adjust clumped vegetation at sub-pixels scales [44][45][46]. Lacking details for cotton plots at CAIDD, a nominal clumping factor of 0.85 was used. No clumping factor was applied to wheat or alfalfa plots.
Having solved f and LAI, the implementation of TSEB used weather data from the AZMET Coolidge station (with the exception for air temperature as noted below) and computed soil and canopy R n values. In an iterative solution-required to solve atmospheric stability equations-the soil heat flux (H soil ) was computed: where ∆, T is the soil-air temperature difference, and a single resistance term (used by METRIC) is replaced by two: one for the canopy-air transport, another for soil-air transport. A third resistance term (r x ) to regulate fluxes between soil and canopy is added to create a series resistance formulation (see [16] for details). Thus, soil evaporative heat flux (LE soil ) is computed by residual within the soil fraction: For the canopy flux fraction, TSEB models transpiration at close to potential rates as estimated by the Priestley-Taylor parameter α: where f G is the fraction of vegetation that is green (assumed to be 1.0 in this study), and R n,canopy is net radiation at the canopy. Complementary to the soil flux fraction, TSEB estimates canopy heat flux (H canopy ) by residual: when solutions of TSEB lead to negative LE canopy values, an indicator of condensation, the α term is adjusted until LE canopy equals or exceeds zero. Having obtained component latent fluxes, the total surface LE is: To obtain ET, LE fluxes from the soil and canopy sources are summed and then converted as noted in Equation (13). For the CAIDD study, the series network TSEB was implemented using methods in [16] with two exceptions. The first was to adopt median cold pixels (as computed for METRIC) for the near surface air temperature. This approach has been used in other contexts [47] and helps accommodate the fact that the nearest air temperature observations lay 50 km away from CAIDD and might have been significantly different. The second amendment was to set plant heights for wheat, cotton, and alfalfa to nominal heights as indicated in Table 3 and surface roughness over bare soil of 0.02 m. These simplifications were imposed because heights were unknown but approximations are needed to run TSEB.

VISW
The Vegetation Index for the South Western US (VISW) model is an empirical ET estimation approach based on experimental data collected in Arizona [17,18,[48][49][50], where wheat, cotton, and alfalfa crops were grown under controlled irrigation regimes and frequently monitored with proximal sensing of NDVI. The approach aims to use remotely sensed vegetation indices as a proxy for the crop coefficient k cb , thereby replacing standardized values with real-time observations. Use of empirically constrained vegetation indices to derive crop coefficients is a concept pursued previously by several researchers, including [14,[51][52][53][54][55]. The original concept for a crop coefficient is a single scaling factor: where ET • is potential evapotranspiration. In the FAO-56 [15] approach, the crop coefficient is re-partitioned into transpiration and evaporation components: where k cb , a basal crop coefficient, represents transpiration, while k e , represents surface evaporation component. The simplicity and efficacy for using crop coefficients to model, forecast, and manage crop water requirements have been noted for many years, notably documented in [15,51,56]. VISW in this study did not include soil evaporation (k e ) because doing so would require knowing irrigation dates at plot scales, information unavailable. Consequences for the simplification depends upon irrigation practices, crop type, crop stage, and time of year. The greatest impact could be low total ET estimates of 10-15% on weekly time steps (Hunsaker personal communication). However, the underestimation is less during winter or where crop canopy is full. Examples of k cb coefficients are illustrated for wheat, cotton, and a hypothetical alfalfa crop with six cuttings: A chief shortcoming of the crop coefficient methodology is its need for local calibration [15,56], which has led to suggestions by [53,54,57,58] that remote sensing observation of vegetation, via an index such as NDVI, could greatly alleviate this constraint. Adoption of VI-based ET estimation is becoming well-known and to a limited extent, commercialized or facilitated by governmental agencies at local (Crop Circle (Holland Scientific, Lincoln, NE); Greenseeker (Trimble, Sunnyvale, CA)) and regional scales (CIMIS California TOPS Satellite Irrigation Management Support (SIMS), [6], https: //ecocast.arc.nasa.gov/simsi). Vegetation indices are inherently slow reactors to changing surface conditions such drought or delayed irrigation and hence are not ideal for monitoring, forecasting or managing crop water use under water deficits. Thus, while future water shortfalls may change operations, we believe few farms in the US Southwest practice deficit irrigation. This means that the VI approaches calibrated under standard conditions could be a viable water management tool.
In this study, ET • is derived from the tall-crop (0.5 m) standardized ET equation as presented in [23]: where the notation is changed for convenience in this manuscript from 'ET SZ ' in [23] to the potential ET notation 'ET • '. The new terms are γ for the psychrometric 'constant', e s for saturation vapor pressure, e a for ambient vapor pressure, u 2 for 2-m wind speed, and numerator/denominator constants C n and C d accommodating different computational time steps and crop heights (citation Table 1 is partially reproduced here as Table 4). Computation of ET • in Equation (24) is done by following the procedures in [23] for hourly time steps. To provide a basis for comparison of ET results between models, the FAO-56 parameters for wheat, cotton, and alfalfa (Table 5) were defined based on documentation and common planting date practices in Central Arizona. The FAO-56 methodology is illustrated for wheat and cotton in Figure 3, but not for alfalfa: it is a multi-year crop with multiple and difficult-to-predict cutting times.   Transformation of vegetation indices to k cb is done empirically based on replicated field experiments of different crops and reflectance measurements with accurately calibrated radiometers or imagers. Due to variations in sensor spatial resolution, spectral band placement, sensor calibration, atmospheric clarity, and soil moisture conditions, raw NDVI values were not consistent between experiments without a normalization procedure. For the calibrations used in this study, normalization re-scales original NDVI values to lie between 0 and 1: where TM5 NDVI values were re-scaled to new values (ND * ) that lie between empirically chosen minimum (L l ) and maximum (L u ) limits, roughly defined as bare-soil and full-cover. For the three crops considered in this study, the transformation equations used are polynomials as shown in Figure 4. For wheat, a cubic equation [18,50] is used to accommodate significantly increased ET while NDVI approaches saturation: For cotton, a quadratic was adequate for experimental results [17]: For alfalfa, a quadratic was also used (Hunsaker, personal communication, based on experiments reported in [48]): NDVI normalization is needed to ensure that the full range of crop coefficients can be represented while viewing a range of different background soil colors and moisture [18]. To avoid biasing results based on local conditions, this study adopted a quantile selection approach, where all plot-mean NDVI values were aggregated by crop type-Wheat, Cotton, Alfalfa-and their resulting distributions were modeled with Beta functions. The Beta function is specified by two shape parameters, by convention identified as α and β. Because the Beta distribution domain spans 0-1, NDVI values were transformed from a nominal range found for the NDVI 2008 CAIDD data set, 0.2 to 0.85, to 0.0 to 1.0. This transformation step simplifies the function fitting process. Using empirical histograms for each crop, Beta shape parameters were determined in multiple steps. Beta shape parameters were computed by first determining sample means and variances, and then estimating α and β terms via the method of moments (Equation (29)): These estimates initialized their final determination with the maximum likelihood method, implemented via the R 'fitdistr' function found in the MASS package. Quantiles at the 10% and 90% probability levels were then computed and transformed back to source NDVI equivalents. Selection of these levels was subjective but not arbitrarily chosen: probabilities spanning a larger range of 5% and 95% were found to produce anomalously low crop coefficient values for wheat and alfalfa, while probabilities spanning a lesser range fail to reproduce representative crop coefficients for sparse cover. The normalization procedure is illustrated for wheat crops ( Figure 5). Using instances with early January planting (green lines, Figure 5A), an empirical histogram was generated ( Figure 5B), bins were re-scaled to lie between 0 and 1, a Beta distribution function was fit, quantiles at 10% and 90% probabilities identified, then indices transformed back to original NDVI space by inverting Equation (25). After the inversion, the recomputed quantiles are respectively the values for NDV I l and NDV I u . Resulting NDVI normalization parameters are shown in Table 6.

Results
In order to evaluate ET results, data quality of input weather and remote sensing needed to be assessed as they have a direct impact on the potential accuracy and variability of ET.

CAIDD Air Temperature
Accurate near surface air temperature is required by all three ET models, either as a constraint defining ET • or as a boundary condition for estimating sensible heat flux values. In 2008, no weather station was deployed within CAIDD, and the nearest station was at Coolidge [59]. To address this shortcoming, two approaches were adopted. For the thermally driven models (METRIC and TSEB), median air temperature maps were created from averaged LST values extracted from cool, well-watered vegetated canopy pixel values as identified from NDVI images. The aim was to achieve air temperature estimates within ±2 • C for every modeled satellite overpass time. Considering all available TM5 and TM7 data (respective spatial resolutions of 120 m and 60 m) and median values of automatically selected cold pixels (Figure 6), a linear model showed close correspondence between cool canopy temperatures obtained on clear sky days and AZMET Coolidge (423.0 m MSL) air temperatures adiabatically adjusted (T 2m ) for the mean CAIDD elevation (479.0 m MSL). Where correspondence was poor, as observed for four days-DOY 77, 285, 333, and 357-ET modeling was not done. These days were confirmed to be partly cloudy by USGS data (Table 1). Considering the high correlation between temperatures (Root Mean Squared Error (RMSE) ∼2.4 • C), the cold-pixel air temperature proxy was adopted for use with inputs needed for TSEB and METRIC, though not for VISW, where adiabatically adjusted AZMET air temperatures were used for ET • estimation.

Land Surface Temperature Assessment
Land surface temperature (LST) data, critical to performance of the TSEB and METRIC models, were atmospherically corrected using the radiative transfer model, MODTRAN (Spectral Sciences, Inc., Burlington, MA, USA), and atmospheric temperature and humidity profiles from Tucson radiosonde data. Considering 44 overpass times for TM5 and TM7 in 2008, results showed ∼4.5 • C temperature differences between the land surface and at-sensor values. Exceptions occurred on the same four overpass times as previously noted for estimating near surface air temperatures. Three relevant facts about atmospherically correction thermal infrared data are illustrated in Figure 7A. First, MODTRAN results show that requirements for accurate profiles are not demanding when LST values were ∼22 • C, which means that for these conditions standard profiles could be used to estimate corrections. On the other hand, accurate estimates of atmospheric conditions become increasingly important as LST values increase beyond ∼35 • C, in such cases correction errors could approach 8-10 • C when using inaccurate atmospheric profile input data. These Tucson model results show that at-sensor temperatures are almost always cooler than LST and that the range of observed temperatures is substantially less than actual surface conditions. This latter point is illustrated in Figure 7B, where the impact of temperature span reduction is seasonal. For 2008, during monsoonal conditions, satellite observations of LST are reduced by ∼6 • C.
Having performed atmospheric corrections to the TIR data, relative accuracy of TM5 data were assessed by comparing them with TM7 and ASTER data. Figure 8 shows the seasonal pattern for temperatures in 754 CAIDD plots: symbols represent mean values, bars represent ±1 standard deviation. None of the three sensors had coincidental overpass times so direct comparisons were not possible. However, a distinct LST bias was observed between TM5 and the other two sensors. Considering ASTER to be the most accurate, TM5 bias for 2008 was −3.8 • C (Table 7). This offset was applied to all TM5 images prior to running TSEB, though not for METRIC since one of its features is insensitivity to LST bias.

Evapotranspiration Results: Wheat
Remotely sensed ET estimates for durum wheat plots were assessed for 137 plots across CAIDD with planting dates spanning late 2007 to early 2008. Figure 9 shows daily estimates made from the three models, METRIC (red), TSEB (blue), and VISW (green), linearly modeled between from eight overpass days (days indicated by solid squares) to daily time steps. Had precipitation or other large weather events occurred between overpasses, a physically based infill method would need to be used for greater accuracy. VISW values were smoothed with Savitzky-Golay polynomials to approximate weekly ET patterns. The average of the models, using equal weighting, is shown as a heavy purple line. Standardized ET estimates are shown as symbols: FAO-56 (solid triangles), and USDA-SW ET values (open squares). Rainfall (×0.1 mm) is shown in light blue along the bottom.
Results show good agreement over the ∼150 growing days with peak ET of 8 mm/day reached by all models and supported by FAO-56 estimates. METRIC and TSEB estimates generally closely tracked within 1 mm, indicating no discrepancies between using contextual LST values vs. using best estimates of LST and near surface air temperatures. Discrepancies existed between thermal models and the VISW between late January and mid-February, where the latter produced low daily ET values, ∼1 mm/day, while METRIC and TSEB estimates ranged 2-3 mm/day, this difference could be due to the ability of the LST approaches to detect surface evaporation while VISW was modeled solely for transpiration. Considering consistency of estimates and their average, the USDA-SW model appears to be the outlier.
When daily ET values are accumulated (Figure 10), modeling consistency is confirmed, seasonal transpiration amounts were estimated to be 726.0, 842.8, 657.1 mm respectively for METRIC, TSEB, and VISW (Table 8). These values roughly agree with general estimates for durum wheat crop water requirements in Central Arizona to be 610 mm plus another 300-450 mm of water to accommodate irrigation inefficiencies [60]. If soil evaporation were included, VISW estimates would likely exceed 750 mm. Considering high model variabilities (∼85 mm), differences between models are likely not significant, except for FAO-56, which is relatively low by 175 mm (Table 8). Table 8. Cumulative ET results for three crops, six models, and model average. Seasonal crop water use in mm. Deviation from model average indicated in parentheses.

Evapotranspiration Results: Cotton
In contrast to winter-time estimates, results for ET modeling over cotton showed broad agreement with seasonal growth but significant inter-model biases ranging between 1 and 4 mm/day ( Figure 11). For days shortly after planting METRIC, VISW indicate 2 mm/day, while TSEB returned 4-5 mm/day, an outcome indicating that TSEB estimates high surface evaporation in April. Mid-season ET estimates were unexpectedly low, ∼6 mm/day, and much less than the 8-10 mm/day estimates provided by other models. Seasonal total ET results ( Figure 12) confirm estimation biases, with transpiration values of 803.6, 1142.8, 1001.3, respectively, for METRIC, TSEB, and VISW. These values fall within cotton ET obtained under detailed controlled experiments 2014 and 2015 in Maricopa, AZ, USA, where irrigation amounts ranged between 757-939 mm [61]. As for wheat, plot-to-plot variability of ET over cotton for METRIC was about twice that of TSEB. Plot variability for the VISW, however, was about the same as for TSEB, possibly indicative of close coupling of NDVI with LST values.
Because a greater number of cloudy days occurred for summer months, a background test using weekly MODIS Terra and Aqua data was conducted. Although the spatial resolution, 250 m for red and NIR bands, is questionable for fields typically less than 180 m in extent at CAIDD, the test could help answer the question about potential ET modeling improvement with more frequent satellite overpasses. Using the MOD09Q1 and MYD09Q1 climatology products, VISW was implemented, but without daily interpolation steps. Resulting ET estimates ( Figure 13) closely match higher spatial resolution TM5 VISW data, suggesting that frequent overpasses could be a key step for improving remote sensing modeling of ET.

Evapotranspiration Results: Alfalfa
ET estimates over alfalfa ( Figure 14) reflect a combination of the daily patterns observed over wheat and cotton, namely that there was close agreement between models in the cooler months, and significant biases in the summertime. Alfalfa is a year-round crop in Central Arizona with cuttings occurring approximately every two months, diagramatically illustrated in the hypothetical FAO-56 crop coefficient chart (Figure 4). Unfortunately for this study, temporal sampling by satellites was not sufficient to detect cutting dates with daily-weekly precision, and FAO-56 models were not run for alfalfa. Early and late year ET values ranged between 2 and 5 mm/day, while mid-summer values ranged 5-8 mm/day, less than seen in a canopy resistance study, 8-9 mm/day [62], and compatible with values obtained between September and November in 1983: a range of 4.2-7.4 mm/day reported from a Coolidge, AZ, USA study [63]. The USDA-SW model indicating peak ET values over 9 mm/day were not confirmed by remote sensing estimates. The USDA-SW model also predicts a late September ET depression from 9 to 7.5 mm/day on ∼DOY 270, a feature possibly detected by METRIC, but not confirmed by any of the other models. In general, remote sensing data overpasses were insufficiently frequent to resolve rapid ET fluctuations, meaning that the fluctuations present in the display are representations of local weather. With frequent overpasses and high spatial resolution-as observed by the Venµs satellite (https://www.theia-land.fr/en/products/venus), Sentinel 2 (https://sentinel.esa. int/web/sentinel/missions/sentinel-2) and Landsat (https://landsat.usgs.gov/landsat-science-dataproducts)-peak cover and harvest days could be detected and crop coefficients accurately mapped over time. However, TM periodicity was insufficient to resolve these patterns, which could mean a low-bias remote sensing of ET from all VI-based models. Cumulative ET estimates for alfalfa ( Figure 15) were the highest of the three examined crop types: 1317.4, 1746.1, and 1217.3 mm/day for METRIC, TSEB, and VISW respectively, in broad agreement with lysimeter data collected in Phoenix, AZ [64] and greater than observed at higher elevations in New Mexico, ∼1190 mm [65]. As for wheat results, plot-to-plot variability of ET, as estimated by METRIC and VISW were approximately twice that of TSEB estimates.

Discussion
Results from assessing multiple remote sensing data sets demonstrated strategies for handling sparse air temperatures, biased TIR satellite observation (critical for TSEB), automated LST end-member selection for the METRIC approach, and quantile selection of NDVI extremes to help standardize implementation of VI-based models. The study found that LST data over well-watered, cool canopies were reasonable proxies for near surface air temperatures, easing ET modeling problems when no local weather station is available; this addresses the problem of unavailable nearby air temperature data needed for modeling the surface energy balance. Eventually consistent, accurate, and finely gridded weather data may make this approach unnecessary. The study implemented a cross-comparison analysis of sensors-TM5, TM7, and ASTER-to improve radiometric accuracy. This improvement is similar to findings reported in [25], where multiple sensors were utilized for data fusion. By analyzing LST trends over all of 2008 observed by TM5 with respect to TM7 and ASTER, we were able to justify a 3.7 • C LST correction to TM5, a change that significantly reduced TSEB's ET difference with respect to the model average. Using atmospherically corrected TM5 data, we used an LST/NDVI quantile selection method to identify so-called 'cold' pixels, and then used those anchor points to search the neighborhood of each for 'hot' pixels. Very large (tens of thousands) samples were generated, greatly exceeding capabilities of manual end-member picking. Lastly, a related quantile selection approach for vegetation index normalization was implemented and tested. The approach is initially subjective in setting quantiles, but is objective and can be replicated for subsequent NDVI normalization. Unresolved in this study is an understanding of the impact of spatial resolution between sensors used to calibrate k cb estimates and sensors used to employ them. Data underpinning calibrations in this study are based on footprints less than 2 m in diameter with NDVI values exceeding 0.9, while satellite-based footprints were 30 m where NDVI values uncommonly exceeded 0.65. As new satellites with finer spatial and temporal sampling, as for example provided by the Venµs and ECOSTRESS missions, the scale divergence problem will be greatly reduced.
Results from this remote sensing study at CAIDD showed ET differences and uncertainties were affected by model and crop. As shown in Table 8, relative to model averaging results for cumulative ET values for all crops, METRIC was low ∼100 mm, TSEB high ∼190 mm, and VISW low ∼90 mm. These respectively represent apparent differences −10%, +18%, and −9%. For wheat, a winter crop at CAIDD, the METRIC model had superior performance with a −16 mm difference, while TSEB returned +100 mm and VISW −85 mm. For cotton, VISW came closest to ensemble average with a 19 mm bias, while METRIC and TSEB returned differences of −180 mm and +160 mm, respectively. For alfalfa, all models diverge from the ensemble average of 1427 mm by over 100 mm with differences consistent with those observed for the other two crops, i.e., METRIC and VISW were relatively low ET estimators while TSEB was relatively high. For METRIC in particular, this result is at odds with findings reported by [66], who report anomalously high ET values for a sorghum crop. Absent ground validation data, the significance of the model differences is unclear, but may be indicative of problems underlying model assumptions and sensitivities to inaccurate or infrequent inputs. In the case of METRIC, having seasonal dependency was not expected, suggesting that the horizontal temperature differencing methodology be checked for LST dependency. For TSEB, efforts to reduce TM5 LST bias effects may have been insufficient, possibly meaning that bias corrections should be adjusted on a scene-by-scene basis: bias LST corrections of 1-2 • C could readily remove any ET bias between TSEB and model averages.
Considering model differences exhibited in this study were larger than desired-one could seek differences in the range of 10%-future implementation strategies that could be considered are adopting ensemble modeling as a standard practice and creating shared standardized data sets to be used for model verification. Inputs, outputs, parameter settings, and intermediate results from runs against such data sets could be provided as a part of the publication and documentation. Currently, ET models results are reported based on the author's own implementation with no apparent means to verify consistency with other implementations of the same model namesake. Tracking down inconsistencies due to implementation differences could significantly improve community ET modeling results. In this study, a simple to implement, equally weighted model averaging approach, using three models, showed a benefit: outlier estimates were reduced and total ET estimates appear to track ET values known from experience. An additional benefit for averaging is a de-emphasis of model-intercomparisons, with a focus on accepting that each model has contributed complementary information. There are many documented remote sensing ET models that could be added to an ensemble framework, and doing so would add credibility to average ET estimates. Such a framework might not be practical on single-user platforms, but with rapidly improving IT technology and cloud-based computing, as being proposed by the OpenET organization, model sample sizes could be readily increased to 10 or more while remaining accessible for multiple users. The other strategy to address ET output differences-incorporation of model verification-needs community support, cooperation, and extensive sharing of common and well-documented data sets.

Conclusions
An ET study using remote sensing models was performed over irrigated farms in South Central Arizona and quantified daily and cumulative water use for wheat, cotton, and alfalfa. Multiple tests to evaluate input data quality were performed. Actions were taken to adjust biases due to atmospheric and sensor effects, while standardized procedures were established for LST end-member selection and normalizing vegetation indices to remove arbitrariness. Three models were implemented, two surface energy balance approaches, TSEB and METRIC, and a vegetation index approach, VISW. Differences had a seasonal dependence, where all models agreed within 18% for winter durum wheat plots, while differed up to 35% during summertime cotton and alfalfa trials. Considering that the three remote sensing models modeled ET in distinctly different ways, model averaging was found to be an objective and simple way to summarize divergent ET estimates. Future experiments will incorporate ground validation of ET to provide an independent assessment of model accuracy. Acknowledgments: The CAIDD kindly shared maps and water use data used in this study.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: