Evaluating the Temperature Di ff erence Parameter in the SSEBop Model with Satellite-Observed Land Surface Temperature Data

The Operational Simplified Surface Energy Balance (SSEBop) model uses the principle of satellite psychrometry to produce spatially explicit actual evapotranspiration (ETa) with remotely sensed and weather data. The temperature difference (dT) in the model is a predefined parameter quantifying the difference between surface temperature at bare soil and air temperature at canopy level. Because dT is derived from the average-sky net radiation based primarily on climate data, validation of the dT estimation is critical for assuring a high-quality ETa product. We used the Moderate Resolution Imaging Spectroradiometer (MODIS) data to evaluate the SSEBop dT estimation for the conterminous United States. MODIS data (2008–2017) were processed to compute the 10-year average land surface temperature (LST) and normalized difference vegetation index (NDVI) at 1 km resolution and 8-day interval. The observed dT (dTo) was computed from the LST difference between hot (NDVI < 0.25) and cold (NDVI > 0.7) pixels within each 2◦ × 2◦ sampling block. There were enough hot and cold pixels within each block to create dTo timeseries in the West Coast and South-Central regions. The comparison of dTo and modeled dT (dTm) showed high agreement, with a bias of 0.8 K and a correlation coefficient of 0.88 on average. This study concludes that the dTm estimation from the SSEBop model is reliable, which further assures the accuracy of the ETa estimation.


Introduction
As a process of transferring moisture from the land surface into the atmosphere, evapotranspiration (ET) is an important element in the hydrologic cycle of the Earth.Measurement and estimation of ET are essential and important for investigations and applications in hydrology, ecology, climatology, agriculture, and water resources management.However, accurate estimation of ET remains a challenge because ET is driven by numerous complexed factors including radiation, vapor-pressure deficit, temperature, soil moisture, and plant stomatal conductance [1][2][3][4].Although direct ET measurement can be implemented using water budget methods (e.g., lysimeter, soil moisture depletion, and field experiment plot) and water vapor transfer methods (e.g., Bowen ratio energy budget and eddy covariance), the scale of the ET measurement is ground-based.Spatially explicit estimation of ET over a basin or a region relies on models that integrate ground information and atmospheric observations with remotely sensed data.Satellite remote sensing techniques have become increasingly attractive because they provide spatially and temporally contiguous observations from visible, infrared, and thermal infrared wavelengths for retrieving land surface parameters used for the energy balance and ET modeling.
Since the 1990s, various remote-sensing-based ET estimation models have been developed for estimating actual ET (ETa) using the surface energy balance method.Those well-known models include Surface Energy Balance Index (SEBI) [5], Two-Source Energy Balance (TSEB) [6], Surface Energy Balance Algorithm for Land (SEBAL) [7,8], Simplified Surface Energy Balance Index (S-SEBI) [9], Surface Energy Balance System (SEBS) [10], Mapping Evapotranspiration at high Resolution with Internalized Calibration (METRIC) [11,12], and Operational Simplified Surface Energy Balance (SSEBop) [13,14].Some ET models are designed with high complexity to approximate the actual hydrological mechanisms and processes for increasing model accuracy, which require more input data and measurements to calibrate parameters.However, the complexity may influence the accuracy of the model output due to the potential errors in the input data and parameters.High requirement for input data also usually impedes the applications of ET estimation at regional or national scale.Other models have simplified structures, which eliminate less important processes in computation.Although these models may sacrifice accuracy to some extent, the simplicity benefits the model in operational applications over a large area.
The Simplified Surface Energy Balance (SSEB) method developed by Senay et al., 2007 [15] is a simplified thermal-based surface energy model.The model requires setting up uniform hydro-climatic regions where climatic conditions are comparable for the hot and cold reference pixels to limit the impact of confounding factors that cause changes in land surface temperature (LST).The SSEB formulation was enhanced by Senay et al., 2011a [16] to handle the impact of topography on surface temperature using a lapse rate correction factor.Later, Senay et al., 2011b [17] introduced a revised SSEB that handles both elevation and latitude effect on surface temperature using the difference between LST and air temperature.Furthermore, the SSEB was parameterized for operational applications and renamed to SSEBop (Operational SSEB) by Senay et al., 2013 [13].The SSEBop model is formulated based on the principle of satellite psychrometry, which uses LST measured from satellite sensors to estimate the psychrometric surface equivalents of dry-bulb and wet-bulb air temperatures to calculate ET fractions and ETa for every pixel in an image.The SSEBop model uses predefined hot and cold reference boundary limits for each location and time, so that the estimation of ET fractions can be operationally implemented using the SSEB approach [13].SSEBop is one of the simplest surface energy balance models for large-scale operational ET estimation because the model directly solves for latent heat flux without the need to solve for other surface energy balance components such as net radiation, sensible heat, and ground heat fluxes [13,14].
To meet the needs for water resource use and management, the U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center routinely produces historical and near-real time ETa maps at regional, national, and global level using the SSEBop model.Those ETa products include (1) global 1 km ETa and ETa anomaly (ratio of concurrent ETa to long-term median value) maps (2003 to present) at dekadal (10-day), monthly, and yearly intervals [18], (2) nationwide 1 km ETa and ETa anomaly maps for the conterminous United States (CONUS) (2001 to present) at 8-day, and monthly, and yearly intervals [18], and (3) monthly and yearly 30 m ETa maps for some focus areas such as the Upper Rio Grande Basin [18] and the southwestern CONUS [19].
The SSEBop ETa products have been comprehensively validated at field scale using in situ flux tower data and at basin scale using water balance approaches.Comparisons of the SSEBop ETa against flux tower data were found to show reasonable accuracies across different landscapes [13,14,[20][21][22][23].For example, validation of the 1 km SSEBop ETa product with individual flux towers had R 2 of 0.70-0.97across the CONUS [13].Validation of the 100 m SSEBop ETa estimates in the Colorado River Basin using eddy covariance flux towers showed good accuracy levels with R 2 ranging from 0.74 to 0.95 [14].Validations of the SSEBop ETa data at basin scale were also found to be reliable in previous studies [14,20,21] and comparable with other existing global ET datasets, such as the MOD16 ET [24,25] and the FLUXNET upscaling ET obtained by combining global flux tower records and machine-learning algorithm [26].Furthermore, the SSEBop ETa product was found to be reliable in multi-model comparison studies across various landscapes and regions [27][28][29][30][31].Although previous studies indicated a certain level of uncertainties of SSEBop ETa data in point scale in some locations, the data were more accurate at basin scale with uncertainties as low as 15-20% [22].
The SSEBop model computes ETa as a product of ET fraction generated from LST and reference ET using two key parameters, i.e., air temperature (T a ) for establishing a maximum ET limit condition (cold boundary) and predefined temperature difference (dT) between dry bare surface and canopy surface [13].In the SSEBop model, dT is calculated from the surface energy balance equation under average-sky conditions for dry bare soil where ET is assumed to be zero, and sensible heat is assumed to be maximum [13,14].Thus, dT is determined based on the assumption that the difference between the hot and cold values remains nearly constant from year-to-year for a given location and period under average-sky conditions [13,21].Because dT is a key parameter in the SSEBop model, validation of the dT estimation is critical for assuring the accuracy of the ETa products.dT represents the LST difference between dry/hot bare soil pixels and wet/cold vegetated field pixels (assumed to be equal to canopy level air temperature).Direct validation of dT data would be impractical from field measurements because of the difficulty in obtaining air temperatures of dry/hot soil and wet/cold canopy at the same location, especially across a large ET mapping area.However, observations from satellite sensors provide a convenient and flexible tool for measuring LST for dry/hot and wet/cold pixels, which can be used to estimate dT and compare it with the modeled dT.
The objective of this study was to evaluate the SSEBop dT estimates using satellite LST observations.We collected and analyzed the Moderate Resolution Imaging Spectroradiometer (MODIS) surface reflectance data (2008-2017) to determine hot and cold pixels based on the normalized difference vegetation index (NDVI).We used the MODIS LST data to compute actual LST for hot and cold pixels to simulate observed dT, which was further compared with the dT data generated from the SSEBop model.We performed the evaluation of the dT products for sampling blocks across the CONUS.The degree of agreement between the satellite-observed dT and modeled dT was quantified to understand the accuracy of the SSEBop ET products.
As mentioned above, validations of the SSEBop estimates have been conducted through using flux tower data, flux tower-based gridded ETa estimates, and basin-scale water balance data [20][21][22][23] and through comparison with existing well-established ETa model estimates [27][28][29][30][31].Although flux towers provide direct ETa measurements, the limited number of tower locations in space hampers wide and comprehensive use of the data for the model validation.Other gridded ETa products, either estimated from remote sensing-based models or from flux tower upscaling algorithms, are useful for evaluating the SSEBop product at pixel level.These modeled ETa products can be utilized for ETa data evaluation through cross-model or cross-data comparisons, but they cannot be referred to as the ground-observed references in the data validation.Unlike the ETa estimates, the dT parameter generated from the SSEBop model can be directly compared with satellite-observed LST at pixel level.Because MODIS contains abundant collections of LST and other land surface products, we choose to validate the dT data instead of the ETa data in this study.This method provides an auxiliary approach for evaluating the SSEBop model in addition to the previous model validation investigations [20][21][22][23][27][28][29][30][31].

Study Area
Our study area encompasses the CONUS with a map extent of 24-50 • N and 66-126 • W (Figure 1).The topography of the CONUS is featured by moderate elevations consisting of hills and low mountains in the east, low elevations in the vast central plains (i.e., the Great Plains), and high elevations with high rugged mountain ranges in the west.According to MODIS land cover data (MCD12Q1) [32], the primary classes of vegetated land in the CONUS are cropland (19.8%), forest (15.6%), shrubland and savanna (25.7%), grassland (34.3%), and wetland (0.6%).Based on the 30-year (1981-2010) climatology data [33], the annual average air temperature is 11.6 • C and total precipitation is 780 mm, calculated as the spatial means across the CONUS.The climate varies greatly, ranging from humid-subtropical in the Southeast, humid-continental in the Northeast and Midwest, semiarid in the western Great Plains, arid in the southwestern deserts, to Mediterranean along the western coast.1).

MODIS Data
We used a suite of MODIS datasets (Collection 6) including Land Surface Reflectance, Land Surface Temperature, Active Fire, and Land Cover to compute the observed dT parameter (Table 1).All the MODIS datasets were obtained from the Application for Extracting and Exploring Analysis Ready Samples (AppEEARS) managed by the Land Processes Distributed Active Archive Center (LP DAAC) [34].We downloaded the MODIS data for the CONUS map extent in the geographic coordinate system.The data included 1850 images of four MODIS products for the years from 2008 to 2017 (Table 1).1).

MODIS Data
We used a suite of MODIS datasets (Collection 6) including Land Surface Reflectance, Land Surface Temperature, Active Fire, and Land Cover to compute the observed dT parameter (Table 1).All the MODIS datasets were obtained from the Application for Extracting and Exploring Analysis Ready Samples (AppEEARS) managed by the Land Processes Distributed Active Archive Center (LP DAAC) [34].We downloaded the MODIS data for the CONUS map extent in the geographic coordinate system.The data included 1850 images of four MODIS products for the years from 2008 to 2017 (Table 1).The Aqua MODIS Surface Reflectance is the MYD09A1 level-3 product containing surface reflectance at 500 m resolution for bands 1 (620-670 nm), 2 (841-876 nm), 3 (459-479 nm), 4 (545-565 nm), 5 (1230-1250 nm), 6 (1628-1652 nm), and 7 (2105-2155 nm).The dataset is an 8-day composite, with each pixel containing the best possible observation selected based on high observation coverage, low view angle, and absence of clouds, cloud shadow, or aerosol during an 8-day period.We used the MYD09A1 State Quality Assurance (QA) layer to extract cloud, cloud shadow, cirrus, and snow/ice data for each 8-day composite.
The Aqua MODIS Land Surface Temperature is the MYD11A2 level-3 product that provides 8-day composites of LST and emissivity at 1 km resolution.Each pixel value in the data is a simple average of all LST observed within the 8-day period.The daytime LST in Kelvin degree was extracted for this study.
The MODIS Active Fire is the MYD14A2 level-3 product at 8-day interval and 1 km resolution.The 8-day composite contains fire pixels detected in each pixel during the 8-day period.The Fire Mask layer in the MYD14A2 includes three categories of confidence levels: Low confidence, nominal confidence, and high confidence.We used the nominal and high confidence levels to identify the active fire pixels.
The Aqua MCD12Q1 (Collection 6) product provides the Terra and Aqua combined yearly land cover type at 500 m resolution.We used the International Geosphere-Biosphere Programme (IGBP) classification scheme, or Land Cover Type 1 in the MCD12Q1 data to identify vegetation cover.The vegetation cover includes evergreen forest, deciduous forest, mixed forest, closed shrubland, open shrubland, woody savanna, savanna, permanent wetland, cropland, and cropland/natural vegetation mosaics, while excluding urban and built-up land, permanent snow and ice, barren, and water bodies.

Modeled dT Data
In the SSEBop model, dT is an intermediate product derived from the surface energy balance equation [13]: where Rn is the net radiation, r ah is the aerodynamic resistance to heat transfer, p a is the air density, and C p is the specific heat of air.The most important assumption of the model is that the surface energy balance process is mostly driven by available net radiation, and a decline in ET due to water stress and other factors can be quantified by differences in LST [13,14].Complete descriptions of the SSEBop model and the dT computation are found in detail in Senay et al., 2013 [13] and Senay 2018 [14].Besides a set of constant parameters, the most important variables as the input to the dT computation are the minimum air temperature (T min ) and maximum air temperature (T max ) derived from the Daymet dataset [35].Daymet is a collection of 1 km gridded estimates of daily weather parameters, including maximum and minimum air temperature, precipitation, water vapor pressure, shortwave radiation, and snow water equivalent, updated from 1980 through the most recent full calendar year for North America [35].The Daymet dataset is archived and distributed through the Oak Ridge National Laboratory Distributed Active Archive Center [36].For this study, we calculated the 10-year average (2001-2010) of daily 1 km T min and T max for the CONUS extent.The dT product is a 1 km gridded dataset estimated yearly at the daily time step, while the dT value is static for a given pixel and date.To compare the modeled data with the 8-day interval MODIS data, we converted the dT data from daily interval to 8-day mean values.To distinguish the two types of dT, we denote the MODIS-observed dT as dT o and SSEBop model-estimated dT as dT m hereafter.

Processing of MODIS Data
We processed 5 sets of MODIS data (2008 to 2017) including surface reflectance, surface reflectance QA, LST, active fire, and land cover (Table 1).All the datasets at 500 m resolution were remapped to 1 km (0.008333 • ) resolution in the geographic coordinate system.The bilinear resampling method was used for continuous data (surface reflectance), and the dominant resampling method was used for thematic data (surface reflectance QA and land cover).We created the NDVI dataset at 1 km using surface reflectance: where R NIR is the reflectance of the near-infrared band and R red is the reflectance of the red band.We chose NDVI instead of other vegetation indices because the quantitative relationships between NDVI and vegetation density have been well established from rich literature and from our ecological modeling experiences.We extracted the cloud, cloud shadow, cirrus, and snow/ice layers from the MYD09A1 Surface Reflectance QA data and the fire layer from the MYD14A2 Fire Mask data.By combining all the cloud, cloud shadow, cirrus, snow/ice, and fire layers, we created a general noise mask dataset at an 8-day interval and 1 km resolution.The NDVI and LST datasets were filtered using the noise mask to remove all poor-quality pixels from the two datasets.
Using the denoised NDVI dataset, we calculated the 10-year average (2008-2017) NDVI for each 8-day interval from 1 January to 31 December at pixel level.Because some years had missing values due to noise (clouds etc.), we calculated the temporal average NDVI for the pixels that had more than half of the total 10-year observations.We applied the same averaging method to the denoised LST data and created a 10-year average LST dataset at 8-day interval.Figure 1 shows examples of the 10-year average NDVI and LST images for the interval of 12-19 July.
The dT calculation uses the LST values selected for two different sites (described in detail in the next section).Because of the possible difference in elevation between the two sites, especially in mountain areas, LST needs to be corrected for the elevation effect with a lapse rate.The lapse rate is the rate at which temperature decreases with an increase in elevation.In this study, we used a constant lapse rate of 0.0065 K/m to correct for the elevation effect on LST [16]: where LSTc (K) is the elevation-corrected LST (K), Г (K/m) is the lapse rate, and z is the elevation (m).By applying the lapse rate to the 10-year average LST dataset, we created an elevation-corrected LST dataset.The elevation data were obtained from Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) product developed by the USGS and the National Geospatial-Intelligence Agency [37].
From the GMTED2010 product suite, we chose the mean elevation product with 30 arc-second (about 1 km) resolution using for the LST correction.We used the yearly MCD12Q1 land cover data (2008-2017) to create a stable vegetation cover map.First, all vegetation classes in Land Cover Type 1, including forest, shrubland, savanna, wetland, and cropland, were combined into a single class "vegetation".The vegetation cover output was simplified to a binary map consisting of "vegetation" and "non-vegetation" for each year.Then the vegetation/non-vegetation maps of 10 years were temporally merged into a stable vegetation cover map.In the stable vegetation map, a pixel is labeled as "vegetation" only if the pixel is consistently "vegetation" through all 10 years in the vegetation cover maps.

Calculation of Observed dT
By using the 10-year average timeseries NDVI and the stable vegetation cover data, we created yearly timeseries maps of hot and cold pixels at an 8-day interval.Here, we defined the hot pixel as NDVI < 0.25 and the cold pixel as NDVI > 0.7 for vegetation cover.The thresholds of 0.25 and 0.7 for identifying cold and hot pixels were based on our knowledge and experience with the use of satellite-derived vegetation indices in ET modeling and mapping.The LSTc values were retrieved for all hot and cold pixels from the timeseries maps.We used a square block of 2 • × 2 • as the sampling unit (approximately 172 km × 222 km at 40 • 00'N, 100 • 00'W in the geographic coordinate system) (Figure 1).There were 390 sampling blocks in the entire study area, with each block containing 240 × 240 pixels.Within each sampling block, we calculated the mean and standard deviation of LSTc for all hot pixels and all cold pixels.We experimented with different sizes of the sampling blocks, for example, 0.5 Although the smaller block reduced the impact of spatial variabilities in climatology and land cover on sampling, it decreased the probability of obtaining a desired number of hot and cold pixels within each block.
The observed dT was computed as the difference between mean LSTc for hot pixels and mean LSTc for cold pixels within a 2 • × 2 • sampling block: where dT o is the observed dT, T h is the mean LSTc value for all hot pixels in a sampling block, and T c is the mean LSTc value for all cold pixels in a sampling block.To avoid outliers in the LSTc, we required each sampling block to contain a sufficient number of both hot pixels and cold pixels that were not less than 30 counts.Thus, we created the yearly dT o timeseries at an 8-day interval for all sampling blocks where enough hot and cold pixels existed.

Comparison of Observed dT and Modelled dT
Like the dT o , we calculated the mean and standard deviation values for dT m for each same sampling block using the yearly dT m timeseries data.Therefore, the mean dT m values were compared with the mean dT o values for the yearly timeseries at 8-day interval for each sampling block, if the dT o values existed.In the data comparison, we used the metrics such as bias, root mean square error (RMSE), and Pearson correlation coefficient (r).The bias and RMSE are defined as where X is the dT o value, Y is the dT m value, and n is the number of the observations in the timeseries.When the dT o calculated from LST is used to validate the dT m simulated from the model, the dT o is referred to as the reference data.Here we assume that the reference data are less erroneous compared to the model outputs.However, because of satellite observation errors, uncertainties exist in MODIS LST, NDVI, and all other products, which will affect the data validation results.More details about the MODIS data uncertainties and the influences are discussed in Section 5.1.

Maps of Hot and Cold Pixels
Using the 10-year average NDVI data and the stable vegetation cover map, we created maps of hot and cold pixels at 8-day interval through the year.Figure 2 demonstrates some examples of the maps of the hot and cold pixels for different months from March to November.In the maps, we notice that the hot pixels are primarily distributed in the western CONUS and the cold pixels are mostly in the eastern CONUS.The area of hot pixels is relatively consistent over different seasons, although the area is bigger in the growing season (May to October) than the non-growing season.The area of cold pixels changes dramatically throughout the year.Because of the low productivity and snow cover, there are much fewer cold pixels in the non-growing season (November to April).However, cold pixels are abundant in the central, northeastern, southeastern, and the Pacific-western CONUS in the growing season.At all locations where both hot and cold pixels existed, we extracted LSTc values from the maps through the yearly timeseries, which was used to generate the maps of T h and T c .
Remote Sens. 2019, 11, x FOR PEER REVIEW 8 of 16 the eastern CONUS.The area of hot pixels is relatively consistent over different seasons, although the area is bigger in the growing season (May to October) than the non-growing season.The area of cold pixels changes dramatically throughout the year.Because of the low productivity and snow cover, there are much fewer cold pixels in the non-growing season (November to April).However, cold pixels are abundant in the central, northeastern, southeastern, and the Pacific-western CONUS in the growing season.At all locations where both hot and cold pixels existed, we extracted LSTc values from the maps through the yearly timeseries, which was used to generate the maps of Th and Tc.

Creation of dTo and dTm Timeseries
For each sampling block, we used the Th and Tc values to calculate the dTo values (Equation 4), which were then compared with the dTm values extracted from the same block.However, there were only a small number of blocks that contained 30 or more hot pixels and cold pixels, located mainly in

Creation of dT o and dT m Timeseries
For each sampling block, we used the T h and T c values to calculate the dT o values (Equation 4), which were then compared with the dT m values extracted from the same block.However, there were only a small number of blocks that contained 30 or more hot pixels and cold pixels, located mainly in the West Coast area, encompassing the states of Washington, Oregon, Idaho, and California (Figure 3a).Thus, we created the dT o timeseries at 8-day interval throughout the year only for the blocks on the West Coast (Figure 4).
Although there were rarely good sampling blocks in other areas except for the West Coast, we were able to combine two blocks that are close to each other.We used one block containing ≥30 hot pixels and another nearby block containing ≥30 cold pixels to calculate dT o values.In the South-Central CONUS, mainly in Kansas, Oklahoma, and Texas, we found several pairs of blocks containing more than 30 hot or cold pixels that can be combined to derive dT o (Figure 3b).To reduce the impact of the air temperature difference between two different blocks, we combined the blocks of hot and cold pixels from the same longitude.That is, when we selected a block of hot pixels in the west side of the region, we chose a nearby block of cold pixels directly to the east.This way, we calculated dT o values based two blocks and generated dT o timeseries at 8-day interval throughout the year (Figure 5).
The complete dT o timeseries consists of 46 observations (or 8-day intervals) for each sampling block.However, due to possible snow impacts on the MODIS data, we exclude the observations from the winter season (December to February).Therefore, the timeseries covers 35 observations from the 8th to 42nd 8-day interval (26 February to 2 December).the West Coast area, encompassing the states of Washington, Oregon, Idaho, and California (Figure 3a).Thus, we created the dTo timeseries at 8-day interval throughout the year only for the blocks on the West Coast (Figure 4).Although there were rarely good sampling blocks in other areas except for the West Coast, we were able to combine two blocks that are close to each other.We used one block containing ≥30 hot pixels and another nearby block containing ≥30 cold pixels to calculate dTo values.In the South-Central CONUS, mainly in Kansas, Oklahoma, and Texas, we found several pairs of blocks containing more than 30 hot or cold pixels that can be combined to derive dTo (Figure 3b).To reduce the impact of the air temperature difference between two different blocks, we combined the blocks of hot and cold pixels from the same longitude.That is, when we selected a block of hot pixels in the west side of the region, we chose a nearby block of cold pixels directly to the east.This way, we calculated dTo values based two blocks and generated dTo timeseries at 8-day interval throughout the year (Figure 5).
The complete dTo timeseries consists of 46 observations (or 8-day intervals) for each sampling block.However, due to possible snow impacts on the MODIS data, we exclude the observations from the winter season (December to February).Therefore, the timeseries covers 35 observations from the 8 th to 42 nd 8-day interval (26 February to 2 December).

Agreement between dTo and dTm in the West Coast Region
Figure 3a shows the locations of the sampling blocks in the West Coast region.There are 16 blocks that meet the two requirements as the samples for generating dTo timeseries: (1) each block

Agreement between dTo and dTm in the West Coast Region
Figure 3a shows the locations of the sampling blocks in the West Coast region.There are 16 blocks that meet the two requirements as the samples for generating dTo timeseries: (1) each block

Agreement between dT o and dT m in the West Coast Region
Figure 3a shows the locations of the sampling blocks in the West Coast region.There are 16 blocks that meet the two requirements as the samples for generating dT o timeseries: (1) each block contains at least 30 hot pixels and at least 30 cold pixels, and (2) the annual timeseries contains at least 2/3 of the total 35 observations (i.e., at least 24 observations) for each block.The timeseries plots of dT o and dT m for each block on the West Coast are shown in Figure 4. Comparing the temporal dT o and dT m timeseries, we found good agreement between the two variables.The overall magnitudes of dT o and dT m values are very close, although dT o shows more temporal variations than dT m .Both dT o and dT m also demonstrate strong seasonal patterns that show higher values in the summer with a peak between day of year 170-200 (late June to middle July) and lower values in spring and autumn.Most blocks present high agreement between dT o and dT m , particularly some blocks such as W07, W11, W15, and W16 showing very high agreement.Some blocks, however, show noticeable difference between dT o and dT m , for example, W01, W03-W06, and W13.The discrepancy between dT o and dT m for those blocks is probably caused by the difference in the land cover types and elevations between hot and cold pixels.Uncertainties in the MODIS NDVI and LST data may be another reason for the unstable dT o values, which are discussed in more detail in Section 5.1.
Table 2 presents the statistics of the comparison between dT o and dT m for the sampling blocks in the West Coast region.The overall difference between dT o and dT m is low as measured with the bias (1.0 K), RMSE (3.6 K), and r value (0.89).Of all sampling blocks in this region, W15 in California shows the best agreement between dT o and dT m , which has a bias of 0.8 K, RMSE of 1.6 K, and r of 0.98.

Agreement between dT o and dT m in the South-Central Region
For the South-Central region, because there were not enough hot and cold pixels within one single block, we combined one block containing all hot pixels and another containing all cold pixels to calculate dT o .Figure 3b shows six pairs of hot-pixel and cold-pixel blocks in this region to generate dT o timeseries.All hot-pixel blocks are from the western part of the region, and all cold-pixel blocks are from the eastern part.
Figure 5 shows the dT o and dT m timeseries plots, which demonstrate reasonably high agreement between dT o and dT m for all six pairs of blocks.The overall magnitudes of dT o and dT m values are very close, but the temporal variations of dT o are higher than dT m .Like dT m , dT o shows seasonal patterns with peaks in the growing season and lower values in spring and autumn.Compared with the West Coast region, the temporal dT o changes in the South-central region show more apparent local variations.This is probably caused by the relatively long distance between two blocks used for calculating dT o in this region.Table 3 provides the statistics of the agreement between dT o and dT m timeseries for the sampling blocks, including the bias, RSME, and r between two variables for the six pairs of blocks.The overall difference between dT o and dT m is low, indicated by the bias (0.5K), RMSE (1.8K), and r value (0.88).We noticed the high RMSEs for some sampling blocks from both regions, which indicate high random errors of the dT m for those sites, but the random error could also stem from the dT o calculated from MODIS LST, especially when the satellite observations are contaminated by clouds, cloud shadows, and snow etc.Our analysis was not able to distinguish between dT m random error and dT o random error.Although we assume that the dT o based on land surface measurements has the least error compared to the model outputs, satellite-based observations are subject to measurement error to a certain degree.As we compared the timeseries plots of dT o and dT m (Figures 4 and 5), dT o showed more temporal variabilities than dT m , which implied higher instability and inconsistency of the dT o data.In this case, we tend to consider that the dT o contributes more random error to the RMSE values than the dT m for some blocks.

Uncertainties of dT o Estimates
In this study, we used dT o as a reference dataset to evaluate and validate dT m .However, because dT o is derived from satellite data, there are several factors that may introduce uncertainties to the dT o observations.First, the MODIS LST data are subject to measurement and estimation errors that can be transferred to the dT o calculation.Second, MODIS NDVI data contain errors caused by background soil, atmospheric effect, nonlinear relationship to vegetation fraction, and saturation for dense vegetation cover, etc.Because NDVI values of 0.25 and 0.7 are the thresholds for identifying hot and cold pixels in this study, errors in NDVI will result in confusion in the counts of hot and cold pixels within the sampling block.Third, contaminations from clouds, cloud shadows, snow cover, and fires may influence the quality of the LST and NDVI images.Although we created a suite of data masks based on the MODIS reflectance QA and fire data to eliminate the poor-quality pixels, residuals of clouds, cloud shadows, snow cover, and fires may still exist.Fourth, the difference in temperature between different elevations may cause errors in the dT o estimation.For each sampling block, the hot and cold pixels may be located at different elevations, which will induce bias in dT o estimates if the elevation difference is substantial.Although this issue is partially resolved by using the lapse rate (Equation 3) to correct the LST in the study, the lapse rate we used is a general rate.For a specific site, the lapse rate is affected by the variations in the local temperature profile.Lastly, as the dT o values are calculated from hot and cold pixels located at different sites within a sampling block, errors in dT o may be induced if the hot and cold pixels are located at different LST gradients.The impact of the LST gradient is more noticeable in the South-Central region where the dT o calculation uses the hot and cold pixels from two separate blocks.Among all those factors, uncertainty in LST is the most influential one in the accuracy of the dT o calculation.Validation and evaluation of MODIS LST data have been well conducted and documented previously [38][39][40][41].For example, Coll et al., 2009 validated the MODIS LST (Collection 5) product over rice fields in Spain and forest sites in Germany using the temperature-based and the radiance-based methods [42].Their analysis showed an average bias of −0.3 to −0.4 K and RMSE of 0.6-0.7 K between MODIS LST and in situ LST using the two validation methods.Guillevic et al., 2012 validated the MODIS LST (Collection 5) product at an agricultural testing site in Illinois, USA, using a land surface modeling approach that scaled up ground LST measurements from field and weather stations to satellite data resolution [39].They found an absolute error of 0.0 K and RMSE of 2.0 K for the MODIS LST product.Li et al., 2014 evaluated MODIS LST product (MOD11_L2 and MYD_L2, Collection 5) in an arid area of Northwest China and reported that the product underestimated the LST for the barren surface sites, with biases varying from −3.13 K to 0.31 K and RMSEs varying from 1.04 to 3.16 K [43].Krishnan et al., 2015 compared the MODIS LST (MOD11A and MYD11A, Collection 5) data with LST measurements from infrared temperature sensors at micrometeorological tower sites and from aircraft campaigns over a mixed grassland forest site in Tennessee, USA [44].Their study showed that the MODIS LST (including both daytime and nighttime data) had a bias of −0.56 K and RMSE of 2.84 K compared to the tower-based LST records at the site.Since 2016, the newest collection (Collection 6) MODIS LST data have been produced and are available for the user community.Collection 6 has implemented refinements, resulting in better data quality than Collection 5. Duan et al., 2018 validated the MODIS LST Collection 6 product using 38 validation sites over eight worldwide land cover types [38].They found good agreement between MODIS LST and radiance-based LST values, with bias and RMSE values less than 1 K for most sites during the daytime and nighttime.As a summary of the previous studies, the bias of the MODIS LST data is generally lower than 1 K, but the random error can be up to 3 K.In our data analysis, as we calculated 10-year average LST, the temporal random errors in LST can be reduced to a certain degree.

Error Sources and Ranges of dT m Estimation
In the SSEBop model, the predefined dT m parameter is derived from the surface energy balance equation under the average sky condition where latent heat flux and ground heat flux on a daily time scale are assumed to be zero, and sensible heat is assumed to be maximum [14].According to the SSEBop formulation, the uncertainties of dT m depend not only on multiple assumptions, but also on the errors of input parameters and variables, including net radiation (Rn) on the surface, aerodynamic resistance to heat transfer (r ah ), air density (ρ a ), and specific heat air (C p ). Rn is the difference between net shortwave radiation (Rns) and net longwave radiation (Rnl), which is the function of albedo (α), incoming solar radiation (R s ), extraterrestrial solar radiation (R a ), minimum and maximum air temperature (T min and T max ), and elevation (z).Chen et al., 2016 summarized the error sources and ranges of the input parameters and variables in the SSEBop algorithm [21].For example, the error ranges are 0.5-3% for α, 1.2-10% for R a , 5-15% for T min and T max , and 0.6-3.6% for r ah (0.6-3.6%) [21].Based on the statistical analysis of the uncertainty and sensitivity of the SSEBop model using the eddy covariance measurements at 42 AmeriFlux tower sites across the CONUS, dT m is sensitive to errors of R a , α, r ah , C p , T max , T min , and z, listed in order from the most to the least sensitive [21].Their study indicates that the reduction of errors from input variables (i.e., ETa and LST) and key parameters (i.e., dT m and T max ) can significantly improve the SSEBop ETa calculation.Their study also suggests that error of the dT m be constrained to be less than ±2 to ±5 K for the growing season and less than ±1 to ±2.5 K for the non-growing season in order to limit the relative error of ETa estimation lower than 20% [21].Our current study on the evaluation of the dT m estimates with satellite-based dT o indicates that the error of the dT m is generally lower than the basic requirements for the SSEBop model.

Conclusions
The high-quality ET product generated by the SSEBop model relies decisively on the accuracy of the dT m estimates.In this study, we evaluated the dT m data using satellite observations including NDVI and LST to identify hot and cold pixels and retrieved dT o from those pixels.We generated the seasonal dT o and dT m timeseries for each sampling block across the CONUS and made comparisons between the two variables.Only in the West Coast region was there a good number of sampling blocks containing enough hot and cold pixels for generating dT o timeseries.Meanwhile, in the South-Central region, we used nearby sampling blocks that contain only hot or cold pixels to create the dT o timeseries.As a result, the evaluation of the dT m product was conducted by comparing it with the dT o values for all available sampling blocks in the West Coast and South-Central regions.The results showed high agreement between the seasonal dT o and dT m timeseries in both regions.The quantitative comparison between dT o and dT m indicated bias = 1.0 K, RMSE = 3.6 K, and r = 0.89 for the West Coast region, and bias = 0.5 K, RMSE = 1.8 K, and r = 0.88 for the South-Central region.Through this analysis, we conclude that the dT m estimation from the SSEBop model is reliable, which further assures the desirable accuracy of the ETa estimation.
As one of the two key parameters in the SSEBop model, dT defines the hot/dry limit of the model using theoretical calculations.The theoretical determination of dT enables the SSEBop model to be operationalized and applied to regional and global scales by eliminating the need for manually collecting the dry limit from images.However, the capability for the automation does not compromise the accuracy of the parameter, which could have a marked influence on the magnitude of ETa estimation.Using a commonly observed value of dT = 20 K for the peak growing season in the Southwestern United States, an error of 2 K from dT estimation could translate into a 10% error in ET estimation [13,21].Thus, reasonable agreement between theoretical and observed dT values, as discovered in this study, will not only save time in ET modeling but also improve the consistency and reliability of the ET estimation at multi-spatiotemporal scales.
Remote Sens. 2019, 11, x FOR PEER REVIEW 4 of 16 semiarid in the western Great Plains, arid in the southwestern deserts, to Mediterranean along the western coast.

Figure 2 .
Figure 2. Maps of hot and cold pixels created based on the 10-year average (2008-2017) LST and NDVI images.The label of each map indicates the dates of the beginning and the end of the 8-day interval.

Figure 2 .
Figure 2. Maps of hot and cold pixels created based on the 10-year average (2008-2017) LST and NDVI images.The label of each map indicates the dates of the beginning and the end of the 8-day interval.

Figure 3 .
Figure 3. Sampling blocks used for retrieving LST from hot and cold pixels to calculate observed dT (dTo).The date of the hot and cold pixels in the map is 12-19 July.(a) West Coast region.(b) South-Central region.

Figure 3 .
Figure 3. Sampling blocks used for retrieving LST from hot and cold pixels to calculate observed dT (dT o ).The date of the hot and cold pixels in the map is 12-19 July.(a) West Coast region.(b) South-Central region.

Figure 4 .
Figure 4. Timeseries plots of dTo and dT (dTm) for the sample blocks in the West Coast region.The label in each plot indicates the name of the sampling block shown in Figure 3.

Figure 5 .
Figure 5. Timeseries plots of dTo and dTm for the sample blocks in the South-Central region.The labels in each plot indicates the paired sampling blocks shown in Figure 3.

Figure 4 .
Figure 4. Timeseries plots of dT o and dT (dT m ) for the sample blocks in the West Coast region.The label in each plot indicates the name of the sampling block shown in Figure 3.

Figure 4 .
Figure 4. Timeseries plots of dTo and dT (dTm) for the sample blocks in the West Coast region.The label in each plot indicates the name of the sampling block shown in Figure 3.

Figure 5 .
Figure 5. Timeseries plots of dTo and dTm for the sample blocks in the South-Central region.The labels in each plot indicates the paired sampling blocks shown in Figure 3.

Figure 5 .
Figure 5. Timeseries plots of dT o and dT m for the sample blocks in the South-Central region.The labels in each plot indicates the paired sampling blocks shown in Figure 3.

Table 1 .
MODIS datasets used for developing observed temperature difference (dT).

Table 1 .
MODIS datasets used for developing observed temperature difference (dT).

Table 2 .
Agreement between modeled and observed dT values for the sample blocks in the West Coast region.

Table 3 .
Agreement between modeled and observed dT values for the sampling blocks in the South-central region.