SSEBop Evapotranspiration Estimates Using Synthetically Derived Landsat Data from the Continuous Change Detection and Classification Algorithm

: The operational Simplified Surface Energy Balance (SSEBop) model has been utilized to generate gridded evapotranspiration data from Landsat images. These estimates are primarily driven by two sources of information: reference evapotranspiration and Landsat land surface temperature (LST) values. Hence, SSEBop is limited by the availability of Landsat data. Here, in this proof-of-concept paper, we utilize the Continuous Change Detection and Classification (CCDC) algorithm to generate synthetic Landsat data, which are then used as input for SSEBop to generate evapotranspiration estimates for six target areas in the continental United States, representing forests, shrublands, and irrigated agriculture. These synthetic land cover data are then used to generate the LST data required for SSEBop evapotranspiration estimates. The synthetic LST, evaporative fractions, and evapotranspiration data from CCDC closely mirror the phenological cycles in the observed Landsat data. Across the six sites, the median correlation in seasonal LST was 0.79, and the median correlation in seasonal evapotranspiration was 0.8. The median root mean squared error (RMSE) values were 2.82 ◦ C for LST and 0.50 mm/day for actual evapotranspiration. CCDC predictions typically underestimate the average evapotranspiration by less than 1 mm/day. The average performance of the CCDC evaporative fractions, and corresponding evapotranspiration estimates, were much better than the initial LST estimates and, therefore, promising. Future work could include bias correction to improve CCDC’s ability to accurately reproduce synthetic Landsat data during the summer, allowing for more accurate evapotranspiration estimates, and determining the ability of SSEBop to predict regional evapotranspiration at seasonal timescales based on projected land cover change from CCDC.


Introduction
Understanding spatial and temporal patterns in actual evapotranspiration (ETa) through remotely sensed data products is of increasing importance for a variety of environmental applications in the continental United States.First and foremost, satellite-derived ETa data are an integral part of drought monitoring [1][2][3].Furthermore, utilizing remotely sensed ETa data in agricultural regions can help inform stakeholders managing evaporative water loss, allowing them to optimize irrigation [4].Gridded ETa data have been used to monitor the hydrological balance in relation to forest fires [5].Additionally, gridded ETa datasets have been used to produce an evapotranspiration climatology for the state of Indiana for analyzing the regional water budget and drought's effects on cereal production [6].
To meet these increasing needs for spatial evapotranspiration data, a variety of gridded evapotranspiration data products have been developed to utilize remote sensing data from a variety of satellite platforms.These satellite-derived evapotranspiration models typically use thermal data from satellites, such as GOES, as is the case for the Atmosphere-Land Exchange Inverse (ALEXI) model [7], or Landsat for the operational Simplified Surface Energy Balance (SSEBop) model [8].Recently, ALEXI and SSEBop have been aggregated together with several other remote-sensing-based evapotranspiration models to produce the OpenET project, which provides evapotranspiration data for stakeholders in the western United States [9].Additionally, NASA has recently deployed ECOSTRESS onboard the International Space Station to capture evapotranspiration data at 70 m spatial resolution with a 1-to 5-day repeat time [10].Landsat provides higher spatial resolution at 30 m but is still limited by its 16-day repeat time.
Despite the clear need for quality ETa estimates, hydrological forecasters typically rely on reference evapotranspiration (ET 0 ) when making predictions.Several different approaches have been successfully utilized to forecast ET 0 for agricultural applications, including probabilistic forecasting [11], numerical weather prediction [12,13], wavelet regression models [14], and machine learning [15,16].Additionally, ET 0 forecasts can lead to improved drought forecasts [17].However, ET 0 is typically modeled on large spatial scales [18] that may not be useful for individual stakeholders.Furthermore, ET 0 measures potential atmospheric water demand, which varies greatly from ETa in water-stressed environments.Hence, more localized forecasts of water losses through ETa are needed.Here, we take the first step necessary for combining modeled ET 0 with Landsat data to provide ETa estimates at 30 m spatial resolution and sub-seasonal timescales.
This proof-of-concept paper uses synthetic Normalized Difference Vegetation Index (NDVI) and thermal infrared Landsat data produced with the Continuous Change Detection and Classification (CCDC) algorithm [19] to generate ETa estimates at a spatial resolution of 30 m on daily timestamps using the SSEBop algorithm.On a per-pixel basis, we use CCDC's harmonic fitting algorithm to calculate synthetic surface reflectance values and brightness temperatures corresponding to Landsat overpasses at six target areas in the western United States.These synthetic surface reflectance values and brightness temperatures are then used to generate synthetic NDVI and land surface temperature (LST) data, which are input into SSEBop to derive synthetic ETa estimates.The CCDC-derived ETa estimates are validated against ETa estimates from Landsat data in six target areas, representing croplands, forest, and shrublands in the western United States.The work presented in this paper paves the way for using synthetic Landsat data to generate ETa estimates between Landsat overpasses, thus allowing for daily ETa estimates from SSEBop and addressing the limited temporal resolution offered by Landsat.Now that this paper validates the ability of CCDC to generate accurate synthetic ETa estimates through SSEBop, future work can focus on generating synthetic ETa estimates between Landsat overflies, with the goal of extending CCDC into the future to generate sub-seasonal ETa forecasts.

Calculation of ETa Data from SSEBop
For this study, we used the most recent edition of SSEBop, which uses the Forcing and Normalizing Operation (FANO) approach [20].SSEBop was initially outlined by Senay et al. [8] and uses the Surface Energy Balance Algorithm for Land [21] as its theoretical basis to derive ETa by comparing the difference between hot and cold pixels in remotely sensed surface temperatures [22].The current FANO approach allows for the accurate derivation of cold boundary (wet bulb) LST from Landsat data utilizing localized LST and NDVI values [20].From there, the evaporative fraction (ETf ) was calculated using Equation (1): where T s is the observed satellite LST, T c is the wet bulb surface temperature, and γ s is the surface psychrometric constant derived from ERA 5, or the fifth-generation ECMWF atmospheric reanalysis of the global climate [23].The ETf is then multiplied by the alfalfa reference evapotranspiration (ET r ), as determined by gridMet [24], to obtain the actual evapotranspiration (ETa; Equation (2)): The main advantage to SSEBop is that as a two-parameter model, it is easy to deploy and less prone to uncertainties that could arise from a more complex model with additional parameters.Additionally, an advantage of using this surface temperature gradient approach between hot and cold pixels in satellite images is that it does not require absolutely accurate LST, but rather depends on the magnitude of the difference between hot and cold pixels [22], as illustrated in Equation (1).Conversely, as a simple model, SSEBop does not generate other components of the energy balance, such as the sensible and ground heat fluxes.An additional drawback to SSEBop is the limited temporal resolution resulting from Landsat's 8-to 16-day repeat time and cloud cover [25].Nonetheless, SSEBop has been shown to provide accurate ETa estimates for irrigated crops in Brazil [26,27] and Egypt [28], in addition to the continental United States [20,29,30].
Here, SSEBop was run on Landsat Collection 2 Level-2 [31], which corresponds to six target areas in the western United States (discussed below).All Landsat scenes from 2019 for each of the six target areas were utilized to generate ETa estimates for comparison to the synthetic ETa estimates produced in this study.Landsat scenes with greater than 70 percent cloud clover, as identified by the Landsat Level-2 QA pixel band, or that only contained part of a target area, were excluded from this study.Additionally, this same cloud mask was used to remove all corresponding cloud-contaminated pixels from the synthetic datasets to avoid biasing the comparisons.These scenes were selected to provide a good representation of water-limited areas in the western United States, where SSEBop is widely used, while also providing a small enough sample size so that individual scenes could be visualized and analyzed in detail.

Landsat and Synthetic CCDC Data
This study utilized Analysis Ready Data (ARD) from Landsat 8 [32].Landsat ARD incorporates several preprocessing techniques that standardize data across the Landsat archive and allow for the scientific analysis of Landsat data.First, the ARD normalizes top-of-atmosphere reflectance and brightness temperature values with respect to the cosine of the zenith angle [33].Additionally, atmospheric corrections for the ARD from Landsat 8 are implemented using the Landsat Surface Reflectance Code [34] and contain a QA band that allows for the filtering of cloud-contaminated pixels [33].Further, Landsat ARD data follow the Committee on Earth Observation Satellites ARD surface reflectance Product Family Specifications [35].
The initial input for the synthetic ETa was land cover data from the Land Change Monitoring Assessment and Projection (LCMAP) project, which were derived from the CCDC algorithm.LCMAP utilizes ARD [36] after cloud-contaminated pixels have been removed using the Fmask algorithm [37,38].To generate LCMAP data, CCDC fits a harmonic model to each individual band in the Landsat ARD to generate per-pixel estimates [19].The CCDC harmonic model (Equation (3)) can then be used to generate synthetic Landsat images and predict surface reflectance for any given date [39].
Here, the harmonic models produced during the LCMAP Collection 1.2 run of CCDC were used to generate synthetic Landsat data for six target areas (discussed below) on dates corresponding to every Landsat flyover for each target area during 2019.On a per-pixel basis, synthetic surface reflectance values and brightness temperatures were calculated from CCDC using the full harmonic model shown in Equation (3) [39]: Remote Sens. 2024, 16, 1297 4 of 17 where, x: Julian date i: the ith Landsat band (i = 1, 2, 3, 4, 5, and 7) T: number of days per year (T = 365.25)a 0,i : coefficient for overall value for the ith Landsat band a 1,i , b 1,i : coefficients for intra-annual change for the ith Landsat band c 1,i : coefficient for inter-annual change (slope) for the ith Landsat band a 2,i , b 2,i : coefficients for intra-annual bimodal change for the ith Landsat band a 3,i , b 3,i : coefficients for intra-annual trimodal change for the ith Landsat band ρ(i,x) full : predicted value for the ith Landsat band at Julian date x Equation (3) was used to generate synthetic Landsat data in the thermal infrared, near-infrared (NIR), green, and red bands, which are required for running SSEBop.These pixel-level data were then assembled into raster stacks representing a time series of synthetic data that correspond to the days of Landsat overpasses at each target area.The thermal band was used to produce synthetic brightness temperatures, while the red and NIR bands were used to produce NDVI: (NIR − Red)/(NIR + Red), and the green and NIR bands were used to produce the Normalized Difference Water Index (NDWI): (Green − NIR)/(Green + NIR).A water mask was generated from the NDWI data for pixels with NDWI values greater than 0.1.These synthetic datasets were then uploaded into Google Earth Engine for integration into SSEBop model version 0.2.6 within Google Earth Engine [30].Brightness temperatures were converted to LST [22] and Tc using the FANO process [20].SSEBop was run on the resulting synthetic LST rasters with the NDWI and water masks used to identify and model open water and wet areas to prevent those pixels from being included in the Tc parameterization.Finally, the pixels of synthetic data that corresponded to cloudcontaminated pixels in the Landsat data were masked out to allow for a direct one-to-one comparison between the two datasets.

Site Selection and Study Areas
To determine how well synthetic data can be used to produce ETa estimates with SSEBop, we selected six target areas in the western United States that represent economically important land cover types of forests, irrigated croplands, and shrubland.The locations of the six target areas and their associated ecoregions [40] are shown in Figure 1, and the target areas themselves are depicted in Figure 2.For each target area, a 30-by-30 km grid of 30 m synthetic data was generated for the thermal infrared, red, green, and NIR bands required to generate the input rasters for SSEBop.For this proof-of-concept paper, the small size of the target areas limited variations in land cover, allowing for a more focused evaluation of Landsat and synthetic ETa estimates.Additionally, the target areas were intentionally kept small to limit the computational resources needed for developing multilayer synthetic datasets and to facilitate data transfers to Google Earth Engine for SSEBop.For each target site, synthetic reflectance and thermal data were generated for every day in 2019 that corresponded to Landsat flyover with less than 70% cloud cover.To avoid uncertainties in the final year of the time series, the year 2019 was selected, because LCMAP Collection 1.2 extends through 2020.
Individual target areas were selected in areas where SSEBop has been shown to perform well and validated against flux tower data [8,30], including the Central Valley of California and the Colorado River Basin [29].The land cover type in each target area was determined using data from the USGS's LCMAP database [41].In the case of agricultural areas, the United States Department of Agriculture's (USDA) 2019 Crop Scape data layer [42] was utilized to identify the crops growing in each target area.The first target area is an almond plantation in the Central Valley of California (Kern County), just northwest of Bakersville.In addition to almonds, small amounts of grapes, alfalfa, pistachios, peanuts, and corn also fall within this Californian target area.The second target area is an agricultural region in the lower Colorado River Basin, south of Phoenix, Arizona (Pinal County), and it is composed mainly of alfalfa, cotton, and corn, surrounded by desert shrubland.The third target area is in the upper Colorado River basin in southwestern Colorado (Dolores County), and it consists almost entirely of evergreen forests.The fourth target area is northeast of Eugene, Oregon (Linn County), and it consists almost entirely of evergreen forests.The fifth target area consists mostly of shrubland and a small amount of evergreen forest in the Santa Anna Pueblo Forest, just north of Albuquerque, New Mexico (Sandoval County), and it encompasses a small stretch of the Rio Grande River.The sixth and final target area is in the southwestern corner of Idaho (Owyhee County) and consists entirely of shrublands.The county-wide average temperatures and total precipitation for 2019 at each target area are presented in Table 1.
[42] was utilized to identify the crops growing in each target area.The first target area is an almond plantation in the Central Valley of California (Kern County), just northwest of Bakersville.In addition to almonds, small amounts of grapes, alfalfa, pistachios, peanuts, and corn also fall within this Californian target area.The second target area is an agricultural region in the lower Colorado River Basin, south of Phoenix, Arizona (Pinal County), and it is composed mainly of alfalfa, cotton, and corn, surrounded by desert shrubland.The third target area is in the upper Colorado River basin in southwestern Colorado (Dolores County), and it consists almost entirely of evergreen forests.The fourth target area is northeast of Eugene, Oregon (Linn County), and it consists almost entirely of evergreen forests.The fifth target area consists mostly of shrubland and a small amount of evergreen forest in the Santa Anna Pueblo Forest, just north of Albuquerque, New Mexico (Sandoval County), and it encompasses a small stretch of the Rio Grande River.The sixth and final target area is in the southwestern corner of Idaho (Owyhee County) and consists entirely of shrublands.The county-wide average temperatures and total precipitation for 2019 at each target area are presented in Table 1.

Land Surface Temperature
The first check on the synthetic data was an examination of how well they reproduce LST in comparison to observed LST from Landsat.Target-area-wide spatial averages of synthetic LST and observed LST are shown in Figure 3.The synthetic LSTs followed the general annual cycle in surface temperatures observed for 2019 for the four southernmost target areas in Arizona, California, Colorado, and New Mexico.However, the synthetic LSTs underestimated the average maximum summer temperatures for each target area and overestimated the average winter and spring surface temperatures in the Arizona and Colorado target areas.This general estimation of summertime LST stems from the lasso

Land Surface Temperature
The first check on the synthetic data was an examination of how well they reproduce LST in comparison to observed LST from Landsat.Target-area-wide spatial averages of synthetic LST and observed LST are shown in Figure 3.The synthetic LSTs followed the general annual cycle in surface temperatures observed for 2019 for the four southernmost target areas in Arizona, California, Colorado, and New Mexico.However, the synthetic LSTs underestimated the average maximum summer temperatures for each target area and overestimated the average winter and spring surface temperatures in the Arizona and Colorado target areas.This general estimation of summertime LST stems from the lasso regression algorithm [43] used by CCDC that smooths under the summertime peaks in the observed Landsat surface brightness temperatures used to generate the synthetic LST estimates.Then, as expected, the synthetic LST did not capture temperature variations in relation to weather.For the Idaho and Oregon target areas, the synthetic LST did not fully capture the seasonal cycles observed in the Landsat data; however, this did not negatively affect the ETf and ETa estimates (discussed below).
Per-pixel comparisons between the annual averages for the six target areas (Figure 4) and their differences between their overall means (Table 2) showed that synthetic LST generally underestimated the observed LST from Landsat data.This underestimate in the synthetic LST was particularly evident in the California, New Mexico, and Oregon target areas.The Arizona target area showed two clear signals for each of its land cover types.The cropland in Arizona followed the typical trend of the synthetic LST data underestimating the observed LST, whereas it overestimated LST in the surrounding desert regions.Generally, for the Colorado target area, the synthetic LST closely matched the observed LST.However, on the eastern edge of the forested Colorado target area are two regions of dense forest, where the synthetic LST overestimated the observed LST, resulting in the cluster of outliers shown in Figure 4. Idaho stood out as being the only target area where the synthetic LST overestimated the observed LST.However, because ETf estimation depends on the difference between T s and T c (Equation ( 1)) and that exact LST, CCDC was still able to produce reliable ETf estimates (discussed below).The root mean squared errors (RMSE) between the CCDC and Landsat 2019 average LSTs are shown in Table 3.
Remote Sens. 2024, 16, 1297 7 of 18 regression algorithm [43] used by CCDC that smooths under the summertime peaks in the observed Landsat surface brightness temperatures used to generate the synthetic LST estimates.Then, as expected, the synthetic LST did not capture temperature variations in relation to weather.For the Idaho and Oregon target areas, the synthetic LST did not fully capture the seasonal cycles observed in the Landsat data; however, this did not negatively affect the ETf and ETa estimates (discussed below).Per-pixel comparisons between the annual averages for the six target areas (Figure 4) and their differences between their overall means (Table 2) showed that synthetic LST generally underestimated the observed LST from Landsat data.This underestimate in the synthetic LST was particularly evident in the California, New Mexico, and Oregon target areas.The Arizona target area showed two clear signals for each of its land cover types.The cropland in Arizona followed the typical trend of the synthetic LST data underestimating the observed LST, whereas it overestimated LST in the surrounding desert regions.Generally, for the Colorado target area, the synthetic LST closely matched the observed LST.However, on the eastern edge of the forested Colorado target area are two regions of dense forest, where the synthetic LST overestimated the observed LST, resulting in the cluster of outliers shown in Figure 4. Idaho stood out as being the only target area where the synthetic LST overestimated the observed LST.However, because ETf estimation depends on the difference between Ts and Tc (Equation ( 1)) and that exact LST, CCDC was still able to produce reliable ETf estimates (discussed below).The root mean squared errors (RMSE) between the CCDC and Landsat 2019 average LSTs are shown in Table 3.

Evapotranspiration Fraction
The calculation of ETf (Equation ( 1)) normalizes the LST data into a unitless metric, leading to a much closer match between the synthetic data and those of the observations from Landsat, and minimizes the effects of weather.The synthetic ETf generally matched the annual cycles across 2019 (Figure 5) and only slightly underestimated the ETf from Landsat, particularly for the summer months.Even at the Idaho target area, where the synthetic LST estimates did not closely match the observed LST, the synthetic ETf closely matched the Landsat ETf observations.At the forested target area in Oregon, in addition to underestimating the observed ETf there were also variations in the synthetic ETf that were not apparent in the observed ETf as the result of noise in the synthetic LST data (Figure 3).These underestimates in ETf appeared due to the steeper regression slope in the inverse relationship between the synthetic LST and NDVI data than was apparent in the observations, leading to an underestimation of T c (Equation ( 1)) in the synthetic data through the FANO approach [20].
Target-area-wide averages of ETf for 2019 (Figure 6) also showed that synthetic data can produce reasonable ETf estimates and minimize the differences between annual averages of the synthetic data and Landsat observations (Table 2).As shown in Figure 6, the synthetic ETf more closely matched the Landsat observations along the 1:1 line for the New Mexico and California target areas than it did for LST.For the forested Oregon target Remote Sens. 2024, 16, 1297 9 of 17 area, the synthetic ETf underestimated the Landsat observations while closely matching the observed ETf in the Colorado target area.In the Idaho target area, the bulk of the observations followed the 1:1 line; however, there were a considerable number of outliers where the synthetic ETf overestimated the Landsat observations.These outliers appeared to be associated with the steep terrain surrounding the rivers in the Idaho target area.The Arizona target area showed a good fit between the synthetic ETf and Landsat observations for the croplands and an overestimation for the desert regions within the target area.Finally, the magnitude of the RMSE between the synthetic ETf and observed ETf (Table 3) was considerably smaller than that for the LST, indicating that the synthetic ETf estimates were a much better match to the Landsat estimates than LST.
can produce reasonable ETf estimates and minimize the differences between annual av ages of the synthetic data and Landsat observations (Table 2).As shown in Figure 6, synthetic ETf more closely matched the Landsat observations along the 1:1 line for New Mexico and California target areas than it did for LST.For the forested Oregon tar area, the synthetic ETf underestimated the Landsat observations while closely match the observed ETf in the Colorado target area.In the Idaho target area, the bulk of the servations followed the 1:1 line; however, there were a considerable number of outl where the synthetic ETf overestimated the Landsat observations.These outliers appea to be associated with the steep terrain surrounding the rivers in the Idaho target area.Arizona target area showed a good fit between the synthetic ETf and Landsat observati for the croplands and an overestimation for the desert regions within the target area.nally, the magnitude of the RMSE between the synthetic ETf and observed ETf (Tabl was considerably smaller than that for the LST, indicating that the synthetic ETf estima were a much better match to the Landsat estimates than LST.

Actual Evapotranspiration
Side-by-side comparisons of synthetic ETa and observed ETa estimates from Landsat (Figure 7) showed that the synthetically derived data can accurately depict regions of high and low ETa.However, as was the case with LST and ETf, the synthetic ETa typically underestimated the observed ETa.The one exception to this general underestimation by the synthetic ETa data occurred in the desert areas of the Arizona target area, where the synthetic ETa overestimated the observed ETa (Figure 7 and Table 2), while still underestimating ETa in the agricultural areas.

Actual Evapotranspiration
Side-by-side comparisons of synthetic ETa and observed ETa estimates from Landsat (Figure 7) showed that the synthetically derived data can accurately depict regions of high and low ETa.However, as was the case with LST and ETf, the synthetic ETa typically underestimated the observed ETa.The one exception to this general underestimation by the synthetic ETa data occurred in the desert areas of the Arizona target area, where the synthetic ETa overestimated the observed ETa (Figure 7 and Table 2), while still underestimating ETa in the agricultural areas.Time series of target area averages (Figure 8) of synthetic and Landsat-derived ETa showed that the synthetic ETa from CCDC accurately depicted seasonal cycles at all six target areas.Additionally, because the ETf was multiplied by gridMet reference evapotranspiration (Equation ( 2)), the synthetic ETa depicted variations due to weather that Time series of target area averages (Figure 8) of synthetic and Landsat-derived ETa showed that the synthetic ETa from CCDC accurately depicted seasonal cycles at all six target areas.Additionally, because the ETf was multiplied by gridMet reference evapotranspiration (Equation ( 2)), the synthetic ETa depicted variations due to weather that were present in the Landsat ETa estimates.However, the synthetic ETa generally underestimated the observed ETa during June, July, and August, with the greatest differences in the Idaho and New Mexico target areas.The forested Colorado target area showed an underestimation by the synthetic ETa in the winter months at the start of 2019, likely the result of winter snow cover.The cropland target areas of California and Arizona showed the closest matches between the synthetic ETa and Landsat observations.The synthetic ETa for the forested Oregon target area closely matched the Landsat-derived ETa, unlike the evaporative fraction, due to the ETr values from gridMet.At the annual timescale, the 2019 averages of synthetic ETa generally matched those of the observed ETa (Figure 9); however, there was a slight underestimate by the synthetic ETa (Table 2).The synthetic ETa data performed best for the agricultural regions in the California and Arizona target areas.The synthetic ETa slightly underestimated the observed ETa for the shrubland target areas of Idaho and New Mexico.The underestimation of synthetic ETa was most apparent in the forested target area in Colorado and shrubland target area in New Mexico (Table 2).Finally, the overestimation by the synthetic ETa in the Arizona desert region can be seen in the lower left of the Arizona panel in Figure 9, based on the data averaged across all of 2019.

Discussion
Synthetic LST and NDVI values from CCDC led to reasonable synthetic ETa estimates at all six target areas for the non-cloud-contaminated pixels.Although CCDC can produce synthetic land cover data for cloud-contaminated pixels, the derived LST and ETa estimates were unreliable.Clouds decreased both LST and ETa by blocking incoming solar radiation and reducing net radiation.The CCDC algorithm does not account for the influence of clouds on net radiation.Thus, we elected to mask out the cloud-contaminated pixels in the synthetic datasets, as indicated by the data gaps in Figures 3, 5 and 8, to focus on the reliable cloud-free data.
These synthetic ETa estimates were more accurate than their corresponding synthetic LST estimates because SSEBop depends on the magnitude of the difference between the hot and cold pixels in each target area to calculate the ETf (Equation (1)) and not the absolute LST values.Additionally, Equation (1) normalized the synthetic LST data and Landsat LST data into the ETf, which ranges from 0 to 1, and further minimized the differences between the synthetic and Landsat data, as indicated by the decrease in RMSE between LST and ETf (Table 3).The accurate synthetic ETf data then led to accurate synthetic ETa estimates when multiplied by the reference evapotranspiration from gridMet (Equation ( 2)).
At sufficiently large temporal and spatial scales, differences between the average synthetic ETa and average Landsat ETa estimates were less than 1 mm/day.Although accurate at large spatial and temporal scales, these synthetic ETa estimates should not be used for daily weather forecasting or at small spatial scales for individual agricultural fields.The harmonic regression used by CCDC does not capture the variations in ETa that result from irrigation (Figure 10).At both the California and Arizona cropland target areas, irrigation of individual fields shortly before the Landsat flyover resulted in lower LST values, which led to an increase in ETa that was not depicted in the synthetic LST or ETa data.Similarly, fields that were not recently irrigated exhibited higher LSTs and lower ETa values than those predicted by the synthetic datasets from CCDC.However, these spatial variations from irrigation can be averaged across the entire 900 km 2 target area (Figure 8) to produce phenologically accurate ETa estimates across the entire year (Figure 9).Additionally, annual averages of the synthetic ETa data produced spatially consistent ETa estimates (Figure 7).The spatial anomalies between the synthetic ETa estimates and observed ETa were less apparent in the shrubland and forest land cover types than in the croplands, as no irrigation occurred in shrubland and forest land.Additionally, the accuracy of the synthetic ETa data had a seasonal component.Across all six target areas, the synthetic ETa values most closely matched the Landsat ETa estimates during the spring and fall, while peak ETa was underestimated during the summer months.Future work could focus on improving the harmonic CCDC model used for producing synthetic LST and NDVI data to better match summertime Landsat observations before eventually testing its ability to make sub-seasonal ETa forecasts.More accurate synthetic LST and NDVI data may result in a regression line that more closely resembles that in the observed data and allow for more accurate synthetic ETf estimates through FANO [20], ultimately leading to more accurate synthetic ETa data.To improve the accuracy of the synthetic summer and winter time LST and NDVI values, the synthetic harmonic regression could be rerun with an improved fitting algorithm that better captures the seasonal peaks and troughs in the Landsat data.Another modeling approach that could be applied to improve the synthetic ETa data is to begin with the minimum and maximum values as fixed parameters, and then estimate the sub-annual changes as damped oscillations between these extrema [44].Rerunning the harmonic regression on a shorter timeframe, as opposed to the LCMAP study period, may improve its ability to capture Future work could focus on improving the harmonic CCDC model used for producing synthetic LST and NDVI data to better match summertime Landsat observations before eventually testing its ability to make sub-seasonal ETa forecasts.More accurate synthetic LST and NDVI data may result in a regression line that more closely resembles that in the observed data and allow for more accurate synthetic ETf estimates through FANO [20], ultimately leading to more accurate synthetic ETa data.To improve the accuracy of the synthetic summer and winter time LST and NDVI values, the synthetic harmonic regression could be rerun with an improved fitting algorithm that better captures the seasonal peaks and troughs in the Landsat data.Another modeling approach that could be applied to improve the synthetic ETa data is to begin with the minimum and maximum values as fixed parameters, and then estimate the sub-annual changes as damped oscillations between these extrema [44].Rerunning the harmonic regression on a shorter timeframe, as opposed to the LCMAP study period, may improve its ability to capture peak summertime values and thus eliminate the underestimation of summer ETa.Another approach that could improve CCDC's ability to generate accurate synthetic data for SSEBop would be to run the harmonic regression on Landsat-derived LST data [45] to generate synthetic LST, as opposed to using the synthetic surface brightness temperatures to generate the synthetic LST values required for SSEBop.
It is important to note that this research has identified several potential problems that may arise when transitioning to an operations setting, in which a continuous stream of ETa estimates is provided to end users.One large issue is related to systematic bias in T s , T c , ETf, and/or ETa.In addition to potential improvements in the CCDC algorithm, as discussed above, future research could also explore more detailed, long-period (1980s to present) ETa estimates for a few key sites.A deeper historical record could enable more detailed evaluation of the causes of biases.The other main issue that has been raised in this paper is the potential for very large ETa errors when there is a mis-match between the irrigation status of farmland.To potentially address this issue, future research could explore the possibility of dynamically updating the CCDC data to more closely represent current conditions.
The work presented here illustrates the ability of synthetic land cover data from CCDC to generate accurate ETa estimates, paving the way for new approaches to generate nearterm ETa forecasts.Once the fitting algorithm has been improved, the harmonic regression model can be used to generate synthetic land surface data to produce SSEBop-derived ETa estimates between Landsat overpasses and validated against flux tower observations.An immediate application of the improved synthetic ETa estimates would be to combine them with real data to investigate irrigation practices in arid regions, because subtracting the real data from the synthetic data clearly highlights irrigation patterns (Figure 10).Then, to make sub-seasonal ETa predictions, the harmonic regression model could be combined with meteorological imputes from the Parameter-elevation Regressions on Independent Slopes Model (PRISM) [46] and extended six weeks into the future with a machine learning approach.Combining the improved harmonic regression model for detecting land change along with PRISM data through a machine learning approach may lead to high-quality land cover predictions that could ultimately result in quality ETa forecasts when coupled with SSEBop.

Conclusions
In conclusion, this study used CCDC data to generate synthetic Landsat data that were then used to initialize SSEBop.The CCDC-derived synthetic ETa estimates were validated against Landsat estimates at six target areas in the western United States, representing forests, croplands, and shrubland land cover types.Synthetic ETf estimates provided much closer matches to the Landsat data than those observed in LST because the ETf estimations depend on the relative distributions in LST rather than the absolute magnitude in LST, which substantially enhances performance.The synthetic ETa accurately depicted regions of high and low evapotranspiration and seasonal variations in ETa.However, the synthetic ETa typically underestimated peak summer ETa, while providing a much closer match to the Landsat ETa estimates in fall and spring.Additionally, the synthetic ETa does not reflect ETa variations due to changing irrigation practices from irrigation in agricultural areas.Future work could focus improving the harmonic fitting algorithm to better capture summertime ETa values and combine CCDC with meteorological data through machine learning to provide enhanced projections of ETa.Using CCDC to generate ETa forecasts may allow for enhanced drought prediction, allowing stakeholders and decision-makers to make more informed water management decisions in the increasingly water-stressed western United States.

Figure 1 .
Figure 1.The locations of the six target areas (white dots) in the western United States and their associated ecoregions.The Level I ecoregion data are available at ecologicalregions.info/htm/na_eco.htm(accessed on 23 February 2024).

Figure 1 .
Figure 1.The locations of the six target areas (white dots) in the western United States and their associated ecoregions.The Level I ecoregion data are available at ecologicalregions.info/htm/na_eco.htm(accessed on 23 February 2024).

Table 1 .
The county-wide mean temperatures for 2019, along with the mean annual temperature for the 1901-2000 period and the 2019 total precipitation for the six target areas.The 1901-2000 mean precipitation is the average of the yearly total precipitation amounts for 1901-2000.Climate data were courtesy of NOAA National Centers for Environmental information, Climate at a Glance: County Rankings, from: https://www.ncei.noaa.gov/access/monitoring/climate-at-aglance/county/rankings(accessed on 6 March 2024).

Figure 2 .
Figure 2. Red, green, and blue composite Landsat images of the six target areas.The Arizona target area (a) consists of irrigated croplands and desert.The California target area (b) consist mostly of irrigated almond plantations.Both the Colorado (c) and Oregon (d) target areas are largely forested.Then, both the Idaho (e) and New Mexico (f) target areas are shrubland.

Figure 2 .
Figure 2. Red, green, and blue composite Landsat images of the six target areas.The Arizona target area (a) consists of irrigated croplands and desert.The California target area (b) consist mostly of irrigated almond plantations.Both the Colorado (c) and Oregon (d) target areas are largely forested.Then, both the Idaho (e) and New Mexico (f) target areas are shrubland.

Figure 3 .
Figure 3. Target-area-wide averages of the Landsat (purple) and synthetic (cyan) land surface temperature (LST) estimates for 2019.The yellow difference line represents the Landsat data minus the synthetic data at each timestep.Note that the gaps in both the Landsat and synthetic data were the result of applying the same Landsat cloud mask to both datasets.

Figure 3 .Table 2 .
Figure 3. Target-area-wide averages of the Landsat (purple) and synthetic (cyan) land surface temperature (LST) estimates for 2019.The yellow difference line represents the Landsat data minus the synthetic data at each timestep.Note that the gaps in both the Landsat and synthetic data were the result of applying the same Landsat cloud mask to both datasets.Table 2.The differences between the 2019 target-area-wide averages in the synthetic Continuous Change Detection and Classification (CCDC) data and the Landsat observations for the land surface temperature (LST), evaporative fraction (ETf ), and actual evapotranspiration (ETa) at each target area.

Figure 4 .
Figure 4. Heat plots of synthetic land surface temperature (LST), derived from Continuous Chan Detection and Classification (CCDC) and Landsat LST estimations for 2019, showing a per-pi comparison for each target area.The blue 1:1 line represents the hypothetical perfect fit betw observed LST and synthetic LST.Note the logarithmic scale, with purple indicating high pi counts and orange indicating low pixel counts.Linear regression results are indicated by the values and are significant at p < 0.001.

Figure 4 .
Figure 4. Heat plots of synthetic land surface temperature (LST), derived from Continuous Change Detection and Classification (CCDC) and Landsat LST estimations for 2019, showing a per-pixel comparison for each target area.The blue 1:1 line represents the hypothetical perfect fit between observed LST and synthetic LST.Note the logarithmic scale, with purple indicating high pixel counts and orange indicating low pixel counts.Linear regression results are indicated by the R 2 values and are significant at p < 0.001.

Figure 5 .
Figure 5. Target-area-wide averages of the Landsat (purple) and synthetic (cyan) evaporative f tion (ETf) estimates for 2019.The yellow difference line represents the Landsat data minus the s thetic data at each timestep.Note that the gaps in both the Landsat and synthetic data were result of applying the same Landsat cloud mask to both datasets.

Figure 5 .
Figure 5. Target-area-wide averages of the Landsat (purple) and synthetic (cyan) evaporative fraction (ETf ) estimates for 2019.The yellow difference line represents the Landsat data minus the synthetic data at each timestep.Note that the gaps in both the Landsat and synthetic data were the result of applying the same Landsat cloud mask to both datasets.

Figure 6 .
Figure 6.Heat plots of synthetic evaporative fraction (ETf), derived from Continuous Change Detection and Classification (CCDC) and Landsat ETf estimations for 2019, showing a per-pixel comparison for each target area.The blue 1:1 line represents the hypothetical perfect fit between observed ETf and synthetic ETf.Note the logarithmic scale, with green indicating high pixel counts and brown indicating low pixel counts.Linear regression results are indicated by the R 2 values and are significant at p < 0.001.

Figure 6 .
Figure 6.Heat plots of synthetic evaporative fraction (ETf ), derived from Continuous Change Detection and Classification (CCDC) and Landsat ETf estimations for 2019, showing a per-pixel comparison for each target area.The blue 1:1 line represents the hypothetical perfect fit between observed ETf and synthetic ETf.Note the logarithmic scale, with green indicating high pixel counts and brown indicating low pixel counts.Linear regression results are indicated by the R 2 values and are significant at p < 0.001.Remote Sens. 2024, 16, 1297 11 of 18

Figure 7 .
Figure 7. Side-by-side comparisons of the synthetic actual evapotranspiration (ETa) data (left) derived from the Continuous Change Detection and Classification (CCDC) and the observed ETa from Landsat (right).The Arizona (top) data correspond to a Landsat overpass on 5 July 2019, and New Mexico (bottom) data are from an overpass on 30 June 2019.The location of each target area is indicated by the plus sign in the bottom left corner.

Figure 7 .
Figure 7. Side-by-side comparisons of the synthetic actual evapotranspiration (ETa) data (left) derived from the Continuous Change Detection and Classification (CCDC) and the observed ETa from Landsat (right).The Arizona (top) data correspond to a Landsat overpass on 5 July 2019, and New Mexico (bottom) data are from an overpass on 30 June 2019.The location of each target area is indicated by the plus sign in the bottom left corner.

Figure 8 .
Figure 8. Target-area-wide averages of the Landsat (purple) and synthetic (cyan) actual evapotranspiration (ETa) estimates for 2019.The yellow difference line represents the Landsat data minus the synthetic data at each timestep.Note that the gaps in both the Landsat and synthetic data were the result of applying the same Landsat cloud mask to both datasets.

Figure 8 .
Figure 8. Target-area-wide averages of the Landsat (purple) and synthetic (cyan) actual evapotranspiration (ETa) estimates for 2019.The yellow difference line represents the Landsat data minus the synthetic data at each timestep.Note that the gaps in both the Landsat and synthetic data were the result of applying the same Landsat cloud mask to both datasets.

Figure 8 .
Figure 8. Target-area-wide averages of the Landsat (purple) and synthetic (cyan) actual evapotranspiration (ETa) estimates for 2019.The yellow difference line represents the Landsat data minus the synthetic data at each timestep.Note that the gaps in both the Landsat and synthetic data were the result of applying the same Landsat cloud mask to both datasets.

Figure 9 .
Figure 9. Heat plots of synthetic actual evapotranspiration (ETa), derived from Continuous Change Detection and Classification (CCDC), and Landsat ETa estimations for 2019, showing a per-pixel comparison for each target area.The blue 1:1 line represents the hypothetical perfect fit between observed ETa and synthetic ETa.Note the logarithmic scale, with green indicating high pixel counts

Figure 9 .
Figure 9. Heat plots of synthetic actual evapotranspiration (ETa), derived from Continuous Change Detection and Classification (CCDC), and Landsat ETa estimations for 2019, showing a per-pixel comparison for each target area.The blue 1:1 line represents the hypothetical perfect fit between observed ETa and synthetic ETa.Note the logarithmic scale, with green indicating high pixel counts and purple indicating low pixel counts.Linear regression results are indicated by the R 2 values and are significant at p < 0.001.

Figure 10 .
Figure 10.Actual evapotranspiration (ETa) bias map showing the synthetic ETa estimates minus the Landsat ETa estimates for 21 July 2019, showing the large amount of spatial variation due to irrigation.The synthetic data underestimated ETa for fields that had recently been irrigated, and high ETa in the Landsat data (blue) and overestimated ETa (red) in agricultural fields that had not been irrigated recently.

Figure 10 .
Figure 10.Actual evapotranspiration (ETa) bias map showing the synthetic ETa estimates minus the Landsat ETa estimates for 21 July 2019, showing the large amount of spatial variation due to irrigation.The synthetic data underestimated ETa for fields that had recently been irrigated, and high ETa in the Landsat data (blue) and overestimated ETa (red) in agricultural fields that had not been irrigated recently.

Table 1 .
The county-wide mean temperatures for 2019, along with the mean annual temperature for the 1901-2000 period and the 2019 total precipitation for the six target areas.The 1901-2000 mean precipitation is the average of the yearly total precipitation amounts for 1901-2000.Climate data were courtesy of NOAA National Centers for Environmental information, Climate at a Glance: County Rankings, from: https://www.ncei.noaa.gov/access/monitoring/climate-at-aglance/county/rankings(accessed on 6 March 2024).

Table 3 .
Root mean squared error (RMSE) between the synthetic Continuous Change Detection and Classification (CCDC) data and Landsat observations for the 2019 averages of land surface temperature (LST), evaporative fraction (ETf ), and actual evapotranspiration (ETa).

Table 2 .
The differences between the 2019 target-area-wide averages in the synthetic Continuo Change Detection and Classification (CCDC) data and the Landsat observations for the land surf temperature (LST), evaporative fraction (ETf), and actual evapotranspiration (ETa) at each tar area.

Table 3 .
Root mean squared error (RMSE) between the synthetic Continuous Change Detection a Classification (CCDC) data and Landsat observations for the 2019 averages of land surface temp ature (LST), evaporative fraction (ETf), and actual evapotranspiration (ETa).