Characterizing Land Surface Phenology and Exotic Annual Grasses in Dryland Ecosystems Using Landsat and Sentinel-2 Data in Harmony

: Invasive annual grasses, such as cheatgrass (Bromus tectorum L.), have proliferated in dryland ecosystems of the western United States, promoting increased ﬁre activity and reduced biodiversity that can be detrimental to socio-environmental systems. Monitoring exotic annual grass cover and dynamics over large areas requires the use of remote sensing that can support early detection and rapid response initiatives. However, few studies have leveraged remote sensing technologies and computing frameworks capable of providing rangeland managers with maps of exotic annual grass cover at relatively high spatiotemporal resolutions and near real-time latencies. Here, we developed a system for automated mapping of invasive annual grass (%) cover using in situ observations, harmonized Landsat and Sentinel-2 (HLS) data, maps of biophysical variables, and machine learning techniques. A robust and automated cloud, cloud shadow, water, and snow / ice masking procedure (mean overall accuracy > 81%) was implemented using time-series outlier detection and data mining techniques prior to spatiotemporal interpolation of HLS data via regression tree models ( r = 0.94; mean absolute error (MAE) = 0.02). Weekly, cloud-free normalized di ﬀ erence vegetation index (NDVI) image composites (2016–2018) were used to construct a suite of spectral and phenological metrics (e.g., start and end of season dates), consistent with information derived from Moderate Resolution Image Spectroradiometer (MODIS) data. These metrics were incorporated into a data mining framework that accurately ( r = 0.83; MAE = 11) modeled and mapped exotic annual grass (%) cover throughout dryland ecosystems in the western United States at a native, 30-m spatial resolution. Our results show that inclusion of weekly HLS time-series data and derived indicators improves our ability to map exotic annual grass cover, as compared to distribution models that use MODIS products or monthly, seasonal, or annual HLS composites as primary inputs. This research ﬁlls a critical gap in our ability to e ﬀ ectively assess, manage, and monitor drylands by providing a framework that allows for an accurate and timely depiction of land surface phenology and exotic annual grass cover at spatial and temporal resolutions that are meaningful to local resource managers.


Introduction
Increasing spread of invasive vegetation, such as cheatgrass (Bromus tectorum L.), can contribute to ecosystem degradation and increased disturbance frequency within dryland ecosystems that cover roughly  [28] and Bureau of Land Management Assessment, Inventory, and Monitoring (AIM) plots (2016)(2017). Black lines indicate state boundaries and inset shows Sentinel-2 tiles and drylands in the United States. Drylands are defined by an aridity index [29], where the average ratio of annual precipitation to potential evapotranspiration (1970 to 2000) is between 0.03 and 0.5.
We developed an automated masking procedure leveraging HLS imagery and decision tree classification models because of known issues with the standard (v1.4) HLS masking layers. Training examples needed for decision tree classifiers were harvested using quality assurance (QA) band information and spectral thresholding for each Landsat and Sentinel-2 scene. Pixels identified to be clouds/shadows/water/snow/ice in the QA bands and those with a Normalized Difference Wetness Index (NDWI) ≤0.05 were given a value of 0. Conversely, pixels with high-quality information and NDWI values >0.05 were give a value of 1 because qualitative analysis indicates this to be a reasonable threshold for improving shadow and water detection. Coincident spectral information (all available bands) was extracted for randomly sampled pixel locations (n = 2000) within five Normalized Difference Vegetation Index (NDVI) clusters (natural breaks) in each scene. We used a time-series outlier filtering approach to further curate the model training databases because of identified omission errors in the standard HLS masks, where we removed NDWI values that exceeded ±25% of the median coincident seasonal value, with seasons being defined as five time periods (i.e., day of year ≤180; 180 to 200; 200 to 220; 220 to 240; ≥240).
Binary, decision classification models were developed using the open-source version (GNU) of the See5 software algorithm [30] and optimal hyperparameters (i.e., number of cases at each leaf node, tree size, and pruning confidence) were determined using f-fold cross validation. After model calibration, the models were used to classify pixels within each image. Manual-image interpretations (n = 2600) of non-clear and clear pixels were made and randomly distributed throughout space and time for an independent validation of the classified images, with an additional target sample (n = 120) within areas where there was disagreement between standard HLS mask and decision tree model outputs. We report overall, user, and producer accuracies within confusion matrices, using estimators for stratified (standard HLS mask) and simple (decision tree mask) random sampling, to depict the robustness of the classifiers on harmonized Landsat-8 and Sentinel-2 imagery [31]. The total number  [28] and Bureau of Land Management Assessment, Inventory, and Monitoring (AIM) plots (2016)(2017). Black lines indicate state boundaries and inset shows Sentinel-2 tiles and drylands in the United States. Drylands are defined by an aridity index [29], where the average ratio of annual precipitation to potential evapotranspiration (1970 to 2000) is between 0.03 and 0.5.
We developed an automated masking procedure leveraging HLS imagery and decision tree classification models because of known issues with the standard (v1.4) HLS masking layers. Training examples needed for decision tree classifiers were harvested using quality assurance (QA) band information and spectral thresholding for each Landsat and Sentinel-2 scene. Pixels identified to be clouds/shadows/water/snow/ice in the QA bands and those with a Normalized Difference Wetness Index (NDWI) ≤0.05 were given a value of 0. Conversely, pixels with high-quality information and NDWI values >0.05 were give a value of 1 because qualitative analysis indicates this to be a reasonable threshold for improving shadow and water detection. Coincident spectral information (all available bands) was extracted for randomly sampled pixel locations (n = 2000) within five Normalized Difference Vegetation Index (NDVI) clusters (natural breaks) in each scene. We used a time-series outlier filtering approach to further curate the model training databases because of identified omission errors in the standard HLS masks, where we removed NDWI values that exceeded ±25% of the median coincident seasonal value, with seasons being defined as five time periods (i.e., day of year ≤180; 180 to 200; 200 to 220; 220 to 240; ≥240).
Binary, decision classification models were developed using the open-source version (GNU) of the See5 software algorithm [30] and optimal hyperparameters (i.e., number of cases at each leaf node, tree size, and pruning confidence) were determined using f -fold cross validation. After model calibration, the models were used to classify pixels within each image. Manual-image interpretations (n = 2600) of non-clear and clear pixels were made and randomly distributed throughout space and time for an independent validation of the classified images, with an additional target sample (n = 120) within areas where there was disagreement between standard HLS mask and decision tree model outputs. We report overall, user, and producer accuracies within confusion matrices, using estimators for stratified (standard HLS mask) and simple (decision tree mask) random sampling, to depict the robustness of the classifiers on harmonized Landsat-8 and Sentinel-2 imagery [31]. The total number of valid HLS observations at each pixel in the study area is shown in Figure 1 and is a useful tool for error diagnostics.

Automated Time-Series Modeling and Metrics
We used a regression tree modeling framework (similar to [32]) for the spatial estimation of weekly NDVI values derived from HLS data acquired from 2016 to 2018. This framework relies on: (1) the generation of spatial inputs (e.g., weekly HLS composites, week of year, and year raster data) to serve as covariates within regression tree models; (2) extraction of model tuning, training, and testing data from raw NDVI image composites; (3) hyperparameter optimization (i.e., number of rules and committees); and (4) application of calibrated and validated models to predict NDVI at each pixel and time step (e.g., weekly), as described in more detail below.
As a part of the workflow, we developed weekly NDVI composites using a pixel-based imaging compositing method to remove highly correlated data within the same week and to impute missing values using clear HLS data. Here, missing values were imputed using a weighted sum approach like [33], where a combined time and confidence weight (TC t ) was calculated as: where d is the day of year the target scene t was acquired, n is the day of year the adjacent scenes (2-3 week search window) were acquired, and q is the likelihood of an unobstructed pixel as derived from the See5 mask confidence, such that q is near 1 and 0 for clear and non-clear pixels, respectively. This approach differs slightly from [32] because of data availability issues present in our larger spatiotemporal domain. An NDVI composite was then constructed as: where NDVI t is the NDVI composite image for scene t, N is the number of scenes combined from adjacent weeks (2-3 weeks), and TC t is the weights for t described above. We then binned each scene into weeks and took the median pixel values to develop a weekly time-series dataset, herein denoted as HLS int , that served as spatial inputs into regression tree models. Regression tree model training samples (n = 25,000) were harvested from the raw, weekly composites using a stratified sampling design and NDVI clustering with five natural breaks for each composite. Raw HLS data served as the dependent variable in the regression tree models, and HLS int , week of year, and year of data served as predictor variables. Predictor NDVI values were replaced with a weekly global mean when the corresponding data were used as a reference, to prevent data leakage and force the model to make predictions based on other inputs. Regression tree models were trained and validated using time-series data from over 8 M and 2 M independent locations, respectively, where individual models were developed for 6 tiles using an overlapping-grid design to overcome computational hurdles and to mitigate spatial autocorrelation affects.
We compared weekly HLS int , regression tree predicted (HLS RT ), and expediated MODIS [34] (eMODIS) data via cross-correlation analysis to assess similarities between each of the annual time series. We report model accuracy and comparison statistics (i.e., Pearson correlation coefficient (r), root mean square rrror (RMSE)) for the entire study domain, using 5-km grid summaries, and by level III ecoregions (i.e., Blue Mountains, Eastern Cascades Slopes and Foothills, Cascades, Sierra Nevada, Central Basin and Range, Idaho Batholith, Sierra Nevada, Columbia Plateau, Central and Northern Basin and Ranges, Central California Foothills and Coastal Mountains, and Snake River Plain [28]). For ecoregion level comparisons of the various time-series data, we summarized NDVI values within homogenous areas (coefficient of variation <20%), as calculated from standard deviation and mean focal scans (9 × 9 pixels) of the average 30 m growing season NDVI, to align with the scale of the Remote Sens. 2020, 12, 725 5 of 17 weekly eMODIS products. Comparisons with eMODIS data allow for an additional independent assessment of estimated seasonal variations in NDVI and phenological transition dates.
A suite of time-series and phenological metrics was generated from the weekly HLS RT and eMODIS time-series datasets to help discriminate exotic from native vegetation and for regional comparisons. Common vegetation phenological metrics (e.g., date and NDVI value at onset of greenness, date and value of brown down, date and value of peak NDVI) were developed from individual years of weekly NDVI data (week 7-week 42) using the CropPhenology package in R [35], as average timings and indices of the absolute maximum and minimum values are useful for cheatgrass mapping [24]. Derived phenological metrics were not compared to phenocam observations because the goal was to solely develop metrics that are useful for regional exotic annual grass mapping. We compiled biophysical (i.e., Natural Resources Conservation Service Gridded Soil Survey Geographic (gSSURGO) [36] soil texture fraction and organic matter content in the first 30 cm of the soil; National Elevation Dataset (NED) [37]), and spectral data (Section 2.2) to serve as covariates within our predictive models because such factors are known to control or depict cheatgrass cover dynamics [11]. Spectral data were extracted coincident with the timing of the field observations (e.g., 2016 predictors extracted for 2016 in situ observations) because exotic annual grass cover can vary tenfold between wet and dry years [38,39] such as those evaluated in our study area ( Figure S1).
We normalized the distribution of observations by percentage cover using a boot-strapping approach, where we reshuffled the model database while undersampling (retaining) those plots with zero (greater than zero) coverage of exotic annual grass cover, because of the prevalence of zero percent exotic annual grass cover observations in the field database. We then developed parsimonious regression tree models (n = 10) that were optimized using f -fold cross validation and the resampled databases from 2016-2017. Regression tree ensembles were constructed using an open-source version (GNU) of Cubist software and relevant environmental covariates as determined using tree-based feature selection algorithms and importance weights in sequential models. We performed a sensitivity analysis using different sets of NDVI composites (i.e., week + phenometrics, month, season, annual) to identify the optimal number and frequency of remote sensing inputs. We reported cross validation errors for each model and standard accuracy metrics across the study domain. The calibrated distribution models were then used to predict cheatgrass fractional cover for each permutation. Finally, we computed the pixel-wise median, standard deviation, and 2.5 and 97.5 percentiles of annual grass cover among the suite of permutations. This yielded both a best estimate (median) and 95% confidence interval for exotic annual grass cover at 30 m resolution for the study domain for all 3 years and the resulting maps were validated using independent observations (n = 112) distributed throughout the spatiotemporal domain.

Image Masking and Validation
Manual image interpretations and classification tree output comparisons were favorable (Table 1), with mean overall accuracies of 85 ± 2% and 81 ± 2% for Landsat-8 and Sentinel-2 imagery, respectively. Standard HLS masks had slightly lower overall accuracies of 83% ± 2% and 78% ± 2% for Landsat-8 and Sentinel-2 imagery, respectively (Table 1). Mean overall accuracies within areas of model disagreement suggest a 50% overall increase in model accuracies when comparing independent interpretations to decision tree models (overall accuracy = 77 ± 7%) and standard HLS (38 ± 9%) masks. While a limited amount of independent test data was used in this comparison, this demonstrates that optimized, semi-supervised classification models significantly enhance our ability to distinguish clear and non-clear data within our study area.

Time-Series Modeling and Metrics
The use of regression tree ensembles for spatiotemporal interpolation of spectral information is straightforward to implement and results in very high accuracies from the perspective of hold-out spectral data (Table S1). A paucity of observations, specifically long-term data gaps, result in increased error rates when using simpler smoothing methods, as compared to estimates made by complex regression tree models, which strongly influence the ultimate determination of phenological metrics and fractional exotic annual grass cover at a pixel ( Figure 2). In dryland ecosystems, inter-annual variability of vegetation productivity distinct from land cover change is prominent in rangelands dominated by cheatgrass because of its link to rainfall [26,40]. Overall, we found stronger correlations between HLS RT and eMODIS data as compared to eMODIS and HLS int data ( Table 2) generated using per-pixel compositing methods that are often used to construct cloud-free time-series data for input into models of land surface phenology or land cover change (e.g., [33,41]).
Cross-correlation analysis of eMODIS and HLS RT time-series data revealed good agreement between the datasets across evergreen forests and rangelands within the study area ( Figure 3; average r = 0.74 ± 0.21; n =~36,000), despite differences in how snow was handled in each time series (i.e., masked out in the HLS time series), with an average lag of 1 week between the time series (eMODIS lagging HLS data). Inclusion of snow-impacted pixels in MODIS time series can adversely impact start of season estimates within coniferous forests that typically have little inter-annual variation in NDVI [42] and are commonly found in western ecoregions of the study area (e.g., Cascades, Sierra Nevada), hence the reason for the exclusion of snow in the HLS dataset. Correspondingly, cross-correlation analysis indicates better agreement for years with relatively less precipitation (i.e., 2016 and 2018; Figure S1), despite the lack of Sentinel-2B observations in 2016, and within homogeneous rangelands less impacted by snow (r = 0.91; Table 2). For comparison, Robinson et al. [43] used a per-pixel and climatology-driven approach to produce 16-day Landsat composites and found a lower correspondence with MODIS data within homogenous evergreen forests and shrublands (average r =~0.65) within the conterminous United States . Enhanced estimates of inter-annual NDVIs in the current study are mainly attributed to shorter revisit times of HLS data, the robust regression tree compositing and masking approach used here, and/or the known spatial inaccuracies of climatology data previously used for aiding interpolations of Landsat NDVI data. Cross-correlation analysis of eMODIS and HLSRT time-series data revealed good agreement between the datasets across evergreen forests and rangelands within the study area (Figure 3; average r = 0.74 ± 0.21; n = ~36,000), despite differences in how snow was handled in each time series (i.e., masked out in the HLS time series), with an average lag of 1 week between the time series (eMODIS lagging HLS data). Inclusion of snow-impacted pixels in MODIS time series can adversely impact start of season estimates within coniferous forests that typically have little inter-annual variation in NDVI [42] and are commonly found in western ecoregions of the study area (e.g., Cascades, Sierra Nevada), hence the reason for the exclusion of snow in the HLS dataset. Correspondingly, crosscorrelation analysis indicates better agreement for years with relatively less precipitation (i.e., 2016 and 2018; Figure S1), despite the lack of Sentinel-2B observations in 2016, and within homogeneous rangelands less impacted by snow (r = 0.91; Table 2). For comparison, Robinson et al. [43] used a per-  Table 2. Cross-correlation (Pearson's r) between eMODIS, regression tree estimated (HLS RT ), and interpolated harmonized Landsat and Sentinel-2 (HLS int ) data within rangelands stratified by level III ecoregions [28]. Bold values indicate r's greater than adjacent r values. Ecoregions substantially impacted by snowfall (i.e., Cascades, Sierra Nevada) are excluded from overall summaries. pixel and climatology-driven approach to produce 16-day Landsat composites and found a lower correspondence with MODIS data within homogenous evergreen forests and shrublands (average r = ~0.65) within the conterminous United States . Enhanced estimates of inter-annual NDVIs in the current study are mainly attributed to shorter revisit times of HLS data, the robust regression tree compositing and masking approach used here, and/or the known spatial inaccuracies of climatology data previously used for aiding interpolations of Landsat NDVI data.  Table 2. Cross-correlation (Pearson's r) between eMODIS, regression tree estimated (HLSRT), and interpolated harmonized Landsat and Sentinel-2 (HLSint) data within rangelands stratified by level III ecoregions [28]. Bold values indicate r's greater than adjacent r values. Ecoregions substantially impacted by snowfall (i.e., Cascades, Sierra Nevada) are excluded from overall summaries. Robust estimates of inter-annual variations in NDVI are a prerequisite for estimating phenological transition dates (phenophases), such as the start of the growing season. Comparisons between phenometrics derived from HLS RT and eMODIS data suggest further agreement between seasonal trajectories of NDVI across the spatial domain ( Figure 4). For example, estimated mean start of season, maximum, and end of season NDVI values show good agreement with eMODIS-derived data, with Pearson's correlation, and mean absolute errors (MAEs) ranging from 0.72 to 0.98 and from 0.02 to 0.08, respectively. Estimated timings followed similar patterns, with HLS-derived metrics generally indicating earlier start of season, maximum, and end of season timings as compared to eMODIS which is generally consistent with previous interpretations [32]. For comparison, Zhou et al. [44] used a weighted least-squares approach for temporal smoothing of HLS data and a delayed moving average technique to estimate the start of season date in grasslands along the eastern border of North Dakota and western Minnesota, with much higher data densities as compared to this study, and found fair agreement with MODIS-derived date estimates (RMSE = 11.6 days).   Phenometrics are summarized for homogeneous pixels (coefficient of variation <20%), as calculated from standard deviation and mean focal scans (9 × 9 pixels) of the average 30-m growing season NDVI, as estimated from regression tree models. Pearson's correlation coefficient (r) and mean absolute error (MAE) are shown for all years combined. Data were fit using ordinary least-squares regression (dashed line) and the 1:1 line is shown as a solid black line. NDVI is scaled (100× + 100) to an 8-bit unsigned integer.

Exotic Annual Grass Cover Modeling, Mapping, and Validation
Our automated workflow produced exotic annual grass (%) cover maps (2016-2018) and associated estimates of model variance at 30-m spatial resolution across the study domain ( Figure 5). Training and testing Pearson's correlation coefficient (0.83 and 0.64), MAE (11% and 14% cover), and relative MAE (rMAE; 0.50 and 0.70) indicate fair agreement between modeled and observed percent cover values ( Figure 6). These errors are either substantially lower or equivalent to those found in similar mapping efforts. For example, Jones et al. [15] generated estimates of annual, fractional forb, and grass cover using a random forest model that integrated some 30,000 field observations, seasonal Landsat image composites (1984 to 2017), and maps of abiotic and climatic factors and reported lower correlations between modeled and observed values when using out-of-bag summaries (r = 0.70) and independent test data (r = 0.44). Likewise, Xian and colleagues [16] used a regression tree model and Landsat Operational Land Imager (OLI) mosaics and reported fair (r = 0.56) agreement between independent field observations and model estimates of annual herbaceous cover. Similar levels of agreement (r = 0.50) were found when comparing 250-m annual invasive herbaceous cover maps, as derived from a regression tree model that used remotely sensed estimates of invasive annual herbaceous cover as model training data and MODIS data as a predictor, with six years of independent AIM plot data, despite large discrepancies in the scale of the AIM plots and spatial inputs [18]. Our automated workflow produced exotic annual grass (%) cover maps (2016-2018) and associated estimates of model variance at 30-m spatial resolution across the study domain ( Figure 5). Training and testing Pearson's correlation coefficient (0.83 and 0.64), MAE (11% and 14% cover), and relative MAE (rMAE; 0.50 and 0.70) indicate fair agreement between modeled and observed percent cover values (Figure 6). These errors are either substantially lower or equivalent to those found in similar mapping efforts. For example, Jones et al. [15] generated estimates of annual, fractional forb, and grass cover using a random forest model that integrated some 30,000 field observations, seasonal Landsat image composites (1984 to 2017), and maps of abiotic and climatic factors and reported lower correlations between modeled and observed values when using out-of-bag summaries (r = 0.70) and independent test data (r = 0.44). Likewise, Xian and colleagues [16] used a regression tree model and Landsat Operational Land Imager (OLI) mosaics and reported fair (r = 0.56) agreement between independent field observations and model estimates of annual herbaceous cover. Similar levels of agreement (r = 0.50) were found when comparing 250-m annual invasive herbaceous cover maps, as derived from a regression tree model that used remotely sensed estimates of invasive annual herbaceous cover as model training data and MODIS data as a predictor, with six years of independent AIM plot data, despite large discrepancies in the scale of the AIM plots and spatial inputs [18].  Model variances were significantly associated with plotted and mapped cover values in our study (p-value < 0.01; r = 0.61; slope = 0.06; Figure 5), and partially attributed to positively skewed observation distributions [15], despite the data normalization and boot-strap approach used here. However, shuffling the model training data while undersampling the lower bounds of the data had a bias-reducing effect as compared to nonparametric modeling schemes that did not assign weights to rarer observations (e.g., [15,46]). Similarly, Henderson and colleagues [14] used a probability-based threshold and linear correction procedure in a univariate random forest modeling scheme, which required accurate and unbiased presence-absence/probability models developed for a user-defined threshold of cover, to construct relatively unbiased estimates of annual invasive grass cover in a portion of the Great Basin. Model variances were significantly associated with plotted and mapped cover values in our study (p-value < 0.01; r = 0.61; slope = 0.06; Figure 5), and partially attributed to positively skewed observation distributions [15], despite the data normalization and boot-strap approach used here. However, shuffling the model training data while undersampling the lower bounds of the data had a bias-reducing effect as compared to nonparametric modeling schemes that did not assign weights to rarer observations (e.g., [15,46]). Similarly, Henderson and colleagues [14] used a probability-based threshold and linear correction procedure in a univariate random forest modeling scheme, which required accurate and unbiased presence-absence/probability models developed for a user-defined threshold of cover, to construct relatively unbiased estimates of annual invasive grass cover in a portion of the Great Basin.
The robustness of our distribution model was largely contingent upon the number and quality of the ground truth data, which should be collected at scales congruent with remote sensing inputs, and the predictive capacity of the environmental covariates. Sensitivity analysis revealed that models which made use of weekly NDVI composites and phenometrics significantly outperformed those that made use of monthly, seasonal, or annual image composites and ancillary data (Figure 7). This finding suggests that the temporal granularity of the remote sensing inputs significantly controls the accuracy of the species distribution models, with the timing of key phenological transitions (e.g., start of green-up and brown-down) and associated weekly spectral values being heavily used across model development (Table S2). The robustness of our distribution model was largely contingent upon the number and quality of the ground truth data, which should be collected at scales congruent with remote sensing inputs, and the predictive capacity of the environmental covariates. Sensitivity analysis revealed that models which made use of weekly NDVI composites and phenometrics significantly outperformed those that made use of monthly, seasonal, or annual image composites and ancillary data (Figure 7). This finding suggests that the temporal granularity of the remote sensing inputs significantly controls the accuracy of the species distribution models, with the timing of key phenological transitions (e.g., start of green-up and brown-down) and associated weekly spectral values being heavily used across model development (Table S2).

Time Series Modeling, Metrics, and Image Masking
Remote sensing of vegetation productivity and land surface phenology in dryland ecosystems can be immensely challenging due to pervasive cloudiness during spring months, high amounts of bare ground and biological soil crusts, high spatial heterogeneity, low vegetation signal-to-noise ratios, and irregular growing seasons related to seasonal rainfall and frequent periods of drought [47]. To circumvent data availability and scaling issues, we developed a data fusion approach that leveraged HLS data (i.e., Sentinel-2A and B, Landsat 8) and machine-learning techniques to construct weekly, cloud-free NDVI image composites for an accurate representation of seasonal cycles of vegetation greenness and estimates of phenological transition dates. Identification of clear and non-clear pixel data (i.e., data quality screening) is a prerequisite for any satellite-based remote sensing analysis and the use of optimized decision tree models resulted in improved accuracies as compared to standard HLS masks in this study (Table 1). We suspect the standard HLS masks represent a more global solution based on a physics-based pixel quality identification while our semi-supervised approach allows for scene and date-specific tuning for improved cloud, shadow, water, and snow/ice identification. It is important to note, however, that cloud and cloud shadow detection algorithms were recently improved for Landsat and Sentinel-2 data (Fmask 4.0; [48]) and such data can easily be assimilated into our workflow in the future.
We demonstrate that our regression tree compositing method is more robust than per-pixel approaches that are typically used to construct time-series data for input into models of land surface phenology or land cover change ( Figure 2 and Table 2). A key advantage of using a global versus pixel-based modeling scheme is the ability to leverage data across the spatiotemporal domain, as opposed strictly to the temporal domain, to make predictions at each pixel and time step. While our approach was mainly developed for monitoring precipitation-driven ecosystems with abrupt and flashy growth cycles, it is transferable to more temperature-driven ecosystems where vegetation synchronizes its growth with relatively consistent seasonal climatic changes using environmental cues (e.g., temperature, photoperiod). Regardless, when insufficient high-quality data are available for training during the beginning and end of the growing season, or when there are high levels of undetected noise in the data (e.g., due to inaccurate masking algorithms), the resulting function fits can still be unrealistic and may require further processing or filtering (e.g., Savitzky-Golay filter, Hampel filter) to diminish noise or artifacts. Since the advancement of remote sensing of land surface phenology is vital for better understanding drylands [49], future research should explore the benefits of using an ensemble of methodologies and spectral indices (e.g., Soil-Adjusted Vegetation Index) to characterize phenological timings (e.g., [50]), leverage residuals from curve fitting procedures to calculate measures of statistical uncertainty in the phenophases dates and associated spectral values, and make comparisons with ground-based optical measurements acquired at scales commensurate with remote sensing inputs.

Exotic Annual Grass Cover Modeling, Mapping, and Validation
Our automated workflow produced maps of exotic annual grass (%) cover associated estimates of model variance throughout dryland ecosystems of the western United States at a native, 30-m spatial resolution. Comparisons with independent field observations and error assessments reported in previous studies suggest improved accuracies that can be partially attributed to: (1) the enhanced temporal revisit times and resolutions of the HLS dataset, which allowed for the development of nearly seamless weekly NDVI data ( Figure 7); (2) consistent usage of sampling protocols across the model testing and training datasets; and/or (3) the use of other relevant ancillary data related to exotic annual grass cover (i.e., phenometrics, soils, topography). Moreover, while there are existing systems that use multi-date satellite imagery to develop fractional cover maps of annual invasive herbaceous cover [19,21,51], such approaches are generally not suited for developing near real-time estimates. These approaches, which are subject to data latency issues or a lack of an automated workflow, have made use of coarse spatial-resolution inputs (e.g., eMODIS), downscaled MODIS products, or composites constructed from Landsat data with coarse temporal granularity (i.e., 16 day), and have not capitalized on the synergistic use of seamless Landsat-8 and Sentinel-2 data for improved characterization of invasive annual grass dynamics.
With regards to scaling issues and sampling errors, we opportunistically used BLM AIM plot data that were not optimally aligned with the scale of the remote sensing inputs and were collected using point-intercept sampling rather than a complete enumeration/inventory with no sampling errors. To circumvent scaling issues, we initially gave higher weights to observations that fell within relatively homogenous areas, as identified by calculating the coefficient of variation of growing season NDVI data within a 3 × 3 moving window, but this did not improve model accuracies from the perspective of hold-out data. Likewise, previous research has shown that it can be important to train machine-learning models on mixed pixels that can occupy a large portion of the spatiotemporal domain [52]. Future studies may benefit by tackling scaling issues in other ways; for example, by upscaling more localized in situ observations (with no sampling errors) using high-resolution imagery collected via suborbital (e.g., unmanned aerial vehicle (UAV)) or orbital (e.g., CubeSat, DigitalGlobe) sensors, and then using those estimates as training data within distribution models that use moderate-resolution imagery inputs (e.g., [48]) while also tracking uncertainties throughout the entire prediction process.

Conclusions
In this study, we developed a fully automated and scalable workflow that integrates field observations, harmonized Landsat and Sentinel-2 (HLS) data, maps of landscape conditions (i.e., soils, topography), and machine-learning techniques to construct a time series of fractional estimates of exotic annual grass cover and associated uncertainties across dryland ecoregions in the western United States [53]. In doing so, we also show that Landsat 8 and Sentinel-2 data can be used in harmony to generate weekly cloud-free image composites of NDVI data and estimates of phenological transition dates comparable to those derived from eMODIS data but at scales (30 m) more relevant to local resource managers. The development and public release of spatially explicit maps of exotic annual grass (%) cover and weekly NDVI can be important assets to land managers and scientists. This workflow and derived information will be immensely useful for numerous science activities and investigations such as monitoring and modeling historical and future land-surface dynamics and biophysical conditions (e.g., biomass, drought), detecting and analyzing fuel breaks in sagebrush ecosystems [54], controlling and quantifying fire behavior, targeting grazing, mapping other rangeland components (e.g., % bare ground, shrubs, herbs), and enhancing aerial herbicide applications. Timely, accurate, and spatially detailed depictions of land surface phenology and exotic annual grass cover serve as foundations for better understanding and managing dryland systems in the face of a changing climate and disturbance regimes.
Author Contributions: N.J.P. and B.K.W. conceived and designed the study. N.J.P. wrote the original manuscript. D.D. and S.P. were responsible for automating a large portion of the data processing workflow with input from N.J.P. and B.K.W, and lastly, S.P.B. and Z.W. contributed to scientific writing and interpretations. All authors have read and agree to the published version of the manuscript.