Spatiotemporal Analysis of Landsat-8 and Sentinel-2 Data to Support Monitoring of Dryland Ecosystems

Drylands are the habitat and source of livelihood for about two fifths of the world’s population and are highly susceptible to climate and anthropogenic change. To understand the vulnerability of drylands to changing environmental conditions, land managers need to effectively monitor rates of past change and remote sensing offers a cost-effective means to assess and manage these vast landscapes. Here, we present a novel approach to accurately monitor land-surface phenology in drylands of the Western United States using a regression tree modeling framework that combined information collected by the Operational Land Imager (OLI) onboard Landsat 8 and the Multispectral Instrument (MSI) onboard Sentinel-2. This highly-automatable approach allowed us to precisely characterize seasonal variations in spectral vegetation indices with substantial agreement between observed and predicted values (R2 = 0.98; Mean Absolute Error = 0.01). Derived phenology curves agreed with independent eMODIS phenological signatures of major land cover types (average r-value = 0.86), cheatgrass cover (average r-value = 0.96), and growing season proxies for vegetation productivity (R2 = 0.88), although a systematic bias towards earlier maturity and senescence indicates enhanced monitoring capabilities associated with the use of harmonized Landsat-8 Sentinel-2 data. Overall, our results demonstrate that observations made by the MSI and OLI can be used in conjunction to accurately characterize land-surface phenology and exclusion of imagery from either sensor drastically reduces our ability to monitor dryland environments. Given the declines in MODIS performance and forthcoming decommission with no equivalent replacement planned, data fusion approaches that integrate observations from multispectral sensors will be needed to effectively monitor dryland ecosystems. While the synthetic image stacks are expected to be locally useful, the technical approach can serve a wide variety of applications such as invasive species and drought monitoring, habitat mapping, production of phenology metrics, and land-cover change modeling.


Introduction
Drylands occupy approximately 41% of Earth's land surface and are currently home to over 38% of the world's population [1].Consequently, drylands exert significant controls on socio-environmental systems, including global energy, water, and carbon cycles [2].Enhanced warming has promoted dryland expansion throughout portions of the world [3], by increasing evaporative demand and decreasing soil moisture, and strongly modifies the capacity of dryland ecosystems to sequester carbon through vegetation shifts and warming effects on photosynthesis and decomposition [4].The spread of cheatgrass (Bromus tectorum L.) has also negatively impacted biodiversity and productivity of drylands in the western United States [5].Because vegetation dynamics exert a strong control on water, energy, and biogeochemical budgets in drylands [6], knowledge of land-surface phenology is vital for effectively monitoring dryland ecosystems and is important for both the research community and the policy community in the management of Earth's landscapes.Here, land-surface phenology is defined as the spatiotemporal development of vegetated land surfaces as imaged by spaceborne sensors.
Satellite data provide a cost-effective means for understanding land-surface phenology and vegetation characteristics across large areas.For example, Moderate Resolution Image Spectroradiometer (MODIS) data have been used to estimate vegetation type and biomass, leaf area index, primary productivity, and biophysical variables that influence climate [7][8][9].Likewise, Landsat data have been used to characterize global-scale processes such as changes in forest cover [10] and surface waters [11] at unprecedented scales.Free and open access of the Landsat archive has resulted in a deluge of new methods and data products in phenological research [12,13].However, the low temporal availability of Landsat data, due in large part to cloud and shadow coverage, continues to be a major barrier for monitoring phenology and land-cover change [14].
To overcome data availability issues, Landsat data have been fused with information collected by satellites with higher temporal resolution (e.g., MODIS) to create high-temporal, cloud-free time series [15][16][17].Likewise, Landsat images from multiple years (and sensors) can be merged into a single time series, but such methods need to consider inter-annual differences in phenology, land surface changes, and different sensor characteristics [14,18].More recently, data collected by the Operational Landsat Imagery (OLI) onboard Landsat-8 have been fused with freely-available information collected by the Multispectral Instrument (MSI) onboard Sentinel-2 [19].The combination of MSI and OLI data provides a global mean revisit interval of approximately five days [20], thereby substantially increasing monitoring capabilities of the Earth's surface, but few studies have proposed methodologies for integrating these datasets to provide cloud-free surface observations and time series data needed to monitor Earth's surface.
Here, we integrate OLI and MSI data into a novel, regression tree modeling framework to develop spatially-explicit estimates of Landsat-like Normalized Difference Vegetation Index (NDVI) throughout an arid and semi-arid environment in the Western United States.Model results were compared to observations of land and cheatgrass cover and independent data collected by the MODIS onboard Aqua.In doing so, we address the following questions: (1) Are surface reflectance data from Landsat-8 OLI and Sentinel-2 MSI data comparable in this region?(2) Can Landsat-8 OLI and Sentinel-2 MSI data be used in conjunction to accurately characterize seasonal vegetation index signatures within dryland environments?(3) To what extent does excluding MSI data impact our ability to effectively monitor land-surface phenology in drylands?This research fills a critical gap in the understanding of current monitoring capabilities of arid and semi-arid environments that cover a substantial portion of the globe.Findings are relevant to future land imaging missions (e.g., Landsat 10) and are useful for understanding key requirements and attributes needed to effectively monitor water-limited ecosystems.
Remote Sens. 2018, 10, x FOR PEER REVIEW 3 of 15 11 °C and from 160 to 810 mm, respectively, largely reflecting local elevation differences (1339-3781 m).Cheatgrass is relatively common in the lowlands of the study area [22].

Harmonized Landsat-8 Sentinel-2 Data
Harmonized Landsat-Sentinel-2 (HLS) data were used in this study and are consistent surface reflectance products derived from observations made by the Multispectral Instrument (MSI) onboard Sentinel-2 and the Operational Land Imager (OLI) onboard Landsat 8.The HLS products were generated using orthorectified top-of-atmosphere (TOA) reflectance data and have undergone atmospheric correction, geometric resampling (e.g., 30 m spatial resolution), geographic registration, BRDF normalization, and band-pass adjustments have been made to MSI data.Readers are referred to the product user's guide for a full description of the HLS data processing [23].Only data from Weeks 9 (start date 6 March 2016) to 47 (end date 24 November 2016) were considered in our analysis to avoid periods of snow cover and to capture the entire growing season.

eMODIS Data
The 250 m expedited MODIS (eMODIS) weekly NDVI composites [24] were used in this study to facilitate comparisons with derived growing season phenology.The eMODIS NDVI composites used data quality information to minimize errors associated with clouds, view angles and snow observation for each weekly NDVI composite.Reference [7] found degradation in MODIS Collect 5 Terra NDVI after about 2007 so MODIS Aqua Collect 6 data were used in this study.Instability in the Collect 6 surface reflectance, often associated with low red reflectance, caused sporadic ephemeral spikes in NDVI.We developed an algorithm to identify these erroneous NDVI spikes and treat them as cloud contaminated pixels.The de-spike algorithm flagged weekly NDVI values as cloud contaminated if the current week NDVI was at least ten percent greater than the maximum NDVI from the prior and post two weeks, and the difference from the current NDVI and both the prior and post weekly NDVI values was greater than 20%.The weekly NDVI time series data were temporally smoothed (five-week) to interpolate NDVI values across ephemeral drops typically associated with cloudy weeks [25], but the raw data and associated quality masks were used within cross-correlation analysis because smoothing and different data observation densities can shift resulting phenocurves.

Harmonized Landsat-8 Sentinel-2 Data
Harmonized Landsat-Sentinel-2 (HLS) data were used in this study and are consistent surface reflectance products derived from observations made by the Multispectral Instrument (MSI) onboard Sentinel-2 and the Operational Land Imager (OLI) onboard Landsat 8.The HLS products were generated using orthorectified top-of-atmosphere (TOA) reflectance data and have undergone atmospheric correction, geometric resampling (e.g., 30 m spatial resolution), geographic registration, BRDF normalization, and band-pass adjustments have been made to MSI data.Readers are referred to the product user's guide for a full description of the HLS data processing [23].Only data from Weeks 9 (start date 6 March 2016) to 47 (end date 24 November 2016) were considered in our analysis to avoid periods of snow cover and to capture the entire growing season.

eMODIS Data
The 250 m expedited MODIS (eMODIS) weekly NDVI composites [24] were used in this study to facilitate comparisons with derived growing season phenology.The eMODIS NDVI composites used data quality information to minimize errors associated with clouds, view angles and snow observation for each weekly NDVI composite.Reference [7] found degradation in MODIS Collect 5 Terra NDVI after about 2007 so MODIS Aqua Collect 6 data were used in this study.Instability in the Collect 6 surface reflectance, often associated with low red reflectance, caused sporadic ephemeral spikes in NDVI.We developed an algorithm to identify these erroneous NDVI spikes and treat them as cloud contaminated pixels.The de-spike algorithm flagged weekly NDVI values as cloud contaminated if the current week NDVI was at least ten percent greater than the maximum NDVI from the prior and post two weeks, and the difference from the current NDVI and both the prior and post weekly NDVI values was greater than 20%.The weekly NDVI time series data were temporally smoothed (five-week) to interpolate NDVI values across ephemeral drops typically associated with cloudy weeks [25], but the raw data and associated quality masks were used within cross-correlation analysis because smoothing and different data observation densities can shift resulting phenocurves.To facilitate eMODIS comparisons with modeled NDVI (see Section 3.3.2),weekly eMODIS NDVI values were downscaled to 30-m using approaches based on regression tree models that are developed at a 250 m resolution (Landsat NDVI is averaged up to 250 m resolution) to predict eMODIS 250-m NDVI from Landsat imagery [17,26].Regression tree models are then applied to the original 30 m Landsat spectral band data to produce an eMODIS equivalent NDVI image at the 30 m resolution.

Methods
The overall workflow for our analysis consisted of ten different steps (Figure 2) with two major components (i.e., pre-processing and time-series analysis).Pre-processing consisted of filtering imagery using standard quality metrics (i.e., cloud and shadow cover ≤ 80%, and data quality > 30%) and date of year (May-September 2016), applying image masks created by decision tree models, and cross-sensor comparisons.Time series analysis consisted of regression tree model calibration and validation for the interpolation of multi-spectral indices calculated from HLS data.Time-series predictions were compared to independent information derived from eMODIS data and estimates of cheatgrass cover to ensure phenology curves were logical.Specific details for each processing step are discussed below.To facilitate eMODIS comparisons with modeled NDVI (see Section 3.3.2),weekly eMODIS NDVI values were downscaled to 30-m using approaches based on regression tree models that are developed at a 250 m resolution (Landsat NDVI is averaged up to 250 m resolution) to predict eMODIS 250-m NDVI from Landsat imagery [17,26].Regression tree models are then applied to the original 30 m Landsat spectral band data to produce an eMODIS equivalent NDVI image at the 30 m resolution.

Methods
The overall workflow for our analysis consisted of ten different steps (Figure 2) with two major components (i.e., pre-processing and time-series analysis).Pre-processing consisted of filtering imagery using standard quality metrics (i.e., cloud and shadow cover ≤ 80%, and data quality > 30%) and date of year (May-September 2016), applying image masks created by decision tree models, and cross-sensor comparisons.Time series analysis consisted of regression tree model calibration and validation for the interpolation of multi-spectral indices calculated from HLS data.Time-series predictions were compared to independent information derived from eMODIS data and estimates of cheatgrass cover to ensure phenology curves were logical.Specific details for each processing step are discussed below.

Data Masks
The mask layers distributed with the HLS data, as created using the FMask algorithm [27] and the Landsat 8 Surface Reflectance Code (LaSRC) cloud masking algorithm [28], were of insufficient quality for our time series analysis due to their inability to differentiate common features from others (i.e., playas from clouds and shadows from evergreen trees; Figure 3).This was particularly apparent for Landsat masks despite the use of the cirrus and thermal bands in the LaSRC cloud masking algorithm [29].Subsequently, we developed separate decision tree models for MSI and OLI data trained on about two million data points (distributed through space and time) to develop robust masks for clouds, shadows, water, and snow/ice pixels.Decision tree models were developed using a commercial version of the See5 software (See5: An Informal Tutorial, 27 December 2017, http://rulequest.com/see5-win.html) and validated using random hold-out samples (20%).Calibrated models were then applied to each image to mask irrelevant data.The accuracies of the masks were assessed using a random-hold out sample of interpretations of cloud, shadows, water, snow/ice that were distributed through space and time.Independent interpretations and masking layers were used to construct confusion matrices to compute overall and class specific accuracies.

Data Masks
The mask layers distributed with the HLS data, as created using the FMask algorithm [27] and the Landsat 8 Surface Reflectance Code (LaSRC) cloud masking algorithm [28], were of insufficient quality for our time series analysis due to their inability to differentiate common features from others (i.e., playas from clouds and shadows from evergreen trees; Figure 3).This was particularly apparent for Landsat masks despite the use of the cirrus and thermal bands in the LaSRC cloud masking algorithm [29].Subsequently, we developed separate decision tree models for MSI and OLI data trained on about two million data points (distributed through space and time) to develop robust masks for clouds, shadows, water, and snow/ice pixels.Decision tree models were developed using a commercial version of the See5 software (See5: An Informal Tutorial, 27 December 2017, http://rulequest.com/see5-win.html) and validated using random hold-out samples (20%).Calibrated models were then applied to each image to mask irrelevant data.The accuracies of the masks were assessed using a random-hold out sample of interpretations of cloud, shadows, water, snow/ice that were distributed through space and time.Independent interpretations and masking layers were used to construct confusion matrices to compute overall and class specific accuracies.and 2) overlaid by mask layers created using LaSRC (Top Middle [28]), FMask (Bottom Middle [27]); and (Right) decision tree models.

Cross-Sensor Calibration
We compared the radiometric characteristics of HLS bands using one-day-apart images acquired over the study area, to ensure HLS band-pass adjustments were of sufficient quality for our analysis.Small time differences reduce uncertainties related to radiometric bias caused by external factors.The selected MSI and OLI image pairs (Table 1) were generally unaffected by clouds and were further screened using masks developed for this study.We sampled approximately 50,000 different pixels across the site and compared surface reflectance data from equivalent bands (i.e., blue, green, red, near-infrared (NIR), and short-wave infrared (SWIR) 2) and the normalized difference vegetation index (NDVI) calculated using Equation ( 1) and bands B8A and B04 for MSI imagery as suggested by others [30].

Cross-Sensor Calibration
We compared the radiometric characteristics of HLS bands using one-day-apart images acquired over the study area, to ensure HLS band-pass adjustments were of sufficient quality for our analysis.Small time differences reduce uncertainties related to radiometric bias caused by external factors.The selected MSI and OLI image pairs (Table 1) were generally unaffected by clouds and were further screened using masks developed for this study.We sampled approximately 50,000 different pixels across the site and compared surface reflectance data from equivalent bands (i.e., blue, green, red, near-infrared (NIR), and short-wave infrared (SWIR) 2) and the normalized difference vegetation index (NDVI) calculated using Equation (1) and bands B8A and B04 for MSI imagery as suggested by others [30].

Cross-Sensor Calibration
We compared the radiometric characteristics of HLS bands using one-day-apart images acquired over the study area, to ensure HLS band-pass adjustments were of sufficient quality for our analysis.Small time differences reduce uncertainties related to radiometric bias caused by external factors.The selected MSI and OLI image pairs (Table 1) were generally unaffected by clouds and were further screened using masks developed for this study.We sampled approximately 50,000 different pixels across the site and compared surface reflectance data from equivalent bands (i.e., blue, green, red, near-infrared (NIR), and short-wave infrared (SWIR) 2) and the normalized difference vegetation index (NDVI) calculated using Equation (1) and bands B8A and B04 for MSI imagery as suggested by others [30].

Database Development and Modeling
Model development and testing databases were constructed by equally sampling the range of observed, mean NDVI values throughout the study period, where approximately 50,000 pixels were randomly selected and repeatedly sampled for each acquisition date.Spatial coordinates, Julian Day, and HLS-NDVI data served as explanatory variables within our modeling framework.Masked pixels were flagged as missing values in the modeling databases and retained their raw values within the imagery.Likewise, NDVI data calculated from OLI imagery were assumed missing for time steps when the values were used as the dependent variable, to avoid circularity that would result in a perfect prediction.The inclusion of spatial coordinates and day of year (Julian Day) information insured that spatial and temporal autocorrelation effects were minimized.The functional form of the model, for one case or period, can be written as: where t is time or Julian Day.Two sets of models were developed to predict NDVI throughout the study period, one including and excluding MSI data as an independent variable (Model LS and Model L ), using a boosted regression tree ensemble approach and the commercial version of the C4.5 algorithm [31].Regression trees are nonparametric models that hierarchically subdivide data using a set of conditional rules, while minimizing an objective function (e.g., Mean Absolute Error (MAE)), and fit linear regression models in each subspace.Regression tree models can approximate any function of a continuous variable, can handle missing data, and have been used for time series prediction [32].Model calibration and testing was done using cross-validation procedures and a random-hold out sample (20%), respectively.After model calibration, the rules and associated linear equations were applied to the environmental covariates to develop spatially-explicit estimates (30-m spatial resolution) of Landsat equivalent, or modeled, NDVI at weekly intervals throughout the study period.

Product Comparisons
Weekly normalized difference vegetation index (NDVI) data, as predicted from the regression tree models, were compared to: (1) the weekly 250-m time series of masked eMODIS data using cross-correlation analysis (i.e., 1-to-1 mapping and lagged); (2) average growing season (April-September) NDVI (250-m) calculated from the smoothed eMODIS time series; and (3) downscaled (30-m) eMODIS products (NDVI at Weeks 22 and 23) using weighted least square regression analysis.Weekly eMODIS and predicted NDVI data were summarized and compared for major land cover types ( [21]; resampled to 250-m using nearest neighbor interpolation) and percentage cheatgrass cover prediction (250-m) made throughout the study area [22], to determine if the modeled data could logically capture land-surface dynamics throughout the growing season.The average growing season NDVI was computed from each dataset and compared at the native spatial resolution of the reference datasets (i.e., 250-m) using weighted (by the inverse of the coefficient of variation calculated using 3 × 3 moving window approach on the reference dataset) least squares regression analysis for about 10,000 pixels distributed throughout the study area.These comparisons were conducted using weekly interpolations by models with and without Sentinel-2A data, to assess the added benefits of including Sentinel data in the modeling framework.Statistical metrics (i.e., Adjusted R 2 , MAE, relative MAE, and Root Mean Square Error (RMSE)) were used to determine agreement and assess model performance.

Data Masks
The overall accuracy for both decision tree masking models (i.e., OLI and MSI) was above 99%, with similar commission and omission error rates, despite that lack of cirrus and thermal bands in MSI data (Table 2).These accuracy metrics far exceed those of the HLS mask layers, as calculated from a subset of these interpretations made on three dates of imagery for each sensor, where OLI and MSI masks (converted to binary) had an overall accuracy of 76% and 89%, respectively (Table S1).

Cross-Sensor Calibration
Analysis of one-day-apart acquisitions (Figure 4) showed strong 1:1 agreement for NDVI, blue, green, red, and near-infrared bands (R 2 from 0.89 to 0.98), with minor bias only observed in short-wave infrared 2 spectrum (R 2 = 0.95), after accounting for contaminated pixels (i.e., cloud, shadow, snow, and water).This suggests that Sentinel-2 MSI data can be used in conjunction with Landsat-8 OLI data, without further band-pass adjustments, with minimal impacts on NDVI time series analysis discussed below.

Time-Series Analysis
Overall, there was a very strong correlation (R 2 = 0.99) between Model LS predictions and randomly-withheld Landsat-NDVI observations, with little difference in error metrics for each acquisition date (Table 3).Correlations between observed and predicted values were lowest (R 2 > 0.92) for the first time step (model not constrained by data prior to Day of Year 64) and for three consecutive acquisitions dates (i.e., Day of Year 112-144) with high cloud and shadow cover (R 2 = 0.96).There was little to no decrease in predictive performance when Sentinel-2 MSI data were excluded as an explanatory variable during Model L development (Table S2).While Model L performs well on withheld Landsat observation throughout the year, this does not necessarily mean it accurately characterizes greenness curves throughout the entire year, as is demonstrated below.
We found strong correlations between modeled NDVI and coarser eMODIS observations for major land cover types in the study area (Figure 5), with substantially higher agreement between Model LS and eMODIS time series, as compared to estimates made by Model L .The poorest agreement between Model LS and eMODIS NDVI (r-value of 0.67) correspond to time series in barren land cover types that are subject to water impoundment, as was identified from the time series data.The highest agreements occurred in Shrub/Scrub and Evergreen Forests (r-value = 0.94), followed by Pasture/Hay (0.92), and Grassland/Herbaceous (0.83) land cover types.Interestingly, Model LS NDVI values peaked earlier (~2 weeks) than eMODIS values for grasslands and shrubs, although the maximum correlation was obtained using 1-to-1 mapping.Agreement between eMODIS and Model L estimates were highest for Evergreen Forests (r-value = 0.63), followed by Barren (0.59), Pasture/Hay (0.55), Grassland/Herbaceous (0.49), and Shrub/Scrub (0.19).
Figure 5.Time series of Harmonized Landsat-8 Sentinel-2 (HLS), predicted, and eMODIS Normalized Difference Vegetation Index (NDVI  100) separated by major land cover classes [21].ModelLS was generated using both Landsat-8 and Sentinel-2A inputs, while ModelL was generated using only Landsat-8 inputs.Each time series is calculated from all pixels (excluding masked pixels in the original HLS time series) representing largely homogenous areas (coefficient of variation < 40%) and smoothed using a three-week sliding window approach.Timesteps where HLS data coverage was less than 70% are not shown.
When estimates are stratified by percentage cheatgrass cover across the entire study area, we find good agreement between ModelLS and eMODIS time series data, where peak NDVI and similarity metrics vary as a function of percentage cheatgrass cover (Figure 6).That is, maximum correlations were found for sites with low (r-value = 0.95), moderate (0.96), and high cheatgrass cover (0.98) assuming no lag.Conversely, there was very poor agreement between ModelL and eMODIS time series data within low (r-value = 0.10), moderate (0.14), and high cheatgrass cover (0.33) (Figure 6).
When estimates are averaged across growing season months, we find substantially higher agreement between ModelLS and eMODIS values, as compared to ModelL estimates (R 2 = 0.88 versus R 2 = 0.72; Figure 7).We also found much higher agreement between weekly eMODIS and interpolated data (ModelLS) for two weeks (Weeks 22 and 23) during peak-growing season, as compared to downscaled eMODIS products (Figure 8).The slight lag between eMODIS and modeled NDVI data  [21].Model LS was generated using both Landsat-8 and Sentinel-2A inputs, while Model L was generated using only Landsat-8 inputs.Each time series is calculated from all pixels (excluding masked pixels in the original HLS time series) representing largely homogenous areas (coefficient of variation < 40%) and smoothed using a three-week sliding window approach.Timesteps where HLS data coverage was less than 70% are not shown.
When estimates are stratified by percentage cheatgrass cover across the entire study area, we find good agreement between Model LS and eMODIS time series data, where peak NDVI and similarity metrics vary as a function of percentage cheatgrass cover (Figure 6).That is, maximum correlations were found for sites with low (r-value = 0.95), moderate (0.96), and high cheatgrass cover (0.98) assuming no lag.Conversely, there was very poor agreement between Model L and eMODIS time series data within low (r-value = 0.10), moderate (0.14), and high cheatgrass cover (0.33) (Figure 6).
When estimates are averaged across growing season months, we find substantially higher agreement between Model LS and eMODIS values, as compared to Model L estimates (R 2 = 0.88 versus R 2 = 0.72; Figure 7).We also found much higher agreement between weekly eMODIS and interpolated data (Model LS ) for two weeks (Weeks 22 and 23) during peak-growing season, as compared to downscaled eMODIS products (Figure 8).The slight lag between eMODIS and modeled NDVI data are manifested in this comparison as a systematically low bias for modeled NDVI during these growing season weeks.
are manifested in this comparison as a systematically low bias for modeled NDVI during these growing season weeks.Figure 6.Time series of Harmonized Landsat-8 Sentinel-2 (HLS), predicted, and eMODIS Normalized Difference Vegetation Index (NDVI  100) separated by percentage cheatgrass cover [22].ModelLS was generated using both Landsat-8 and Sentinel-2A inputs, while ModelL was generated using only Landsat-8 inputs.Each time series was from all pixels (excluding masked areas in the original HLS time series) and smoothed using a three-week sliding window approach.Timesteps where HLS data coverage was less than 70% are not shown.
Figure 7. eMODIS versus predicted (upscaled to 250-m spatial resolution using bilinear interpolation) growing season (April-September) Normalized Difference Vegetation Index (NDVI  100) for randomly selected pixels (n ~ 10,000).ModelLS was generated using both harmonized Landsat-8 and Sentinel-2A inputs, while ModelL was generated using only harmonized Landsat-8 inputs.Darker regions represent larger numbers of sampled pixels.Dashed-green line is 1-to-1 and the red line is the ordinary-least squares regression fit weighted by the inverse of the coefficient of variation (CV) determined from a focal scan of the eMODIS Growing Season NDVI product (e.g., more weight to homogenous areas [larger points]).Figure 6.Time series of Harmonized Landsat-8 Sentinel-2 (HLS), predicted, and eMODIS Normalized Difference Vegetation Index (NDVI × 100) separated by percentage cheatgrass cover [22].Model LS was generated using both Landsat-8 and Sentinel-2A inputs, while Model L was generated using only Landsat-8 inputs.Each time series was from all pixels (excluding masked areas in the original HLS time series) and smoothed using a three-week sliding window approach.Timesteps where HLS data coverage was less than 70% are not shown.Difference Vegetation Index (NDVI  100) separated by percentage cheatgrass cover [22].ModelLS was generated using both Landsat-8 and Sentinel-2A inputs, while ModelL was generated using only Landsat-8 inputs.Each time series was from all pixels (excluding masked areas in the original HLS time series) and smoothed using a three-week sliding window approach.Timesteps where HLS data coverage was less than 70% are not shown.

Discussion
In this research, we presented a novel approach to accurately monitor land-surface phenology and generate spatially-explicit estimates of surface reflectance data using a regression tree modeling framework that combined information collected by Landsat OLI and Sentinel MSI.Overall, we found that data collected from these sensors were largely compatible, but the extent to which HLS pre-processing ensured compatibility is outside the scope of this study.To effectively characterize seasonal vegetation index curves, we developed robust cloud masks because HLS masking products were of insufficient quality.For a thorough review of operational cloud masking algorithms and their associated performance, we refer readers to reference [29].We demonstrate that the inclusion of Sentinel MSI data into our modeling framework substantially improves our ability to track and represent land-surface dynamics in water-limited environments, due in large part to the fact that additional observations allow us to seamlessly fill observational gaps due to clouds and shadows.
Our methodology has advantages over previously published approaches and is highly automatable.First, our approach explicitly mitigates spatial and temporal autocorrelation effects, by integrating spatial and temporal covariates (i.e., latitude, longitude, Julian date, and spectral reflectance from other dates) directly into the global model, which per-pixel approaches typically ignore (e.g., [14,33]).This is particularly import for monitoring semi-arid environments because vegetation growth can vary widely in response to heterogeneous, precipitation effects.Second, this model does not make use of eMODIS data, as we rely solely on observations collected by OLI and MSI sensors, and it outperformed Landsat-eMODIS downscaling approaches even though eMODIS data are used in the data fusion framework [17,26].Third, this modeling framework is very flexible and can easily integrate other covariates where needed (e.g., climatic data, spectral and temporal information (e.g., Year)).Finally, disturbances (e.g., wildfires) are also captured by this modeling approach, which simpler interpolation techniques may not adequately resolve.
Future directions for the modeling component of this work include: (1) integration of other Landsat and Sentinel data (e.g., Landsat ETM+, Sentinel-2B) to better constrain models; (2) generating spatially-explicit estimates of other surface reflectance data (e.g., red-edge) and uncertainties from an ensemble of model realizations; (3) normalization of model training distributions as a function of time, to account for differences in observation density throughout the time series; and (4) assimilation of leave-one-image-out validation procedures within model calibration to independently assess model interpolation precision rather than relying on MODIS time series.We also recognize ongoing efforts by the HLS project and others to improve cloud masking algorithms for MSI data, which should be useful for expediting processing time within our overall workflow.
On average, we found that agreement between modeled and eMODIS temporal profiles of NDVI was highest for shrub/scrub and evergreen forests.Evergreen forests typically have little change in greenness from season to season and, thus, abrupt increases in NDVI within the earlier portion of the time series (Figure 5) may reflect snowmelt effects that could impact similarity metrics.Modeled estimates of NDVI were systematically lower than those of the eMODIS dataset across most land cover types, which has been previously attributed to different sensor characteristics (e.g., spatial resolution) [34,35].The use of cross-calibrated Sentinel data as a dependent variable within our modeling framework could enhance our ability depict seasonal vegetation dynamics, particularly when a paucity of Landsat data exists.While our methodology was applied to only one year of growing season imagery, we believe that it will likely be of great value for understanding large, inter-annual variations in land-surface phenology that are commonly associated with dryland ecosystems.
Our results also hint at a systematic bias toward earlier timing of maturity and senescence of grass and shrub/scrub cover types (Figures 5 and 6), which is not caused by inadequate density of observations in either time series but rather the enhanced capacity for tracking phenology using moderate resolution data than has been previously possible with coarser resolution data (e.g., MODIS).This finding agrees with those of references [18,36], where phenological transition dates (i.e., green-up, start-of-season, and maturity) extracted from Landsat time series were generally earlier than those from derived from MODIS and Visible Infrared Imaging Radiometer Suite (VIIRS) phenocurves.Others have also found a two-week lag when comparing MODIS NDVI to flux tower gross photosynthesis in rangeland ecosystems [37].This has enormous implications for both the research community and the policy community for the management of dryland ecosystems.For example, these data allow us to detect abrupt increases in vegetation greenness associated with cheatgrass growth at fine spatial and temporal resolutions, which will allow land-use managers to take proactive steps in mitigating its impact on ecosystems and human communities.
As is demonstrated in this research, remote sensing is a cost-effective tool to describe vegetation dynamics at broad scales.Time series of vegetation indices provide the foundation from which land-surface phenology metrics (e.g., greenup, leaf-out, start of spring) are constructed [38,39].Thus, it is vital that phenological products are derived from high quality time series data, such as those described in in this manuscript.While phenology metrics have been routinely generated from coarse-resolution data [40], data derived solely from moderate-resolution imagery (e.g., Landsat and Sentinel-2) are beginning to emerge [41].Given the declines in MODIS performance and forthcoming decommission with no equivalent replacement planned, data fusion approaches that integrate observations from multispectral sensors (e.g., Landsat, Sentinel-2A and -2B) will be needed to effectively monitor dryland ecosystems.

Conclusions
In this study, we presented an innovative methodology to accurately monitor land-surface phenology in drylands of the Western United States.Our data fusion approach combined information collected by the Operational Land Imager (OLI) onboard Landsat 8 and the Multispectral Instrument (MSI) onboard Sentinel-2 using a regression tree modeling framework.This highly-automatable work flow allowed us to precisely characterize seasonal dynamics in vegetation indices with considerable agreement between spatially-explicit estimates and independent observations.We provide evidence for enhanced monitoring capabilities associated with the fusion of OLI and MSI time series, as opposed to solely using coarser resolution data or downscaling techniques that rely on a combination of Landsat and MODIS, where higher-resolution imagery allowed for early detection of plant maturity and senescence.We demonstrate that observations made by MSI and OLI can be used in amalgamation and exclusion of imagery from either sensor impairs our ability to precisely characterize vegetation dynamics in dryland environments.Other environmental applications, such as habitat and land cover mapping, will likely benefit from our time series products because they accurately capture the state of the vegetation at a spatial resolution that is meaningful to local resource managers.This research fills a critical gap in the understanding of current monitoring capabilities of arid and semi-arid environments that cover a substantial portion of the globe.
Author Contributions: N.J.P and B.K.W. conceived and designed the study; N.J.P. performed the analysis; and N.J.P., B.K.W., and Z.W. wrote the paper and interpreted the results.

Figure 4 .
Figure 4. Comparison of surface reflectance data from Harmonized Landsat-Sentinel-2 (HLS) products for one-day-apart acquisitions.Darker regions represent larger numbers of points (n = 50,000).Dashed-green line is 1-to-1 and the red line is the ordinary least squares (OLS) regression fit showing proportional bias.

Figure 4 .
Figure 4. Comparison of surface reflectance data from Harmonized Landsat-Sentinel-2 (HLS) products for one-day-apart acquisitions.Darker regions represent larger numbers of points (n = 50,000).Dashed-green line is 1-to-1 and the red line is the ordinary least squares (OLS) regression fit showing proportional bias.

Figure 4 .
Figure 4. Comparison of surface reflectance data from Harmonized Landsat-Sentinel-2 (HLS) products for one-day-apart acquisitions.Darker regions represent larger numbers of points (n = 50,000).Dashed-green line is 1-to-1 and the red line is the ordinary least squares (OLS) regression fit showing proportional bias.

Figure 5 .
Figure 5.Time series of Harmonized Landsat-8 Sentinel-2 (HLS), predicted, and eMODIS Normalized Difference Vegetation Index (NDVI × 100) separated by major land cover classes[21].Model LS was generated using both Landsat-8 and Sentinel-2A inputs, while Model L was generated using only Landsat-8 inputs.Each time series is calculated from all pixels (excluding masked pixels in the original HLS time series) representing largely homogenous areas (coefficient of variation < 40%) and smoothed using a three-week sliding window approach.Timesteps where HLS data coverage was less than 70% are not shown.
Remote Sens. 2018, 10, x FOR PEER REVIEW 10 of 15 are manifested in this comparison as a systematically low bias for modeled NDVI during these growing season weeks.

Figure 6 .
Figure 6.Time series of Harmonized Landsat-8 Sentinel-2 (HLS), predicted, and eMODIS Normalized Difference Vegetation Index (NDVI  100) separated by percentage cheatgrass cover[22].ModelLS was generated using both Landsat-8 and Sentinel-2A inputs, while ModelL was generated using only Landsat-8 inputs.Each time series was from all pixels (excluding masked areas in the original HLS time series) and smoothed using a three-week sliding window approach.Timesteps where HLS data coverage was less than 70% are not shown.

Figure 7 .
Figure 7. eMODIS versus predicted (upscaled to 250-m spatial resolution using bilinear interpolation) growing season (April-September) Normalized Difference Vegetation Index (NDVI  100) for randomly selected pixels (n ~ 10,000).ModelLS was generated using both harmonized Landsat-8 and Sentinel-2A inputs, while ModelL was generated using only harmonized Landsat-8 inputs.Darker regions represent larger numbers of sampled pixels.Dashed-green line is 1-to-1 and the red line is the ordinary-least squares regression fit weighted by the inverse of the coefficient of variation (CV) determined from a focal scan of the eMODIS Growing Season NDVI product (e.g., more weight to homogenous areas [larger points]).

Figure 7 .
Figure 7. eMODIS versus predicted (upscaled to 250-m spatial resolution using bilinear interpolation) growing season (April-September) Normalized Difference Vegetation Index (NDVI × 100) for randomly selected pixels (n ~10,000).Model LS was generated using both harmonized Landsat-8 and Sentinel-2A inputs, while Model L was generated using only harmonized Landsat-8 inputs.Darker regions represent larger numbers of sampled pixels.Dashed-green line is 1-to-1 and the red line is the ordinary-least squares regression fit weighted by the inverse of the coefficient of variation (CV) determined from a focal scan of the eMODIS Growing Season NDVI product (e.g., more weight to homogenous areas [larger points]).

Figure 8 .
Figure 8. eMODIS versus predicted (upscaled to 250-m spatial resolution using bilinear interpolation) normalized difference vegetation index (NDVI  100) for randomly selected pixels (n ~ 10,000) during Weeks 22 and 23.ModelLS was generated using both harmonized Landsat-8 and Sentinel-2A inputs, while downscaled NDVI was generated using regression tree models with harmonized Landsat-8 and eMODIS inputs.Darker regions represent larger numbers of sampled points.Dashed-green line is 1-to-1 and the red line is the ordinary-least squares regression fit weighted by the inverse of the coefficient of variation (CV) determined from a focal scan of the eMODIS growing season NDVI product (e.g., more weight to homogenous areas [larger points]).

Figure 8 .
Figure 8. eMODIS versus predicted (upscaled to 250-m spatial resolution using bilinear interpolation) normalized difference vegetation index (NDVI × 100) for randomly selected pixels (n ~10,000) during Weeks 22 and 23.Model LS was generated using both harmonized Landsat-8 and Sentinel-2A inputs, while downscaled NDVI was generated using regression tree models with harmonized Landsat-8 and eMODIS inputs.Darker regions represent larger numbers of sampled points.Dashed-green line is 1-to-1 and the red line is the ordinary-least squares regression fit weighted by the inverse of the coefficient of variation (CV) determined from a focal scan of the eMODIS growing season NDVI product (e.g., more weight to homogenous areas [larger points]).

Table 1 .
Harmonized Landsat-8 OLI and Sentinel-2 MSI images (n = 30) acquired in 2016 and used within the study.Imagery used in one-day-apart comparisons; † † Refers to percentage of pixels where quality assessment band equals zero.

Table 2 .
Confusion matrices constructed from random hold-out samples of observed versus mapped values for binary (i.e., cloud, shadow, snow/ice, water (yes), or clear (no)) decision tree masking models developed for (top) Landsat-8 OLI and (bottom) Sentinel-2 MSI data throughout the study period.

Table 3 .
Statistical metrics derived from observed and predicted (Model LS ) normalized difference vegetation index (NDVI × 100) values for a random hold-out sample.= Mean Absolute Error; rMAE = relative Mean Absolute Error; RMSE = Root-mean Squared Error. MAE

Table 3 .
Statistical metrics derived from observed and predicted (ModelLS) normalized difference vegetation index (NDVI  100) values for a random hold-out sample.