Revisiting the Past: Replicability of a Historic Long-Term Vegetation Dynamics Assessment in the Era of Big Data Analytics

: Open and analysis-ready data, as well as methodological and technical advancements have resulted in an unprecedented capability for observing the Earth’s land surfaces. Over 10 years ago, Landsat time series analyses were inevitably limited to a few expensive images from carefully selected acquisition dates. Yet, such a static selection may have introduced uncertainties when spatial or inter-annual variability in seasonal vegetation growth were large. As seminal pre-open-data-era papers are still heavily cited, variations of their workﬂows are still widely used, too. Thus, here we quantitatively assessed the level of agreement between an approach using carefully selected images and a state-of-the-art analysis that uses all available images. We reproduced a representative case study from the year 2003 that for the ﬁrst time used annual Landsat time series to assess long-term vegetation dynamics in a semi-arid Mediterranean ecosystem in Crete, Greece. We replicated this assessment using all available data paired with a time series method based on land surface phenology metrics. Results differed fundamentally because the volatile timing of statically selected images relative to the phenological cycle introduced systematic uncertainty. We further applied lessons learned to arrive at a more nuanced and information-enriched vegetation dynamics description by decomposing vegetation cover into woody and herbaceous components, followed by a syndrome-based classiﬁcation of change and trend parameters. This allowed for a more reliable interpretation of vegetation changes and even permitted us to disentangle certain land-use change processes with opposite trajectories in the vegetation components that were not observable when solely analyzing total vegetation cover. The long-term budget of net cover change revealed that vegetation cover of both components has increased at large and that this process was mainly driven by gradual processes. We conclude that study designs based on static image selection strategies should be critically evaluated in the light of current data availability, analytical capabilities, and with regards to the ecosystem under investigation. We recommend using all available data and taking advantage of phenology-based approaches that remove the selection bias and hence reduce uncertainties in results.


Introduction
Earth observation (EO) data have long been used to monitor land cover, land cover change, and long-term modifications of the land surface [1][2][3][4]. Among all the satellite constellations that were launched into space [5], the Landsat program occupies the leading focused on abrupt changes in long time series, whereas the BFAST model aims at capturing both changes and long-term trends. Harmonic models, however, are better suited for forecasting than for data reconstruction [28], and have problems in areas with large inter-annual variations in the timing of phenological processes, e.g., in drylands [25]. In these cases, Jönsson et al. [29] recommend smoothing filters that preserve intra-seasonal variations, which are commonly used prior to deriving land surface phenology (LSP) metrics, e.g., [30][31][32][33]. Annual LSP metrics are ecologically meaningful parameters that describe key aspects of the phenological cycle (e.g., date and value of peaking time, or the onset of vegetation growth). This feature space enables physical interpretability and the use of conceptually simpler models (as opposed to models that additionally model seasonal components) to capture long-term trends, e.g., [34][35][36], or may potentially be used as surrogate input feature space with algorithms that account for both abrupt and gradual change processes, e.g., [37,38].
Nonetheless, quantitative knowledge about whether results of static approaches are comparable to data-driven approaches that effectively circumvent static selection strategies, hence making the analysis robust to these factors, are mostly missing (but see [39,40] for a comparison of forest change algorithms in the United States). In addition, papers reporting on reproducing or replicating previously published remote sensing research are missing at large, although "one of the pathways by which the scientific community confirms the validity of a new scientific discovery is by repeating the research that produced it" [41]. We thus deem it essential to replicate past case studies with all currently available data to assess whether their results still hold true and better understand limitations of static selection related to satellite data time series and long-term change analyses. This is especially important as such seminal landmark papers can still be heavily cited and referenced, or their workflows can be adopted in ongoing research. To this end, we replicate a representative case study in the light of a dramatically different data availability and processing reality.
To aid the reader, we here reprint the definitions of the National Academies of Sciences, Engineering, and Medicine for reproducibility and replicability [41]: reproducibility means "obtaining consistent results using the same input data, computational steps, methods, code, and conditions of analysis"; replicability means "obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data". Thus, a remote sensing example for reproducibility may be achieving a quasi-identical land cover classification when using the same data archive and method (e.g., classifier) as described in a previous paper. These reproduced results can, e.g., be used as a comparison baseline for new research, wherein a different dataset and/or a different method (e.g., a different classifier) may provide replicability by proving the same hypothesis about the process.

Case Study: Understanding Grazing Pressure in the European Mediterranean from Landsat Time Series
One of the first studies using the multi-decadal and multi-sensor time series of Landsat targeted the analysis of grazing pressure for the island of Crete, Greece [42,43]. Crete's elevation gradients extend from sea level to 2500 m a.s.l. and cover a vast range of Mediterranean ecosystems. At the same time, remote areas at high altitudes are embedded in intensively and extensively managed agricultural systems, also allowing the study of diverse land use [44]. One of the major concerns on humans impacting Mediterranean lands is related to grazing [45,46]. In many respects, research still struggles in answering the question of if Mediterranean grazing lands, such as the rangelands of Crete, are notoriously over-grazed and what implications such over-grazing may have for the diverse ecosystem services at stake [47,48].
Hostert et al. [42] employed for the first time a rigorously inter-calibrated 20-year time series of Landsat MSS and TM imagery from 1977 to 1996 to quantify how rangeland vegetation in the mountainous grazing lands of Crete changed over time, which was later extended to 2006 by Sonnenschein et al. [49]; please note that although published in 2011, this study still followed a pre-open-data-era study design. At the time of their study, data were provided as pixel-wise radiance data without ortho-correction. Standardized pre-processing schemes were not operationalized yet, and each image was independently and iteratively corrected with considerable user interaction. A rigorous pre-processing protocol enabling level-2 time-series was therefore developed from scratch, including steps for ortho-correction [50], atmospheric and topographic correction [51,52], and interactive cloud screening [42].
However, even when following such a well-defined protocol, human productivity, data availability, and/or cost was still the limiting factor, as the free and open data access to Landsat imagery was not implemented until 2008 [13,53,54] and a majority of historic scenes over Europe were not ingested into the consolidated and cross-calibrated USGS archive until 2018 due to missing payload correction data [14]. Data availability was restricted to a few scenes, largely limited by the financial resources needed to acquire a time series [12], and the lack of operational cloud screening and compositing algorithms that would have allowed to include partially cloud-covered datasets to the analyses [15,[55][56][57].
We here ask if past time-series analyses from Landsat data would yield the same or at least similar results when being replicated with today's state-of-the-art approaches based on the wealth of freely available Landsat data. To answer this question, we reproduce the original study, yet with the constraint of not including the Landsat MSS data for the period before 1984. The 80 m resolution, 4-band, 6-bit MSS data would introduce a bias in our comparisons; the integration of MSS into fully operationalized workflows is only advancing as of late [58], yet still precluding the use of the state-of-the-art workflows and algorithms we need to employ here.
We also enlarge the study area to all semi-natural areas of Crete (as compared to the restricted original study, only covering the mountains of central Crete). Against this background, we create a comparative study design that allows contrasting the methodology of Landsat studies from limited historic time series and methodologies with today's data wealth and contemporary methodologies.

Objectives
We are specifically interested in whether the results of these past studies are replicable when applying a more advanced processing scheme that uses all available data and accounts for inter-annual variability in vegetation growth. To achieve this, we will firstly reproduce the past studies with a comparable methodology and data setup (although acknowledging that full reproducibility could not be obtained following the definition above; reasons are outlined in the previous subsection, e.g., the exclusion of Landsat MSS data). Secondly, we will replicate the results by applying a data-driven, state-of-the-art methodology to the complete Landsat data record that temporally overlaps with the reference studies. Based on this comparative study, we will answer the research question "How do results of past and new time-series analysis approaches compare and what drives potential disagreement?" Subsequently, we will apply lessons learned to arrive at a more nuanced and informationenriched long-term vegetation change assessment for semi-arid ecosystems by separating changes of woody and herbaceous vegetation.

Landsat
We acquired all high-quality Landsat images over Crete for the years 1984-2006 following Sonnenschein et al. [49]. Crete is covered by six Worldwide Reference System 2 (WRS-2) scenes. We used the Level 1, Collection 1, Tier 1, terrain precision corrected dataset, comprised of Landsat 4, Landsat 5, and Landsat 7 imagery. Images with cloud coverage > 70% were discarded. A total of 2816 images were downloaded from Google's Landsat Cloud Storage-Bucket (https://console.cloud.google.com/storage/browser/gcppublic-data-landsat, accessed on 18 September 2021) via the Framework for Operational Radiometric Correction for Environmental monitoring (FORCE; v. 3.5) [59]. Figure 1 shows the number of clear-sky observations for each pixel (after preprocessing, see Section 3.1.1), i.e., filtered for clouds, cloud shadows, and snow. In general, clear-sky data availability is very high due to remote sensing-friendly weather conditions in the Mediterranean. Fewer observations are available for mountainous areas due to orographic cloud formation, and potentially due to persistent false-positive cloud detections over the bright massifs, c.f. [60], which, however, is uncritical since these bare rock surfaces are out of the scope of the study at hand. Data availability for the southern part of the island is higher than for the northern part due to advective clouds from dominant northwestern low-pressure tracks [61]. In the central and most eastern part of Crete (Iraklio and Sitia provinces), data availability is highest due to the presence of lateral orbital overlaps, which effectively double observation frequency.

Auxiliary Data
The 1-arc Second Digital Elevation Model (DEM) observed by the Shuttle Radar Topography Mission (SRTM) was used for preprocessing. Gaps were filled using the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) DEM. A pre-compiled water vapor database [62,63] was used for correcting gaseous absorption during atmospheric correction. Similar to Sonnenschein et al. [49], we used the CORINE land cover map 2006 (http://dataservice.eea.europa.eu, accessed on 18 September 2021) to restrict our analysis to the 'semi-natural' class and used the DEM to exclude the alpine zone (>1500 m a.s.l.).

Methods
Most processing outlined in this section was performed using FORCE [59], freely available from https://github.com/davidfrantz/force, accessed on 18 September 2021. As outlined in the objectives, we processed and analyzed the data with two different methods, one to reproduce the past state-of-the-art with few input data (hereby denoted as "Vegetation Dynamics 1.0": VD 1.0), and the second to replicate with state-of-the-art methods and all available data ("Vegetation Dynamics 2.0": VD 2.0-in the sense of [64]). After downloading the data, the complete VD 2.0 processing was performed in two steps only, equaling two user interactions ( Figure 2). The first step was the generation of Analysis Ready Data (ARD), i.e., data preprocessed to allow immediate analysis with a minimum of additional user effort and guaranteed interoperability both through time and with other datasets (http://ceos.org/ard/, accessed on 18 September 2021). The second step comprised all higher-level processing carried out in the temporal domain, enabled by the data cube structure of the ARD that allows for immediate and efficient access to the temporal sequence of each pixel. The VD 1.0 processing was performed in FORCE, too, although some more user interactions were necessary to reproduce a workflow comparable to [42,49]. Postprocessing and validation were performed in R [65]. The complete workflow is outlined in Figure 2. All mathematical symbols used throughout the paper are listed in Table 1.  All Landsat images were converted to ARD using the FORCE Level 2 Processing System (v. 3.5). Clouds and cloud shadows were identified using a modified version of the Fmask algorithm [57,[66][67][68]. The cloud masking is currently assessed in the Cloud Masking Inter-comparison Exercise (CMIX, https://earth.esa.int/web/sppa/meetings-workshops/ hosted-and-co-sponsored-meetings/acix-ii-cmix, accessed on 18 September 2021) as well as in [69]. The FORCE atmospheric correction [68] branched from the AtcPro algorithm [50] used in the reference studies, and was re-implemented to enable bulk-processing of large image volumes [70]; thus a high level of comparability is ensured. The images were radiometrically standardized using radiative transfer modeling [71], including object-based Aerosol Optical Depth (AOD) estimation over dense dark vegetation [72] and water features [73]. Water vapor absorption was corrected using nearly concurrent MODIS water vapor estimates [74] or a long-term climatology as an approximation [62]; the effect of using the climatology as a surrogate was recently assessed in [63]. AOD estimation, as well as surface reflectance retrieval, were assessed in the Atmospheric Correction Inter-comparison Exercise (ACIX) [75], as well as in the ongoing 2nd edition (ACIX-II, https://earth.esa. int/web/sppa/meetings-workshops/hosted-and-co-sponsored-meetings/acix-ii-cmix, accessed on 18 September 2021). Topographic correction was performed with an enhanced C-correction, wherein the C-factor was estimated for each pixel in the image, then propagated through the spectrum using radiative transfer [76]. Three kernels of increasing size were used to approximate the background reflectance for adjacency effect correction [77]. Nadir BRDF-adjusted reflectance was retrieved using a global set of MODIS-derived BRDF kernel parameters [78]. Eventually, all images were re-projected to a single coordinate system (EPSG:3035) and then split into 30 × 30 km 2 image chips using a custom grid, thus representing data cubes [79]. This data cube concept allows for non-redundant data storage, efficient data access, as well as simplified extraction of data and information, which is required for the higher level processing outlined in the following sections.

Dataset Reduction for Vegetation Dynamics 1.0
Before the opening of the Landsat archive, VD 1.0 studies were limited to few Landsat images that needed to be handpicked carefully. Sonnenschein et al. [49] manually selected 14 Landsat TM/ETM+ images from 1984-2005 based on cloud cover and seasonal constraints with a maximum gap of one year between acquisitions ( Table 2). They aimed to pick images at peak vegetation vigor, which was assumed to approximately correspond to mid-May to mid-June in the mountainous rangelands of Crete. The seasonal validity of each selected image was compared with a 10-day AVHRR NDVI time series [80] for selected plots in these rangelands. To mimic this hand-picking of images, but for the whole of Crete, we used a selection algorithm on the full ARD dataset. We generated pixel-based composites with a parametric weighting scheme [15,22] with the FORCE Higher Level Processing System (v. 3.5) using the seasonal proxy as the only determinant, i.e., the closest observation to a target date was used to populate each pixel in the composite. The selected acquisition dates (Table 2) were used as temporal targets, and a narrow compositing period of ±10 days was used to restrain the selection of observations to dates that are as close as possible to the target dates. For most pixels, the exact date was selected, with secondary peaks at +7, −1, and −9 days, which are related to Landsat's orbital paths. Note that these composites still contain some clouds as only the seasonal suitability was used as the selection criterion. This was done to ensure that the compositing algorithm is biased towards selecting full images as this matches the manual selection process from the reference studies. Still, clouds were masked out for the analysis using the automated cloud masks, whereas clouds were manually identified in the reference studies. Table 2. Target dates used for Vegetation Dynamics 1.0 and pixel share of selected observations relative to the target dates (∆D).

Step 2: Time Series Analysis
The second step was performed with the FORCE Higher Level Processing System (v. 3.7), which includes a time series analysis submodule that may be configured to chain the following sub-sections in a single execution with one user interaction only.

Spectral Unmixing
For each acquisition date-the full dataset for VD 2.0, the reduced dataset for VD 1.0-an ecologically reasonable index for representing vegetation cover was computed based on the bottom-of-atmosphere reflectance spectra. Spectral Mixture Analysis (SMA, see for example, Somers et al. [81] for an overview) is superior compared to vegetation indices in quantifying sparse vegetation cover over bright background materials [82,83]. For comparability with Hostert et al. [42], and as we deem conventional SMA a straightforward means to generate consistent vegetation fractions for long and dense time series, we employed a linear unmixing approach following: wherein band-wise reflectance ρ b is explained as the summarized product of m endmember reflectance spectra ρ i,b with their respective fractions F i . A least-squares optimization was performed to minimize the model error E b [83], while both enforcing non-negativity of fractions F i [84], and honoring the sum-to-one constraint For congruency with Hostert et al. [42], we selected the same reference endmembers from field measurements (Table 3), i.e., rock (limestone with lichen), soil (Luvisol), photosynthetically active vegetation (PV, Quercus coccifera), and spectrometric shade for scaling the fractions for physical interpretability (Equation (3)). The four-endmember model was carefully selected considering the inherent dimensionality of Landsat TM data [42]. The limitation to four endmembers leads to non-photosynthetic vegetation (NPV) falling into the soil fraction component. This does not affect our comparison, given that the interpretation focuses on the trends in photosynthetic active vegetation here. The derived PV fraction was used for all further analyses. For simplicity, we will use f without subscript to refer to the shade-normalized PV fraction for the remainder of the manuscript. For the VD 2.0 analysis, additional processing was performed to generate annual time series devoid of seasonal effects that may arise from the static selection.
With LGAC data migration and data restoration of imagery lacking payload-correction data, it is generally feasible to extract the timing of specific phenological key points from the full data record on a per-pixel basis, i.e., to derive land surface phenology (LSP) metrics directly [32], or to downscale coarse resolution LSP metrics using data fusion techniques [85].
In a first step, the quality bit product was used to remove all poor-quality observations, i.e., pixels flagged as cloud, cloud shadow, snow, as well as saturated or sub-zero reflectance values. Acknowledging that cloud, and especially cloud shadow masks, are not perfectly accurate, a time series-based outlier detection followed by an inlier restoration was performed based on the noise estimation proposed by Vermote et al. [86]. Given a triplet of successive observations, the absolute residual r between the central observation i and linear interpolation between its neighbors is given by: where ρ is the reflectance of the shortest wavelength, i.e., blue, which commonly is most severely affected by remaining noise due to atmospheric effects; t is the time measured in continuous days. The noise ε of a time series with n observations is estimated as: We iteratively removed the observation i with the largest residual r i,max , if r i,max /ε exceeded a threshold. Subsequently, we restored all observations that were eliminated via the quality bit product, if r/ε was below a threshold. We set these thresholds to 3 and 1, respectively, which were found to be reasonable values based on practical experience.
Subsequently, the f time series were interpolated using an ensemble of Radial Basis Function convolution (i.e., smoothing) filters as described by Schwieder et al. [87], where the observations in the kernel are weighted according to a Gaussian distribution: with t being the time of the observation, ∆t the observation time relative to the kernel center, and σ the full-width-at-half-maximum of the Gaussian bell-shaped curve, all values given in days. We used three kernel widths σ with 8, 16, and 32 days, as e.g., employed by Rufin et al. [88]. A kernel cutoff value that preserves 95% of the area under the Gaussian curve was used. The estimates from the three kernels were eventually aggregated using a weighted average, wherein the weights correspond to the data available within each kernel. This weighting strategy gives preference to kernels with a higher data density for the final value estimation. Due to the implemented kernel cutoff value, no-data values occasionally remained during the winter season when data gaps are substantially larger than the employed kernel widths, and as such, we subsequently performed a linear interpolation to strictly enforce equidistance and absence of data gaps. LSP metrics for each year between 1984 and 2006 were then derived from the interpolated time series using a polar transformation-based methodology based on the concept proposed by Brooks et al. [33], partially reproduced here for the sake of readability. The time series is transformed into polar space by converting DOY to radians: Polar coordinates are then transformed to Cartesian space ( Figure 3) using: x r, f = f t · cos r t y r, f = f t · sin r t (8) and the long-term polar average vector [ r , f ] is computed from the average Cartesian coordinates [ x , y ] as: The long-term average vector points to the direction of the central tendency of annual peak cover. As phenological timing can substantially vary from pixel to pixel due to environmental and biophysical characteristics, the typical start of the phenological year θ is computed as the diametric opposite angle of [ r , f ], which points into the direction of least annual cover: This long-term measure of the start of the phenological year is used to structure the time series into phenological years, i.e., annual slices that contain a full season. Besides spatial variability, inter-annual variability may be substantial when analyzing highly volatile, precipitation-driven phenologies, such as in the Mediterranean. As such, we further improved on the Brooks method by fine-tuning the start of the phenological year for each season. For this, we use the initial long-term start of the phenological year θ , and re-estimate θ s for each season s (Equations (9)- (11)). This is achieved by considering θ s as the diametric opposite angle to the seasonal average vector [r s , f s ], which is computed from the average of all Cartesian coordinates [x s , y s ] in the given season (i.e., the annual slices between θ ). The updated values θ s are then used to restructure the time series into phenological years with a dynamic, yet reasonably constrained, phenological offset.
The observations before θ 1 and after θ ns in the first and last season of the time series, respectively, need to be discarded as they do not represent a full phenological cycle; thus resulting in ns = ny − 1 seasons, where ny is the number of years in the time series. If θ is located in the first half of the year, we did not obtain LSP metrics for the last year; or the 1st year if otherwise. Once the time series was structured into years, the computation of LSP metrics was straightforward, and a multitude of parameters that describe the timing of phenological key events, seasonal variability, seasonal bimodality, etc. were implemented in FORCE (see https://force-eo.readthedocs.io/en/latest/components/higher-level/tsa/ format.html#phenology, accessed on 18 September 2021, for a complete list).
Our reference studies aimed at selecting images at peak vegetation vigor. Thus, in a first step, we focused on the value of peak of season (VPS) metric, which is the maximum fractional cover for each year f VPS,y (see Figure 3) representing the fractional cover of all photosynthetically active vegetation types (PV). On Crete, vegetation growth below the alpine zone is not limited by temperature, but by water availability: woody vegetation is predominantly evergreen, and thus a fairly constant photosynthetically active fractional cover persists throughout the year, while the seasonal growth in woody vegetation is relatively small. In contrast, herbaceous vegetation has a strong seasonal component, where photosynthetically active fractional cover rapidly increases in response to winter rainfall and then quickly turns into NPV at the beginning of the hot and dry summer. Thus, in the second step, we additionally selected LSP metrics that approximate the seasonal growth of these vegetation components, and consequently offer a decomposition of the total PV cover into woody and herbaceous components. The value of base-level (VBL) is defined as the minimum fractional cover of each year f VBL,y , computed as the average of the first fractional cover value after θ s and θ s+1 , and hence approximates the fairly stable fractional cover of woody vegetation. The value of seasonal amplitude (VSA) is the difference between VPS and VBL: f VSA,y , thus f VPS,y = f VBL,y + f VSA,y . It approximates the seasonal growth of PV cover and is thus well aligned with the seasonal dynamics of herbaceous vegetation. It further follows that the remainder represents other cover types (soil, rock, or non-photosynthetically active vegetation), i.e., F NS = 1.0 − (f VBL,y + f VSA,y ).

Trend Analysis
For both VD 1.0 and VD 2.0, pixel-based linear trend analyses were performed on the annual time series: with regression intercept a and slope b. For VD 1.0, the trend analysis was performed on f derived from the reduced ARD dataset. We accounted for shifts in acquisition times by setting the regression time scale to fractional years since 1984, i.e., the middle of 1984 and the beginning of 1986 are encoded as y = 0.5 and y = 2.0, respectively. For VD 2.0, the trend analysis was performed on the annual f VPS values, thus the time scale for this analysis was simply given in years since 1984. As identical change magnitudes (trend b) have a substantially different meaning depending on the initial cover (offset a), we further normalized the trends to represent cover change CC (%), which is the long-term cover loss/gain measured relative to the initial cover:

Trend + Change Analysis
Disturbances in the woody vegetation component result in a non-linear trajectory of vegetation cover, and as such, linear trend analysis alone is ill-suited for these cases. Consequently, we further used an extension of the Change, Aftereffect, and Trend (CAT) transformation [37]. The original CAT transformation produces three parameters, which were primarily chosen for a synoptic visualization of landscape change processes as RGB composite. Given an annual time series of a vegetation index, Change is the maximum absolute difference between consecutive years; Aftereffect is the mean value after the change; and Trend is the slope of a linear regression applied to the full annual time series.
We complemented the CAT concept with a more exhaustive set of change and trend parameters including three change parameters: change, year of change, and percentage loss, wherein change only considers negative changes and percentage loss is the change relative to the initial cover as estimated by the offset of a linear regression applied to the full annual time series (analogous to Equation (13)). In addition, a full set of trend parameters is generated for each part of the time series, i.e., before and after the change, as well as for the complete time series.

Postprocessing: Vegetation Dynamics Syndrome Classification
For the woody vegetation, we classified the VBL-based CAT parameters into nine classes using a simple expert-based decision tree (Figure 4) similar to the land-use change syndrome approach [34,35]. If the initial cover was above 25% and if more than 25% of the initial cover was lost abruptly, the pixel was labeled as being disturbed. These pixels were further divided into two disturbance severity classes (mildly and severely disturbed), as well as into three directional classes based on thresholds as used in Hostert et al. [42] of the trend component for the "after disturbance" segment, i.e., mild/severe disturbances followed by a decline, stagnation, or recovery of fractional cover. For non-disturbed pixels, three classes were labeled accordingly, but using the trend component of the full time series, i.e., steady decrease, steady increase, or stable fractional cover. The herbaceous vegetation cover has high inter-annual variability in response to the rainfall sum of the preceding rainy season. In addition, disturbance (e.g., fire) rarely results in a permanent loss of herbaceous vegetation, as it recovers quickly. As we are only investigating semi-natural areas, land cover changes from semi-natural to, for example, built-up or agriculture, are excluded. Consequently, only the gradual conditions were used for the herbaceous cover, i.e., the VSA-based CAT parameters were split into three gradual change classes (Figure 4). The combination of both layers allows for a more nuanced and information-enriched long-term vegetation change assessment for both the woody and herbaceous vegetation. Additionally, the absolute net cover change (m 2 ) was computed for each pixel and component. For pixels subject to gradual processes, the net cover change was computed as the absolute gain/loss for the complete period: For disturbed pixels, the net cover change was computed as the sum of the three parts of the time series following the CAT transform: CC net,disturbed = CC net,be f ore − CC net,change + CC net,a f ter , with: CC net,be f ore = b be f ore ·ny be f ore ·900 m 2 CC net,change = (Change)·900 m 2 CC net,a f ter = b a f ter ·ny a f ter ·900 m 2

Validation
Validation data on vegetation components and their fractional cover, as well as longterm trends thereof, are unfortunately not available for the study area and cannot easily be derived from existing data sources. This especially applies as a result of the historic investigation period with unavailability of very high-resolution imagery, which would be needed at multiple time steps. Moreover, the mixed pixel spectral signatures dominating our feature space, and subtle changes thereof, prevent the development of quantitative reference data from the medium-resolution data themselves. A discussion on challenges considering a reliable, historic validation can be found in Hostert et al. [43]. In order to still provide guidance on the reliability of our used methods, we implemented a verification strategy based on interactive time series interpretation. As we needed to validate and compare different information layers of both quantitative and qualitative datatypes, we drew a random sample across Crete and evaluated all layers at the same point locations. We used Poisson Disc Sampling [89], a sampling technique that provides a uniform sample distribution across the study area with minimum distance between points; N = 977 samples were obtained. A trained expert was handed validation sheets with time series of the employed Landsat data, superimposed with the derived trends for the VD1.0 and VD2.0 analyses, peak of season, base level, and seasonal amplitude metrics, as well as detected changes and trends of the woody and herbaceous vegetation components. Annual subsets of best available pixel composites were plotted to provide spatial context. The expert assessed whether the outputs of the algorithm were obviously erroneous to the human eye, e.g., whether an illogical trend was computed, the magnitude was unreasonable, or whether a spurious change was detected. This may include incorrect trend magnitude or even trend direction due to misalignment of the VD1.0 date selection with the actual timing of peak phenology, exaggerated or reduced trend magnitudes which can be caused by incorrect offsets in result of missing data towards the beginning or end of the time series, or spurious change caused by missing or erroneous data. The assessment either confirmed or rejected the results of (1) the VD 1.0 and VD 2.0 trend analysis on total vegetation cover,

Results and Discussion
All maps presented in the following are available through an interactive web viewer: https://ows.geo.hu-berlin.de/webviewer/crete/ accessed on 18 September 2021; note that the vegetation components map for 1984, i.e., the first year of available data, is incomplete as a full phenological cycle need to exist to extract LSP metrics (see Section 3.2.2). The dataset produced in this study can be accessed at https://zenodo.org/record/5902672# .YfENz-rMKCo, accessed on 18 September 2021.

Comparison of Long-Term Vegetation Change between Vegetation Dynamics 1.0 and Vegetation Dynamics 2.0
The long-term vegetation cover change maps of the reproduced VD 1.0 indicate large areas with severe cover loss, Figure 5a. Overall, net change over Crete is negative (Table 4) with 50% and 26% of semi-natural areas showing signs of vegetation decline and increase, respectively. Although not numerically comparable (due to several reasons discussed in Section 1.1), these numbers resemble the findings in Hostert et al. [42], who found that over 40% of the mountain area of central Crete (Psiloritis) showed a decreasing trend of vegetation cover. Spatial patterns are also very similar to the maps published by Sonnenschein et al. [49], both indicating that general reproducibility with the past papers is given from a thematic perspective with a focus on overgrazing and that similar conclusions would have been drawn as before.  However, the results from VD 2.0 are fundamentally different to those from VD 1.0 (Figure 5b). Severe cover loss is limited to small areas and the magnitude of cover change is smaller compared to VD 1.0. Overall, net cover change is positive with 18% of areas showing a decreasing trend of vegetation cover, while half (50%) of all semi-natural areas exhibit an increase ( Table 4).
As the Psiloritis mountains were the primary study area in our reference studies, Figure 6 closes in on this area, and further depicts the difference in cover change ∆CC between both methods:  Figure 7. White areas are non-natural areas or above 1500 m a.s.l., which were out of scope of this study.
∆CC can assume the same value for different combinations of trend directions; as such, Figure 6c ∆CC is styled with four separate color bars. The first two color bars represent areas where the trend direction is consistent, the last two color bars represent areas with opposite trend directions. Figure 7 shows representative pixel time series, along with trends, for the six different colors in Figure 6c. The comparison of VD 1.0 and VD 2.0 reveals several general observations:

•
For both-negative trends, VD 2.0 rarely indicates a stronger decline (red in Figure 6c); Figure 7a shows a pixel where a fire occurred. The disagreement between methods is due to several reasons: (1) in the year where the change happened, no cloud-free image was available in VD 1.0, thus the fresh disturbance scar is not included in the regression; (2) before the change, the statically selected images coincide more closely with minimal cover, whereas they coincide more closely with peak cover thereafter-presumably due to a change in vegetation composition after the disturbance. • More commonly, however, VD 1.0 shows stronger negative trends (blue in Figure 6c, cf. Figure 7b). The trajectory shows large inter-annual variability with a low annual minimum of PV fractions, thus pointing to herbaceous-dominated vegetation composition. For such pixels, the static VD 1.0 image selection results in a volatile timing of the observation relative to the phenological cycle-sometimes close to peak cover, sometimes close to minimal cover. Thus, the resulting trend needs to be considered error-prone and potentially spurious. However, it is also apparent that, although the peak vegetation cover is rather stable over the long period (VD 2.0), there still is change related to a decreasing annual minimum of the cover, which is largely balanced by an increase in seasonal amplitude. This could be caused by an increase in herbaceous cover at the expense of a woody cover. • Green colors in Figure 6c represent pixels where both trends are positive with a stronger increase in VD 2.0; teal colors in Figure 6c represent pixels with a negative trend in VD 1.0 and a positive trend in VD 2.0. The corresponding time series (cf. Figure 7e,f) reveal that there is indeed an increase in peak cover (VD 2.0). The VD 1.0 analysis, however, indicates a slight increase only (Figure 7e) or a strong decline (Figure 7f), which is caused by a systematic shift in the timing of the statically selected images relative to the phenological cycle as a consequence of changes in vegetation composition towards a higher share of herbaceous cover, i.e., earlier peak cover that quickly turns into NPV at the beginning of the dry Mediterranean summer. Our validation supports the above observations. Only 32.9% of samples in case of VD 1.0 were confirmed, whereas 98.7% of samples were positively evaluated for VD 2.0, which: corroborates the finding that static image selection is volatile in areas where interannual or spatial variability in phenology is high, and 2.
confirms the robustness of a data-driven approach using phenological metrics.
However, it also needs to be noted that the expert could not arrive at a decision in 4.4% of cases (for both VD 1.0 and VD 2.0), primarily because disturbances were present, in which case the expert could not reliably judge the results of a monotonic trend fitted through the full time series.

Information-Enriched Long-Term Vegetation Change
The LSP-based decomposition of total photosynthetically active vegetation cover into herbaceous and woody components is visualized in Figure 8a for the year 1985. Compared to the total vegetation cover (the sum of blue and green), this split offers another layer of interpretation that readily depicts areas with different vegetation compositions. It is important to note that this split does not produce meaningful results when the underlying assumptions are violated (cf. Section 3.2.2, e.g., for a deciduous forest with a substantial seasonal component, as well as lush grasslands in more protected pockets that do not entirely dry out during summer. These areas are very rare in Crete though, thus we believe that the decomposition provides reliable results at large. The decomposition is highly informative when paired with a trend-and-change analysis, followed by syndrome-based labeling of vegetation dynamics classes (Figure 8b,c; Table 4). Half of Crete's semi-natural areas (i.e., cells of 30 × 30 km 2 ) do not show substantial changes in woody cover (52%), whereas 8.5% and 19% are subject to gradual decreases and increases, respectively. The remaining 20% were disturbed, and almost half of these were on a recovery trajectory thereafter. The ongoing decline after a disturbance is fairly rare. More than half of all areas do not show strong changes in the herbaceous cover (56.5%), while 15.5% and 28% of the pixels show signs of decreasing and increasing cover, respectively.
The accuracy assessment of the syndromes reveals that severe disturbances in the woody vegetation, as well as the absence of a disturbance were detected with high quality. Mild disturbances, however, were detected less reliably such that our expert only confirmed 62% of these algorithmic decisions. In addition, 3.68% of all samples were not labeled as the expert could not make a reliable decision. The trend of the woody vegetation was only assessed for samples where the change detection was positively evaluated, thus reducing the sample size by 99; another eight samples were not labeled as the expert could not make a reliable decision. The trend component, both for the total length of the time series (in case no change was detected), and for the time after a disturbance were confirmed to be of high quality. The same is true for the trend analysis of the herbaceous vegetation. In both cases, the algorithmic decision was rejected in less than 1% of samples.
It is acknowledged that our validation is liberal in nature, as the interpreter had access to the algorithm's decision and was tasked to reject decisions that did not match with human perception. Furthermore, this validation should only be interpreted in a qualitative sense, although our feature space (subpixel fractions of vegetation, annual LSP metrics), and observed change phenomena (long-term trends) are quantitative. Thus, it is to be expected that a more conservative validation with an independent and quantitative reference dataset would result in somewhat lower accuracies. However, such data on vegetation components and their fractional cover, as well as long-term trends thereof, are unfortunately not available for the study area, especially when considering the historic investigation period. This is similar to many other regions, where fractional cover reference data do not exist and cannot be created retrospectively either due to the non-existence of high resolution images at required temporal granularity, with Australia being an exception in terms of field data generation (but still for rather contemporary periods) [90]. Nevertheless, our quality assessment still underlines the robustness of this approach, and it is rooted in ecological principles of the seasonal pattern of photosynthetically active vegetation cover of woody and herbaceous vegetation components in semi-arid areas, see e.g., [91]. We therefore presume that our methodology is transferable to similar ecoregions with comparable data availability (although tuning of the endmember model might be necessary), e.g., to Australia, South Africa, or the Mediterranean at large. Due to the consistency of the contemporary and upcoming data from Landsat 8, Landsat 9, as well as Sentinel-2A/B and Sentinel-2C/D with the existing Landsat legacy data records, extending our approach temporally to the current era will be straightforward, too, although the change analysis may need to be extended (or interchanged with other tools, e.g., BFAST) to accommodate for more than one breakpoint. Our approach could also be extended using other novel approaches, e.g., by incorporating object-based segmentation of phenologically similar pixels to increase spatial coherence of spatially similar pixels [92] or by incorporating new insights of statistical trend estimates in large remote sensing data [93]. Entirely different drivers may relate to changes predominantly occurring in the herbaceous or woody cover, or a combination thereof (Figure 9). Thirty-eight percent of the areas with steadily decreasing woody cover (yellow left) do not show substantial changes in the herbaceous cover (gray right); 56% show signs of increasing herbaceous cover (green right). Thus, when only analyzing total vegetation cover as measured via the peak of season metric (Figure 5b), only the first case (yellow left, gray right) was observable. In the second case (yellow left, green right), the long-term decrease in woody cover was compensated by the increase in the herbaceous component. Consequently, the overall absence of decreasing total vegetation cover is not synonymous with the absence of vegetation dynamics.   Table 5.
Similarly, the change-and-trend analysis on decomposed woody and herbaceous cover allows making more informed interpretations of the vegetation dynamics concerning the ecological meaning of related changes. As an example, areas with steadily increasing woody cover (green left) unanimously show no (gray right, 56%) or negative (yellow right, 32%) changes in the herbaceous vegetation. This might be caused by shrub encroachment, which might be considered a degradation process [94]. However, whether the decrease in the herbaceous cover is due to selective grazing or occlusion of the herbaceous layer by a growing canopy cannot be answered with this approach.
Disturbances in the woody vegetation (bottom left), e.g., caused by fire, do not show changes in the herbaceous cover in 47% of pixels, which-as described in Section 3.3-is because herbaceous vegetation quickly recovers after disturbance events. Secondarily, 34% of disturbed areas show signs of increasing herbaceous cover, which could point to areas where shrubs are mechanically removed to provide a grass-dominated vegetation composition-most likely as fodder for sheep and goats [95]. Therefore, the synoptic view on both change trajectories plays an important role to more reliably interpret vegetation changes concerning their trend direction, as well as to disentangle land uses that, e.g., cause a transition in vegetation composition from shrubs to grasses, which is not observable at all when analyzing peak vegetation only.
Next to the ecology-based syndrome classification, the absolute change in the cover is also important to consider as the previous assessment overestimated the actual vegetation cover that was lost or has accumulated. As an example, more biomass may be lost in productive areas than compared to high ranges with sparse vegetation cover. Thus, we additionally computed the absolute sub-pixel net cover change in m 2 over the complete period for each vegetation component ( Figure 10). The net cover change varies regionally and is often pointing in opposite directions in the woody and herbaceous components (c.f. the rather stable total vegetation change in Figure 5b). As an example, a greening of the herbaceous layer in southern Crete is apparent while the higher ranges are more subject to an increase in woody vegetation. This decomposition is critical for carbon budgeting as, e.g., a loss in the woody vegetation is associated with a larger release of CO 2 as compared to the same amount of area lost in herbs (see e.g., [96] for biomass values). The total cover budget for all semi-natural areas in Crete ( Figure 11) reveals that gains exceed losses for both herbaceous and woody components, thus resulting in a net gain of vegetation cover. This is in line with a general greening trend in the entire Mediterranean [97]. The gain is unanimously a result of the gradual long-term processes. The disturbances in the woody cover are fairly neutral overall (when considering the entire island): the cover lost through disturbances without regrowth is compensated by areas where the regrowth exceeds the magnitude of the disturbance.

Conclusions
We reproduced a past case study on land-use change induced land degradation in a Mediterranean ecosystem ("Vegetation Dynamics 1.0" [42]), which was conducted before the opening of the Landsat archive, and well before the remigration of European data holdings into the main data archive, thus inevitably being limited by data scarcity. Now, for the first time, the complete cross-calibrated data record over Europe was available and allowed to replicate this study with a state-of-the-art, data-driven, phenology-adaptive methodology based on the peak of season metric ("Vegetation Dynamics 2.0"). Following our findings, we conclude that trends derived from a Vegetation Dynamics 1.0 methodology are systematically prone to uncertainty due to a volatile timing of statically selected images relative to the spatially and inter-annually variable phenological cycle-sometimes close to peak vegetation cover, while other times close to minimal vegetation cover. This is especially critical if: • a disturbance in the woody vegetation happened, • a transition from/to woody/herbaceous vegetation took place, or • inter-annual variability in seasonal herbaceous vegetation cover was high.
As expected, the majority of semi-natural areas in our study area were subject to at least one of these conditions. The employed Vegetation Dynamics 2.0 methodology yielded more robust results. Nonetheless, results also highlighted that: • linear regression is too simplistic a tool to assess long-term vegetation cover change when stand-replacing disturbances in the woody vegetation cannot be ruled out, and that • peak vegetation cover is not the optimal parameter to analyze.
A more nuanced and meaningful characterization could be achieved by further decomposing the total vegetation cover into woody and herbaceous components using the base level and seasonal amplitude metrics, respectively. The synoptic view on both change trajectories eventually allowed for: • a more reliable interpretation of vegetation changes with respect to their trend direction and ecological meaning, • to disentangle certain land-use change processes with opposite trajectories in the vegetation components that were unobservable when analyzing total vegetation cover, • generating a long-term budget of net cover change, which revealed that vegetation cover of both components has increased at large, mainly due to gradual processes.
For the Earth observation-based assessment of long-term vegetation dynamics in semi-arid Mediterranean ecosystems, we conclude that static image selection study designs should be critically evaluated in the light of current data availability, analytical capabilities, and with regards to the ecosystem under investigation. This applies both to studies before the opening of the Landsat archive, and to contemporary papers implementing static selection. Although we only investigated one particular case study in the Mediterranean, we nevertheless postulate that similar findings would also apply to other cases and ecosystems. We recommend using all available data and take advantage of phenology-based approaches that remove the selection bias and hence reduce uncertainties in results.