Long-Term Satellite Image Time-Series for Land Use/Land Cover Change Detection Using Refined Open Source Data in a Rural Region

The increasing availability and volume of remote sensing data, such as Landsat satellite images, have allowed the multidimensional analysis of land use/land cover (LULC) changes. However, the performance of image classification is highly dependent on the quality and quantity of the training set and its temporal continuity, which may affect the accuracy of the classification and bias the analysis of the LULC changes. In this study, we intended to apply a long-term LULC analysis in a rural region based on a Landsat time series of 21 years (1995 to 2015). Here, we investigated the use of open LULC source data to provide training samples and the application of the K-means clustering technique to refine the broad range of spectral signatures for each LULC class. Experiments were conducted on a predominantly rural region characterized by a mixed agro-silvo-pastoral environment. The open source data of the official Portuguese LULC map (Carta de Uso e Ocupação do Solo, COS) from 1995, 2007, 2010, and 2015 were integrated to generate the training samples for the entire period of analysis. The time series was computed from Landsat data based on the normalized difference vegetation index and normalized difference water index, using 221 Landsat images. The Time-Weighted Dynamic Time Warping (TWDTW) classifier was used, since it accounts for LULC-type seasonality and has already achieved promising overall accuracy values for classifications based on time series. The results revealed that the proposed method was efficient in classifying a long-term satellite time-series with an overall accuracy of 76%, providing insights into the main LULC changes that occurred over 21 years.


Introduction
The increased availability of free and analysis-ready remote sensing (RS) data, such as Landsat, supports dynamic analysis in time and space [1,2], since it allows researchers to perform multidimensional classifications of land use/land cover (LULC) based on satellite image time series [3][4][5].In fact, the classification methods involving multitemporal remotely sensed images are becoming a trend in LULC change detection [6] and also in detailed LULC classification approaches [7], since these are reported to achieve better results than single-date classification methods [8][9][10].Particularly due to the developed and improved tools and algorithms that give priority to time dimension in satellite data analysis [11,12], the changes and disturbances of LULC classes can now be more easily detected.
As Zhu and Woodcock [6] emphasized, the ability to accurately identify LULC changes using remote sensing data depends on an algorithm that can use high spatial resolution data and is based on a multitemporal analysis.However, the multitemporal analysis application can have some limitations-for example, vegetation phenology, sun angles, clouds, sensor errors, among others [6].
Likewise, concerning the well-established, advanced, nonparametric classifiers in remote sensing literature [8,13], such as the support vector machine (SVM) and the random forest (RF) [14], it has been recognized that some challenges still remain regarding LULC classifications based on time series [15].These challenges are: (i) the inexistence/existence and the quality of the samples required to train the algorithm, (ii) the irregular phenological signatures of different LULC types through time, and (iii) the absence of a data temporal continuum [15].
The Dynamic Time Warping (DTW) classifier has been demonstrated to be a capable solution to deal with some of these challenges [15,16].Since it can "fill-in" temporal gaps in the remote sensing time series (e.g., cloudy images), it has been successfully applied in satellite imagery time-series analysis [17,18].However, the distinctive phenological cycle of each LULC class requires an equilibrium between shape matching and temporal alignment [19,20], which is why Maus et al. [21] improved the DTW algorithm.Maus et al. [21] proposed the Time-Weighted Dynamic Time Warping (TWDTW) method that includes time-weighting to account for seasonality.This yields better time series to benefit the LULC mapping.In the research of Maus et al. [18], this method was applied to the LULC classification of a tropical forest area from MODIS enhanced vegetation index (EVI) data.It had the highest accuracy for the forest, pastures, single cropping and double cropping classes' classification and an overall accuracy of about 87%.Belgiu and Csillik [22] evaluated the TWDTW method for mapping different cropland types in two European cities (in Italy and Romania) and one American city (in California) from Sentinel-2 normalized difference vegetation index (NDVI) time series.In their research, this method achieved a higher overall accuracy when classifying different cropland classes in the two European study areas compared to the classification results of the RF method [22].Also, the TWDTW method proved it could produce good output in terms of mapping accuracy, even with few training samples [15,23].
In addition, it is not only the absence of LULC reference data to train the algorithm that can be detrimental to the image classification results but also the lack of representative samples for each LULC class [24].If we look into the literature regarding the acquisition of training samples, they are usually acquired from field surveys or expert knowledge through the visual interpretation of other products-for example, high-resolution images from Google Earth and aerial photographs [24].However, collecting training samples from fieldwork involves high associated costs, in terms of money and time, and collecting training samples through visual interpretations can be difficult, due to the lack of available products, causing biases in the representation of the samples [25].Various approaches have been used to overcome the lack insufficient training samples, such as semi-supervised learning or, more recently, active learning [26,27].However, both these approaches are dependent on preexisting labelled samples, which require user expertise and proprietary software.In addition, in a long-term multitemporal analysis approach, it would be very unlikely to have samples for each of the years under study (especially, if the analysis had just begun in the current year).Accordingly, it is essential to explore ways to obtain numerous quality training samples to allow more accurate remote sensing classification.

Context and Background
High economic pressures have been strongly influencing agricultural practices over the years [28].Considering the importance of agricultural land, it is urgent to have information regarding the dynamics of LULC changes at regional levels over time to promote parsimonious use of the available resources [29][30][31].However, the ability to identify LULC changes accurately depends on a multitemporal analysis [6,8].Nevertheless, how can we evaluate the LULC dynamics over time when there is scarce reference LULC data for certain regions?
In addition, the agricultural areas can be characterized by different cropping calendars (due to different agricultural practices, intensive and super-intensive) and field geometries (with the existence of both fragmented parcels and larger and compact parcels) and can be influenced by the climate conditions and LULC management [18][19][20].These characteristics, together with the influences of the complex spectral properties of the different vegetation types [32], point to some of the challenges present in a multitemporal remote sensing application.
In this study, we address some of these challenges.We start by establishing a methodology to develop a multitemporal analysis of the LULC changes and thereby provide essential information regarding the extent of the changes.The goal of this study was twofold: (i) access free LULC data to be used as training samples for a long-term satellite image time-series classification; and (ii) identify the LULC changes that occurred from 1995 to 2015 in a rural region characterized by a mixed agro-silvo-pastoral environment in the municipality of Beja, Portugal.This region is characterized by a set of LULC complex patterns typical of the Iberian Peninsula.It is a Mediterranean agro-forestry system [33,34] that includes a vast landscape of intermingling cultures, such as wheat, olive groves, vineyards, cork oak forests and pastures, which have a high economic importance in the Portuguese agricultural industry [35].Therefore, we first sought to integrate the free official Portuguese LULC maps (Carta de Uso e Ocupação do Solo, COS) to produce a data source for training purposes.Second, we sought to investigate the potential of a k-means clustering technique [25,36,37] to refine the broad range of spectral signatures for each LULC class of the training data.Third, we tested the TWDTW algorithm on a Landsat imagery time-series classification, using the refined sample source.Finally, we evaluated the extent of the changes over 21 years.
Our research is organized as follows.Section 2 describes the study area and our data.The applied methodology is detailed in Section 3. The multitemporal satellite imagery classification and the results of the LULC changes are described in Section 4. Section 5 elaborates on the discussion.Finally, the main conclusions of our research are described in Section 6.

Brief Introduction to the Study Area
The municipality of Beja, in the Alentejo region of Portugal, with an area of about 1145 km 2 , was selected for the study (see Figure 1).Different crops characterize this region, but the dominant land use is a mixed agro-silvo-pastoral environment [33,34].The different agricultural practices require different cropping calendars and field geometries, making the landscape of this region distinctive, with the existence of both fragmented land parcels and larger and more compact ones.This region has a Mediterranean climate, influenced by its distance to the coast (average high temperatures are around 33 • C in August and 14 • C in January) [38].The mean annual precipitation is around 558 mm, concentrated in the months of November to January.Beja is characterized by a low population density (31 inhabitants per km 2 ).The Guadiana River is the most important watercourse in this region; it crosses the west border of the municipality.The Alqueva Dam was built in 2002 on the Guadiana River, creating the largest artificial lake in Europe.However, it was only fully operational in 2012 in providing water for agriculture services [32].

Satellite Image Collection and Preprocessing
Landsat data is the longest and most informative temporal report of the earth.It is the only data source that gives a continuous image of the earth's resources with a high spatial resolution (30 m) spanning nearly 40 years [39].To obtain vast temporal continuity data and compose our long time series, we downloaded available Landsat data (with less than 50% of cloud cover) from 1995 through 2015 from the official website of the United States Geological Survey (https://earthexplorer.usgs.gov/).This included Landsat 4-5 Level-2 and Landsat 8 Level-2 data (path 203, row 34).We were working with data that spanned 21 years, for a total of 221 scenes (see Table 1).As shown in Table 1, there were no available Landsat images for 2012, either from Landsat 4-5 Level-2 or Landsat 8 Level-2.Only the Landsat 7 sensor had information for 2012, but unfortunately, as is commonly known, all Landsat 7 scenes have data gaps for this area (due to a technical problem with the satellite) [40].

Satellite Image Collection and Preprocessing
Landsat data is the longest and most informative temporal report of the earth.It is the only data source that gives a continuous image of the earth's resources with a high spatial resolution (30 m) spanning nearly 40 years [39].To obtain vast temporal continuity data and compose our long time series, we downloaded available Landsat data (with less than 50% of cloud cover) from 1995 through 2015 from the official website of the United States Geological Survey (https://earthexplorer.usgs.gov/).This included Landsat 4-5 Level-2 and Landsat 8 Level-2 data (path 203, row 34).We were working with data that spanned 21 years, for a total of 221 scenes (see Table 1).As shown in Table 1, there were no available Landsat images for 2012, either from Landsat 4-5 Level-2 or Landsat 8 Level-2.Only the Landsat 7 sensor had information for 2012, but unfortunately, as is commonly known, all Landsat 7 scenes have data gaps for this area (due to a technical problem with the satellite) [40].
As we were using Landsat Level-2 Surface Reflectance images, no atmospheric correction was needed.Also, no further normalization was required since the TWDTW uses a Generalized Additive Model (GAM) with a spline smooth function and a logistic regression for time weight (see Section 3.4).Furthermore, no masking for clouds or shadows was performed since the TWDTW classifier can "fill in" temporal gaps [17,18].The ETRS_1989_Portugal_TM06 coordinate system was applied to the image datasets, respecting the 30 x 30 m spatial resolution.As we were using Landsat Level-2 Surface Reflectance images, no atmospheric correction was needed.Also, no further normalization was required since the TWDTW uses a Generalized Additive Model (GAM) with a spline smooth function and a logistic regression for time weight (see Section 3.4).Furthermore, no masking for clouds or shadows was performed since the TWDTW classifier can "fill in" temporal gaps [17,18].The ETRS_1989_Portugal_TM06 coordinate system was applied to the image datasets, respecting the 30 x 30 m spatial resolution.

Official Portuguese Land Cover Map (COS) Data
The COS maps of 1995, 2007, 2010, and 2015 used to derive the training samples were produced by the Portuguese General Directorate for Territorial Development (DGT).These datasets are freely available for download on the DGT website (http://mapas.dgterritorio.pt/geoportal/catalogo.html).The COS maps are in a vector format (polygons) with a cartographic accuracy of 0.5 m and a minimum mapping unit (MMU) of 1 hectare and are produced by photointerpretation (i.e., over orthophoto maps) [41].COS uses a hierarchical, a priori nomenclature system, having five disaggregation levels.A DGT report [41] describes in detail the COS nomenclature characteristics.

Methods
The first step of this research was to download the Landsat Level-2 imagery data from 1995 to 2015.Second, we used the data to compute the normalized difference vegetation index (NDVI) and the normalized difference water index (NDWI).Then, we combined all the scenes to create one raster time-series file for each index.Third, using the COS maps, we identified the most representative LULC classes in the study area and randomly generated sample points for each year to be used in the training.Fourth, we refined the training samples by class and by year using the k-means clustering technique.Finally, in the fifth step, we classified the Landsat imagery time series using the TWDTW algorithm and evaluated the classification accuracy (see Figure 2).

Official Portuguese Land Cover Map (COS) Data
The COS maps of 1995, 2007, 2010, and 2015 used to derive the training samples were produced by the Portuguese General Directorate for Territorial Development (DGT).These datasets are freely available for download on the DGT website (http://mapas.dgterritorio.pt/geoportal/catalogo.html).The COS maps are in a vector format (polygons) with a cartographic accuracy of 0.5 m and a minimum mapping unit (MMU) of 1 hectare and are produced by photointerpretation (i.e., over orthophoto maps) [41].COS uses a hierarchical, a priori nomenclature system, having five disaggregation levels.A DGT report [41] describes in detail the COS nomenclature characteristics.

Methods
The first step of this research was to download the Landsat Level-2 imagery data from 1995 to 2015.Second, we used the data to compute the normalized difference vegetation index (NDVI) and the normalized difference water index (NDWI).Then, we combined all the scenes to create one raster time-series file for each index.Third, using the COS maps, we identified the most representative LULC classes in the study area and randomly generated sample points for each year to be used in the training.Fourth, we refined the training samples by class and by year using the k-means clustering technique.Finally, in the fifth step, we classified the Landsat imagery time series using the TWDTW algorithm and evaluated the classification accuracy (see Figure 2).

Derivation of NDVI and NDWI from Landsat Imagery
The municipality of Beja has a heterogeneous biophysical environment, so to improve the identification of the different vegetation classes using Landsat imagery seasonal time series [10,42,43], we computed two indexed time series, NDVI and NDWI.To derive these indices, we used the equations in Table 2.
The normalized difference vegetation index (NDVI) is the most commonly used satellite-based measure to globally monitor vegetation [44] due to its robust and easy interpretation.This index considers the distinctive shape of the vegetation reflectance curve and is able to screen dynamic

Derivation of NDVI and NDWI from Landsat Imagery
The municipality of Beja has a heterogeneous biophysical environment, so to improve the identification of the different vegetation classes using Landsat imagery seasonal time series [10,42,43], we computed two indexed time series, NDVI and NDWI.To derive these indices, we used the equations in Table 2.
The normalized difference vegetation index (NDVI) is the most commonly used satellite-based measure to globally monitor vegetation [44] due to its robust and easy interpretation.This index considers the distinctive shape of the vegetation reflectance curve and is able to screen dynamic specifications of vegetation and/or vegetation health [45].Despite the existence of several other vegetation indices, the ability of the NDVI index to monitor vegetation for seasonal and inter-annual changes has already been sufficiently substantiated in prior studies [44,46].The normalized difference water index (NDWI) was used as an additional aid, since it was conceived to outline open water features and improve their identification through RS imagery [47,48].According to Gao [47], this index complements NDVI by improving in the identification of liquid water content of vegetation, whereas NDVI just accounts for the vegetation's greenness [49].

Training Sample Derivation
The COS maps for 1995, 2007, 2010, and 2015 were used to produce a sample source for training purposes.As the first step, we statistically analyzed the COS maps to understand what were the main LULC classes in the study area and how they changed over time.The identification of the major LULC classes was based on the second level of nomenclature of the COS maps.We found the predominant temporary crops (51%) in 1995 (Table 3).However, in later years (2007, 2010 and 2015), the coverage of temporary crops declined by almost 8%, though they still represented the largest cropland use (43%).The heterogeneous agricultural areas represented the second largest LULC class over the 21-year period (19% in 1995 and 2015; 20% in 2007 and 2010).Indeed, during this time interval, there was an increase in forested areas (11% in 1995; 12% in 2007; and 13% in 2010 and 2015).Despite the decrease in pasture coverage between 2010 and 2015 by 2%, this LULC class covered about 10% of the study area in 1995 and 12% in 2007.Although permanent crops represent a small amount of LULC in the Beja municipality, they increased substantially (by 7%) between 1995 and 2015.Artificial surfaces and wetlands represent minor LULC classes in this municipality.The results revealed the predominance of agricultural land (>82%) in the Beja municipality over the 21-year period.Nevertheless, the second largest class (heterogeneous agricultural areas) encompassed different mosaics: (i) areas of temporary crops and/or permanent crops on the same parcel; (ii) temporary crops cultivated under agroforestry systems; (iii) meadows and/or permanent crops, and (iv) landscapes in which crops and pastures intermix with natural vegetation [41].Indeed, this was a highly heterogeneous class that may include each of the other agricultural classes that predominate in the Beja municipality.When analyzing this class in more detail (looking at the COS maps' third level of nomenclature), we found a predominance of temporary crops cultivated under agroforestry systems (>85%) over the 21-year period.
Therefore, we decided to mix the temporary crops and the heterogeneous agricultural areas classes.As such, five major LULC classes were identified as the most representative of the selected study area: (1) temporary crops with agroforestry areas (including the temporary crops and the heterogeneous agricultural areas); (2) permanent cropland; (3) permanent pastures; (4) forest; and (5) water.Also, the temporary crops with agroforestry areas could encompass impervious surfaces, since in Beja, the dispersed settlements had relatively small dimensions with houses far apart from each other and surrounded by bare soils and/or croplands.The impervious surfaces were less than 2% of the whole Beja municipality area and have not significantly changed throughout the years.
As the second step, we randomly generated the training sample points for each of the five LULC classes previously identified (see the COS nomenclature codes in Table 4).However, we decided to a priori reduce the polygon areas of the COS maps, using a QGIS Python plugin called "Buffer by Percentage".We followed this approach, because we knew training samples had low accuracies when their proximity to the polygon's boundaries increased [50].As we were using the shape and area of each COS map polygon to produce the training samples, we sought to ensure a relationship of magnitude between the MMUs of both data sources (the COS map and the Landsat imagery) [51].This ensured the generated samples related to one cell superimposed to just one polygon.Since the COS has an MMU of 1 hectare and the Landsat imagery cell size was 900 m 2 , we reduced the COS polygon areas to 11% of the original.Consequently, 2000 training samples were generated for the heterogeneous agriculture class (since this area occupied more than 50% of the study area) and 1000 training samples were generated for each of the remaining LULC classes of the four COS maps (1995, 2007, 2010, and 2015) making 6000 samples per year.Freshwater surfaces, including watercourses and water plans (both natural and artificial)

Training Sample Refinement
The created sample source had the following limitations: (1) the sample source was acquired from vector-based LULC data (the COS maps); (2) no training samples acquired from fieldwork were integrated; and (3) there was a high probability of spectral-temporal signature confusion among LULC classes, due to the heterogeneous biophysical environment of the study area.As such, we decided to apply the k-means clustering technique [25,36,37] to improve the sample source.
Cluster analysis methods are normally used in an attempt to detect homogeneous groups.The rationale behind implementing a clustering method to refine the training set is that each generated training sample per class should be grouped to either one or, at most, two clusters.This is due to the Freshwater surfaces, including watercourses and water plans (both natural and artificial)

Training Sample Refinement
The created sample source had the following limitations: (1) the sample source was acquired from vector-based LULC data (the COS maps); (2) no training samples acquired from fieldwork were integrated; and (3) there was a high probability of spectral-temporal signature confusion among LULC classes, due to the heterogeneous biophysical environment of the study area.As such, we decided to apply the k-means clustering technique [25,36,37] to improve the sample source.
Cluster analysis methods are normally used in an attempt to detect homogeneous groups.The rationale behind implementing a clustering method to refine the training set is that each generated training sample per class should be grouped to either one or, at most, two clusters.This is due to the Freshwater surfaces, including watercourses and water plans (both natural and artificial)

Training Sample Refinement
The created sample source had the following limitations: (1) the sample source was acquired from vector-based LULC data (the COS maps); (2) no training samples acquired from fieldwork were integrated; and (3) there was a high probability of spectral-temporal signature confusion among LULC classes, due to the heterogeneous biophysical environment of the study area.As such, we decided to apply the k-means clustering technique [25,36,37] to improve the sample source.
Cluster analysis methods are normally used in an attempt to detect homogeneous groups.The rationale behind implementing a clustering method to refine the training set is that each generated training sample per class should be grouped to either one or, at most, two clusters.This is due to the Freshwater surfaces, including watercourses and water plans (both natural and artificial)

Training Sample Refinement
The created sample source had the following limitations: (1) the sample source was acquired from vector-based LULC data (the COS maps); (2) no training samples acquired from fieldwork were integrated; and (3) there was a high probability of spectral-temporal signature confusion among LULC classes, due to the heterogeneous biophysical environment of the study area.As such, we decided to apply the k-means clustering technique [25,36,37] to improve the sample source.
Cluster analysis methods are normally used in an attempt to detect homogeneous groups.The rationale behind implementing a clustering method to refine the training set is that each generated training sample per class should be grouped to either one or, at most, two clusters.This is due to the

Training Sample Refinement
The created sample source had the following limitations: (1) the sample source was acquired from vector-based LULC data (the COS maps); (2) no training samples acquired from fieldwork were integrated; and (3) there was a high probability of spectral-temporal signature confusion among LULC classes, due to the heterogeneous biophysical environment of the study area.As such, we decided to apply the k-means clustering technique [25,36,37] to improve the sample source.
Cluster analysis methods are normally used in an attempt to detect homogeneous groups.The rationale behind implementing a clustering method to refine the training set is that each generated training sample per class should be grouped to either one or, at most, two clusters.This is due to the fact that each generated training sample should have a similar spectral signature, such as the one belonging to their LULC class.To verify this hypothesis, we used the R package tclust [52].
The R package tclust offers an adequate cluster scatter matrix constraint, leading to a wide range of clustering procedures and avoiding the occurrence of spurious non-interesting clusters.Also, it deals with noisy data and provides graphical exploratory tools (see Figure 3) to help the user make sensible choices regarding trimming the outliers and choosing the number of clusters [52].These options were the main reason why this package was chosen.
The clustering process is schematically portrayed in Figure 4 and is essentially divided into three distinct processes: (1) classification-trimmed likelihood calculation (ctlcurves); (2) cluster computation; and (3) mean discriminant factor value calculation (DiscrFact).The variables used for the creation of clusters were the NDVI and the NDWI indices (the same variables that would be used for the image classification).
The R package tclust offers an adequate cluster scatter matrix constraint, leading to a wide range of clustering procedures and avoiding the occurrence of spurious non-interesting clusters.Also, it deals with noisy data and provides graphical exploratory tools (see Figure 3) to help the user make sensible choices regarding trimming the outliers and choosing the number of clusters [52].These options were the main reason why this package was chosen.The clustering process is schematically portrayed in Figure 4 and is essentially divided into three distinct processes: 1) classification-trimmed likelihood calculation (ctlcurves); 2) cluster computation; and 3) mean discriminant factor value calculation (DiscrFact).The variables used for the creation of clusters were the NDVI and the NDWI indices (the same variables that would be used for the image classification).The ctlcurves function approximates the classification-trimmed likelihoods by successively applying the tclust function to a sequence of values, k (number of clusters) and α (fraction of the most outlying data).The calculation of ctlcurves shows the implications of increasing/decreasing k-values in the fraction of the most outlying samples (limited, in this case, to 30% of the sample universe).
To determine the most adequate parameters to include in the tclust function, the following logic was employed: 1. Was it possible to create only one cluster that could well represent each LULC class reality?If so, then only one cluster was created.
2. When more than one cluster had to be created, what was the minimal number of clusters that should be created?In this case, we selected the minimum number of clusters possible, trimming up to the maximum of 30% of the sample universe.When this strategy was put into practice, two and, in rare cases, three clusters were created.The ctlcurves function approximates the classification-trimmed likelihoods by successively applying the tclust function to a sequence of values, k (number of clusters) and α (fraction of the most outlying data).The calculation of ctlcurves shows the implications of increasing/decreasing k-values in the fraction of the most outlying samples (limited, in this case, to 30% of the sample universe).
To determine the most adequate parameters to include in the tclust function, the following logic was employed: 1. Was it possible to create only one cluster that could well represent each LULC class reality?If so, then only one cluster was created.
2. When more than one cluster had to be created, what was the minimal number of clusters that should be created?In this case, we selected the minimum number of clusters possible, trimming up to the maximum of 30% of the sample universe.When this strategy was put into practice, two and, in rare cases, three clusters were created.
To make sure that in both hypotheses we were left with well-determined clusters, the quality of the cluster assignments and the trimming was evaluated by applying the function DiscrFact.This function helped: (1) to discover if the clusters were well-determined, and (2) to select the cluster with the least doubtful assignments to represent the reality of the respective class [53].
The parameters k and α for the tclust function that provided (among other information) the cluster assignment to each observation, differed according to the results generated by the ctlcurves and DiscrFact functions.The remaining parameters were left at their default values, with the exception of restr that was set to "eigen" to simultaneously control the relative group sizes and the deviation from sphericity in each cluster.Also, the equal.weightsparameter was set to "TRUE" to avoid the creation of a well-determined cluster that actually does not represent the LULC class.This way, we achieved heterogeneous clusters as much as possible, since no LULC class had a pure and unique spectral signature.For a correct classification, we needed some variability in the range of values to avoid eliminating potentially important information [52].The restr.fact parameter was increased until no artificial restriction was necessary to create the clusters [53] (1 being the most restrictive case).
Finally, the tclust package was used to refine the training points generated for each class and for each of the four years.In Table 5, we present the final number of training samples per class for each year after the clustering analysis.

Dynamic Time Warping and Time-Weighted Dynamic Time Warping Methods
Sakoe and Chiba [54] developed the Dynamic Time Warping (DTW) method for speech recognition applications, and it was later introduced for use in satellite imagery time series analysis [15][16][17][18].However, DTW proved to be unfit for satellite imagery time-series analysis, because "it disregards the temporal range when trying to find the best pair matches between time series" [21].The distinctive cycle of each LULC class requires an equilibrium between shape matching and temporal alignment to be employed in LULC classification using remote sensing time-series images [19,20].
To build the time-series analysis, the TWDTW algorithm maps the derived indices (NDVI and NDWI) to a three-dimensional array in space and time, where each pixel location is associated with a sequence of measurements.To match LULC classes to the subintervals (i.e., years) of a long-term satellite image time series and classify phenological cycles, this method uses both cycle amplitude and phase information.The spectral-temporal patterns are defined by applying a GAM [18,21] with a smoothing function of y = s(x), where the function s corresponds to a spline model, x is time, and y is each satellite band.
To set the temporal constraints of the TWDTW algorithm, we applied a logistic weight function, since this provides more accurate results than a linear function [18,21].This logistic function uses a small weight for short time warps and substantial weight for larger time warps, with a midpoint (beta) and steepness (alpha).It is formally given by (Equation ( 1)) [18,21]: where g(t 1 , t 2 ) is the elapsed time (in days) between date t 1 (in the pattern) and date t 2 (in the time series).The purpose of this function is to control the time warp.If a large seasonal difference exists between the pattern and its matching point in the time series, an additional cost is added to the distance measure (DTW).The g function constraint drives time warping and ensures that the time-series alignment is reliant on the seasons.This is particularly valuable for identifying temporary crops and in differentiating pasture from agriculture.To achieve an LULC classification, and as recommended in Maus et al. [21], we assign the logistic function α = −0.1 and β = 100 to weight the function.This implies adding a time weight to the DTW with a low penalty for time warps smaller than 60 days and a higher penalty for larger time warps.We also set the temporal pattern frequency to eight days to ensure, at least, one Landsat image was considered (i.e., a temporal resolution of eight days).
Since LULC classifications are extremely dependent on the quality of spectro-temporal patterns, we performed a k-fold cross-validation [55].This function uses the training samples to generate the spectral-temporal patterns.The results of this classification are used in the accuracy calculation.In this study, we obtained the accuracy for each data partition using 100 different data partitions, each of them with 10% of the training samples randomly selected for training and 90% for validation.All calculations and analyses were accomplished using the R programming language and operating environment [56].Maus et al. [21] implemented the TWDTW method for LULC classifications using satellite imagery time series on an open-source R package (dtwSat, for a package overview see [21]).

Classification Accuracy Assessment
Several metrics commonly used in remote sensing were calculated, including the kappa index [57], the overall and individual user and producer accuracies [58], the error matrix (confusion matrix), and the confidence intervals for the accuracy measures (error tolerances) [59].The kappa index estimates how good the classification is when compared to a randomly generated image.The overall accuracy measure indicates total classification performance as a percentage by dividing the number of samples correctly classified by the total number of samples.Individual user accuracy measures the probability that any classified pixel will actually match the samples, while individual producer accuracy provides the probability that a particular LULC class is classified as such in the samples.The error matrix indicates the samples correctly (diagonal) and incorrectly classified for each LULC class.

LULC Classification and Analysis
The distance measure is employed to produce categorical land cover maps, where the classification result is based on the most similar pattern for each period.Classification results for each period can be visualized in Figure 5.The results reveal the predominance of temporary crops with agroforestry areas in the Beja municipality.Beja is, in fact, characterized by extensive agriculture with complex cultivation patterns, where significant areas are occupied by temporary crops with natural vegetation and agroforestry areas.The substantial coverage of temporary crops with agroforestry areas confirm the importance of cereal production (particularly wheat) in this municipality, as well as the relevance of cork oak exploitation in the vast landscape of agroforestry areas (named Montado in Portugal and Dehesa in Spain) [34,60].Over the 21-year period, the biggest change noticed was the substantial increase of the permanent cropland through scattered parcels, especially after the 2010-2011 period.Although this class represents a small coverage in Beja, this considerable increase suggests a change focused on intensive farming without fallow.
We found that permanent pastures represented a minor LULC class and were mainly concentrated near forested areas.Indeed, the forested areas remain mainly in the southeastern and northeastern part of the municipality, possibly due to the topographic conditions (plateaus) and the proximity to the Guadiana River. Figure 6 helps to visualize in detail the main significant changes throughout the 21 years under analysis.Different occupation areas for each class were observed, which was in agreement with the LULC class spatial distribution.The increase of the permanent cropland was clear from 2011 forward, occupying an area of about 16% in 2015 in contrast with the 1995 occupation area of 8% (Figure 7).The water areas had a similar pattern of consistent increases, up from about 7% of the total area in 1995 to 11% in 2015, with a significant increase in the 2010-2011 period.Regarding the agricultural classes, the temporary crops with agroforestry areas class was clearly prevalent, occupying a total area close to 63% in 1995.However, this class had a significant area decrease from 1995 to 2015 (about 11%).The permanent pastures group did not vary greatly from 1995 to 2015; it always occupied an area of about 12%-13%.Forested areas did vary more over the 21 years analyzed, the area occupied ranged between 11% (maximum area) and 8% (minimum area).In particular, from 2003 to 2010-2011, little or no change in the LULC was detected, perhaps due to the measures brought by the 2003 Common Agriculture Policy (CAP) that more driven by social integration and less to supporting production [32].Different occupation areas for each class were observed, which was in agreement with the LULC class spatial distribution.The increase of the permanent cropland was clear from 2011 forward, occupying an area of about 16% in 2015 in contrast with the 1995 occupation area of 8% (Figure 7).The water areas had a similar pattern of consistent increases, up from about 7% of the total area in 1995 to 11% in 2015, with a significant increase in the 2010-2011 period.Regarding the agricultural classes, the temporary crops with agroforestry areas class was clearly prevalent, occupying a total area close to 63% in 1995.However, this class had a significant area decrease from 1995 to 2015 (about 11%).The permanent pastures group did not vary greatly from 1995 to 2015; it always occupied an area of about 12%-13%.Forested areas did vary more over the 21 years analyzed, the area occupied ranged between 11% (maximum area) and 8% (minimum area).In particular, from 2003 to 2010-2011, little or no change in the LULC was detected, perhaps due to the measures brought by the 2003 Common Agriculture Policy (CAP) that more driven by social integration and less to supporting production [32].

LULC Classification Accuracy Summary
A set of metrics-including overall user and producer accuracy, the kappa index, and an error matrix (confusion matrix) were calculated.The results of the assessment were gathered for the full period (i.e., the total LULC mapping area was the same as the surface area times the number of maps), but it is also possible to evaluate and visualize each period separately.The accuracy assessment was computed using the validation samples (90% of the total samples, with a 95% confidence level), obtaining an overall classification accuracy of 75.96%.11%).The permanent pastures group did not vary greatly from 1995 to 2015; it always occupied an area of about 12%-13%.Forested areas did vary more over the 21 years analyzed, the area occupied ranged between 11% (maximum area) and 8% (minimum area).In particular, from 2003 to 2010-2011, little or no change in the LULC was detected, perhaps due to the measures brought by the 2003 Common Agriculture Policy (CAP) that more driven by social integration and less to supporting production [32].Table 6 presents the error matrix in proportion to the classification map.Only the water class had almost all the samples correctly classified.It is noteworthy that the misclassified samples were mistaken as forest.In this region, forested areas are characterized by hardwood forest that grows along water lines and open spaces with little or no vegetation.Regarding forested areas, there was some confusion between this class and the agricultural classes.Still, only about 4% of the total samples were misclassified.However, between the agricultural classes, the temporary crops with agroforestry areas samples that were misclassified were mostly mistaken as permanent cropland or permanent pastures (about one-quarter of this class's total sample size).In contrary, the permanent cropland was mostly classified as temporary crops with agroforestry areas (almost 40% of the total sample size was wrongly classified), but it also presented some confusion with the forest class (about 5% of the samples).Similarly, there was some confusion in the permanent pastures class with the temporary crops with agroforestry areas (about 27% of the samples) and also with forest (6% of the samples).In the class-based analysis, we observed a wide range of accuracy values.The water class was the most accurate, with both use and producer accuracy above 99%, suggesting high confidence and sensitivity of the method for this class.The forest class followed the same pattern as the water class, with high user accuracy (96%) and producer accuracy (88%).However, regarding the agricultural classes, they did not present consistent accuracy values.The temporary crops with agroforestry areas class was the most accurate with a user rate above 74% and a producer rate above 87%, which suggests the pixels classified as temporary crops with agroforestry areas closely matched those in the COS.In contrast, the permanent cropland and permanent pastures classes presented user accuracy values around 56% and 66%, respectively, while the producer accuracy values were even lower (43% for permanent cropland and 39% for permanent pastures).Overall, a kappa coefficient of 0.62 indicated a substantial agreement between the classification and the samples.

Discussion
Over the years, the remote sensing applications for rural areas have become extremely important [61], since they support the monitoring and mapping of the LULC changes of agricultural land.Some studies have emphasized the usefulness of using remote sensing data for agricultural land change classification [62,63], cropland estimation/scenario forecast [64][65][66], and vegetation health and crop production monitoring [67,68].The ongoing abandonment of agricultural land [69][70][71], and the fact that these areas have been threatened by socioeconomic and biophysical factors [72,73] demonstrates the need to identify and characterize the spatiotemporal changes.Accordingly, in this study, we applied a long-term LULC analysis in a rural region based on Landsat time series of 21 years.However, it involved overcoming a number of challenges.

The Implications of a Long-Term Landsat Time Series
For case studies at a regional scale, Landsat TM/ETM+ has been one of the most frequently used products due to its medium/high spatial resolution (30 m), large temporal extent (since the 1970s), and free and analysis-ready availability [1,2].However, in this study there were some limitations that were identified regarding the use of the Landsat data.
Having a long time-series span without interruptions was not possible, since there were no images available for the year 2012.Only the Landsat 7 sensor had information relative to 2012, but unfortunately, as is commonly known, many Landsat 7 scenes have data gaps (the result of a technical problem with the satellite) [40].Although there are several approaches to handling the Landsat 7 sensor error problem, all the processes are time-consuming, and the gaps may not be completely eliminated [74].In addition, to have a large temporally continuous dataset and create our long time series, we had to combine a set of images of Landsat 4-5 Level-2 and Landsat 8 Level-2 data.As such, the absence of images for 2012 and the use of two different sensors were limitations of this study.
Clouds are often a problem in processing satellite imagery.In our study, the temporal continuity was not affected, as we could use all scenes regardless of the presence of clouds.However, when using other classifiers, a long-term Landsat time series analysis can be difficult, due to the high number of scenes with the presence of clouds, which can affect the temporal continuity and the integrity of the classification.

The Generation of Training Samples from Official LULC Maps
One challenge associated with a long-term application is the lack of a sufficient and representative training sample.In fact, the accuracy of an image classification result is highly dependent on the quality and quantity of the training set used to train the classification model [75,76].For most supervised classifiers (such as a maximum likelihood classification, multilayer perceptron, SVM, or RF) not having sufficient and representative training data can be detrimental to the image classification results [24].Furthermore, in a long-term multitemporal image classification result, it would be very unlikely to have samples for each of the years under study.
Most studies about image classification usually mention the use of field surveys or visual interpretations to obtain training samples [24].How can a long-term classification be done when both options are not feasible?The approach presented in this study provides a path toward addressing this problem by exploring a way to generate a set of samples for training purposes using free LULC data (the COS maps) from different years.Even if money and time were available to collect training samples from fieldwork, the development of an acceptable automated alternative would allow those resources to be used in other ways.
In this exploratory study, we tested the use of the COS maps to produce a sample source for training purposes for various reasons: (i) the LULC data was produced by a governmental institution (DGT); (ii) it had a medium spatial resolution; (iii) it was the data with most temporal continuity in Portugal; and (iv) it was one of the most used data sources for LULC analysis in Portugal [77].However, generating training samples from a LULC map (such as the COS) produced by a governmental institution that we considered a reliable source of information still had associated problems.For example, following a previous work [50] we decided to reduce the COS map polygon areas before the generation of random points since the authors of the cited work concluded that the boundaries of the polygons of such sources can have low accuracy.In addition, bias in the representation of the samples can exist if there is no prior knowledge regarding the major LULC types present in the study area.This makes it very difficult to know which LULC classes from the COS should be used to generate the training samples associated with the most representative LULC classes of the study area.We had to support the identification of the major LULC classes present in our study area by a statistical analysis of the coverage of the COS LULC classes.
In addition, even though we were able to generate 6000 training samples for each year in the study, this set was irregularly distributed over time, since it only represented the information of the LULC for four of the 21 years under analysis.This was, in fact, the major constraint we had, and, once again, it depicted the limitations associated with this type of remote sensing application.

The LULC Types and Temporal Phenological Signatures Diversity
In our study, we decided to apply the k-means clustering technique to refine the training samples, since they were acquired using an explorative approach without integrating any ground-truth data and there was mixed agro-silvo-pastoral ecosystem present in our study area.Moreover, Viana et al. [37] had already demonstrated the training samples refinement using the k-means clustering technique improves the image classification's overall accuracy (+8%).
Interestingly, Allen et al. [32] and Senf et al. [78] had already mentioned the difficulty of classifying the different LULC types of a region with a heterogeneous biophysical environment due to the wide range of spectral signatures for each LULC class.Our word with the Beja municipal region was no different; the permanent cropland class is an example of a class with a complex spectral pattern.This class was mostly constituted of olive groves, and the production techniques of olive groves have changed over the past decade.This region has always been characterized by open olive grove crops, but recent foreign direct investment in super-intensive olive grove plantations has resulted in a different landscape.It is easy to find intensive and super-intensive olive grove production, which is a particular type with its own spectral signature characteristics that differ from the "usual" open olive grove production (since the trees are small, have less space between them, and have no grass beneath).As such, the same LULC class can have different spectral signatures, which helps explain the low accuracy values for the permanent cropland class and the confusion between the remaining agricultural classes.
As mentioned before, the Beja municipality has a heterogeneous land cover area (with the existence of both fragmented parcels and larger and compact parcels) with different crop phenological cycles and agricultural practices.In addition, the most misclassification was between the permanent cropland and temporary crops with agroforestry areas presumably due to how olive groves (when not super-intensively produced) and especially the vineyard planting was organized (in line with spaces of bare soil between them).Also, the presence and extent of Montado in the Beja municipality can easily explain the misclassification among each of the three agricultural classes, since this class is mostly characterized by a herbaceous understory, and can, therefore, be easily mistaken with olive groves or pastures [32].
In particular, Allen et al. [32] have emphasized the lack of research focusing on the detection of LULC changes in the Alentejo region, based on identifying the extent of more than one type of LULC at a large temporal scale.The approaches of Calvão and Palmeirim [79] and Godinho et al. [60] to identify the LULC changes in Alentejo were focused on single classes (semi-deciduous scrub, and Montado, respectively).As a matter of fact, Allen et al. [32] were the first to assess the extent of the changes in the different LULC classes present in an agro-silvo-pastoral environment and recognized the limitations of such analysis.Our study reveals similar drawbacks that emerged in the classification confusion among the agricultural classes present in our study area.
Nevertheless, the final result is a starting point for understanding the LULC dynamics for 21 years-confirming the impact some political decisions have had in this region (such as the construction of the Alqueva Dam in 2002 that filled its reservoir until 2012) and the lack of control of the type of agricultural activity that has had a significant impact on soil quality due to the super-intensive production for permanent crops like the case of the olive groves.

The TWDTW Method for Long-Term Time-Series Classification
This study also introduced the use of the TWDTW method for long-term classification using the Landsat time series.The TWDTW had already successfully achieved high accuracy values for satellite time series, using MODIS imagery [18,21] and Sentinel-2 imagery [22], showing to a certain extent the capacity of this classifier for handling multidimensional analysis.In this study, the assessment revealed an overall good accuracy of about 76%.Although the water and forest classes had contributed to this overall value, the accuracy values obtained for each of the three agricultural classes (the agricultural classes were prevalent in the study area, occupying about 80% of the total area) were problematic.Certainly, the 24% misclassification rate among the LULC classes could have a statistical influence in the results, particularly in relation to the extent of the changes in classes over the 21-year period.However, our results are in accordance with the statistical analysis we performed with the COS maps (Section 3.2), which allowed identifying the main LULC classes in the study area and how they changed over time.
It is noteworthy that this misclassification among each of the three agricultural classes was already expected, due to the complex spectral properties of the different vegetation types present in these large agricultural classes.This is especially true for temporary crops with agroforestry areas.They have a complex cultivation pattern encompassing temporary irrigated and rain-fed crops that are produced and harvested in different seasons followed by bare soil afterwards that can be easily confused with the permanent cropland and permanent pastures.Furthermore, the classifications are complicated by the more recent super-intensive production for permanent crops like the case of the olive groves [80].Intensification of the permanent cropland is very likely associated with increased exploitation of water resources, following the construction of Alqueva Dam.The increased expansion of water bodies is also apparently attributable to the construction of this Dam [32].
In addition, to improve the identification of each LULC type and increase the classification accuracy, we decided to compute two-time series indices (NDVI and NDWI) and perform the cluster analysis using the spectral signatures of the training samples for all LULC classes.However, we recognize that there may be other indices that can be used to further improve the results, such as the enhanced vegetation index (EVI) [17,63,68], the generalized soil adjusted vegetation index (SAVI) [81], or the Modified Soil-adjusted Vegetation Index (MSAVI) [82].These should be considered in further studies.
Nevertheless, our results, in line with Maus et al. [18], and Belgiu and Csillik [22], highlight the potential of this method to classify complex ecosystems, especially if we take into account our methodology's major constraint, the sample set irregularly distributed over time.Yet, it is important to highlight the processing time associated to create the categorical LULC map for each period when using the dtwSat package.The TWDTW method processing time (using an Intel ® Core™ i7-6700K CPU with a 4.00 GHz clock and 32 GB of RAM) was almost 16 hours when processing in parallel [21].
This can be an issue to take into account, especially if we want to reproduce this approach for larger areas or if the available hardware has limited resources.

Conclusions
In this study, the TWDTW method was used to classify a heterogeneous area located in the southeast of Portugal, providing insights into the main changes that occurred over 21 years.This time series, derived from Landsat data, facilitated the inter-annual change comparisons in cropland type activity in an area of predominantly mixed cropland.If we consider the above-mentioned limitations (absence of images for 2012, the use of two different satellite sensors, and the training samples irregularly distributed over time) and the overall good classification accuracy value obtained, we are truly convinced that in the near future, this classifier will become one of the most used for time-series classification, especially because it is an open-source and free method for remote sensing time-series analysis.
The cluster analysis performed on R was efficient and straightforward.It proved itself a promising approach capable of detecting homogeneous training sample groups for each class.It guaranteed an acceptable level of quality, as we collected the samples based on vector cartography, the polygons constituted homogeneous entities, and the salt-and-pepper effect proliferated in the satellite images.
Also, we acknowledge that with the COS data we could have already had a notion of the major transformations, but we would not have had seen the trends from year to year and been able to quantify them.In addition, the time gap of the COS data was very large, especially between the first and the second maps (12 years apart).Also, this methodology can now be replicated for other study areas, using other LULC data (such as CORINE Land Cover (CLC), since it is available for most of Europe and for the years 1990, 2000, 2006, 2012, and 2018).Also, for more recent years, the OpenStreetMap data can be a reliable alternative since this source of voluntary geographic information has a level of accuracy similar to the COS.
For future studies, it would be important to try to disaggregate the three major agricultural classes into subclasses to make it possible to describe the type of crop in detail.Ultimately, we conducted a study, using only open data and open software that has allowed gaining more insight regarding a relatively large rural area (>1000 km 2 ).It is an area where the LULC changes have been dynamic, and the data provide support to improve cropland management with sustainable practices and policies.

Figure 2 .
Figure 2. Workflow for the long-term time series LULC methodology.

Figure 2 .
Figure 2. Workflow for the long-term time series LULC methodology.

Figure 3 .
Figure 3. Example of graphical display based on the discriminant factors (Df) values from the tclust cluster solution obtained with k = 2, α = 0.25 for one of the LULC classes.Larger Df values indicate not very "well-determined" clusters.

Figure 3 .
Figure 3. Example of graphical display based on the discriminant factors (Df) values from the tclust cluster solution obtained with k = 2, α = 0.25 for one of the LULC classes.Larger Df values indicate not very "well-determined" clusters.Remote Sens. 2019, 11, x FOR PEER REVIEW 10 of 23

Table 1 .
Frequency by year of all available Landsat Level 2 data between 1995 and 2015 years.

Table 1 .
Frequency by year of all available Landsat Level 2 data between 1995 and 2015 years.

Table 3 .
Descriptive statistics of the COS maps.

Table 4 .
Land use/land cover classes' characteristics.
2.1.,2.4.4Temporary crops (non-irrigated and permanently irrigated crops); complex cultivation patterns (herbaceous understory); land principally occupied by agriculture, with significant areas of natural vegetation; agroforestry areas.Remote Sens. 2019, 11, x FOR PEER REVIEW 8 of 23 Forest 3.1.,3.2., 3.3 Areas occupied by tree clusters resulting from natural regeneration or planting Remote Sens. 2019, 11, x FOR PEER REVIEW 8 of Forest 3.1.,3.2., 3.3 Areas occupied by tree clusters resulting from natural regeneration or planting Crops occupied the land for a long period and had a nonrotating regime (olive groves, vineyards) Remote Sens. 2019, 11, x FOR PEER REVIEW 8 of 23 Forest 3.1.,3.2., 3.3 Areas occupied by tree clusters resulting from natural regeneration or planting Freshwater surfaces, including watercourses Remote Sens. 2019, 11, x FOR PEER REVIEW 8 of Forest 3.1.,3.2., 3.3 Areas occupied by tree clusters resulting from natural regeneration or planting Freshwater surfaces, including watercourses Remote Sens. 2019, 11, x FOR PEER REVIEW 8 of 23 Forest 3.1.,3.2., 3.3 Areas occupied by tree clusters resulting from natural regeneration or planting Water 5.1.,5.2.Freshwater surfaces, including watercourses Remote Sens. 2019, 11, x FOR PEER REVIEW 8 of Forest 3.1.,3.2., 3.3 Areas occupied by tree clusters resulting from natural regeneration or planting Water 5.1.,5.2.Freshwater surfaces, including watercourses

Table 5 .
Final number of samples per class.

Table 6 .
Error matrix of the map classification task (full period).