Intra-Annual Sentinel-2 Time-Series Supporting Grassland Habitat Discrimination

: The present study aims to discriminate four semi-arid grassland habitats in a Mediterranean Natura 2000 site, Southern Italy, involving 6210/E1.263, 62A0/E1.55, 6220/E1.434 and X/E1.61-E1.C2-E1.C4 (according to Annex I of the European Habitat Directive/EUropean Nature Information System (EUNIS) taxonomies). For this purpose, an intra-annual time-series of 30 Sentinel-2 images, embedding phenology information, were investigated for 2018. The methodology adopted was based on a two-stage workﬂow employing a Support Vector Machine classiﬁer. In the ﬁrst stage only four Sentinel-2 multi-season images were analyzed, to provide an updated land cover map from where the grassland layer was extracted. The layer obtained was then used for masking the input features to the second stage. The latter stage discriminated the four grassland habitats by analyzing several input features conﬁgurations. These included multiple spectral indices selected from the time-series and the Digital Terrain Model. The results obtained from the different input conﬁgurations selected were compared to evaluate if the phenology information from time-series could improve grassland habitats discrimination. The highest F1 values (95.25% and 80.27%) were achieved for 6210/E1.263 and 6220/E1.434, respectively, whereas the results remained stable (97,33%) for 62A0/E1.55 and quite low (75,97%) for X/E1.61-E1.C2-E1.C4. However, since for all the four habitats analyzed no single conﬁguration resulted effective, a Majority Vote algorithm was applied to achieve a reduction in classiﬁcation uncertainty. Semi-natural and natural dry grasslands and scrubland facies on calcareous substrates (Festuco-Brometalia) ; type_2— Eastern sub-mediterranean dry grasslands (Scorzoner-atalia villosae) ; type_3— Pseudo-steppe with grasses and annuals of the Thero-Brachypodietea and type_4— Mediterranean subnitrophilous grass communities, thistle ﬁelds and giant fennel (Ferula)


Introduction
Natural and semi-natural grassland ecosystems represent one of the largest landscape units in the terrestrial system and, as such, they are essential contributors to global biodiversity [1]. Evidence has been reported that the biodiversity of grasslands can be linked with productivity [2,3]. Semi-natural grasslands of the Western Palearctic region are considered among the most species-rich habitats in the world [4]. However, in the last few decades, widespread proof has been reported about the impoverishment of grassland biodiversity and related ecosystem services. This has been mainly due to land use change, agricultural intensification, land abandonment and urbanization [5][6][7]. In addition, the change of climate occurring to the globe can threaten grassland behavior [8]. Hence there is a need for earth observation monitoring to support both ecological and economic decision making. for type_1 resulted always lower than 53%, independent of the specific combination of input features used. Zlinszky et al. (2014) [26] have used LIght Detection And Ranging (LIDAR) data from two different aerial campaigns for mapping grassland habitats in Hungary, including type_1 habitat and lowland hay meadows by adopting the Random Forest classifier. For type_1 habitat, they obtained an OA equal to 75% and UA and PA values equal to 66.4% and 78.6%, respectively.
To address issue (c), i.e., grassland habitats' spectral similarity, airborne hyperspectral images have been used for the mapping of plants species [1], flower types [27] and species richness [28][29][30]. Marcinkowska-Ochtyra et al. (2019) [31] have analyzed three multitemporal hyperspectral images combined with Digital Terrain Model (DTM) information for mapping three grassland habitats, including type_1 habitat, in Poland. By applying a Random Forest classifier, the best results were obtained when hyperspectral data were integrated with five topographic indices (bottom flatness, ridge top flatness, topographic position, wetness index, modified catchment area) from an available DTM. They have obtained the F1-measure of 84.5% for type_1. However due to the high costs of airborne hyperspectral data their availability has remained limited.
For mapping grassland habitats, another useful approach has used the acquisition of intra-annual time-series of satellite data acquired at high temporal, spatial and spectral resolutions. Such data can take advantage of the high temporal frequency of acquisition to detect phenological differences in vegetation classes useful for their discrimination.
Generally, rather than using the entire available set of spectral bands, spectral indices have been extracted from the time-series. For example, Wen et al. (2010) [32] have analyzed the single spectral index Enhanced Vegetation Index (EVI) from a MODerate-resolution Imaging Spectroradiometer (MODIS) time-series for mapping six grassland types in China. They have obtained an OA value equal to 68% by using an ISODATA procedure to identify homogenous regions followed by a decision tree classifier. Schuster et al. (2015) [1] have used three different vegetation indices from a one-year time-series obtained by considering 21 RapidEye images acquired over three years (2009)(2010)(2011) for discriminating seven grassland habitats in Germany. The results obtained from the optical time-series, with single or multiple indices, were compared with those from a TerraSAR-X time-series. The authors applied the SVM classifier reporting the highest OA value (91.7%) when three indices were considered: Normalized Difference Vegetation Index (NDVI), NDVI-RE-R (where the NIR band is substituted by the Red-Edge band in the NDVI formula) and NDVI-NIR-RE (where the red band is replaced by the Red-Edge band in the NDVI formula) [1]. Marginal differ-ences were reported with the results (91%) obtained from TerraSAR-X data, with slight differences in specific habitat accuracy values.
For grasslands mapping, recent studies have investigated Sentinel-2 time-series by the EU Copernicus and European Space Agency (ESA) programs [33,34]. These data are characterized by both high spatial resolution (10 m for bands 2, 3, 4 and 8; 20 m for bands 5, 6, 7, 8a, B11 and B12; 60 m for bands 1, 9 and 10) and high revisiting time (less than 5 days in tandem acquisition). Fauvel et al. (2020) [35] have combined time-series of NDVI index from Sentinel-2 with polarization data from Sentinel-1 SAR time-series for mapping grassland plant diversity in France. Rapinel et al. (2019) [36], instead, have compared the results obtained from the spectral bands of an intra-annual Sentinel-2 time-series (12 images) with the ones from single-date and single-band datasets. These data were used to map seven floodplain grassland plant communities. In order to reduce the original feature dimensionality (120 bands), the authors adopted a recursive feature elimination (RFE) algorithm for retaining the most discriminating bands (26) as input to the classifier. The SVM finding, thus, obtained (78%) was better than both the ones from a single date Sentinel-2 image (67%) and from a single band of the same image (70%). Both Fauvel et al. (2020) and Rapinel et al. (2019) have analyzed only pixels of the Sentinel-2 time-series belonging to the grasslands layer. This layer has been obtained from ancillary data used for the masking of Sentinel-2 images. However, the ancillary data (e.g., the agricultural Land Parcel Information System) generally are neither regularly updated nor have been made available at the same scale of the satellite data used for habitat mapping.
In this study, the spectral indices will be selected to ensure the coverage of the full spectral range offered by Sentinel-2 data, as proposed by Große-Stoltenberg et al. (2016) [37] for mapping the four selected habitat classes. By investigating the detection of invasive species from field-spectra, these authors have suggested the selection of an appropriate set of spectral indices able to cover the full spectral range of the sensor used. In so doing, they obtained results comparable with those from the whole set of spectral bands but were able to reduce the features number.
The approach proposed in our study will be based on a two-stage procedure for habitat mapping. A supervised data-driven SVM classifier will be used in both the first and second stage [38][39][40]. The SVM classifier will be preferred to deep learning algorithms since the latter require relatively large datasets to work well, and an adequate infrastructure to be trained in reasonable time. With respect to Random Forest (RF), Grabska et al. (2020) [41] have argued that RF may struggle with as well as less-common classes and corresponding imbalanced training data [42], while SVM may perform better in this case.
The first stage of the system will provide an updated Land Cover (LC) map obtained from four multi-season Sentinel-2 images. The output, thus, obtained will be then used for the masking of an intra-annual Sentinel-2 time-series. Only the pixels of the time-series belonging to the grasslands class will be used as input to the second classification stage. As a result, the grassland layer will be extracted at the same date and spatial resolution of the final habitat map.
The second stage will be used to discriminate the four grassland habitats. Several classification configurations will be investigated within this stage by changing the input features, including also the available Digital Terrain Model (DTM). The results obtained from the different input configurations will be compared to evaluate if the phenology information from time-series can improve grassland habitats discrimination.
A Majority Vote algorithm will be applied to combine the results from all the configurations and reduce classification uncertainty.
The findings reported could encourage regular automatic mapping of grassland habitats for conservation and restoration purposes.

Study Area and Grassland Habitats Characterization
Our study area is located in the Mediterranean basin within the Apulia region, Southern Italy. The area covers nearly 800 km 2 within the Natura 2000 "Murgia Alta" protected area (IT9120007) (Figure 1). This is a Site of Community Importance (SCI) and in addition a Special Protection Area (SPA) which has included a National Park since 2004. for the masking of Sentinel-2 images. However, the ancillary data (e.g., the agricultural Land Parcel Information System) generally are neither regularly updated nor have been made available at the same scale of the satellite data used for habitat mapping. In this study, the spectral indices will be selected to ensure the coverage of the full spectral range offered by Sentinel-2 data, as proposed by Große-Stoltenberg et al. (2016) [37] for mapping the four selected habitat classes. By investigating the detection of invasive species from field-spectra, these authors have suggested the selection of an appropriate set of spectral indices able to cover the full spectral range of the sensor used. In so doing, they obtained results comparable with those from the whole set of spectral bands but were able to reduce the features number.
The approach proposed in our study will be based on a two-stage procedure for habitat mapping. A supervised data-driven SVM classifier will be used in both the first and second stage [38][39][40]. The SVM classifier will be preferred to deep learning algorithms since the latter require relatively large datasets to work well, and an adequate infrastructure to be trained in reasonable time. With respect to Random Forest (RF), Grabska et al. (2020) [41] have argued that RF may struggle with as well as less-common classes and corresponding imbalanced training data [42], while SVM may perform better in this case.
The first stage of the system will provide an updated Land Cover (LC) map obtained from four multi-season Sentinel-2 images. The output, thus, obtained will be then used for the masking of an intra-annual Sentinel-2 time-series. Only the pixels of the time-series belonging to the grasslands class will be used as input to the second classification stage. As a result, the grassland layer will be extracted at the same date and spatial resolution of the final habitat map.
The second stage will be used to discriminate the four grassland habitats. Several classification configurations will be investigated within this stage by changing the input features, including also the available Digital Terrain Model (DTM). The results obtained from the different input configurations will be compared to evaluate if the phenology information from time-series can improve grassland habitats discrimination.
A Majority Vote algorithm will be applied to combine the results from all the configurations and reduce classification uncertainty.
The findings reported could encourage regular automatic mapping of grassland habitats for conservation and restoration purposes.

Study Area and Grassland Habitats Characterization
Our study area is located in the Mediterranean basin within the Apulia region, Southern Italy. The area covers nearly 800 km 2 within the Natura 2000 "Murgia Alta" protected area (IT9120007) (Figure 1). This is a Site of Community Importance (SCI) and in addition a Special Protection Area (SPA) which has included a National Park since 2004.  The altitude of the area is from 285 to 680 m above the sea level and its climate is mesomediterranean oceanic subcontinental pluviseasonal with dry to sub-humid ombrotype [43]. The site is characterized by a typical Mediterranean agro-pastoral landscape with millennial land-use history mainly occupied by semi-natural rocky dry grasslands, traditionally used as extensive pastures, while forest vegetation consists only of residual patches of downy oak (Quercus pubescens s.l.) woodlands and Aleppo pine (Pinus halepensis) plantations [44]. In "Murgia Alta" the same semi-natural grassland ecosystem hosts numerous regionally endemic and generally rare species, but also many with trans-Adriatic distribution [43]. This area is considered of crucial importance for the conservation of wildlife and priority species [10].
During the last three decades, this unique ecosystem has been exposed to tremendous impacts and accelerated processes of habitat degradation, fragmentation and biotic contamination (i.e., woody encroachment), both within and next to its borders. As a result of the combined pressures, ecosystems are threatened or even in danger of destruction [45]. The pressures related to degradation range from: • the Common Agricultural Policy (CAP) which has driven the transformation of grassland pastures into agricultural (cereal crops intensification) areas by stone (rock) graining (clearance) that has induced soil erosion and sediment deposition in the aquifer system; • the illegal waste and toxic mud dumping on transformed areas has caused heavy metal contamination of soils and aquifers; • the increasing of traditional legal and illegal mining activities, wind farms infrastructures and arson [46,47]; • the below long-term average rainfall as a result of climate change; • the spreading of invasive species [44,45,48].
According to the list of habitats reported in Annex I of the European Habitat Directive [9] and in EUNIS habitat taxonomies [49], the four grassland habitat types considered in "Murgia Alta" are listed in Table 1.The target habitats' detailed description is provided in the text reported hereafter. In the text, the nomenclature of species and sintaxa follow Bartolucci et al. (2018) [50] and Biondi and Blasi (2015) [51], respectively. A set of reference polygons related to 9 LC classes was analyzed in the first stage of the workflow. The polygons were collected through, both, in-field campaigns and through visual interpretation of available orthophotos (2016-2018). For each class (i.e., stratum), a "stratified" sampling was adopted to ensure a minimum number of polygons-samples [53].
For ground data collection, an open source mobile application (App) was used during our in-field campaigns. This mobile App was developed within the framework of the Horizon 2020 ECOPOTENTIAL Project (www.ecopotential-project.eu). This App can gather ground truth according to Food and Agriculture Organization-Land Cover Classification System 2 (FAO-LCCS2) land cover class taxonomy [54][55][56]. By integrating additional ancillary data (e.g., lithology, soil aspects, soil surfaces), the FAO-LCCS2 taxonomy can provide a framework which may be useful to translate any LC class into different habitats [56][57][58][59][60][61][62].
For the second stage of the workflow, georeferenced surveys of the vegetation were carried out using the phytosociological method of the Sigmatista School in Zurich-Montpellier [63,64]. This method uses a floristic-statistical method based on the survey of the complete floristic composition for the plant community investigated. This approach is recognized at EU level [65,66] since it allows a precise diagnosis for many habitats of the Directive and in particular for grassland habitats. For our work, the sampling was first stratified, i.e., the relevés were carried out randomly in areas previously identified on the basis of their physiognomic and structural homogeneity. Then, all the surveys were subjected to a multivariate numerical classification by using the coverage values transformed according to the scale proposed by van der Maarel, 1979 [67]. Following this hierarchical classification, the average linkage method (UPGMA) [68] was used on the basis of the dissimilarity matrix. The latter was calculated with the reciprocal of the similarity ratio function for coverage data [64]. The clusters, thus, obtained were then attributed to the different phytosociological syntaxa on the basis of the characteristic species combina-Remote Sens. 2021, 13, 277 7 of 29 tion. The different plant community types, thus identified and defined, were consequently attributed to the habitats of both the EU Directive on the basis of the "Interpretation Manual of European Union habitats" [68] and the "Manuale Italiano di Interpretazione degli Habitats" [51,66,69]. Thereafter, a polygon related to a homogenous area around each relevés was identified by a visual interpretation of the available orthophoto (2018). Due to the high time-consuming process involved in the recognition and collection of the ground data, the number of reference polygons resulted to be rather small.
The data reported on Table 1 are related to four different habitat grasslands classes which correspond to the same LC class of semi-natural grasslands. In addition, a habitat map obtained by in-field campaigns and visual interpretation of a 2018 orthophoto has been recently released (2020) for the whole region by Apulia authorities. Figure 2 shows some close-ups of ground truth polygons overlaid on an orthophoto (2m spatial resolution), for the different habitat classes investigated. of European Union habitats" [68] and the "Manuale Italiano di Interpretazione degli Habitats" [51,66,69]. Thereafter, a polygon related to a homogenous area around each relevés was identified by a visual interpretation of the available orthophoto (2018). Due to the high time-consuming process involved in the recognition and collection of the ground data, the number of reference polygons resulted to be rather small. The data reported on Table 1 are related to four different habitat grasslands classes which correspond to the same LC class of semi-natural grasslands. In addition, a habitat map obtained by in-field campaigns and visual interpretation of a 2018 orthophoto has been recently released (2020) for the whole region by Apulia authorities. Figure 2 shows some close-ups of ground truth polygons overlaid on an orthophoto (2m spatial resolution), for the different habitat classes investigated.

Satellite Data
For the whole year 2018, an intra-annual time-series of Sentinel-2A/2B images was freely downloaded from the United States Geological Survey (USGS) EarthExplorer portal [70]. Only the images with less than 10% cloud cover on the full tile were investigated while the ones which showed further residual clouds in the study area were discarded. As a result, 30 clouds free images were collected, thus, no clouds/clouds shadows masking was needed. Figure 3 shows the distribution, per month, of the 30 images examined. Only in March were there no clouds free images. The dates of each image are reported in Table  A1 of Appendix A. The entire study area was covered by the tile 33TXF from the two orbits R036 and R079. Since not all images selected were available as L2A products, level-1C products

Satellite Data
For the whole year 2018, an intra-annual time-series of Sentinel-2A/2B images was freely downloaded from the United States Geological Survey (USGS) EarthExplorer portal [70]. Only the images with less than 10% cloud cover on the full tile were investigated while the ones which showed further residual clouds in the study area were discarded. As a result, 30 clouds free images were collected, thus, no clouds/clouds shadows masking was needed. Figure 3 shows the distribution, per month, of the 30 images examined. Only in March were there no clouds free images. The dates of each image are reported in Table A1 of Appendix A.
Remote Sens. 2021, 13, x FOR PEER REVIEW 8 of 30 in March were there no clouds free images. The dates of each image are reported in Table  A1 of Appendix A. The entire study area was covered by the tile 33TXF from the two orbits R036 and R079. Since not all images selected were available as L2A products, level-1C products were downloaded. Hence, the images in the time series to be analyzed were already geometrically orthorectified, co-registered and in at Top Of Atmosphere (TOA) reflectance. For each image, the Level-2A surface reflectance product was generated, from level-1C, The entire study area was covered by the tile 33TXF from the two orbits R036 and R079. Since not all images selected were available as L2A products, level-1C products were downloaded. Hence, the images in the time series to be analyzed were already geometrically orthorectified, co-registered and in at Top Of Atmosphere (TOA) reflectance. For each image, the Level-2A surface reflectance product was generated, from level-1C, by using the standalone version of the Sen2Cor processor, as suggested in [71] for atmospheric correction.
To better discriminate habitats type_1 and type_4, a DTM with 8 m spatial resolution was downloaded from the Apulia Region portal [72]. The elevation information was, thus, obtained from this source.

Satellite Data Pre-Processing
A chain for the time-series pre-processing was implemented in the RStudio open source environment. For each image, the workflow consisted of the following steps: 1.
cropping according to the area of interest; 2.
spectral index extraction at the native spatial bands resolution; 3.
bilinear resampling to 10 m, when necessary. Indeed, 10 m was the final resolution adopted in our work. Different resampling approaches were tested. The bilinear one resulted the right compromise between minimization of artifacts and distortions introduced in the image as well as in the computational complexity.
Each of the spectral indices were stacked as a data-cube of intra-annual time-series in a raster file format. Table 2 shows the spectral vegetation indices analyzed singularly or in combination. Table 2. List of spectral indices analyzed. R stands for the Reflectance signal. The Sentinel-2 bands are reported in brackets.

Name
Formula Reference

Green Normalized Difference
Vegetation Index [81] The choice of these indices is based on the need to analyze the whole spectral range, from VIS to NIR, of the Sentinel-2 data. With reference to the spectral response of a vegetation pattern, we tried to take into account the information associated with the presence of a high reflectance peak in the NIR and other lower peaks in the green and SWIR. In so doing, Green Normalized Difference Vegetation Index (GNDVI) was selected to analyze the reflectance vegetation peak in the VIS spectrum region. This index is related to the photosynthetic activity and determines water and nitrogen plant uptake into the crop canopy [73,74]. As is well known, Modified Soil Adjusted Vegetation Index (MSAVI) is related to the highest peak of the vegetation reflectance in the NIR spectrum. Such an index can maximize the reduction in soil effects on the vegetation signature [74,75]. Thus, MSAVI was preferred to the widely used NDVI [48,51]. By considering Normalized Burn Ratio (NBR), the SWIR spectral information was also included. This index is correlated to water, carbon and nitrogen content as well as to vegetation health [76][77][78][79].
Further indices related to the red-edge portion of the Sentinel-2 spectrum were also introduced. The Normalized Difference Red-Edge (NDRE) [80] and Red-Edge Position (REP) [81] time-series spectral indices were extracted. All these indices were used as input to the second stage both singularly and in combinations. Only those indices providing a significant contribution to the habitat discrimination were used in combination. The results of different input configurations discussed in the present paper are reported in Table 3.

Two-Stage Algorithm for Habitat Mapping
To discriminate the four grassland habitats listed in Table 1, a two-stage classification approach was developed. Figure 4 shows the two-stage algorithm workflow designed for the grassland habitats mapping. The first stage was intended to extract the grassland layer used as a pre-filter for the input features to the second stage. Two different classifiers were fed as input with a multi-season Sentinel-2 dataset. One classifier was trained for a binary (grasslands/no grasslands) classification problem and the other one for a multi-class problem. The two grassland maps obtained were compared. The second stage focuses on the discrimination of grassland habitats. These habitats were searched within the grassland pixels recognized in the first stage LC map. For the two stages, the SVM classifier, implemented in ENVI 4.8, was used [82].
SVM is a non-linear supervised classifier aimed at finding the optimal separating hyperplane that can divide the input dataset into the predefined number of classes [83,84]. SVM classifiers are particularly well-suited when small training data sets are available with no underlying assumptions of statistical distribution [84][85][86]. Thus, SVM was adopted for our study due to the small number of training samples available. Following the recommendations reported in [87], in our study a radial basis function was selected as kernel type while the penalty parameter chosen was taken as 100. The gamma in kernel function was chosen as the inverse of the band numbers used in the data input [88,89]. Such choices were in agreement with previous research conducted by Othman and Gloaguen [90] and Yang [89].

First Stage
The first stage was based on a multi-class, data-driven, pixel-based, classification whose aim was to detect the grassland layer on the scene. The SVM classifier was trained with ground references from 9 LC classes, including natural grassland, labeled according to FAO-LCCS2 taxonomy [57][58][59][60][61][62]. For training the SVM classifier and subsequent validation of the output map, the available reference polygons were divided into training and validation datasets randomly per each LC class. As a result, 70% of each LC class sample was used for training and 30% for the validation, corresponding to 157,245 and 65,548 The second stage focuses on the discrimination of grassland habitats. These habitats were searched within the grassland pixels recognized in the first stage LC map. For the two stages, the SVM classifier, implemented in ENVI 4.8, was used [82].
SVM is a non-linear supervised classifier aimed at finding the optimal separating hyperplane that can divide the input dataset into the predefined number of classes [83,84]. SVM classifiers are particularly well-suited when small training data sets are available with no underlying assumptions of statistical distribution [84][85][86]. Thus, SVM was adopted for our study due to the small number of training samples available. Following the recommendations reported in [87], in our study a radial basis function was selected as kernel type while the penalty parameter chosen was taken as 100. The gamma in kernel function was chosen as the inverse of the band numbers used in the data input [88,89]. Such choices were in agreement with previous research conducted by Othman and Gloaguen [90] and Yang [89].

First Stage
The first stage was based on a multi-class, data-driven, pixel-based, classification whose aim was to detect the grassland layer on the scene. The SVM classifier was trained with ground references from 9 LC classes, including natural grassland, labeled according to FAO-LCCS2 taxonomy [57][58][59][60][61][62]. For training the SVM classifier and subsequent validation of the output map, the available reference polygons were divided into training and validation datasets randomly per each LC class. As a result, 70% of each LC class sample was used for training and 30% for the validation, corresponding to 157,245 and 65,548 pixels, respectively.
The reference polygons were distributed in the scene according to the real presence of each class on the ground. A percentage of almost 50% for natural grassland reference polygons was considered, both in the training and validation sets, since this was the dominant LC in the scene. The cultivated herbaceous vegetation was represented by almost 23% of reference polygons since this was the second major class in the scene. The remaining area was covered by both natural needle-leaved evergreen and no-herbaceous cultivated classes.
First, the classifier was fed with the input features from the stack of four multi-season images (10 bands for each image: 40 layers). Then, from the LC map obtained, the natural grassland layer was extracted to be used for the masking of the second stage input features. The pixels of the grassland layer were coded as "Natural Terrestrial Vegetation/Herbaceous", according to LCCS2 taxonomy [62].

Second Stage
The second stage was employed for detecting the four different grassland habitat classes through a further data-driven, pixel-based SVM classification. Only the pixels belonging to the grassland layer found in the first stage were analyzed. For each class, the available habitat reference pixels were randomly divided into 75% and 25% to train and validate the classifier, respectively. In Table 3, the first column describes the set of input configurations adopted in the second stage. For each configuration, the second and third columns of the same table report the acronyms and the specific input features used, respectively. The output of the "4_scenes" configuration, which combines the pre-peak, peak, and post-peak of biomass with dry-season values, was compared with the ones obtained from different spectral index time-series. The latter allowed the evaluation of their inherent phenology information to improve habitat discrimination.
Since MSAVI can represent the contribution of the highest peak of the reflectance vegetation in the NIR spectrum region, the intra-annual time series of such an index was used in the first classification. For comparison purposes, GNDVI and NBR single-spectral index configurations were also analyzed. However, the findings obtained singularly are not reported in this paper since they always resulted lower than the ones from MSAVI. Nevertheless, the latter indices were successively stacked with MSAVI and used as input features to the classifier in order to enlarge the spectral coverage understudy (third and fourth configurations in Table 3). An additional configuration, which included the three spectral indices (GNDVI, MSAVI and NBR) and the DTM available, was analyzed. The fifth configuration clearly involved a computational cost related to 90 spectral layers (from the three indices) plus the DTM data used. To further reduce computational costs, the input features of the first configuration, "4_scenes", were combined with those of only one single spectral index, i.e., MSAVI. The choice allowed the combination of the phenology (from a single-spectral index time-series) with four multi-seasonal information. The seventh configuration included the DTM layer for possible improvements related to quota information. All the analyses were carried out in a QGIS open source environment using R open source language.

Accuracy Asssessment
The protocol described in [91,92] and implemented in [93] was adopted for the classification accuracy assessment. This protocol uses the information obtained from the traditional Confusion Matrix [94] to estimate the corrected mapped area for each class and to construct confidence intervals reflecting the uncertainty of the estimates obtained.
In this framework, the sample error matrix is reported in terms of: (a) the unbiased stratified estimator of the proportion of area (p ij ) in each cell i,j of the matrix; (b) OA, UA and Producer's Accuracy (PA), with standard errors estimates [91]. In addition, the F1measure was computed as the harmonic mean of producer and user accuracies [95][96][97][98].
According to [91,92], when map categories are reported as rows (i) and the reference categories are shown as columns (j), A tot represents the total mapped area of the scene, A mapped,i is the mapped area (hectares, ha) of category i in the map and W i = A mapped,i /A tot is the proportion of the mapped area as category i, p ij is then: where n ij corresponds to sample count. The unbiased stratified estimator of the area of category j can be obtained as: where A corrected,j can be viewed as an "error-adjusted" estimator of area because it includes the area of omission error of category j and leaves out the area of commission error. The estimated standard error of the estimated proportion of area is: Finally, the standard error of the stratified area estimate can be expressed as: and an approximate 95% confidence interval for A corrected,j is: For each class, the measure of stratified area estimator (A corrected ) is based on the assumption that the reference validation data used for the computation of the confusion matrix can represent a proportion of the whole area [91,92].

Results
The investigation has been carried out to evaluate the effectiveness of an intra-annual Sentinel-2 time-series of spectral indices to improve the mapping of four Mediterranean grassland habitats in the "Murgia Alta" site. The findings, obtained from each of the configurations reported in Table 3, are presented separately below.

First Stage: Grassland Layer Extraction
In order to provide the grassland layer from the multi-seasonal input images, both a binary (grassland/no grassland) and a multi-class classification were carried out. The findings are shown in Table 4. The multi-class SVM UA value was higher than the one obtained from the binary classifier: 98.29% and 96.11%, respectively. The mapping from the binary SVM classifier provided a grassland layer extension greater than the one obtained from the multi-class: 233 km 2 and 223 km 2 , respectively. From visual interpretation of the orthophoto available, some overestimation areas appeared as misclassifications of grassland instead of crops in Figure 5. The multi-class SVM UA value was higher than the one obtained from the binary classifier: 98.29% and 96.11%, respectively. The mapping from the binary SVM classifier provided a grassland layer extension greater than the one obtained from the multi-class: 233 km 2 and 223 km 2 , respectively. From visual interpretation of the orthophoto available, some overestimation areas appeared as misclassifications of grassland instead of crops in Figure 5. The multi-class LC map ( Figure 6) was thus used to extract the grassland layer to be fed as input to the second stage.   The multi-class LC map ( Figure 6) was thus used to extract the grassland layer to be fed as input to the second stage. The multi-class SVM UA value was higher than the one obtained from the binary classifier: 98.29% and 96.11%, respectively. The mapping from the binary SVM classifier provided a grassland layer extension greater than the one obtained from the multi-class: 233 km 2 and 223 km 2 , respectively. From visual interpretation of the orthophoto available, some overestimation areas appeared as misclassifications of grassland instead of crops in Figure 5. The multi-class LC map ( Figure 6) was thus used to extract the grassland layer to be fed as input to the second stage.    Table 5 shows the nine LC classes mapped and their legend according to the FAO-LCCS2 taxonomy along with the percentage values of UA and PA. Such values can be considered quite high. Instead, misclassifications of small "Artificial Surfaces/BuiltUp" objects (i.e., sheds for depositing agriculture tools and narrow not asphalted roads) with barren land patches produced a lower UA value (61.51%). The specific UA and PA values of the grassland class (Figure 7a) resulted to 98.29% ± 0.04% and 99.65% ± 0.03%, respectively. Artificial or Natural Waterbodies/Water B27-B28/A1.A5 100 100 The specific UA and PA values of the grassland class (Figure 7a) resulted to 98.29%±0.04% and 99.65%±0.03%, respectively.  Figure 7b,c, respectively. As is well known, CLC is based on a different taxonomy (than the one used in our map, i.e., FAO-LCCS2). Thus, both 231 "Pastures" and 321 "Natural Grasslands" classes were combined to represent the CLC grassland layer. This combination resulted in a CLC layer covering 210 km 2 . Instead, the grassland layer from Copernicus 2018 covers an area of 250 km 2 . The latter product features the same resolution (10 m) as our map. The results obtained evidenced a slight overestimation by Copernicus with respect to our product (223 km 2 ).
The accuracy of both CLC and Copernicus grassland layers was evaluated by using the same grassland validation set. The results are reported in Table 6.   Figure 7b,c, respectively. As is well known, CLC is based on a different taxonomy (than the one used in our map, i.e., FAO-LCCS2). Thus, both 231 "Pastures" and 321 "Natural Grasslands" classes were combined to represent the CLC grassland layer. This combination resulted in a CLC layer covering 210 km 2 . Instead, the grassland layer from Copernicus 2018 covers an area of 250 km 2 . The latter product features the same resolution (10 m) as our map. The results obtained evidenced a slight overestimation by Copernicus with respect to our product (223 km 2 ).
The accuracy of both CLC and Copernicus grassland layers was evaluated by using the same grassland validation set. The results are reported in Table 6. Finally, the natural grassland layer, from the first stage, was used for the masking of the second stage input features.

Second Stage: Habitat Discrimination
The findings obtained in terms of OA, UA, PA, F1-measure, A mapped and A corrected are reported in Table 7 according to the different input configurations listed in Table 3. Instead, Figure 8 compares the different F1-values obtained in the form of a histogram.  As evidenced in Table 7, the F1-measure values obtained for type_1 and type_2 habitats resulted quite high (89.17% and 97.29%, respectively) for the "4_scenes" configuration. Instead, the F1 value for type_4 habitat resulted rather low (57.81%) due to a PA of 41.23%±0.01%. This habitat was misclassified as type_2. For the latter habitat, the Areamapped was overestimated of +646.4 ha with respect to the estimated corrected one (18875.77 ha ± 63.94 ha). Thus, discriminating type_4 would have required improvements. In fact, both PA (57.49% ± 3.76%) and F1-values (72.82%) were found to be rather low for As evidenced in Table 7, the F1-measure values obtained for type_1 and type_2 habitats resulted quite high (89.17% and 97.29%, respectively) for the "4_scenes" configuration. Instead, the F1 value for type_4 habitat resulted rather low (57.81%) due to a PA of 41.23%±0.01%. This habitat was misclassified as type_2. For the latter habitat, the Area mapped was overestimated of +646.4 ha with respect to the estimated corrected one (18875.77 ha ± 63.94 ha). Thus, discriminating type_4 would have required improvements. In fact, both PA (57.49% ± 3.76%) and F1-values (72.82%) were found to be rather low for habitat type_3. The area of this habitat resulted in an underestimation of 210.76 ha with respect to the estimated corrected value (500. 66 ± 24.39). This underestimation appears to be due to misclassifications of both type_1 and type_2, thus also requiring improvements.
The configurations based on single-spectral index intra-annual time-series, i.e., "GNDVI", "MSAVI", "NBR", "NDRE" and "REP" were then considered for the required improvements. These time-series provide the contribution of phenology information.
For each habitat class, better PA and UA values were obtained from "MSAVI" configuration with respect to the other ones (i.e., "GNDVI", "NBR", "NDRE", "REP"). Hence, only the findings from "MSAVI" are reported in Table 7. When compared to "4_scenes", the MSAVI intra-annual time-series provided an improvement of the F1-measure from 57.81% to 69.90% for type_4 habitat. Instead, the F1-measure resulted slightly reduced for two classes, i.e., from 89.11% to 84.60% for type_1 and from 72.82% to 68.34% for type_3. For type_2, the F1-measure remained quite stable (about 97%). This trend remained stable throughout the experiments carried out.
Pairs of spectral indices time-series were then combined to enlarge the spectrum portion analyzed. The MSAVI time-series with either GNDVI or NBR performed better than the one from single-spectral index configurations for habitats type_3 and type_4. With respect to the reference "4_scenes" results, the "MSAVI+NBR" configuration (Table 7) provided an increase in F1-measure for type_3, from 72.82% to 76.55%, and for type_4, from 57.81% to 75.88% (Table 7). This configuration performed better than "MSAVI+GNDVI". The addition of NDRE or REP spectral indices yielded no improvement in accuracy of habitat classification, thus they were not exploited in combination with other indices.
The time-series of the three-spectral indices, "GNDVI+MSAVI+NBR", covering the whole spectral region, were subsequently analyzed. With respect to the previous twospectral indices configurations, slight improvements in the F1-measure were achieved for all the habitat classes. Whereas, compared to the "4_scenes" results, a significant increase in the F1-measure was achieved for habitat type_3, from 72.82% to 77.95%, and for habitat type_4, from 57.81% to 80.31%. A slight decrease was obtained, instead, for habitat type_1, from 89.11% to 86.09%.
By adding the elevation information from the DTM layer to the three "GNDVI+MSAVI +NBR" indices, some increase was achieved for habitats type_1 and type_4. With respect to the "4_scenes" configuration, a slight improvement of F1-measure was obtained for type_1 from 89.11% to 90.84%, whereas a more significant was achieved, from 57.81% to 82.21% for type_4.
Thereafter, the integration of "4_scenes" with "MSAVI" input features was tested to evaluate the advantage of combining dense temporal information, from an intra-annual single-index time-series, with the spectral bands of the four multi-season images. By reducing the input dimensionality from 90 (30 images per each of the three indices) to 70 (10 bands per each of the four images, plus 30 MSAVI values), the computation complexity decreased. With respect to the "4_scenes" configuration, the F1-measure improved from 89.11% to 93.73%, from 72.82% to 79.97% and from 57.81% to 73.95%, respectively, for three habitats, i.e., type_1, type_3 and type_4. Whereas, the F1 value remained stable for class type_2.
Finally, the DTM layer was added to the previous configuration ("4_scenes + MSAVI"). The OA value obtained remained almost the same (95.45% ± 0.30%), whereas the F1measure was the highest among the whole set of configurations for type_1 and type_3, as reported in Table 7. Compared to the "4_scenes" results, the F1 values improved from 89.11% to 95.25%, from 72.82% to 80.27% and from 57.81% to 75.97%, respectively, for type_1, type_3 and type_4. However, the F1 value remained stable for class type_2 (Table 7). Figure 9 shows some close-ups of the output maps obtained from the seven configurations analyzed for habitats type_1, type_3 and type_4.
Finally, the DTM layer was added to the previous configuration ("4_scenes + MSAVI"). The OA value obtained remained almost the same (95.45% ± 0.30%), whereas the F1-measure was the highest among the whole set of configurations for type_1 and type_3, as reported in Table 7. Compared to the "4_scenes" results, the F1 values improved from 89.11% to 95.25%, from 72.82% to 80.27% and from 57.81% to 75.97%, respectively, for type_1, type_3 and type_4. However, the F1 value remained stable for class type_2 (Table 7). Figure 9 shows some close-ups of the output maps obtained from the seven configurations analyzed for habitats type_1, type_3 and type_4.  Figure 9 shows examples of output grassland habitats varying with the input features of the classifier. For habitat type_1, the addition of the DTM to both "GNDVI+MSAVI+NBR" and "4_scenes + MSAVI" improved the recognition of the patch (Figure 9a). For habitat type_3, the best result appeared the one from "4_scenes + MSAVI" (Figure 9b). For habitat type_4 ( Figure   Figure 9. Close-ups from output maps from the input configurations analyzed. Each habitat reference polygon overlaid on the maps. A red frame highlights the best result. Patches for habitat class: (a) Type_1; (b) Type_3; (c) Type_4. Figure 9 shows examples of output grassland habitats varying with the input features of the classifier. For habitat type_1, the addition of the DTM to both "GNDVI+MSAVI+NBR" and "4_scenes + MSAVI" improved the recognition of the patch (Figure 9a). For habitat type_3, the best result appeared the one from "4_scenes + MSAVI" (Figure 9b). For habitat type_4 ( Figure 9c) the best discrimination occurred with "GNDVI+MSAVI+NBR+DTM".
In order to provide quantitative evaluations of the whole scene classification, the corrected stratified area estimator (per each class) and its standard error within 95% confidence interval values are also reported in Table 7. The values found allowed quantification of habitat over/under-estimations over the whole area mapped. Several problems of over/underestimation are evidenced in this Table. For type_1 habitat, the Area mapped was overestimated with respect to the Area corrected in all the configurations ("plus" sign in brackets, Table 7). The Area mapped reported the lowest value (1418.27 ha) when "MSAVI" configuration was analyzed, while the highest value (2481.36 ha) was obtained from "4_scenes + MSAVI"configuration.
Habitat type_2 is the dominant class in the scene and thus the most represented in the training samples. For this habitat, the highest Area mapped and overestimation values (19,808.83 ha and + 608.10 ha) were obtained from the "MSAVI", whereas the lowest values (18,092.50 ha and + 467.83) were achieved from the "GNDVI + MSAVI + NBR + DTM" configuration.
Habitats type_3 and type_4 were underestimated in all the configurations ("minus" sign in brackets, Table 7). These habitats were misclassified not only with the dominant class type_2 but also with the type_1. For type_3 the highest Area mapped value (429.30 ha) was obtained from "MSAVI+NBR" while the lowest (289.90 ha) value was achieved from "4_scenes". For type_4 the highest (1751.57 ha) and the lowest (449.57 ha) Area mapped values resulted from "GNDVI + MSAVI + NBR + DTM" and "4_scenes" configurations, respectively. For all the configurations explored, the findings reported required an improvement to reduce the over/underestimations described. For this purpose, a combination of the classifiers outputs was carried out through a majority vote (MV) technique [99].

Combination of Different Configurations
For each pixel, the MV algorithm can read the label obtained from the seven configurations and then may assign the pixel analyzed to the most frequent class found. In the present study, we chose as the most frequent class the label assigned by at least four classifiers out of seven. The OA obtained was 95.72% ± 0.29%. The criteria of the most frequent class were unsatisfactory for only 0.9% of the pixels classified. To each pixel an uncertainty value was assigned. This value was defined as the number of classifiers in disagreement with the final MV output class. Table 8 reports the OA obtained by the validation pixels which can satisfy each uncertainty value. As can be noted, to a decrease in the uncertainty corresponds an increase in the OA value. As a result, the overestimation of habitat type_1 was reduced. For each class, Table 9 compares the habitat coverage obtained by MV with the one from the best single configuration found, i.e., "4_scenes + MSAVI + DTM".  Figures 10 and 11 report some close-ups which evidence the reduction in habitat type_1 overestimation through the MV. Specifically, Figure 10 shows the reduction in type_1 pixels performed by the MV. As can be noted, the MV confirms type_1 patches classified by more than four classifiers but cleans up the remaining type_1 pixels in favor of habitat type_2. The MV impact is also evident in Figure 11. In this figure, pixels misclassified as type_4 by the "4_scenes+MSAVI+DTM" configuration were mostly correctly assigned as type_2 (Figure 11b). The related uncertainty values are shown in Figure 11c.  10 and 11 report some close-ups which evidence the reduction in habitat type_1 overestimation through the MV. Specifically, Figure 10 shows the reduction in type_1 pixels performed by the MV. As can be noted, the MV confirms type_1 patches classified by more than four classifiers but cleans up the remaining type_1 pixels in favor of habitat type_2. The MV impact is also evident in Figure 11. In this figure, pixels misclassified as type_4 by the "4_scenes+MSAVI+DTM" configuration were mostly correctly assigned as type_2 (Figure 11b). The related uncertainty values are shown in Figure 11c.   Figure 12 reports an example of a habitat type_4 patch. For this patch, the classifier, "4_scenes+MSAVI+DTM" provides a better habitat classification than MV. The latter classifier increased the pixels belonging to type_2 habitat.  Figure 12 reports an example of a habitat type_4 patch. For this patch, the classifier, "4_scenes+MSAVI+DTM" provides a better habitat classification than MV. The latter classifier increased the pixels belonging to type_2 habitat. Figure 11. Close-ups from classified maps obtained by: (a) the best classifier configuration ("4_scenes + MSAVI + DTM"); (b) the MV; (c) uncertainty map with blue polygons evidencing reference habitat type_2 patches on the ground. Figure 12 reports an example of a habitat type_4 patch. For this patch, the classifier, "4_scenes+MSAVI+DTM" provides a better habitat classification than MV. The latter classifier increased the pixels belonging to type_2 habitat. The MV behavior found in the last case evidenced the failure to solve misclassifications associated to a specific class, as also evidenced in [100,101]. In our study, the failure concerning habitat type_4 may be due to several issues: first, a non-adequate reference training collection; second, the semantic habitat description adopted for habitat discrimination. Thus, these issues may have influenced the performance of all classifiers.
In previous papers, the grassland layer was obtained by ancillary data which were not always either updated or available at the same spatial scale [13,32,35,36]. Thus, in such studies, the grassland layer accuracies reported could have affected the quality of the The MV behavior found in the last case evidenced the failure to solve misclassifications associated to a specific class, as also evidenced in [100,101]. In our study, the failure concerning habitat type_4 may be due to several issues: first, a non-adequate reference training collection; second, the semantic habitat description adopted for habitat discrimination. Thus, these issues may have influenced the performance of all classifiers.
In previous papers, the grassland layer was obtained by ancillary data which were not always either updated or available at the same spatial scale [13,32,35,36]. Thus, in such studies, the grassland layer accuracies reported could have affected the quality of the habitat mapping. In this paper, instead, the first stage of the classification system was employed to provide a grassland layer and a habitat map obtained at the same date.
In the first stage of the proposed system, a binary (grassland/non-grassland) and a multi-class approach were first tested and compared for the extraction of the grassland layer. The findings revealed slight differences between the two approaches with higher accuracy and lower misclassifications obtained by the multi-class classifier. Specifically, the multi-class classification produced a grassland layer with high OA, UA and PA of 98.94%, 98.31% and 99.60%, respectively. This layer was extracted from the same images used for habitat mapping in the second stage.
The grassland layer can also be extracted from a CLC map. An additional opportunity is offered by the recent availability of the Copernicus grassland service. However, these grassland products are not yearly updated. The To evaluate the available grassland services, the layer obtained from our first stage was thus compared with the grassland layer extracted by the existing CLC map and the high resolution Copernicus grassland one (10 m), both available for 2018. The quantitative comparison results ( Table 6) have evidenced that the best grasslands OA was achieved by the SVM (98.94 ± 0.08) classifier of our first stage system with four multi-season images as input. A qualitative analysis of specific areas in the three outputs compared can evidence some important misclassifications found in both the CLC and the Copernicus layers. For example, Figure 13a shows a woodland patch surrounded by grasslands in an orthophoto. Figure 13b-d show the grasslands pixels as obtained by the three products compared: our first stage output, the CLC and the Copernicus grasslands layers, respectively. As can be observed, both our first stage classifier (Figure 13b) and the Copernicus service (Figure 13d) do correctly identify the grasslands pixels, whereas the CLC (Figure 13c) overestimated such pixels also covering the woodland area. To evaluate the available grassland services, the layer obtained from our first stage was thus compared with the grassland layer extracted by the existing CLC map and the high resolution Copernicus grassland one (10 m), both available for 2018. The quantitative comparison results ( Table 6) have evidenced that the best grasslands OA was achieved by the SVM (98.94 ± 0.08) classifier of our first stage system with four multi-season images as input. A qualitative analysis of specific areas in the three outputs compared can evidence some important misclassifications found in both the CLC and the Copernicus layers. For example, Figure 13a shows a woodland patch surrounded by grasslands in an orthophoto. Figure 13b-d show the grasslands pixels as obtained by the three products compared: our first stage output, the CLC and the Copernicus grasslands layers, respectively. As can be observed, both our first stage classifier (Figure 13b) and the Copernicus service ( Figure  13d) do correctly identify the grasslands pixels, whereas the CLC (Figure 13c) overestimated such pixels also covering the woodland area. An additional example is provided in Figure 13e-h. Specifically, Figure 13e shows an agricultural area in the orthophoto. As can be noted in Figure 13f, our first stage classifier misclassified only a few (natural) grasslands pixels (label A12/A2 in Table 5) in the cultivated area. Figure 13g shows correctly, that the CLC did not detect any grassland pixels in the same area. This is due to the inclusion of land use information in CLC map as well as to its lower resolution with respect to our product (10 m). Figure 13h, instead, evidences a misclassification of grasslands in the Copernicus layer (20 m). Such a result is due to the fact that the Copernicus service can provide only LC information. An additional example is provided in Figure 13e-h. Specifically, Figure 13e shows an agricultural area in the orthophoto. As can be noted in Figure 13f, our first stage classifier misclassified only a few (natural) grasslands pixels (label A12/A2 in Table 5) in the cultivated area. Figure 13g shows correctly, that the CLC did not detect any grassland pixels in the same area. This is due to the inclusion of land use information in CLC map as well as to its lower resolution with respect to our product (10 m). Figure 13h, instead, evidences a misclassification of grasslands in the Copernicus layer (20 m). Such a result is due to the fact that the Copernicus service can provide only LC information.
For the first-stage output of the system, the findings have confirmed both the usefulness of high resolution (10 m) data and the multi-class analysis adopted, with respect to the binary approach. Thus, the proposed first-stage setting has adequately extracted the grassland layer to be used in the second stage for habitat mapping. In addition, this layer can be updated whenever a new habitat map is required, at the same spatial and temporal image resolution.
For the second stage of the system, the introduction of times-series of spectral indices improved habitat discrimination. The results obtained were better than the ones by analyzing only the four multi-season data sets (first configuration, Table 7) including the biomass pre-peak (30 January), peak (20 April), post-peak (27 October) and the dry season (19 July) for grasslands. The hypothesis justifying these findings can be the fact that plant communities composing the habitats under study may not contemporarily reach these phases, as also reported by Rapinel et al. (2019). For example, the therophytic communities, including type_3 and part of type_4, are composed by different species with different times of biomass peak. Some of them can reach the peak in April, others toward the end of May (Table 1).
By adding the MSAVI time series to the "4_scenes" configuration, we intensified phenology information. The choice to add MSAVI time series was due to the better results achieved with respect to GNDVI and NBR, evaluated singularly (Section 3.2). In addition, MSAVI can represent the contribution of the reflectance vegetation highest peak in the NIR spectrum region.
Following the suggestion by Groβe-Stoltenbeg et al. [37], a three-indice time series (i.e., GNDVI+MSAVI+NBR) covering the whole Sentinel-2 spectral range from VIS to NIR, was analyzed. The results achieved were quite comparable in terms of F1 measurements with the ones obtained from our "4_scenes+MSAVI" configuration. The OA value reported from these two configurations (93.97% and 95.29%, respectively) were better than the one (91%) obtained by Schuster et al. (2015) [1]. These authors extracted three spectral indices (NDVI, NDVI-RE-R, NDVI-NIR-RE) from a one-year Rapid-Eye time-series obtained spanning over three vegetation periods from April 2009 to September 2011 to achieve a rather dense and relatively homogenous intra-annual time series. However, the spectral indices chosen by these authors and the satellite data used were different from ours.
As evidenced by Favel et al. (2019), DTM can yield management information useful for the discrimination of habitats characterized by similar phenology and spectral features. In particular, in our study area, lower quotas are more accessible for pastures. Thus, they are subjected to sheep grazing. Such living stock feeding can advantage or inhibit the presence of plant species, which characterize specific habitats. Thus, when DTM information was added to configurations selected a further improvement was achieved for OA values and in particular for F1 findings (Table 7). Specifically, F1 values for classes type_1 and type_3 obtained from "4_scenes + MSAVI + DTM" (70 input layers) were 95.25% and 80.27%, respectively. These values were better than the ones resulting from "GNDVI + MSAVI + NBR + DTM" (90 input layers), i.e., 90.84% and 73.02%, respectively. These results may be due to the well-known over-fitting phenomena which occurs when the dimension of input configuration is rather high with respect to the small size of the available training set [101].
With respect to habitat type_1, our results are in agreement with the ones obtained by Marcinkowska-Ochtyra (2019) through the integration of DTM with hyperspectral data. In terms of UA and PA, our findings (86.09% and 96.15%), are better than the ones reported by Zlinszky et al. (2014) (66.4% and 78.6%), obtained by using Lidar data from two different campaigns.
For each input pixel, the "4_scenes + MSAVI + DTM" configuration involved 71 values, i.e., 10 bands for each of the four images analyzed, plus 30 MSAVI values from the time series and the specific pixel elevation value. Instead, the "GNDVI+MSAVI+NBR+DTM", involved 91 values, i.e., three index values per 30 images and elevation. Thus, the former configuration resulted in a reduction in computational costs. This reduction did not require the application of a pre-processing phase for feature extraction as done by Rapinel et al. (2019). These authors analyzed the phenology information embedded in the whole set of spectral bands of a Sentinel-2 time-series (12 images) instead of spectral indices. By reducing the input bands from 120 to 26, they mapped seven grassland plant communities, thus reporting an OA of 78%. This result is lower than the one obtained from both our "GNDVI + MSAVI + NBR + DTM" and "4_scenes + MSAVI + DTM" configurations (Table 7).
Thus, phenology information from Sentinel-2 time series can better discriminate grassland habitats with different times of increase and decrease speed of photosynthesis, start/end of growing season, duration of photosynthetic activity [102,103]. Such metrics, which are generally collected through in-field campaigns, appear embedded in the time series indices.
This finding suggests that neither four multi-season images nor phenology information extracted from a single spectral index alone may be sufficient to discriminate the grassland habitats under study. Instead, the combined use of phenology and spectral bands of four multi-season images (last configuration) can guarantee the achievement of rather high F1-measures values for most of the habitat classes considered ( Table 7). The highest F1-measures accuracies could be obtained for type_1 and type_3 (95.25% and 80.27%, respectively) whereas the accuracy for type_2, that was already very high with the first "4_scenes" configuration (F1-measure equal to 97.29%), remained stable in all the experi-ments. For type_4 habitat the F1-measure was the lowest (75,97%) of the configurations analyzed. This might be due to the fact that species reach Biomass Peak in the first half of April, occasionally in late March. However, no cloud free images were available in March (Table A1, Appendix A) to extract MSAVI spectral index values in that month. Moreover, only two images were available from the first half of April (8 and 13 April). Actually, a rather denser intra-annual spectral index time-series would have helped in discriminating this habitat from the others in the Biomass Peak period. To this aim, gap filling techniques of clouded images should be applied in future work [78,104].
The application of the MV reduced the misclassifications of type_1. Thus, a slight improvement of the OA (up to 2%) with respect to the OA of each specific configuration was obtained by combining the outputs of all the configurations. These findings, together with the difficulty of identifying the optimal classifier for all four of the habitats, make the MV approach attractive for mapping. Equally evidenced by Foody et al., (2007) [105], although not yielding the most accurate classification, the MV can provide useful information on classification uncertainties. These values can be used for post-classification fieldwork aimed to collect additional training/validation reference samples and reduce in-field costs in future campaigns. Figure 14 shows a close-up of a woodland area where an attempt at reforestation must have failed. Overlaid on the orthophoto, the following maps are shown: Figure 14a, the regional map evidencing type_2 pixels located only in the surrounding woodland area; Figure 14b, the "GNDVI + MSAVI + NBR + DTM" map showing both type_2 and type_4 habitat pixels within the woody patch; Figure 14c, "4_scenes + MSAVI + DTM" map reporting, within the woody patch, type_2 pixels, but showing less type_4 pixels than in Figure 14b; Figure 14d, the MV map based on the convergence of ≥4 classifiers; such a map appears similar to the one in Figure 14c; Figure 14e, the uncertainty map of the area evidencing higher values for pixels assigned to type_4 habitat.
Besides the quantitative results reported above, the qualitative comparison carried out in Figure 14 evidences that the automatic two-stage mapping proposed in our study can recognize the presence of type_2 and type_4 in the area analyzed. In this area, in-field campaigns had not been carried out by the local expert involved in the regional mapping. In fact, to reduce in-field campaigns costs, the expert ecologists generally use visual interpretation of available orthophotos. However, such orthophotos generally cover only single season information. Thus, the approach proposed in our work can provide updated habitat maps and related uncertainty values on a large scale. In turn, these values may represent a valuable support for specific in-field investigations. The 2018 orthophoto close ups: (a) habitat map available, from Apulia Regional portal (2020); (b-d) output maps from "GNDVI + MSAVI + NBR + DTM", "4_scenes + MSAVI + DTM" and Majority Vote, respectively; (e) uncertainty map.
Besides the quantitative results reported above, the qualitative comparison carried out in Figure 14 evidences that the automatic two-stage mapping proposed in our study can recognize the presence of type_2 and type_4 in the area analyzed. In this area, infield campaigns had not been carried out by the local expert involved in the regional mapping. In fact, to reduce in-field campaigns costs, the expert ecologists generally use visual interpretation of available orthophotos. However, such orthophotos generally cover only single season information. Thus, the approach proposed in our work can provide updated habitat maps and related uncertainty values on a large scale. In turn, these values may represent a valuable support for specific in-field investigations.

Conclusions
The aim of the present study was to investigate the improvements that can be obtained by exploiting phenology information extracted from intra-annual indices time-series of Sentinel-2 data for mapping grassland habitats in "Murgia Alta". Based on multi-seasonal information from four Sentinel-2 images, the first classification stage provided an updated grassland layer. At the local scale, this layer showed less misclassified pixels than the ones from CLC and Copernicus services. In the second stage, the combination of four multiseasonal spectral band images with both the phenology embedded in the MSAVI spectral index time-series and DTM, provided more reliable habitats discrimination. Nevertheless, the Majority Vote ensemble of the different second-stage input configurations used offered uncertainty measures useful for addressing in-field campaigns for validation purposes. In a future study, to improve the discrimination of the habitats, gap filling techniques will be implemented for obtaining more dense time-series.
The habitat mapping of a Natura 2000 sites must be updated every 6 years, according to the European Habitat Directive. For large areas, this task may be achieved by developing cost/time-efficient automatic tools such as the ones proposed in the present study. The two-stage classification system proposed, which is based on the use of machine learning algorithms, could contribute to speeding up the design of both urgent conservation policies and sustainable regeneration actions of natural ecosystems. This would address both the drastic biodiversity loss and the ongoing climatic changes.
The availability of habitat maps updated both at high temporal and spatial resolutions is very important for decision making at local levels. Therefore, updated habitats maps could be useful variables for monitoring the achievement of Sustainable Development Goal 15 (SDG 15) aimed towards halting biodiversity loss (https://sdgs.un.org/goals/goal15). Acknowledgments: The authors are grateful to "Murgia Alta" National Park Authority for the remarkable ecological support and to Maria Tarantino for the patient reviewing of the English version of the paper.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.