Machine Learning Classiﬁcation of Mediterranean Forest Habitats in Google Earth Engine Based on Seasonal Sentinel-2 Time-Series and Input Image Composition Optimisation

: The sustainable management of natural heritage is presently considered a global strategic issue. Owing to the ever-growing availability of free data and software, remote sensing (RS) techniques have been primarily used to map, analyse, and monitor natural resources for conservation purposes. The need to adopt multi-scale and multi-temporal approaches to detect different phenological aspects of different vegetation types and species has also emerged. The time-series composite image approach allows for capturing much of the spectral variability, but presents some criticalities (e.g., time-consuming research, downloading data, and the required storage space). To overcome these issues, the Google Earth engine (GEE) has been proposed, a free cloud-based computational platform that allows users to access and process remotely sensed data at petabyte scales. The application was tested in a natural protected area in Calabria (South Italy), which is particularly representative of the Mediterranean mountain forest environment. In the research, random forest (RF), support vector machine (SVM), and classiﬁcation and regression tree (CART) algorithms were used to perform supervised pixel-based classiﬁcation based on the use of Sentinel-2 images. A process to select the best input image (seasonal composition strategies, statistical operators, band composition, and derived vegetation indices (VIs) information) for classiﬁcation was implemented. A set of accuracy indicators, including overall accuracy (OA) and multi-class F-score (F m ), were computed to assess the results of the different classiﬁcations. GEE proved to be a reliable and powerful tool for the classiﬁcation process. The best results (OA = 0.88 and F m = 0.88) were achieved using RF with the summer image composite, adding three VIs (NDVI, EVI, and NBR) to the Sentinel-2 bands. SVM and RF produced OAs of 0.83 and 0.80, respectively.


Introduction
The contemporary management and conservation of natural heritage have assumed strategic importance globally. In the European Union, a key role is played by the Natura 2000 network, which is a network of high-natural-value sites to be protected, set up in the framework of two different, but integrated, European directives (79/409/EEC-Birds Directive and 92/43/EEC-Habitat Directive). In this context, to ensure the achievement of the related conservation aims, a repeatable method is needed to monitor the changes in habitats and species occurring over time [1]. Moreover, each European Member State must report, every six years, the status of protected habitats and species that fall within its borders [2]. Accordingly, knowledge of the spatial pattern of typical species is fundamental for habitat monitoring issues. Forests play a major role in nature conservation. Their specific monitoring is relevant to the computational capability but suffering high load and processing massive volumes of data [14]. On the other hand, cloud-based platforms-that is, virtualising supercomputer infrastructures-offer a more user-friendly approach.
The Google Earth engine (GEE; https://earthengine.google.com) [51] is a free cloudbased computational platform that uses Google's cloud and JavaScript-based language to access and process petabyte scales of remotely sensed data on a global scale. GEE takes advantage of Google's computational infrastructure to reduce operational time and provide a repository for script storing and sharing, allowing for broad collaboration between different users with minimal cost and equipment. The GEE data catalogue allows access to multiple satellite data and satellite-derived products [52].
Moreover, GEE offers different image collecting, analysis, processing, classification, and export packages, ensuring that users no longer have to depend solely on commercial software [14,52,53]. To date, GEE has been widely used for different mapping purposes; primarily to exploit its massive catalogue of images to capture time-series data or derive useful information to analyse phenomena over a long period. The capabilities of GEE, jointly with external software and applications, have been widely explored using different data and algorithms for a wide range of applications, such as forest mapping [26,54,55], LU/LC classification [56,57], fire severity analysis [58], forest disturbance mapping [59], forest defoliation assessment [60], surface water detection [61], mine mapping [62], snow [63] and shoreline detection [64], urban and rural settlement mapping [65,66], and species habitat monitoring [67].
This paper proposes a novel approach which is entirely developed in the GEE environment, exploiting its full potential to test different combinations of variables and obtain solid data collection to improve each classification step. Moreover, this approach has made it possible to exploit the strengths of both an unsupervised and supervised classification to optimise the acquisition of input data and then achieve the highest levels of accuracy.
The novelty of this research relies upon the proposed workflow for analysing different image and variable composites (i.e., seasonal image collections, reflectance bands, and VIs) to improve Mediterranean mountainous forest habitat classification accuracy. Moreover, the in-depth investigation of these data and their combination enabled us to optimise training and validation points selection, which is a delicate step in the classification process.
Furthermore, we aimed to make the analysis, classification, and mapping processes accessible to different users with different backgrounds and software and hardware availabilities. Therefore, it was crucial to test GEE's responses, in order to carry out the entire work process without using third-party applications.
Furthermore, all scripts can be shared, edited, and updated for rapid application, in order to support decision-making in planning and monitoring activities.
The research reported in this paper was conceived with three main objectives: 1.
To test the potential of the GEE platform and Sentinel-2 data to classify forest habitat in a protected natural national park representative of the Mediterranean region, which includes remarkable Natura 2000 sites, performing the whole process inside the code editor environment of GEE; 2.
To test how different variables and their combinations, all available in GEE, can improve the classification performance (e.g., combinations of input images, bands, reflectance indices, and so on); 3.
To compare and assess the performance of different machine-learning classification algorithms, in terms of the obtained classification accuracy.
Our goal was to define a method which, although tested in a specific region, might be adapted and applied in a broader context concerning the same classification issue, using similar data types and for similar Mediterranean Natura 2000 forest habitats.

Materials and Methods
Our proposed method, overviewed in Figure 1 and applied in an Italian mountainous protected area, was based entirely on the use of the GEE cloud platform environment. Workflow of the presented method. The first part shows the image pre-processing step to obtain the time-series. The second part shows all image processing steps to obtain image composites. The third part shows all the classification process components, while the fourth concerns the accuracy comparison step to obtain the final output (the last step). List of abbreviations: NDVI (normalised difference vegetation index); GNDVI (green normalised difference vegetation index); EVI (enhanced vegetation index); NBR (normalised burn ratio); NDII (normalised difference infrared index); DEM (digital elevation model); CART (classification and regression tree); RF (random forest); SVM (support vector machine); OA (overall accuracy).
Starting from the available images in the GEE catalogue, we built a consistent timeseries. We tested how the use of different compositions of images, grouped according to their seasonality, together with different combinations of spectral bands and VIs, affected the classification result, leading to identification of the best input image that provided the highest accuracy. Finally, this image was used to test three different classifiers, in order to detect which of them provided the best result, in terms of classification accuracy.
Our proposed method can be summarised as follows: image pre-processing, including image selection, filtering, and time-series building; image processing (derivation of seasonal information, image reduction); classification (training and validation points acquisition, best input image selection, supervised machine learning algorithms and parameters optimisation); accuracy assessment and comparison; and mapping.

Study Area
The study area where the method was applied is part of the natural Aspromonte National Park, in the Calabria region (Southern Italy; see Figure 2). This natural park was established in 1994 and had a total surface of 65,647.46 ha (www.parcoaspromonte.gov.it-last accessed on 5 November 2020).
Aspromonte National Park is located in the very south of Calabria, and presents altitudes ranging from 100 m a.s.l to 1956 m a.s.l., with the highest peak corresponding to Mt. Montalto. Calabria is located in the extreme south of the Italian peninsula, covering about 15,000 km 2 , and presents an elongated shape, extending north to south and separating the Ionian Sea from the Tyrrhenian Sea. Calabria lies in the centre of the Mediterranean basin, and shows a typical, although diversified, Mediterranean climate [67]. The Apennine Mountains run along the central part of the region, marking climatic differences between the Ionian and Tyrrhenian sides [68].
This climate diversity is mirrored by an analogous diversity in vegetation composition and distribution [69], playing an essential role in defining and maintaining a natural habitat network [70]. This is why we decided to locate the study area in the Aspromonte National Park, where considerable forest biodiversity within a relatively small area can be observed, thus making LU/LC classification particularly challenging. Aspromonte is characterised by a gradient of natural vegetation, following the altitudinal belts from the Meso-Mediterranean Holm oak forests (Quercus ilex L.) and thermophilous oak forests (Quercus frainetto Ten., Quercus pubescens s.l.) to mountain black pine forests (Pinus nigra subsp. Calabrica Loud.) and beech forests (Fagus sylvatica L.) with fir (Abies alba Mill.). This vegetation complex contains many endemic (Calabrian) and sub-endemic (Calabrian-Sicilian) plant species [71].
IT9350133 "Monte Basilicò-Torrente Listì"; (iii) IT9350150 "Contrada Gornelle"; (iv) IT9350154 "Torrente Menta"; (v) IT9350155 "Montalto"; and (vi) IT9350180 "Contrada scala". These sites hosted several relevant forest habitats, such as Apennine beech forests with Taxus and Ilex (code 9210*), Apennine beech forests with Abies alba and beech forests with Abies nebrodensis (code 9220*), Castanea sativa woods (code 9260), Southern Apennine Abies alba forests (code 9510*), and (Sub-)Mediterranean pine forests with endemic black pines (code 9530*).  According to Chaves et al. [18], we used Sentinel-2 imagery, because its bands are considered more suitable for vegetation analysis, thanks to its finer spatial resolution compared to other satellite images and its wavelength sensitivity to chlorophyll content and phenological states [72], making the VIs more accurate for LU/LC discrimination [73]. The payload carried on Sentinel-2 is a multispectral sensor with 13 bands at different ground resolutions (Table 1). Sentinel-2 data are available in two different levels, based on the atmospherical correction status of the images: Level 1C for top-of-atmosphere (TOA) images, and Level 2A for bottom-of-atmosphere (BOA) reflectance. The former need to be atmospherically corrected, in order to obtain useful reflectance values. This correction cannot be carried out in the GEE JavaScript-based environment, but requires third-party applications or Python API [74][75][76]. To adhere to the research objectives, we used level 2A images, which are already atmospherically corrected.
In the GEE code editor environment, we imported Sentinel-2 level 2A images as an image collection (i.e., a set of Google Earth engine images).

Image Filtering and Time-Series Extraction
To reduce the image collection step, a filtering process was needed. In this work, we filtered the image collection using three variables: covered location, time interval, and cloud percentage. The location of the images was given by the boundaries of the three square plots under study that we imported in GEE environment. The time period was fixed, starting from 28 March 2017 (the first Sentinel-2 BOA available image in GEE) to 31 July 2020, thus covering more than three consecutive years. Finally, we used the cloudy pixel percentage value, stored as metadata for each image, to select and extract only those images with a cloud cover less than 10% in the study area.

Vegetation Indices
Time-series can be used to extract the seasonal variability of each pixel, using the values of several VIs as indicators [42,77]. The link between VIs and vegetation phenological variability is more robust than that of single bands [78]. Computing different VIs for each image of a time-series can be a time-consuming activity. The GEE JavaScript-based code editor environment allows for the straightforward and simultaneous calculation of VIs for each time-series image.
All computed indices can be added as a separate band for each image. Six VIs, listed in Table 2, were taken into account in this work.  [83] The first index was the normalised difference vegetation index (NDVI) [79], based on the normalised ratio of near infrared (NIR) and red bands. This is a well-known and widely used VI, which allows for the identification of photosynthesising vegetation by investigating the bands of higher absorption and chlorophyll reflectance. The NDVI can assume values between −1 and 1.
The green normalised difference vegetation index (GNDVI) [80] has been developed to estimate leaf chlorophyll concentration, and uses the green band instead of the red band in the NDVI formula.
The enhanced vegetation index (EVI) [81] involves the NIR, red, and blue bands. It has been developed to achieve better sensitivity in high biomass regions and be more responsive to different species' canopy structural variation by decoupling the canopy background.
The normalised ratio between NIR and short wave infrared (SWIR) bands was taken into account to compute two different indices: the normalised difference infrared index (NDII) [82] and normalised burn ratio (NBR) [83]. Both indices consider the shortwaveinfrared region, a portion of the spectrum which is sensitive to leaf water content. In general, SWIR reflectance decreases when leaf water content increases [84]. The NDII and NBR formulae differ in the SWIR wavelength investigated [85]. The former investigates the region between 1550 nm and 1750 nm (Sentinel-2 B11), while the latter considers the region between 2050 nm and 2450 nm (Sentinel-2 B12).

Image Reduction
All collected images were split into sub-time-series by month of acquisition and grouped to represent different seasonal periods. In this way, we obtained: (i) a winter set of images that highlights the no-leaf status of deciduous species, composed of images acquired in December-March (W_IC); (ii) a late spring-early summer set from April-June (Sp_IC), representing the period of vegetative restart and growth; (iii) a summer set, corresponding to the maximum potential vegetation activity, collected in July-September (Su_IC); and (iv) an autumn set, during the start of the senescence period, highlighted by images from October and November (A_IC), when the leaves of deciduous trees turn to yellow and red colours.
All of the time-series had to be reduced to a single image, containing each selected image's information, to perform the classification.
To obtain a single image, a process of reduction was needed. Single images of the image collection represented the input.
The output, computed pixel-wise, was a new image for which values of all images provide each pixel value in the collection at a given location. The final value was computed using statistical operations. We used the median, mean, maximum, and minimum values of pixels to perform the reduction.

Classification
The classification can be performed either by pixel-or object-based approaches. It has been widely demonstrated that object-based approaches can provide more accurate results when using high-and very high-resolution data [86][87][88]. In contrast, when the object's dimension is smaller than the pixel resolution (e.g., when a single-tree canopy is to be detected in images with 10 m spatial resolution), a pixel-based approach is preferable.
In this study, considering the resolution of the images used to carry out the supervised classification of LC, we adopted a pixel-based approach which, according to Tamiminia et al. [14], is considered as the GEE approach and has been most adopted by the scientific community.
Considering the characteristics of the area and our previous research experiences [89], we chose seven LC classes to perform the classification: water (Wa), bare soil (BS), roads (Ro), chestnut woodland (CW), beech woodland (BW), silver fir woodland (SW), and pine woodland (PW).

Unsupervised Clustering
The first classification step concerned the discrimination between the coniferous and broadleaf forest components, in order to derive homogeneous areas to collect SW and PW class training points in the subsequent process. To this end, an unsupervised classification was performed, in order to highlight potential areas of localisation of these forest types, using W_IC to exploit the leaf-off condition to better distinguish conifers from deciduous broad-leaved woods. Among the various classification approaches, unsupervised image classification methods have the advantage of using the overall spectral content of an image [90], with the essential condition that, in this case, the goal was to grasp the thematic homogeneity represented by conifers. The unsupervised classification was performed in the GEE environment, using the K-means clustering algorithm [91][92][93].
This classifier calculates class means that are evenly distributed in the data space, and then iteratively tries to cluster data, according to their similarity, using minimum spectral distance techniques [94]. Each iteration recalculates means and reclassifies pixels concerning the new means. All pixels are classified to the nearest class using a distance threshold.
In this case, the clustering was based on the Euclidean distance. Once the process was completed, a new layer was obtained, with the homogeneous areas represented only by conifers. The result was validated using ground truth data points.

Determination of Training and Validation Points
To improve the quality of the final classification results, the choice of training and validation points is one of the most critical steps in the whole process [95]. The in-depth investigation of all the available images and the possibility of efficiently combining them to create seasonal ICs enabled us to optimise the selection of training and validation points, thus allowing us to use reference layers that highlighted different forest species behaviours (e.g., winter images allowed for better coniferous detection). The classification method used was pixel-based; therefore, training and validation points represented the defined LC classes to which they belonged, avoiding mixed pixel issues.
For each LC class, a set of 100 points (700 in total) was collected in the GEE map area environment using different reference layers (i.e., unsupervised clustering output, in situ surveys, and different ICs). Each set was formed of 25 training points (25% of the total) and 75 validation points (75% of the total), for each class; thus, with a total number of 175 training points and 525 validation points. All training points were collected through a visual approach, while the validation points were collected, on one hand, using a visual approach justified by a thorough knowledge of the area and, on the other hand, only for the forest LC classes, using several points of known co-ordinates surveyed directly in the field. A total of 50 ground truth points were randomly collected in situ, 20 for the BW class and 10 each for CW, SW, and PW classes. As mentioned previously (see Section 2.4.1), we used the map resulting from the unsupervised clustering, detecting the two main coniferous woodland species existing in the scene, as a reference layer for the collection of coniferous class training points. The winter composite image was used to collect validation points, due to its capacity to highlight the species of interest. For BS and Ro class points, we used the summer composite image as a reference layer for both training and validation points, avoiding the wrong assignment of points caused by a temporary uncovering of the soil due to the regular growth cycles of vegetation. CW and BW class training and validation points were collected using the autumn composite image. In this season, the two broadleaved species present a different colour, due to the higher concentration of carotenoid and anthocyanin pigments caused by the senescence period [96]. Considering that these two species occupy different altitudinal limits, we added the information provided by the 30 m spatial resolution digital elevation model (DEM) available in GEE. Wa points detection did not need a specific image as a reference and, thus, the Su_IC was used as reference layer for both training and validation points.

Machine Learning Classification Algorithms
The GEE environment integrates several different classifiers. We compared the performance of three of them, chosen according to their wide use and reliability in LC classification [11,14,86,[97][98][99][100][101]: random forest (RF), classification and regression tree (CART), and support vector machine (SVM).
RF [102] is a decision method based on using a decision forest consisting of many individual decision trees that operate as an ensemble. Each tree is trained in a different training set from the original one using bootstrap aggregation (namely, bagging) and generating an error estimate, including observations that occur in the original data and not in the bootstrap sample [103].
RF in GEE allows for setting different parameters: the number of decision trees in the forest, the number of variables per split, the minimum leaf population, the bag fraction (i.e., the proportion of training data to be used in the creation of the next tree in the boost), the maximum number of nodes in each tree, and the randomisation seed.
CART [104,105], similar to RF, is a single tree decision classifier. One attribute splits the data into subsets at each node of the tree based on the normalised information gain, resulting in the split's defining attribute. The attribute with the higher value of normalised information gain is chosen to make the final decision [106]. In GEE, only two parameters can be set: the minimum leaf population and the maximum number of nodes.
SVM [107,108] is a non-parametric supervised classifier based on kernel functions. To train the algorithm, SVM learns the boundary between trainers belonging to different classes, projecting them in a multidimensional space and finding one or more hyperplanes maximising the separation of the training data set between the pre-defined number of classes [109].
In the GEE environment, several parameters can be set: the decision procedure (voting or margin), the Kernel type (linear, polynomial, sigmoid, or radial basis function), and the C parameter (how many samples inside the margin contribute to the overall error). Based on these parameters, others can be set (e.g., Kernel degree, if a polynomial function is chosen).
Different combinations of parameters were tested for each classifier with a trial-anderror approach, reporting and comparing the obtained accuracy.

Choice of the Best Input Image for Classification
As highlighted by Phan et al. [110], the choice of the input image can influence the accuracy of the classification process. Different images lead to different accuracy values at the end of the classification process.
To assess the best achievable accuracy, we exploited GEE's potential, which allowed us to test several different input images quickly. They were differentiated, according to the different variables composing each of them, such as band number, reflectance regions, VIs, and statistics used to reduce the image collection (e.g., mean, median, minimum, and maximum values of the pixel). A trial-and-error approach was carried out, testing how the use of one variable rather than another, and their possible combinations, might affect the final classification result. To test the effects of different images on the classification, we used the accuracy values and out-of-bag (OOB) error estimates as indicators. OOB is an unbiased estimate of the actual prediction error of RF and other machine learning algorithms.

Accuracy Assessment
For each performed classification test, a measure of accuracy was performed, considering, as indicators, the overall accuracy (OA), the user's accuracy (UA), and the producer's accuracy (PA).
The OA is the total percentage of classification, given by the ratio between the number of correctly classified units and their total number [111], while UA and PA refer to singleclass classification accuracies [112]. The UA is the ratio between correctly classified and all classified units in a given class, while the PA is the ratio between the number of correctly classified units and the number of validation units in a given class. The UA and PA were calculated, for each LC class, as the mean value of all LC classes, UA m (Equation (1)) and PA m (Equation (2)), respectively.
These accuracy measures were used to calculate the F-Score (F i ) [113,114] (Equation (3)) for each LC and the multi-class F-score (F m ) [115] (Equation (4)), the latter representing a measure of the entire classification process accuracy.
The F-score is the harmonic mean of recall and precision, which have the same meaning as PA and UA, respectively [87,101]: where n is the number of LC classes.

Results
The image collection, after the filtering with the date and location process, consisted of 140 images. After applying the cloud cover threshold (maximum 5%), we obtained 33 elements, as follows: (i) 11 images for the winter season, (ii) 9 images for spring, (iii) 8 images for summer, and (iv) 5 images for autumn. Referring to the acquisition year, we retrieved 5 images for 2017 (starting from the end of March), 11 images for 2018, 11 images for 2019, and 6 images for the first half of 2020 (until the end of July).

Best Input Image Composite (IC)
This section reports the main results obtained after an extensive comparison of different variables (see Section 2.4.4), carried out through a trial-and-error approach, and aimed to highlight the best input image composite, in terms of image seasonality, reflectance bands, and VIs. Table 3  Focusing on single LC classes, the best performance was obtained when classifying Wa, with an F i of 1. Considering the forest classes, the best and worst F i scores were obtained using the autumn input composite (A_IC), for classes CW (0.89) and PW (0.33), respectively. UA, PA, and F i values for each LC class and each examined IC, resulting from the trial-and-error approach, are displayed in Figure A1 (Appendix A). Table 3. The different combinations of image composites (ICs) tested in the first step of the proposed method (i.e., best input image composite selection). The obtained overall accuracy (OA) and multiclass F-score (F m ) are reported for each tested IC. Moreover, the third column reports the out-of-bag (OOB) error estimate values for the random forest (RF) algorithm. Concerning the statistical operator adopted to reduce the image collection, the results are given in Table 4. The best accuracy value was obtained when reducing images with the mean. All used accuracy indicators were equal to 0.88. UA, PA, and F i values for each LC class and each statistical operator, resulting from the trial-and-error approach, are displayed in Figure A2 (Appendix A). Table 4. The different combinations of bands adopted for the Su_IC input image with accuracy information of overall accuracy (OA) and multi-class F-score (F m ). Moreover, the out-of-bag (OOB) error estimate values for the RF algorithm are reported. Once we identified the best image composite, based on the accuracy results, we tested different band combinations for that image. We started by using only visible bands and then added NIR, RE, and SWIR bands.

Statistics
The main results, shown in Table 5, highlight that the solution with all bands was the best input image, with an OOB error estimate of 0.01, an OA of 0.86, UA m and PA m equal to 0.87 and 0.86, respectively, and an F m of 0.87. As expected, the lowest result was obtained using the visible bands (OA = 0.68, UA m = 0.72, PA m = 0.68, and F m = 0.70).
The best-classified forest class was CW when using all bands, with an F i of 0.85; while the worst score was reached for PW (0.49) when using just the visible bands. This image also gave the highest OOB error estimate (0.10). The UA, PA, and F i values for each LC class and each spectral region investigated by the different bands, resulting from all the conducted iterations, are displayed in Figure A3 (Appendix A).
The computed VIs were added to the Su_IC all-bands image, and their contribution to the final result was investigated through the accuracy indicators and the OOB error estimate. The OOB error estimate remained stable (at 0.01) for all tests. Concerning the use of a single VI, the lowest accuracy values were reached using all bands and adding GNDVI or NDII with the same results for OA (0.86) and F m (0.87), while the highest values were reached with the other three VIs, showing the same OA (0.87) and F m (0.88). Although demonstrating identical mean accuracy values, these three combinations of bands and indices showed different F i values for the various LC classes. Table 5. The different combinations of bands adopted on the summer image composite (Su_IC) and the obtained accuracy information, expressed as overall accuracy (OA) and multi-class F-score (F m ). Moreover, the out-of-bag (OOB) error estimate values for the random forest (RF) algorithm are reported. The best forest class score (0.85) was recorded for CW using all bands + NDVI, while the worst (0.79) was achieved for SW when adding the EVI layer to the IC and for BW adding the NBR. These three VIs were added together, reaching the best result with all accuracy indicators equal to 0.88. Table 6 shows the main results of this step. Regarding this combination of bands and VIs, CW registered the highest F i for forest classes, with a value of 0.85, while BW and SW had the lowest score (0.80). The UA, PA, and F i values for each LC class with each VI taken into account are displayed in Figure A4 (Appendix A).

Input Bands
On the basis of the results of this process, the best input image for classification was obtained by reducing all the summer images using the mean statistical operator, considering all available Sentinel-2 bands, and adding NDVI, EVI, and NBR as VIs.

Classification Algorithms
As mentioned in Section 2.4.3, three machine learning algorithms were tested for their supervised pixel-based classification performance. RF obtained the best accuracy scores, with OA and F m values of 0.88, followed by SVM, which registered values of 0.83 and 0.84 for OA and F m , respectively.
The worst accuracy values were reached when performing classification with CART, obtaining an OA of 0.80 and an F m of 0.79. According to the classification results (Figure 3

Classification Algorithms
As mentioned in Section 2.4.3, three machine learning algorithms were tested for their supervised pixel-based classification performance. RF obtained the best accuracy scores, with OA and Fm values of 0.88, followed by SVM, which registered values of 0.  Regarding the confusion matrices (Figure 4), it can be observed that all algorithms reached a satisfactory correspondence in classifying Wa (all validation points were correctly classified).
Considering forest LC classes, BW achieved the highest score in all classifiers, with a percentage of correctly assigned validation points of 93.33% for RF and CART and Regarding the confusion matrices (Figure 4), it can be observed that all algorithms reached a satisfactory correspondence in classifying Wa (all validation points were correctly classified).
Considering forest LC classes, BW achieved the highest score in all classifiers, with a percentage of correctly assigned validation points of 93.33% for RF and CART and 90.67% for SVM. RF misassigned five points, assigning three to CW, one to Ro, and one to PW. SVM wrongly assigned five points to SW, one to PW, and one to Ro. CART misassigned two points to CW, two points to SW, and one point to BS.
RF reached the lowest score in classifying SW, misassigning 12 points to BW and 6 points to PW, reaching a percentage of correctly assigned points equal to 76% (57 points). SVM and CART highlighted PW class as the worst, with percentages of correctly assigned points of 61.33% and 66.67%, respectively. SVM wrongly assigned 16 points to SW, 12 points to BW, and 1 point to Ro; while CART erroneously assigned 13 points to SW, 11 points to BW, and 1 point to Ro. Figure 5 shows the classified maps resulting from each adopted classifier and square plot.
SVM and CART highlighted PW class as the worst, with percentages of correctly assigned points of 61.33% and 66.67%, respectively. SVM wrongly assigned 16 points to SW, 12 points to BW, and 1 point to Ro; while CART erroneously assigned 13 points to SW, 11 points to BW, and 1 point to Ro. Figure 5 shows the classified maps resulting from each adopted classifier and square plot.

Discussion
Unlike other similar studies relying on third-party software [75,76,[116][117][118][119][120][121], we performed the entire workflow process inside the GEE cloud platform and used the JavaScript language to recall variables and functions in the code editor. This is a point of novelty of our research in this field. It is an attractive solution, because it allows different users with different technical devices of relatively low cost to obtain complete and satisfactory results, otherwise obtainable only by using external components that need background knowledge and a significant financial availability to buy high-performing PCs and/or commercial software. Our results show that the accuracy is generally high, compared to that obtained in other similar studies, and were completely in line with the accuracy highlighted in the review of Tamiminia et al. [14].
In previous studies, the data most used by other authors have been those from the Landsat constellation, followed by MODIS products (785 and 55 published papers, respectively, according to a review from GEE's first use until 2020 [14]). We preferred the use of Sentinel-2 images instead, due to their better spatial resolution (10 m for visible and NIR bands, instead of 30 m for Landsat) and the narrow bandwidth in RE and NIR regions.
The classification accuracy is affected by the number and quality of training and validation points. We highlighted how the possibility of exploiting the computational potential of GEE makes it possible-quickly and without a need for storing big data-to use different seasonal ICs for the choice of training and validation points, which is generally considered the most delicate step in the classification process. We also highlighted that, when using the same set of reference data for training and validation, the input image composition choice led to different results. We tested different composition methods (seasonal composition strategies, statistical operators, band composition, and derived VIs information), in order to generate the best spectral input data for LC classification, investigating how this process may affect classification accuracy. Moreover, we tested the performance of three different classifiers (RF, SVM, and CART), using the same composite image as input.
According to the results summarised in Table 2, the worst accuracy results of W_IC may have been due to the non-simultaneous presence of all species on the ground, which generated errors in the LC classification, especially for BS and Ro classes. On the other hand, this composite was useful to discriminate coniferous training points, due to the absence of broadleaved trees in the scene. A progressive enhancement of accuracy values can be noticed with the other seasonal periods, representing the different phases of the senescence/growth cycle, reaching the best results in summer, when vegetative species are in their maximum growth period.
Although previous studies have approached the statistical operator issue in the image reduction process, highlighting median as the most-used [48,111], after testing different solutions with a trial-and-error approach, we obtained the best results using mean pixel values.
It is interesting to highlight that the progressive adding of Sentinel-2 bands to the RGB bands (Table 4) produced a decrease in the OOB error estimate and an increase in accuracy indicators (OA and F m ). This was because these bands add useful spectral information, facilitating the detection of different behaviours of remotely sensed objects, such as the chlorophyll response (in different portions of the NIR and RE wavelengths) or the response of water content (in SWIR wavelengths). A bottleneck of this process can be produced by the spectral similarity among co-occurring species, which could affect the results of a classification process based on multispectral instead of hyperspectral images [122]. Hyperspectral images are more difficult to obtain and require more background knowledge and, so far, require third-party software for their elaboration.
The use of VIs can improve classification accuracy. We tested five VIs (Table 5), three of which (NDVI, EVI, and NBR) produced a better result-in terms of increased accuracythan the other two (GNDVI and NDII). This was probably due to the spectral behaviour of the species present in the study area and the characteristics of the selected input images. Moreover, we tested either a composition with these three best indices and a composition with all indices, highlighting that, for classification purposes, a correct choice of VIs is needed and should be preferred to the mere addition of derived indices.
Concerning the classification process, the choice of the three adopted classifiers was in line with the findings of the review of Taminia et al. [14], who reported RF, CART, and SVM as those which have been most adopted in previous GEE works (97,26, and 21 published papers, respectively). Concerning the accuracy results, RF proved to be the best solution, in agreement with Rodriguez-Galiano et al. [123], who defined RF as being superior to other classifiers. Considering the total surface occupied by each LC class (Figure 4), the classified maps ( Figure 5), and the composition in terms of percentage of each LC class (Figure 6), it can be observed that the three classifiers produced quite different results. In general, it is possible to see that, in SVM-classified maps ( Figure 5), the typical salt-and-pepper effect of pixel-based classification is more accentuated than in the RF and CART maps, especially for Plot 1. RF and CART gave a similar result for the Wa class, while SVM overestimated this LC class by assigning it almost double the area indicated by the other two classifiers. Furthermore, SW was overestimated, while CW and BW were underestimated; mainly CW, which was assigned half the total surface assigned by RF. Concerning BS, the major overestimation was by CART, which assigned almost three-fold more surface assigned by RF. The total area occupied by the forest classes was overestimated either by SVM and CART. The reason for this is probably related to the intrinsic characteristics of RF which, as reported in the literature [124,125], is easy to train and is less sensitive to the quality of training data; unlike SVM, which is very sensitive to mislabelled pixels [126,127]. The results also showed that, in our case-having used the same set of training and validation data-the poor performance of SVM may be due to the remarkably heterogeneous nature of the identified plots. It is important to highlight that the obtained results were strongly linked to the spatial resolution of the available data in the GEE catalogue. It has been demonstrated that the results can be powerfully improved when using higher-resolution sensors [128]. haviour of the species present in the study area and the characteristics of the selected input images. Moreover, we tested either a composition with these three best indices and a composition with all indices, highlighting that, for classification purposes, a correct choice of VIs is needed and should be preferred to the mere addition of derived indices. Concerning the classification process, the choice of the three adopted classifiers was in line with the findings of the review of Taminia et al. [14], who reported RF, CART, and SVM as those which have been most adopted in previous GEE works (97,26, and 21 published papers, respectively). Concerning the accuracy results, RF proved to be the best solution, in agreement with Rodriguez-Galiano et al. [123], who defined RF as being superior to other classifiers. Considering the total surface occupied by each LC class (Figure 4), the classified maps ( Figure 5), and the composition in terms of percentage of each LC class (Figure 6), it can be observed that the three classifiers produced quite different results. In general, it is possible to see that, in SVM-classified maps (Figure 5), the typical salt-and-pepper effect of pixel-based classification is more accentuated than in the RF and CART maps, especially for Plot 1. RF and CART gave a similar result for the Wa class, while SVM overestimated this LC class by assigning it almost double the area indicated by the other two classifiers. Furthermore, SW was overestimated, while CW and BW were underestimated; mainly CW, which was assigned half the total surface assigned by RF. Concerning BS, the major overestimation was by CART, which assigned almost three-fold more surface assigned by RF. The total area occupied by the forest classes was overestimated either by SVM and CART. The reason for this is probably related to the intrinsic characteristics of RF which, as reported in the literature [124,125], is easy to train and is less sensitive to the quality of training data; unlike SVM, which is very sensitive to mislabelled pixels [126,127]. The results also showed that, in our case-having used the same set of training and validation data-the poor performance of SVM may be due to the remarkably heterogeneous nature of the identified plots. It is important to highlight that the obtained results were strongly linked to the spatial resolution of the available data in the GEE catalogue. It has been demonstrated that the results can be powerfully improved when using higher-resolution sensors [128].
In general, all classifiers produced an acceptable accuracy value, demonstrating the reliability of GEE as a tool enabling access to bulk data efficiently, which can also perform the entire classification process.  In general, all classifiers produced an acceptable accuracy value, demonstrating the reliability of GEE as a tool enabling access to bulk data efficiently, which can also perform the entire classification process.

Conclusions
The main aim of this work was to test the potential of the GEE platform and Sentinel-2 time-series to classify LC in a mountainous natural national park, which included Natura 2000 sites hosting relevant protected forest habitats. Behind the entire work was the idea to exploit GEE to obtain a solid data collection, investigating how different uses and combinations of variables can improve the classification performance, without needing to rely on third-party software. Furthermore, the performance of different classification algorithms, in terms of classification accuracy, was compared and evaluated. By completing the entire classification process in a heterogeneous protected forest environment, such as that of the study area (Aspromonte National Park), we showed the reliability and versatility of GEE. Moreover, by efficiently managing a massive amount of RS data, thanks to its cloud architecture, it avoids using external and often commercial (i.e., expensive) software. In this work, high accuracy was reached thanks to the careful process of training and validation points collection, carried out by exploiting the opportunity of using different seasonal IC, and to the various iterations carried out during the trial-and-error approach to achieve the best input image. It highlighted the possibility to implement the entire classification process in this cloud environment, thus enabling worldwide dissemination of useful primary data for decision-making and planning processes, even in those countries with limited access to the most current technological resources. The only limitations remain the need for a good internet connection and the limited availability of data with suitable resolution in the catalogue, whose improvement in this direction is desirable for both researchers and practitioners.