Mapping Land Cover Types for Highland Andean Ecosystems in Peru Using Google Earth Engine

: Highland Andean ecosystems sustain high levels of ﬂoral and faunal biodiversity in areas with diverse topography and provide varied ecosystem services, including the supply of water to cities and downstream agricultural valleys. Google (™) has developed a product speciﬁcally designed for mapping purposes (Earth Engine), which enables users to harness the computing power of a cloud-based solution in near-real time for land cover change mapping and monitoring. We explore the feasibility of using this platform for mapping land cover types in topographically complex terrain with highly mixed vegetation types (Nor Yauyos Cochas Landscape Reserve located in the central Andes of Peru) using classiﬁcation machine learning (ML) algorithms in combination with different sets of remote sensing data. The algorithms were trained using 3601 sampling pixels of (a) normalized spectral bands between the visible and near infrared spectrum of the Landsat 8 OLI sensor for the 2018 period, (b) spectral indices of vegetation, soil, water, snow, burned areas and bare ground and (c) topographic-derived indices (elevation, slope and aspect). Six ML algorithms were tested, including CART, random forest, gradient tree boosting, minimum distance, naïve Bayes and support vector machine. The results reveal that ML algorithms produce accurate classiﬁcations when spectral bands are used in conjunction with topographic indices, resulting in better discrimination among classes with similar spectral signatures such as pajonal (tussock grass-dominated cover) and short grasses or rocky groups, and moraines, agricultural and forested areas. The model with the highest explanatory power was obtained from the combination of spectral bands and topographic indices using the random forest algorithm (Kappa = 0.81). Our study presents a ﬁrst approach of its kind in topographically complex Cordilleran terrain and we show that GEE is particularly useful in large-scale land cover mapping and monitoring in mountainous ecosystems subject to rapid changes and conversions, with replicability and scalability to other areas with similar characteristics.


Introduction
The interaction between conservation and the use of natural resources in and around protected areas characterized by multiple conflicting land uses arises from multiple interacting causes and requires continuous monitoring of land cover associated with land use changes and conversions [1]. In order to understand and quantify the resulting landscape dynamics and patterns, it is necessary to have large-scale and repeat information on the integrity of ecosystems for the implementation and evaluation of management policies of natural resources [2]. In addition, land use and land cover (LULC) mapping of mountain ecosystems is very important for researchers and governments, given that these are global biodiversity hotspots that support the livelihoods of billions of people worldwide, and the sustainable management of these ecosystems is listed on the United Nations 2030 Agenda for sustainable development under the Sustainable Development Goal (SDG) 15.4 [3]. However, given the topographic, abiotic and biotic complexity and heterogeneity of mountain ecosystems globally, it remains challenging to quantify the dynamic nature of LULC [4], which leads to structural limitations in achieving SDG 15.4 and creating sustainable protection strategies [5]. As such, in-depth assessments of mountain ecosystem integrity and stability at regional scales continue to provide value towards achieving a global framework that can be applied towards monitoring progress towards SDG 15.4. High-elevation landscapes in Peru and across the Andean Cordillera present complex topography, altitudinal and environmental gradients that contain a variety of ecosystems and vegetation communities; more broadly characterized by multiple conflicting land uses (protected areas, grazing, agriculture, mining), with implications for water provision and the maintenance of other ecosystem services needed for agriculture and water supply for both local and coastal populations [6]. Across these high-value, sensitive ecosystems, it is becoming ever more important that stakeholders and decision-makers have the ability to monitor LULCC in near-real time and take advantage of the computational advances in geospatial technologies [7].
Optical remote sensing data represent a feasible and economical option for the continuous monitoring of land cover changes, conversions and transitions [8,9] given its ability to collect data remotely, at consistent temporal scales. When working with mid-resolution satellite imagery, most of the pixels contain a mixture of the reflected radiance of different land cover types [10], and it is very difficult to find fully homogeneous land cover pixels in semi-arid regions, especially for rare and degraded vegetation types growing in areas that are hard to access [11]. In addition, a diverse topography reduces the accuracy of land cover classification in complex terrain because it produces variations in surface illumination between shaded areas and those receiving direct sunlight, as result the reflectance values of land cover vary greatly within classes [12]. One approach to overcome these issues is to use advanced classification methods based on machine learning (ML) that can process high-dimensional data to avoid overfitting. ML has been used to map vegetation in mountains because they have good potential for accurately classifying natural land cover types [13,14]. Supervised classification is one of the most essential and classical machine learning techniques in remote sensing with the goal of optimizing the performance criteria of prediction using data samples and learning rules from a training set, that then get generalized to the total data set even on high-dimensional, complex data [15].
Furthermore, the increasing accessibility of spatial imagery and the development of cloud-based spatial analysis computational frameworks such as GEE enable the development and implementation of spatial analysis, such as land cover classification even in contexts where processing such large volumes of data had been previously prohibitive [16]. GEE's programming interface allows users to define, create and run custom algorithms based on their analysis needs [17]. GEE is a platform for geospatial and multitemporal analysis on a planetary scale, which by incorporating the computational capabilities of Google, represents a free access tool for the analysis of a wide variety of socio-environmental problems with great usability especially in less-developed country contexts [18]. GEE's catalog includes the complete archive of Landsat satellite imagery, Sentinel-1, 2 and 3 imagery, Moderate Resolution Imaging Spectrometer (MODIS), National Oceanographic and Atmospheric Administration Advanced Very High-Resolution Radiometer (NOAA AVHRR) imagery, as well as climate, topographic and land cover data, among others [18,19]. The existence of these freely available resources is especially critical for researchers in developing countries where the computational infrastructure used to be and oftentimes continues to be a limiting factor. GEE has been shown to provide reliable results when investigating ecological restoration policies [2], the conservation status of mountain ecosystems [5], high-alpine rangeland ecosystem change trajectories [20], plant functional types [13] and for mapping vegetation and land use types [21].
Since 1972, Landsat satellite missions have continuously provided a historical record of imagery that forms the basis for extracting land use and land cover data at global, regional and local scales [22]. The characteristics of Landsat images such as their spatial resolution of 30 m and their temporal resolution of 16 days [22], as well as their vast historical record have established the Landsat archive as a leader in mapping and monitoring of land cover and the ease of retrieval of this deep imagery data set via GEE is now revolutionizing how we approach mapping and monitoring LULC distribution [21,23].
The current study aimed to design and implement a repeatable and transferable GEE approach for mapping land cover classes in the Nor Yauyos Cochas Landscape Reserve (RPNYC acronym in Spanish), using remote sensing data derived from Landsat 8 OLI imagery and topographic-derived indices from ALOS PALSAR DEM. In this context, we demonstrate a first attempt at using remote sensing technologies that rely on optical sensors for mapping vegetation cover in Andean highland ecosystems employing cloud computing implemented in the Google Earth Engine (GEE) and machine learning (ML).

Study Area
RPNYC is located in the central Andes of Peru, in the regions of Lima and Junín ( Figure 1). The RPNYC has an area of 221,268.48 ha and is surrounded by a buffer zone of 109,503.20 ha, a significant area to monitor in the absence of remotely sensed data. In total, 62.1% of the surface of the reserve is located in the Lima region, Yauyos Province, while 37.9% is located in the Junín region [24]. oftentimes continues to be a limiting factor. GEE has been shown to provide reliable results when investigating ecological restoration policies [2], the conservation status of mountain ecosystems [5], high-alpine rangeland ecosystem change trajectories [20], plant functional types [13] and for mapping vegetation and land use types [21]. Since 1972, Landsat satellite missions have continuously provided a historical record of imagery that forms the basis for extracting land use and land cover data at global, regional and local scales [22]. The characteristics of Landsat images such as their spatial resolution of 30 m and their temporal resolution of 16 days [22], as well as their vast historical record have established the Landsat archive as a leader in mapping and monitoring of land cover and the ease of retrieval of this deep imagery data set via GEE is now revolutionizing how we approach mapping and monitoring LULC distribution [21,23].
The current study aimed to design and implement a repeatable and transferable GEE approach for mapping land cover classes in the Nor Yauyos Cochas Landscape Reserve (RPNYC acronym in Spanish), using remote sensing data derived from Landsat 8 OLI imagery and topographic-derived indices from ALOS PALSAR DEM. In this context, we demonstrate a first attempt at using remote sensing technologies that rely on optical sensors for mapping vegetation cover in Andean highland ecosystems employing cloud computing implemented in the Google Earth Engine (GEE) and machine learning (ML).

Study Area
RPNYC is located in the central Andes of Peru, in the regions of Lima and Junín (Figure 1). The RPNYC has an area of 221,268.48 ha and is surrounded by a buffer zone of 109,503.20 ha, a significant area to monitor in the absence of remotely sensed data. In total, 62.1% of the surface of the reserve is located in the Lima region, Yauyos Province, while 37.9% is located in the Junín region [24].  RPNYC was established by the National Natural Protected Areas Department (SER-NANP) in 2001 with the dual aim of preserving its ecological and cultural value, and promoting a harmonious and economically sustainable relationship with the rural communities surrounding it [25]. The reserve supports great biodiversity of flora and fauna, including environmentally sensitive grasslands, wetlands and high Andean forests, used by the local population for various agricultural activities [25], under regulation and supervision by SERNANP [26]. Water supply to the cities and agricultural valleys downstream represents one of the key ecosystem services supplied by RPNYC. Approximately 11 million people, including residents of the capital Lima, depend on the water originating in this reserve [26].
The RPNYC has an altitudinal gradient that ranges from 2630 to 5660 m above sea level (m.a.s.l). The physiography is dominated by landscapes of high mountains whose slopes present a very rugged topography, strongly divided by channels and deep valley bottoms. RPNYC covers six distinct biomes, with moisture regimes ranging from arid to rainforest, home to ten types of vegetation covers and a total of 330 plant species, such as trees, shrubs, grasses, succulents and epiphytes. The fauna consists of 75 species of birds, 15 species of mammals, 4 reptile species, 1 amphibian and 2 native fish species [24].
Extensive breeding and grazing of domestic livestock such as sheep, camelids, cattle and horses constitutes the main agricultural land use in the reserve. In the surrounding buffer communal lands, the sustainable management of vicuñas (Vicugna vicugna) is carried out, a wild species of South American camelid, of important cultural value for local populations, cataloged by the International Union for Conservation of Nature (IUCN) as Least Concern. Subsistence agricultural activities are carried out in the valley bottoms and gentler mountain slopes by means of terraced crops, similar to densely populated urban centers downstream [24].
The climate varies with altitude, generating multiple microclimates, with periods of rain between November and March (minimum of 2.05 mm / day in November and maximum of 4.48 mm/day in February), a transition period from April to October (1.68 mm/day in April and 1.90 mm/day in October) and a dry season between May and August (0.46 mm/day in August and 0.23 mm/day in July). The average temperature ranges from 3 to 7.5 • C, with lowest temperatures between May and August, and frost events between July and August [27].
Data from vegetation censuses and in-country assessments identify five native and endemic vegetation types of high biodiversity value: Bofedales, tall grass (tussock grassdominated cover), Puna mat grass, bushes and relict forests [24,28]. High-altitude areas lacking vegetation cover were classified as: bare soil, moraines, rocky areas, lagoons and glaciers and finally, areas of agricultural and urban land use were divided into reforested, agricultural and urban areas.

Methodological Framework
The methodological framework employed in this study is presented in Figure 2, and described in the following six methods subsections: Landsat image preprocessing (Section 2.3), training and validation sample collection (Section 2.4), classification feature input (Section 2.5), separability analysis (Section 2.6), classifiers (Section 2.7), accuracy assessment (Section 2.8) and calculation of land cover extents (Section 2.9).

Landsat Image Preprocessing
Due to the number of spectral bands, Landsat 8 Operational Land imager (OLI) surface reflectance images were used as the prime source of data for the study. A search for all USGS Surface Reflectance (SR) images from the Landsat 8 OLI/TIRS sensor covering RPNYC reserve (located within Worldwide Reference System 2, path 6, row 68, path 6, row 69, path 7, row 68, path 7, row 69) from January to June 2018 according to the field data collection dates was performed within the GEE's data pool. Pre-processing of Landsat 8 OLI/TIRS images consisted of application of quality bands to mask cloud and cloud shadows contamination, using a CFmask function [29] and verified with a final visual test. The composite imagery used in the ML classifications was assembled from 31 scenes (see Supplemental Materials, Table S1), through a median reducer based on GEE tools, to use the more frequent pixel value in the stack.

Landsat Image Preprocessing
Due to the number of spectral bands, Landsat 8 Operational Land imager (OLI) surface reflectance images were used as the prime source of data for the study. A search for all USGS Surface Reflectance (SR) images from the Landsat 8 OLI/TIRS sensor covering RPNYC reserve (located within Worldwide Reference System 2, path 6, row 68, path 6, row 69, path 7, row 68, path 7, row 69) from January to June 2018 according to the field data collection dates was performed within the GEE's data pool. Pre-processing of Landsat 8 OLI/TIRS images consisted of application of quality bands to mask cloud and cloud shadows contamination, using a CFmask function [29] and verified with a final visual test. The composite imagery used in the ML classifications was assembled from 31 scenes (see Supplemental Materials, Table S1), through a median reducer based on GEE tools, to use the more frequent pixel value in the stack.

Training and Validation Sample Collection
Ground reference data on vegetation composition within accessible regions of the RPNYC were collected during and after the rainy season 2018 (Jan-June). Relatively homogeneous areas of 90 × 90 m 2 were chosen as survey plots, on the ground, based on accessibility and visual inspection of patches connected by ranger trails to ensure sampling within the fourteen land cover classes shown in Figure 1.
At every plot, the dominant vegetation land cover class (Table 1) was determined based on dominant species cover through a rapid assessment process developed by the Rangeland Ecology ad Utilization Laboratory at the National Agrarian University La Mo-Spatial subset Landsat

Training and Validation Sample Collection
Ground reference data on vegetation composition within accessible regions of the RPNYC were collected during and after the rainy season 2018 (Jan-June). Relatively homogeneous areas of 90 × 90 m 2 were chosen as survey plots, on the ground, based on accessibility and visual inspection of patches connected by ranger trails to ensure sampling within the fourteen land cover classes shown in Figure 1.
At every plot, the dominant vegetation land cover class (Table 1) was determined based on dominant species cover through a rapid assessment process developed by the Rangeland Ecology ad Utilization Laboratory at the National Agrarian University La Molina, Peru (LEUP-UNALM acronym in Spanish). All the survey plot locations were recorded with a global navigation satellite system receiver (Garmin 62S) and subsequently checked for accuracy.
In order to complete training data sets for land cover classes, we used other reference sources such as the 2015 national vegetation map [30] (rocky areas, glaciers and bare soil classes), Google Earth imagery (pan-sharpened QuickBird, GeoEye and WorldView-2 imagery) and visual interpretation.
Training samples were generated from a total of 3601 pixels available (75% from field data, and 25% from Google Earth imagery and the 2015 national vegetation map) and the samples were split into a validation data set, defined by a stratified random sampling method for each cover class, to at least 30% of the total points. This procedure ensured independence between the training and validation data [31]. Table 1. Vegetation land cover classes identified in RPNYC [24,25].

Land Cover Description
Wetlands Areas with excess of water from high rates of orographic precipitation, dominated by cushion plants, distributed in low-slope areas and above 3800 m.a.s.l.

Tall grass
Areas composed chiefly of perennial grasses such as Festuca, Poa, Stipa and Calamagrostis species, distributed along the slopes and tops of hilly and mountains landscapes, between an altitude range of 3900 to 4900 m.a.s.l.

Short grass
Areas dominated by dwarf herbaceous forbs and cushion plants growing in areas of moderate water content. Erect shrubs, mosses and lichens are of minor importance. Spatial distribution similar to tall grass land cover.

Shrublands
Communities conformed by species of Baccharis and Parastrephia, distributed between an altitude range of 2750 to 3800 m.a.s.l.

Andean forest
Areas generally relegated to rocky slopes or ravines dominated by shrubs of the genus Polylepis associated with the genera: Buddleja, Clethra, Gynoxys, Podocarpus or Prumnopitys, distributed between a wide altitude range of 600 to 4100 m.a.s.l.

Classification Feature Input
Due to the extreme elevation range and steep slopes and its interaction with the vegetation cover in the RPNYC reserve, a reflectance normalization process was applied to Landsat 8 images by dividing each reflectance band by the reflectance sum of all bands [21,32], using band variables from Band 2 to Band 7.
Eight spectral indices were derived from the reflectance normalized Landsat spectral bands as shown in Table 2. Table 2. Extracted spectral indices from Landsat 8 imagery.

Bands
Wavelength (µm) NDVI is one of the most widely used vegetation indices thanks to its ability to discriminate vegetation/non-vegetation covers [33], while EVI can be useful in areas of high leaf area index (LAI) where NDVI may saturate [34]. MSAVI minimizes the effect of bare soil in the computation of the Soil-Adjusted Vegetation Index (SAVI) [35], whereas NWDI is useful for discriminating water from land due to its sensitivity to open water. [36] NDSI is related to the presence of snow in a pixel while ignoring cloud cover [37], NBR is designed to highlight burned areas and estimate fire severity [38], while NDBI highlights urban areas where there is typically a higher reflectance in the shortwave-infrared part of the spectrum [39]. Finally, SVVI is used to discriminate forest, secondary forest, savanna and agriculture classes that would otherwise be difficult to differentiate spectrally [40].
Given the importance of topographic variables in classifying vegetation cover classes, we utilized the ALOS PALSAR DEM data (30 m) [41] to derive slope (degrees, ranging from 0 to 90 • ), aspect (0 to 360 • ), transformed by taking the trigonometric sine values of aspect to avoid circular data values [42]. Sine values of aspect represent the degree of east-facing slopes, as the values range from 1 (i.e., east-facing) to -1 (west-facing).
From the Landsat data and ALOS PALSAR topographic derivatives, we defined six sets of stacked imagery for predictor inputs, in the following order: spectral bands; spectral indices; spectral bands + spectral indices; spectral bands + topographic-derived indices; spectral indexes + topographic-derived indices; and spectral bands + spectral indices + topographic-derived indices that were then used to implement classifiers on.

Separability Analysis
The spectral separability analysis was performed for all classification data set inputs, and the range of values generated by this method is 0.0 to 2.0, where a value of 0.0 means that the pair of classes have a perfect resemblance and the value of 2.0 indicates that the class pairs are completely different. Ideally, the inter-class separability value in the classification scheme used is more than 1.9. It is calculated using the followings formulas [43].
where x is first spectral signature vector; y is second spectral signature vector; Σx is covariance matrix of sample x; and Σy is covariance matrix of sample y. The Jeffries-Matusita distance is asymptotic to 2 when signatures are completely different, and tends to 0 when signatures are identical.

Classifiers
A variety of machine learning image classification methods have been used to map vegetation type on large mountain areas, such as, support vector machines (SVM) [44], decision tree (i.e., CART), naïve Bayes and random forest (RF) classifiers [21,45].
SVM is used for classification to determine a hyperplane that can divide training data into a predetermined number of categories, maximizing the distance from the data points of classes to an optimal separation vector of a hyperplane created from the variables [46].
Decision tree classifiers use variables subsequently to find split (partitioning) points and try to split subsets further with other variables, etc. Groups of pixels can be further divided based on tree growing and pruning parameters, until optimal classification is achieved [14].
Random Forest [47] classifiers construct multiple decision trees that are sampled independently during training, typically improving classification by vote results compared to a single decision tree model.
To implement classification, we used GEE's platform-a web-based IDE-through JavaScript Code Editor [18], which offers a variety of geospatial analysis tools and visualization interface. In these environments, we use six supervised machine learning classification algorithms available [48] shown in Table 3, and applied iteratively to all six data set stacks as indicated in Section 2.4.
For random forest and gradient tree boosting, we defined the number of generated decision trees of 100 leaving the other parameters by default; and for the other classifiers we used the default configuration. A total of 36 models were built between the combination of classifiers and input data.  [53] Note: Examples in bold represent machine learning algorithms used in GEE.

Accuracy Assessment
A confusion matrix and the accuracy assessment of each resulting classification output for the six computed algorithms on the six sets were computed using the 'caret' [54] and 'diffeR' [55] packages in R. We calculated the overall accuracy (OA), Kappa coefficient (K), producer accuracy (PA) and user accuracy (UA), complemented with a quantity disagreement (Q) and allocation disagreement (A) assessment [56]. We used these metrics combined to assess the quality of the resulting land cover maps. The OA determines the overall efficiency of an algorithm and is measured by dividing the total number of correctly classified samples by the total number of testing samples.
The K indicates the degree of agreement between the ground truth data and the predicted values, while the PA measures how well a pixel has been classified and includes the error of omission (the proportion of observed features on the ground that are erroneously excluded from a class). The UA measures the reliability of the map, informing how well the map represents what is on the ground and it includes the error of commission which refers to pixels erroneously included in a class [57].
Q is the amount of difference between the reference data and a comparison map that is due to the less than perfect match in the proportions of the categories, and A is the amount of difference between the reference data and a comparison map that is due to the less than optimal match in the spatial allocation of the categories, given the proportions of the categories in the reference and comparison maps (e.g., classifying built-up in a location where built-up development is not observed) [56].
Finally, we used variable importance which considers that every time a split of a node is made on a variable, the impurity criterion for the two descendent nodes is less than the parent node and adding up the decreases for each individual variable over all trees in the forest gives a fast variable importance that is often very consistent with the permutation importance measure. That provided us an additional means of assessing how each predictor variable enabled accuracy improvements in the optimized land cover classification model, in terms of a normalized percentage contribution.

Calculation of Land Cover Areas/Extents
During the final step, we created a mask of areas that were excluded from the analysis based on CFmask filter [29], which glacier pixels were misclassified as clouds, reclassifying them as glacier pixels. This process was performed using the "raster" package [58] in R 3.6 software. Then, we tabulated pixels by land cover class and estimated the cover area adjusted to the total area for the core and buffer zone of the RPNYC. The area results were compared with a published map from the environment minister in 2015 expressed in ha for all the RPNYC.

Separability Analysis
Considering the different input data sets, there are a small improvement on JM distance when topographic indices are added for all the land cover classes (Figure 3), showing the better separability when all the variables are included.
Considering the different input data sets, there are a small improvement on JM distance when topographic indices are added for all the land cover classes (Figure 3), showing the better separability when all the variables are included.
Due to the landscape heterogeneity, we consider that the JM distance has acceptable levels for the separability of the training areas, the next step is to conduct the classification process. Overall, class separability is adequate and would provide a fairly accurate classification.  Due to the landscape heterogeneity, we consider that the JM distance has acceptable levels for the separability of the training areas, the next step is to conduct the classification process. Overall, class separability is adequate and would provide a fairly accurate classification. Table 4 provides the accuracy estimates of the iterations between the ML classifiers available in GEE and the six sets of stacked imagery for predictor inputs of the study area. The results showed that for all the sets of iterations, the random forest classifier yielded the better performance (0.81), followed by gradient tree boosting; additionally, the accuracy increases dramatically for all the models when topographic-derived indices are added as predictors. There are no marked observed differences between predictors based on spectral bands and spectral indices separately. The lowest performance were the naïve Bayes and minimum distance classifiers. Table 5 depicts the confusion matrix obtained from the analysis of the validation data set for the best model identified (RF: spectral bands + topographic indices). Overall, the classification between vegetation land cover classes and non-vegetation is similar. In particular, the comparison between wetlands, short grasses and tall grasses presents more confusion errors. In contrast, the classification accuracies of water bodies and built-up areas are higher (both user's and producer's accuracies) than the other classes.

Accuracy Assessment Results
The integration of topographic-derived indices resulted in improvements in the Kappa coefficient for all models (0.08-0.61) and was the most important variable in predicting land cover types (elevation and slope), as shown in Figure 4, for the best three classification models (random forest).

Classification Result
Based on the previous results, we selected the best classification map and the spatial distribution of each land cover class is shown in Figure 5. We can recognize the spatial distribution of different land covers related to elevation, slope and aspect gradients, agriculture areas are distributed in the north and south side of the RPNYC in relative lowland areas, grazing lands dominates the high elevation around glaciers and water bodies in the basins head. Pixels classified as shadows were reclassified as NoData and not displayed on the final map. reference data. Off-diagonal elements in rows and columns, respectively, capture errors of com sion and omission. UA: user's accuracy; PA: producer's accuracy; OA: overall accuracy; K: K coefficient; Q: Quantity disagreement; A: Allocation disagreement. WT-Wetlands; TG-Tall SG-Short grass; SR-Shrublands; AF-Andean forest; BS-Bare soil; MR-Moraine; RC-R WB-Water bodies; GL-Glacier; SH-Shadows, FA-Forested areas; CR-Croplands; BUup.
The integration of topographic-derived indices resulted in improvements i Kappa coefficient for all models (0.08-0.61) and was the most important variable in dicting land cover types (elevation and slope), as shown in Figure 4, for the best classification models (random forest).

Classification Result
Based on the previous results, we selected the best classification map and the sp distribution of each land cover class is shown in Figure 5. We can recognize the sp distribution of different land covers related to elevation, slope and aspect gradients, culture areas are distributed in the north and south side of the RPNYC in relative low areas, grazing lands dominates the high elevation around glaciers and water bodies basins head. Pixels classified as shadows were reclassified as NoData and not disp on the final map. Regarding the distribution of the land covers for the two best maps for 2018 (Table 6), the reserve is dominated by a combination of tall grasses and short grasses along the core and buffer zone (56.26% and 55.06%, respectively), followed by a combination of no vegetation covers such as bare soil, moraine and rocky (28.88% and 31.76 %, respectively). Wetlands, glaciers and water bodies extend more in the buffer zone than in the core area, in contrast to Andean forests and shrublands which are more characteristic in the core zone (6.38%) than in the buffer zone (2.80%). With respect to forested and agricultural areas (0.93%), both are concentrated around built-up areas and along rivers into the core and buffer zones and represent 0.14% of the total surface of RPNYC.  Regarding the distribution of the land covers for the two best maps for 2018 (Table  6), the reserve is dominated by a combination of tall grasses and short grasses along the core and buffer zone (56.26% and 55.06%, respectively), followed by a combination of no vegetation covers such as bare soil, moraine and rocky (28.88% and 31.76 %, respectively). Wetlands, glaciers and water bodies extend more in the buffer zone than in the core area, in contrast to Andean forests and shrublands which are more characteristic in the core zone (6.38%) than in the buffer zone (2.80%). With respect to forested and agricultural areas (0.93%), both are concentrated around built-up areas and along rivers into the core and buffer zones and represent 0.14% of the total surface of RPNYC.

Discussion
The use of cloud computing platforms for geospatial big data analysis, such as Google Earth Engine, provide multitemporal data sets of satellite imagery and preprocessed spatial models and allow us to build specific workflows, in order to obtain detailed maps in short timeframes relative to traditional computing approaches, such as [59] who used >650,000 Landsat scenes and >1million central processing unit (CPU) hours to produce annual maps of tree cover gain and loss, performing these computations in just several days.
Our results show that is feasible to use the open-access GEE platform to implement land cover mapping in complex Andean ecosystems, based on composite multitemporal mosaics and field data information using machine learning classifiers, without having to purchase or download data and software [2,5,13,20,21].
These characteristics confer significant advantages that reduce the learning gap to process spatial data for technicians and resource managers for the task of spatially monitoring protected areas in an effective way. Another advantage is the feasibility to share the GEE code and apply them to another areas, with minimum modifications, considering a growing community of user's support, and discussion forums, with the only critical requirement being a stable internet connection.
For the more general purpose of updating land cover maps using visual interpretation of satellite imagery or semi-automatic updating through GIS analysis, it is feasible if the study area is not extensive [31]; however, natural protected areas are generally large areas. Then, GEE is particularly useful for these purposes.
In this study, we show that the inclusion of topographic indices improved the discrimination of the various cover classes by minimizing the terrain shadow effects and greatly decreasing the misclassification between water bodies and shadows. Thus, the overall classification accuracy of the land cover spatial models was improved when topographic indices were combined with spectral bands or spectral indices separately. However, no improvement was observed in the classification accuracy when spectral bands and indices are combined together, except the effect of topographic indices that can improve classification accuracy in general [2,60].
The best classifiers tested in this study were able to consistently minimize the effects of clouds and the derived no data pixels, and the accuracy metrics for the select model outperformed coincides with other studies that used machine learning classifiers, especially RF in distinguishing several vegetation types for heterogeneous and large regions such as RPNYC [21,61]. RF can cope with high-dimensional problems very easily thanks to the pruning strategy and is very beneficial by alleviating the often-reported overfitting problem of simple decision trees [62].
As shown in Table 5, confusion errors occurred among all classes, except water bodies. Notably, the highest confusion was found between tall and short grasses. This is particularly severe since both have very similar ecological and visual characteristics and are dominated by ecologically similar vegetation types of grasses [63]. In the field, we noted that the wetland class was found adjacent to tall grass and short grass areas, forming transition zones, which adds errors to the classification. Confusion was also common among moraines and glaciers, which is attributable to the proximity of both classes and the loss of glacial coverage [64].
Another consideration when interpreting the classification accuracies is the availability of the training samples for the supervised classification. As shown in Table 5, for example, wetlands have a larger number of training polygons compared to the forested areas class or croplands. This is because the RPNYC, since its foundation in 2001, has not allowed the expansion of the agricultural frontier and reforestation with exotic species [28]. On the other hand, the shadow class is usually found in physically smaller and inaccessible area relative to other classes, for example, in canyons.
Given that persistent cloud cover is an issue for large, high-altitude areas, we show that the issue can effectively be minimized using the multi-temporal composite and cloud masking approach such as CFmask function [29].
We find differences between the land cover selected map and an official land cover map [30], which was built based on 2011 Landsat 4-5 TM imagery. The principal differences are related to the information classes and proportion of areas, for example tall and short grasses are considered as one class, and the area of these two classes in our selected maps represents 75% of the official map. Similarly, we detected an area five times larger for wetlands. However, we found some similarities with respect to the water bodies and glacier extents.
The main limitations of this work mostly pertain to uncertainties between the Landsatderived land cover maps and the reference data. The reference data samples that were utilized during the classifier training and testing phases were limited in quantity due to the remote accessibility. Another major challenge we encountered was the mismatch in time between the available cloud-free Landsat images and the reference data. The MINAM reference map was produced three years earlier than the study period (2018) and used 657 verification points and 415 control points for the entire country, so the comparison with the maps generated from our work is tenuous. The field survey and the high spatial resolution reference images retrieved from Google Earth (2017) are minimally a year apart from the study period. This created difficulty in analyzing the classification products in conjunction with the available reference data set.

Conclusions
Our results demonstrate that the use of cloud-based computing and machine learning feasibly allows us to build an open-access, accurate and relatively gap-free approach for land cover mapping in a heterogeneous and large Andean ecosystem using freely available Landsat imagery. This approach can be utilized to advance natural resource planning and management in less-developed country contexts and is also useful in informing longterm monitoring of vegetation in remote protected areas by establishing key land cover classes for any reference year. Furthermore, establishing the extent and distribution of land cover classes along with their spectral and topographic characteristics may allow us to extrapolate these classifications to non-protected areas in similar ecosystems. Our work highlights the importance of the inclusion of topographic-derived indices, in the differentiation of complex land covers distributed in diverse and highly heterogeneous topography to improve classification accuracies unlike the inclusion of spectral indices. Finally, we established an approach for generating a reliable land cover map in a protected reserve most representative of Andean ecosystems, thus establishing a baseline for future change analyses and environmental planning. This work has implications for achieving SDG 15.4 focused on monitoring and assessing changes in mountain ecosystems and for the long-term monitoring and assessment of environmental changes and conservation efforts in mountains.