Using UAV Imagery to Detect and Map Woody Species Encroachment in a Subalpine Grassland: Advantages and Limits

: Woody species encroachment on grassland ecosystems is occurring worldwide with both negative and positive consequences for biodiversity conservation and ecosystem services. Remote sensing and image analysis represent useful tools for the monitoring of this process. In this paper, we aimed at evaluating quantitatively the potential of using high-resolution UAV imagery to monitor the encroachment process during its early development and at comparing the performance of manual and semi-automatic classiﬁcation methods. The RGB images of an abandoned subalpine grassland on the Western Italian Alps were acquired by drone and then classiﬁed through manual photo-interpretation, with both pixel- and object-based semi-automatic models, using machine-learning algorithms. The classiﬁcation techniques were applied at different resolution levels and tested for their accuracy against reference data including measurements of tree dimensions collected in the ﬁeld. Results showed that the most accurate method was the photo-interpretation ( ≈ 99%), followed by the pixel-based approach ( ≈ 86%) that was faster than the manual technique and more accurate than the object-based one ( ≈ 78%). The dimensional threshold for juvenile tree detection was lower for the photo-interpretation but comparable to the pixel-based one. Therefore, for the encroachment mapping at its early stages, the pixel-based approach proved to be a promising and pragmatic choice.


Introduction
Land-use and climate change are the main processes predicted to have major effects on biodiversity and the functioning of terrestrial ecosystems [1,2]. In montane and subalpine grasslands, below the treeline, land abandonment is the main driver of shrub and tree colonization [3]. In Europe, semi-natural grasslands, mainly used for foraging purposes, have been prevalent on south-facing slopes and managed to prevent tree establishment [4][5][6]. In the last few decades, socio-economic transformations in the Alps have led, in some areas, to the decline of livestock farming and agriculture and consequently to land abandonment [7,8]. This process opened the way to woody shrub and tree encroachment, which, in turn, has the potential to modify the ecosystem structure and function, with both negative and positive consequences for ecosystem services [9][10][11] and biodiversity [12,13]. Generally, the woody encroachment of grasslands causes an alteration of microclimatic conditions and a reduction of herbaceous cover, which contribute to the consequent decrease of species richness in the long-term, e.g., [12][13][14]. Nevertheless, this observation depends on the encroachment stage and the relative woody cover. In fact, many studies observed a peak in plant species diversity at intermediate levels of woody cover, especially during the so-called In order to evaluate quantitatively the potential and the advantages of using highresolution UAV data to monitor the encroachment process during the early stages of its development in a subalpine grassland, in this paper, our aim was to answer the following questions: (1) What is the accuracy of manual photo-interpretation during the early phases of the encroachment? (2) Can pixel-and/or object-based semi-automatic classification methods efficiently replace manual photo-interpretation? (3) What is the size of the smallest tree colonizing the grassland that can be detected by different classification methods? (4) Is it possible to detect a short-term (i.e., less than 10 years) temporal pattern of the encroachment through manual and semi-automatic classification?

Study Area and Vegetation Characteristics
The study was carried out in a subalpine grassland in the north-western Italian Alps. The site is an abandoned pasture that was extensively grazed until 2007 and is located in the Aosta Valley region, at a short distance from the village of Torgnon, at an elevation of 2160 m asl ( Figure 1). The terrain is characterized by an average slope of 4 • , which is south facing (195 • ), and the soil is classified as Cambisol (FAO/ISRIC/ISS). Furthermore, the site is characterized by an intra-alpine semi-continental climate, with a mean annual temperature of 3.1 • C and a mean annual precipitation of 880 mm. On average, from the end of October to late May, the site is covered by a thick snow cover (90 to 120 cm) which limits the growing period to about 5 months during the year [61].
Remote Sens. 2021, 13, x FOR PEER REVIEW 4 of 24 about the effects of climatic fluctuations on the encroachment rate, because of the lack of knowledge regarding the early stages of this ecological process in different climate and vegetation types. In order to evaluate quantitatively the potential and the advantages of using highresolution UAV data to monitor the encroachment process during the early stages of its development in a subalpine grassland, in this paper, our aim was to answer the following questions: (1) What is the accuracy of manual photo-interpretation during the early phases of the encroachment? (2) Can pixel-and/or object-based semi-automatic classification methods efficiently replace manual photo-interpretation? (3) What is the size of the smallest tree colonizing the grassland that can be detected by different classification methods? (4) Is it possible to detect a short-term (i.e., less than 10 years) temporal pattern of the encroachment through manual and semi-automatic classification?

Study Area and Vegetation Characteristics
The study was carried out in a subalpine grassland in the north-western Italian Alps. The site is an abandoned pasture that was extensively grazed until 2007 and is located in the Aosta Valley region, at a short distance from the village of Torgnon, at an elevation of 2160 m asl ( Figure 1). The terrain is characterized by an average slope of 4°, which is south facing (195°), and the soil is classified as Cambisol (FAO/ISRIC/ISS). Furthermore, the site is characterized by an intra-alpine semi-continental climate, with a mean annual temperature of 3.1 °C and a mean annual precipitation of 880 mm. On average, from the end of October to late May, the site is covered by a thick snow cover (90 to 120 cm) which limits the growing period to about 5 months during the year [61].  of Nardus stricta L., Festuca nigrescens All., Arnica montana L., Carex sempervirens Vill., Geum montanum L., Anthoxanthum alpinum L., Potentilla aurea L. and Trifolium alpinum L. This habitat occurs in almost all the EU member states, except for Estonia, Malta and Cyprus, and more than 40% of its area is located within the Alpine bioregion (Alps, Pyrenees and the Carpathian region), with 23% occurring in Italy [62]. As a consequence of land abandonment, in the last few decades, large areas of mountainous Nardus grasslands evolved into heath or shrub communities or were colonized by forests as secondary succession (i.e., woody plant encroachment) [62]. A simulation study conducted in a French subalpine grassland [27] predicted that abandoned grasslands could turn into woody communities within a few decades, with juveniles of Larix decidua Mill. appearing outside the initial population source after 20 years, and mature trees about 50 years after the land-use change.
Within the study area, the shrub species involved in the encroachment process are mainly Calluna vulgaris (L.) Hull, Juniperus communis L., Vaccinium myrtillus L. and V. uliginousm L. To a lesser extent, there are also individuals of Rhododendron ferrugineum L., Vaccinium vitis-idaea L. and Arcostaphylos uva-ursi (L.) Spreng. Regarding tree species, the area is being colonized only by Larix decidua dispersing from the surrounding forest. In this study, herbaceous and woody species were divided into four vegetation classes that were captured using high-resolution drone imagery: (1) larch trees, (2) brownish and (3) greenish shrubs, and (4) the area dominated by the herbaceous species characterizing the grassland community ( Figure 2). The study area is occupied by a species-rich Nardus grassland, a priority habitat (6230) listed in the Habitat Directive (92/43/CEE). The dominant vegetation of the area consists of Nardus stricta L., Festuca nigrescens All., Arnica montana L., Carex sempervirens Vill., Geum montanum L., Anthoxanthum alpinum L., Potentilla aurea L. and Trifolium alpinum L. This habitat occurs in almost all the EU member states, except for Estonia, Malta and Cyprus, and more than 40% of its area is located within the Alpine bioregion (Alps, Pyrenees and the Carpathian region), with 23% occurring in Italy [62]. As a consequence of land abandonment, in the last few decades, large areas of mountainous Nardus grasslands evolved into heath or shrub communities or were colonized by forests as secondary succession (i.e., woody plant encroachment) [62]. A simulation study conducted in a French subalpine grassland [27] predicted that abandoned grasslands could turn into woody communities within a few decades, with juveniles of Larix decidua Mill. appearing outside the initial population source after 20 years, and mature trees about 50 years after the land-use change.
Within the study area, the shrub species involved in the encroachment process are mainly Calluna vulgaris (L.) Hull, Juniperus communis L., Vaccinium myrtillus L. and V. uliginousm L. To a lesser extent, there are also individuals of Rhododendron ferrugineum L., Vaccinium vitis-idaea L. and Arcostaphylos uva-ursi (L.) Spreng. Regarding tree species, the area is being colonized only by Larix decidua dispersing from the surrounding forest. In this study, herbaceous and woody species were divided into four vegetation classes that were captured using high-resolution drone imagery: (1) larch trees, (2) brownish and (3) greenish shrubs, and (4) the area dominated by the herbaceous species characterizing the grassland community ( Figure 2).

Image Acquisition
RGB images were acquired in mid-October 2012 and 2018 because the senescent phase, which began in October and lasted for at least four weeks, was considered the best moment to achieve a good classification quality, according to the vegetation phenology. Indeed, at the onset of the growing season in May, the area was mainly grayish and brownish, except for evergreen species such as J. communis, whereas during most of the vegetation growth the whole area appeared green (Figure 3a). On the contrary, in October, during senescence, the highest rate of species differentiation occurred because larches became yellow, C. vulgaris and Vaccinium sp. were brownish, and J. communis remained a green and herbaceous species of the grassland, dominated by N. stricta, which appeared grayish (Figures 2 and 3b).

Image Acquisition
RGB images were acquired in mid-October 2012 and 2018 because the senescent phase, which began in October and lasted for at least four weeks, was considered the best moment to achieve a good classification quality, according to the vegetation phenology. Indeed, at the onset of the growing season in May, the area was mainly grayish and brownish, except for evergreen species such as J. communis, whereas during most of the vegetation growth the whole area appeared green (Figure 3a). On the contrary, in October, during senescence, the highest rate of species differentiation occurred because larches became yellow, C. vulgaris and Vaccinium sp. were brownish, and J. communis remained a green and herbaceous species of the grassland, dominated by N. stricta, which appeared grayish (Figures 2 and 3b).
Flight was carried out with a DJI™ Phantom 4 Pro equipped with a compact camera FC6310 (20 Mpixel; CMOS 1" sensor, FOV 84 • and focal length 8.8 mm/24 mm-35 mm format equivalent). The aspect ratio was kept at 3:2 (5472 × 3648 pixels). The images were captured on 4 October, between 1:30 and 1:45 PM. Flight planning was designed with 75% forward overlap and 70% side overlap at ground level. The photogrammetric block consisted of 12 strips (211 images), flown at about 30 m above ground level and at an average speed of 3 m/s, with a GSD of about 1.0 cm. Flight lines were oriented along the North-South direction. Each point in the whole area was covered by >9 images. For external orientation and bundle block adjustment, nine ground control points (GCPs) were Flight was carried out with a DJI™ Phantom 4 Pro equipped with a compact camera FC6310 (20 Mpixel; CMOS 1" sensor, FOV 84° and focal length 8.8 mm/24 mm-35 mm format equivalent). The aspect ratio was kept at 3:2 (5472 × 3648 pixels). The images were captured on 4 October, between 1:30 and 1:45 PM. Flight planning was designed with 75% forward overlap and 70% side overlap at ground level. The photogrammetric block consisted of 12 strips (211 images), flown at about 30 m above ground level and at an average speed of 3 m/s, with a GSD of about 1.0 cm. Flight lines were oriented along the North-South direction. Each point in the whole area was covered by >9 images. For external orientation and bundle block adjustment, nine ground control points (GCPs) were marked and measured by using Geomax Z35 Pro GNSS antennas in RTK base-rover mode; the base station was installed on a fixed and known point near the site.

Photogrammetric Data Processing
All the photogrammetric process was performed in PhotoScan Agisoft software (LLC Company, St. Petersburg, FL, USA), and high-resolution/high-accuracy outputs (DSM and RGB orthomosaic) were achieved. To investigate the image classification accuracy at different resolution levels, the original orthomosaic exported at high spatial resolution (1 cm; HR = high resolution) in 2018, was resized with nearest-neighbor interpolation from its original size (10,109 × 14,952) to medium (2022 × 2990) and low (505 × 748) scale, with a spatial resolution of 5 (MR = medium resolution) and 20 cm (LR = low resolution), respectively ( Figure 4). Regarding the 2012, only an LR original orthomosaic was available.

Ground Truth
To carry out a quantitative comparison between the "ground truth" and the data obtained through different classification methods, we collected data representing ground truth by performing a detailed vegetation survey in mid-October 2018, at the same time as image acquisition. We installed 47 transects 100-m long in the field. Transects were arranged every 3 m from the eastern to the western side of the study area that was located in a Cartesian coordinate system, where y-and x-axis were respectively orthogonal and

Photogrammetric Data Processing
All the photogrammetric process was performed in PhotoScan Agisoft software (LLC Company, St. Petersburg, FL, USA), and high-resolution/high-accuracy outputs (DSM and RGB orthomosaic) were achieved. To investigate the image classification accuracy at different resolution levels, the original orthomosaic exported at high spatial resolution (1 cm; HR = high resolution) in 2018, was resized with nearest-neighbor interpolation from its original size (10,109 × 14,952) to medium (2022 × 2990) and low (505 × 748) scale, with a spatial resolution of 5 (MR = medium resolution) and 20 cm (LR = low resolution), respectively ( Figure 4). Regarding the 2012, only an LR original orthomosaic was available. Flight was carried out with a DJI™ Phantom 4 Pro equipped with a compact camera FC6310 (20 Mpixel; CMOS 1" sensor, FOV 84° and focal length 8.8 mm/24 mm-35 mm format equivalent). The aspect ratio was kept at 3:2 (5472 × 3648 pixels). The images were captured on 4 October, between 1:30 and 1:45 PM. Flight planning was designed with 75% forward overlap and 70% side overlap at ground level. The photogrammetric block consisted of 12 strips (211 images), flown at about 30 m above ground level and at an average speed of 3 m/s, with a GSD of about 1.0 cm. Flight lines were oriented along the North-South direction. Each point in the whole area was covered by >9 images. For external orientation and bundle block adjustment, nine ground control points (GCPs) were marked and measured by using Geomax Z35 Pro GNSS antennas in RTK base-rover mode; the base station was installed on a fixed and known point near the site.

Photogrammetric Data Processing
All the photogrammetric process was performed in PhotoScan Agisoft software (LLC Company, St. Petersburg, FL, USA), and high-resolution/high-accuracy outputs (DSM and RGB orthomosaic) were achieved. To investigate the image classification accuracy at different resolution levels, the original orthomosaic exported at high spatial resolution (1 cm; HR = high resolution) in 2018, was resized with nearest-neighbor interpolation from its original size (10,109 × 14,952) to medium (2022 × 2990) and low (505 × 748) scale, with a spatial resolution of 5 (MR = medium resolution) and 20 cm (LR = low resolution), respectively ( Figure 4). Regarding the 2012, only an LR original orthomosaic was available.

Ground Truth
To carry out a quantitative comparison between the "ground truth" and the data obtained through different classification methods, we collected data representing ground truth by performing a detailed vegetation survey in mid-October 2018, at the same time as image acquisition. We installed 47 transects 100-m long in the field. Transects were arranged every 3 m from the eastern to the western side of the study area that was located in a Cartesian coordinate system, where y-and x-axis were respectively orthogonal and

Ground Truth
To carry out a quantitative comparison between the "ground truth" and the data obtained through different classification methods, we collected data representing ground truth by performing a detailed vegetation survey in mid-October 2018, at the same time as image acquisition. We installed 47 transects 100-m long in the field. Transects were arranged every 3 m from the eastern to the western side of the study area that was located in a Cartesian coordinate system, where yand x-axis were respectively orthogonal and parallel to the transect direction ( Figure 5a). Moving from the axis origin along the xdirection, we recorded the species and length of all the shrub individuals growing below each transect. At the same time, we geo-located all the larches growing on the right side of each transect, using Geomax Z35 Pro GNSS antennas in RTK base-rover mode, and for each individual, we measured trunk diameter at ground level and maximum height and mean canopy width. Therefore, at the end of the survey, we obtained a list including all the larches growing within the study area with their dimensions and all the shrubs intersecting transects with information about their species and length. This data was used, firstly, to estimate a dimensional threshold for larch identification through different methods and, Remote Sens. 2021, 13, 1239 7 of 23 secondly, to compare the linear percentage cover of shrub species obtained using transects, with their area percentage cover obtained by photo-interpretation. direction, we recorded the species and length of all the shrub individuals growing below each transect. At the same time, we geo-located all the larches growing on the right side of each transect, using Geomax Z35 Pro GNSS antennas in RTK base-rover mode, and for each individual, we measured trunk diameter at ground level and maximum height and mean canopy width. Therefore, at the end of the survey, we obtained a list including all the larches growing within the study area with their dimensions and all the shrubs intersecting transects with information about their species and length. This data was used, firstly, to estimate a dimensional threshold for larch identification through different methods and, secondly, to compare the linear percentage cover of shrub species obtained using transects, with their area percentage cover obtained by photo-interpretation.

Reference Data: Sampling and Response Design
To obtain high-quality reference data for classification accuracy assessment, other types of surveys were carried out in the field at the same time as image acquisition. In these cases, sample locations were identified using a stratified random sampling. The study area was divided into two discrete strata: grassland and shrubland, respectively dominated by herbaceous and woody plant species. The sample size for reference data was set to n = 120, after calculation through the following formula proposed by [63], which returned the minimum number of reference samples that are requested to achieve a specific overall accuracy (Equation (1)): where N is the number of units in the map data, Ô is the standard error of the estimated overall accuracy that we would like to achieve, Wi is the mapped proportion of the area of class i, and Si is the standard deviation of stratum i. Because N is typically a very large number, the second term in the denominator of Equation (1) can be ignored. The standard error of the estimated overall accuracy was set to 0.01, according to the values for target standard error suggested by [64]. The user's accuracy was set to 0.98 for both shrubland

Reference Data: Sampling and Response Design
To obtain high-quality reference data for classification accuracy assessment, other types of surveys were carried out in the field at the same time as image acquisition. In these cases, sample locations were identified using a stratified random sampling. The study area was divided into two discrete strata: grassland and shrubland, respectively dominated by herbaceous and woody plant species. The sample size for reference data was set to n = 120, after calculation through the following formula proposed by [63], which returned the minimum number of reference samples that are requested to achieve a specific overall accuracy (Equation (1)): where N is the number of units in the map data, Ô is the standard error of the estimated overall accuracy that we would like to achieve, W i is the mapped proportion of the area of class i, and S i is the standard deviation of stratum i. Because N is typically a very large number, the second term in the denominator of Equation (1) can be ignored. The standard error of the estimated overall accuracy was set to 0.01, according to the values for target standard error suggested by [64]. The user's accuracy was set to 0.98 for both shrubland and grassland stable strata. Sample allocation was carried out taking into account the mapped proportion of each stratum and their heterogeneity, so that 40 and 80 samples were randomly located, respectively, within the grassland and shrubland stratum. A minimum distance of 10 m was maintained among all the samples. We used pixels as the spatial unit of the sampling data. The minimum mapping unit (MMU) of the reference data was set to 100 cm 2 . Regarding the collection of reference data, we labelled the 120 sample plots, assigning to each sample one of the four vegetation classes used in this study ( Figure 2) by means of observation in the field. Furthermore, we geo-located the sample plots using Geomax Z35 Pro GNSS antennas in RTK base-rover mode to obtain high-quality spatial locations of the reference data ( Figure 5b).

Image Classification
Image analysis was carried out using two different approaches: (i) visual photointerpretation by a user expert on the study area and (ii) semi-automatic classification.
Photo-interpretation (PI) was manually carried out in QGIS 3.10.1 (QGIS Development Team, 2019) [65] by an operator interpreting the images according to their knowledge of the vegetation in the area. The operator created a one-point vector layer for larches (i.e., one point for each larch location) and one polygon vector layer for larch crowns and shrubs ( Figure 6b). The number of classes was set to four, according to the number of classes detected by the PI: (1) larches, (2) brownish shrubs, mainly represented by C. vulgaris, V. uliginosum and V. myrtillus, (3) greenish shrubs, including the evergreen species J. communis and R. ferrugineum, and (4) grassland. In order to estimate the size threshold for larch identification, the operator matched the locations of the photo-interpreted larches with field data. Furthermore, the PI was carried out on LR and HR images, to assess the effect of pixel dimension on classification accuracy. Six RF models were trained (3 × 2), one for each resolution level and one for each of the two different types of image: (1) original and (2) smoothed. We obtained the second type of image by running an iterative edge-preserving image smoothing algorithm, which, for any given pixel of the input image, returned a filtered value corresponding to the average spectral signature of the neighborhood pixels, that are both close in space and spectral signatures. More precisely, neighborhood pixels are those pixels spatially closer than the spatial radius parameter (spatialr), that in this study was set to 5, 20 or 100, respectively for low-, medium-and high-resolution images, and with spectral signature having Euclidean distance to the input pixel lower than the range radius (ranger) that was set to 15. Regarding OBIA, an additional step to obtain image objects was carried out, producing a segmentation for both original and smoothed images. Segmentation was performed in the Monteverdi module of Orfeo ToolBox repository and consisted of three steps: (i) smoothing (LSMSSegmentation), with the same parameters described above, (ii) small region merging (LSMSSmallRegionMerging), with a threshold value of 1, 16 and 400 pixels, respectively, for low-, medium-and high-resolution images, and (iii) vectorization (LSMSVectorization) to obtain mean and standard deviation values of the three bands for each image object. The threshold values for the region merging were set up according to the image resolution, in order to create objects with a fixed minimum surface area of 400 Semi-automatic classification was performed in the QGIS 3.10.1 environment, using a machine learning approach and making use of the algorithms available in OTB 7.1.0 (Orfeo ToolBox) and SCP 6.4.6 (Semi-automatic Classification) plugins. We carried out pixel-based (PBC) and object-based (OBIA) classifications (Figure 6c,d) of HR, MR and LR images. To perform PBC and OBIA, we used Random Forest (RF) [66] classification models, based on several studies that highlighted the efficiency and power of RF approaches in the classification of remote sensing imagery, e.g., [67,68]. In general, RF classifiers were used with satisfactory results in several mapping applications for different vegetation types, such as bog communities [69], invasive plant species [70] and, especially, for heterogeneous categories such as shrublands [71] and grasslands [46], which are the most difficult to classify. Moreover, RF classifiers were compared to other classifiers, such as support vector machine (SVM) and AdaBoost, which in some cases showed similar or even a higher classification accuracy [72][73][74][75]; RF classifiers are more user-friendly, and the algorithms are faster to train and more stable. We set model parameters as follows: maximum depth of the tree = 5; minimum number of samples in each node = 10; cluster possible values of a categorical variable into K = 4; maximum number of trees in the forest = 500; sufficient accuracy (OOB error) = 0.01; training and validation ratio = 0.8. RF models, were trained with five training regions per class. To obtain random training regions, random points were generated across the whole area, avoiding the overlap with the 120 plots constituting the reference data (see Section 2.5), and were manually labelled, according to the four vegetation classes described above. Afterwards, the training regions (hereafter ROI) were manually drawn around five points per class, randomly selected among all the manually labelled points.
Six RF models were trained (3 × 2), one for each resolution level and one for each of the two different types of image: (1) original and (2) smoothed. We obtained the second type of image by running an iterative edge-preserving image smoothing algorithm, which, for any given pixel of the input image, returned a filtered value corresponding to the average spectral signature of the neighborhood pixels, that are both close in space and spectral signatures. More precisely, neighborhood pixels are those pixels spatially closer than the spatial radius parameter (spatialr), that in this study was set to 5, 20 or 100, respectively for low-, medium-and high-resolution images, and with spectral signature having Euclidean distance to the input pixel lower than the range radius (ranger) that was set to 15. Regarding OBIA, an additional step to obtain image objects was carried out, producing a segmentation for both original and smoothed images. Segmentation was performed in the Monteverdi module of Orfeo ToolBox repository and consisted of three steps: (i) smoothing (LSMSSegmentation), with the same parameters described above, (ii) small region merging (LSMSSmallRegionMerging), with a threshold value of 1, 16 and 400 pixels, respectively, for low-, medium-and high-resolution images, and (iii) vectorization (LSMSVectorization) to obtain mean and standard deviation values of the three bands for each image object. The threshold values for the region merging were set up according to the image resolution, in order to create objects with a fixed minimum surface area of 400 cm 2 , which corresponds to the pixel area of the image at the lowest resolution (i.e., 20 cm). All these steps were made starting from HR, MR and LR images, which were then classified by means of an RF model, trained with five ROIs per class, as mentioned before.
The accuracy assessment was carried out in two steps on both "raw" and "cleaned" classification maps. A first accuracy assessment was preceded immediately by the classification process performed through the PBC and OBIA. Afterwards, all the classification maps obtained through PBC were post-processed to "clean" the classification output and try to improve the classification accuracy, which was assessed again thanks to a second accuracy assessment. Post-processing included raster filtering of PBC maps with a threshold value of 1, 16 and 400 pixels respectively for LR, MR and HR images. This step of the analyses aimed at reducing the potential salt and pepper effect (i.e., noise consisting in a scattered occurring of black and white pixels due to transmission errors) or the presence of noise due to small shadows in the raw classification maps, and at quantifying the effect of the filtering algorithm on the classification accuracy. The filtering step was not performed on OBIA classification maps where the same thresholds were already applied through the small-region-merging step. The whole workflow of image classification through the three methods described above was outlined in Figure 7.
Once we obtained the classification maps and performed the accuracy assessment, we compared, firstly the accuracy metrics of the three classification methods (i.e., PI, PBC and OBIA) and, secondly, the accuracy metrics obtained at different resolution levels. Moreover, to answer our question about the opportunity for using semi-automatic classification for the detection of encroachment advance over time, the three classification methods described above were also applied on the LR image of the study area acquired by drone in October 2012. The image was photo-interpreted by the same operator as before and then classified using the two most accurate models identified among PBC and OBIA models trained on LR images. Afterwards, the classification maps of 2012, obtained through the three different methods, were compared with the classification maps obtained through the same three methods applied on LR images of 2018 and a map of land cover change was generated for each classification method (see Figures S2-S7 for land cover change and classification maps).
Remote Sens. 2021, 13, x FOR PEER REVIEW 10 of 24 filtering step was not performed on OBIA classification maps where the same thresholds were already applied through the small-region-merging step. The whole workflow of image classification through the three methods described above was outlined in Figure 7. Once we obtained the classification maps and performed the accuracy assessment, we compared, firstly the accuracy metrics of the three classification methods (i.e., PI, PBC and OBIA) and, secondly, the accuracy metrics obtained at different resolution levels.
Moreover, to answer our question about the opportunity for using semi-automatic classification for the detection of encroachment advance over time, the three classification methods described above were also applied on the LR image of the study area acquired by drone in October 2012. The image was photo-interpreted by the same operator as before and then classified using the two most accurate models identified among PBC and OBIA models trained on LR images. Afterwards, the classification maps of 2012, obtained through the three different methods, were compared with the classification maps obtained through the same three methods applied on LR images of 2018 and a map of land cover change was generated for each classification method (see Figures S2-S7 for land cover change and classification maps).

Analyses
The accuracy of each map classification, obtained through photo-interpretation and semi-automatic classification methods, was assessed with the algorithms available in SCP 6.4.6 (Semi-automatic Classification) plugin, which were used to derive three accuracy parameters from an area-based error matrix: (1) the Overall Accuracy (OA), indicating the proportion of the reference plots that was correctly mapped; (2) the Producer's Accuracy (PA), that is, the proportion of the area of reference class j that was mapped as class j; and (3) the User's Accuracy (UA), that is, the proportion of the area mapped as class i that had reference class i. Assuming that, in the error matrix, p ij represents the proportion of area that has map class i and reference class j, for an error matrix of q classes, the overall accuracy is whereas producer's accuracy of class j and user's accuracy of class i are respectively calculated with Equations (3) and (4): Finally, we performed the non-parametric Mann-Whitney's test (95% CI) to investigate the differences between the dimensions of the larches surveyed in the field and correctly classified or misclassified through the different methods. All the analyses, maps and graphs were produced in R 3.5.1 and QGIS 3.10.1. In particular, accuracy assessment was performed using the algorithms available in SCP 6.4.2 (Semi-automatic Classification Plugin).

Photo-Interpretation
The photo-interpretation (PI) of the low-resolution (LR) image, acquired in mid-October 2018, allowed us to detect and correctly classify 135 larches (CC-larches), corresponding to 25.4% of the 531 ground-surveyed larches (GS-larches), identified and measured in the field). The dimension of the CC-larches significantly differed from the GS-larches, in terms of trunk diameter, maximum height and mean canopy width. More precisely, the population of CC-larches showed higher trunk diameter (p < 0.001), maximum height (p < 0.001) and mean canopy width (p < 0.001) compared to the GS-larches (Table 1).  The undetected larches were sorted into two groups according to their position relative to the other larches: isolated and grouped larches. This classification allowed us to discriminate between larches that were not detected because of their small size and larches that were undetectable since they were part of a group, where distinguishing the individual canopy borders was not possible from the image. This latter type of larches was excluded from the analyses that we performed to define the PI resolution threshold. Among the 396 larches that were not detected through the photo-interpretation, 84.6% (n = 335) were isolated and not detectable due to their dimensions and were consequently misclassified (hereafter MC-larches). These MC-larches were used to define the PI resolution threshold. The MC-larches, as a subset of the GS-larches including the smallest individuals, were significantly smaller compared to all the GS-larches, in terms of trunk diameter (p < 0.001), maximum height (p < 0.001), and mean canopy width (p < 0.001) ( Table 1). The intersection between the density curves of CC-and MC-larches was used to define the threshold of the PI efficiency or, in other words, the size limit between detectable and undetectable larches. According to this approach, the threshold was set at 1.8 cm of trunk diameter, 45.7 cm of maximum height and 52.1 cm of mean canopy width (Figure 8a-c), meaning that the photo-interpretation of low-resolution (20 cm) images cannot guarantee the detection of larches smaller than this size. The undetected larches were sorted into two groups according to their position relative to the other larches: isolated and grouped larches. This classification allowed us to discriminate between larches that were not detected because of their small size and larches that were undetectable since they were part of a group, where distinguishing the individual canopy borders was not possible from the image. This latter type of larches was excluded from the analyses that we performed to define the PI resolution threshold. Among the 396 larches that were not detected through the photo-interpretation, 84.6% (n = 335) were isolated and not detectable due to their dimensions and were consequently misclassified (hereafter MC-larches). These MC-larches were used to define the PI resolution threshold. The MC-larches, as a subset of the GS-larches including the smallest individuals, were significantly smaller compared to all the GS-larches, in terms of trunk diameter (p < 0.001), maximum height (p < 0.001), and mean canopy width (p < 0.001) ( Table 1). The intersection between the density curves of CC-and MC-larches was used to define the threshold of the PI efficiency or, in other words, the size limit between detectable and undetectable larches. According to this approach, the threshold was set at 1.8 cm of trunk diameter, 45.7 cm of maximum height and 52.1 cm of mean canopy width (Figure 8a-c), meaning that the photo-interpretation of low-resolution (20 cm) images cannot guarantee the detection of larches smaller than this size.  Regarding the high-resolution (HR) image, 320 larches (60.3% of the GS-larches) were manually detected during the photo-interpretation. Similar to what we observed for the LR image, the dimension of the CC-larches differed significantly from the dimension of the GS-larches, in terms of trunk diameter, maximum height and mean canopy width. More precisely, the population of CC-larches showed higher trunk diameter (p < 0.001), maximum height (p < 0.001), and mean canopy width (p < 0.001) compared to the GS-larches (Table 1 and Figure S1a-c).
On the other hand, 39.7% (n = 229) of the GS-larches were not detected through PI. As seen above for the LR image, we selected the 134 isolated larches (58.5% of all the not-detected larches) that were not detected due to their small size and thus misclassified (MC-larches) and used them for further analyses. The MC-larches, as a subset of the GSlarches including the smallest individuals, were significantly smaller than the GS-larches, in terms of trunk diameter (p < 0.001), maximum height (p < 0.001) and mean canopy width (p < 0.001) ( Table 1 and Figure S1a-c). For the HR image, the PI threshold was set at 0.9 cm of diameter, 20.3 cm of maximum height and 26.5 cm of mean canopy width (Figure 8d-f), meaning that the PI of HR images cannot guarantee the detection of larches smaller than this size, that was much lower than the PI threshold of the LR image.
Regarding shrub species, since discriminating the species of the shrubs with similar color was nearly impossible without the support of a "ground truth", especially on the LR image, the shrubs that were included in the two classes distinguishable by color were: (i) brownish shrubs, including C. vulgaris + Vaccinium ssp., and (ii) greenish shrubs, including J. communis + R. ferrugineum, which correspond respectively to "Class 2" and "Class 3" of the classification categories.
The area of the polygons created around the detected shrubs was calculated to estimate the percentage cover of each class, which was then compared to the linear percentage cover obtained through the vegetation survey. We observed that the percentage cover of the two shrub classes within the study area was greatly similar to the linear percentage cover along transects. More specifically, brownish shrubs showed a linear percentage cover of 5.6% and a surface area of 5.4% or 8.1% obtained through the PI of, respectively, the LR and HR image, whereas greenish shrubs showed a linear percentage cover of 4.2% and 4.0% or 4.4% obtained through the PI of, respectively, the LR and HR image. Moreover, "Class 1" including larches and "Class 4" including areas dominated by herbaceous species showed a percentage cover of, respectively, 0.6% and 90.0% as resulted from the PI of the LR image and a percentage cover of, respectively, 0.7% and 86.8% as resulted from the PI of the HR image (Table S3).
Finally, the accuracy assessment of the PI against reference data showed an Overall Accuracy (OA) of 88.5% and 99.3%, respectively, for the PI of the LR and HR image. All the other accuracy metrics are reported in Table 2. Table 2. Results of the accuracy assessment of the image classification performed through photo-interpretation (PI); the accuracy metrics are reported for the PI of the high-resolution (1 cm; HR) and low-resolution (20 cm; LR) images and included the overall accuracy (OA), the producer's (PA) and user's accuracy (UA) of each classification class (1: larches; 2: brownish shrubs; 3: greenish shrubs; 4: grassland).

Semi-Automatic Classification Methods
The PBC method was more accurate than the OBIA approach for the classification of the images at all the resolution levels and for both the original and smoothed images. Moreover, the highest accuracy was observed for the PBC followed by the filtering of the classification maps. We found that the filtering step was able to increase the classification accuracy of all the PBC models by 3% to 15%. In particular the most accurate classification was the one performed through the PBC method followed by filtering on the HR smoothed image, which showed an overall accuracy of 86.3%. On the other hand, the highest accuracy reached by OBIA models was 77.7% (Figure 9 and Table S2). smoothed image, which showed an overall accuracy of 86.3%. On the other hand, the highest accuracy reached by OBIA models was 77.7% (Figure 9 and Table S2).
Generally, the effect of the image smoothing on the performance of the PBC models was positive for the classification of the HR image and negative for the LR image, respectively increasing and decreasing the overall accuracy of the classification of 2% to 5%. OBIA models did not show any positive effects of the smoothing process and showed higher values of overall accuracy when applied to the original images ( Figure 9 and Table S2). Regarding single classes, in general, "Class 1", including larches, showed the lowest values of the producer's accuracy (PA) and the highest values of the user's accuracy (UA), in both PBC and OBIA classification maps. As shown by the area-based error matrices of the three-best-performing PBC models (highlighted in grey in Table S2), the most common error was the misclassification of the larches ("Class 1") as grassland ("Class 4"). This observation was confirmed also for the OBIA classification maps that additionally showed a recurring misclassification between brownish ("Class 2") and greenish ("Class 3") shrubs.
According to Table S3, the percentage covers of each class obtained by the best performing PBC model were 0.4% for larches ("Class 1"); 12.8% and 1.5%, respectively, for brownish and greenish shrubs ("Class 2" and "Class 3") and 85.3% for the herbaceous cover of the grassland ("Class 3") (Table S2). Regarding larches, we found that only 28.8% of the GS-larches showed greater dimensions (Figure S1d-f) and were correctly classified by the PBC. We then compared the dimensional parameters of correctly classified and misclassified larches, and we extrapolated the classification threshold, as done before for photo-interpretation. The results of this analysis highlighted that PBC was not able to correctly classify larches smaller than 1.9 cm of diameter, 48.8 cm of maximum height and 55.5 cm of mean canopy width (Figure 10a-c). On the other hand, the percentage covers of each class, obtained by the best performing OBIA model according to Table S3, were 4.1% for larches ("Class 1"); 4.2% and 8.3%, respectively, for brownish and greenish shrubs ("Class 2" and "Class 3") and 83.4% for the herbaceous cover of the grassland ("Class 3"). Regarding larches, we found that only 33.0% of the GS-larches showed greater dimensions (Figure S1g-i) and were correctly classified by OBIA, and that this method was not able to correctly classify larches smaller than 1.9 cm of diameter, 50.0 cm of maximum height and 56.1 cm of mean canopy width (Figure 10d-f). Generally, the effect of the image smoothing on the performance of the PBC models was positive for the classification of the HR image and negative for the LR image, respectively increasing and decreasing the overall accuracy of the classification of 2% to 5%. OBIA models did not show any positive effects of the smoothing process and showed higher values of overall accuracy when applied to the original images ( Figure 9 and Table S2).
Regarding single classes, in general, "Class 1", including larches, showed the lowest values of the producer's accuracy (PA) and the highest values of the user's accuracy (UA), in both PBC and OBIA classification maps. As shown by the area-based error matrices of the three-best-performing PBC models (highlighted in grey in Table S2), the most common error was the misclassification of the larches ("Class 1") as grassland ("Class 4"). This observation was confirmed also for the OBIA classification maps that additionally showed a recurring misclassification between brownish ("Class 2") and greenish ("Class 3") shrubs.
According to Table S3, the percentage covers of each class obtained by the best performing PBC model were 0.4% for larches ("Class 1"); 12.8% and 1.5%, respectively, for brownish and greenish shrubs ("Class 2" and "Class 3") and 85.3% for the herbaceous cover of the grassland ("Class 3") (Table S2). Regarding larches, we found that only 28.8% of the GS-larches showed greater dimensions (Figure S1d-f) and were correctly classified by the PBC. We then compared the dimensional parameters of correctly classified and misclassified larches, and we extrapolated the classification threshold, as done before for photo-interpretation. The results of this analysis highlighted that PBC was not able to correctly classify larches smaller than 1.9 cm of diameter, 48.8 cm of maximum height and 55.5 cm of mean canopy width (Figure 10a-c). On the other hand, the percentage covers of each class, obtained by the best performing OBIA model according to Table S3, were 4.1% for larches ("Class 1"); 4.2% and 8.3%, respectively, for brownish and greenish shrubs ("Class 2" and "Class 3") and 83.4% for the herbaceous cover of the grassland ("Class 3"). Regarding larches, we found that only 33.0% of the GS-larches showed greater dimensions (Figure S1g-i) and were correctly classified by OBIA, and that this method was not able to correctly classify larches smaller than 1.9 cm of diameter, 50.0 cm of maximum height and 56.1 cm of mean canopy width (Figure 10d-f).

Land Cover Change
The map of land cover change obtained by the PI of the LR images acquired in 2012 and 2018 showed an increase of the shrubland cover on the grassland area of 6.5%, whereas 6.7% of the shrubland cover already observed in 2012 remained constant (Table 3). Considering the area that changed from shrubland to grassland cover, mainly due to the limits of the classification method, these results led to a general increase in the shrubland cover of 4.3%, from 8.9% in 2012 to 13.2% in 2018, while grassland cover decreased from 91.1% to 86.8% (Table 3). The most accurate pixel-based classification model was not able to detect this trend of change and, on the contrary, showed a substantially unvaried land cover, with 24.6% of shrubland cover in 2012 and 23.6% in 2018 (Table 3). Similar results were observed for the most accurate object-based classification model that even showed an increase of grassland cover, from 79% to 83.3%, and a consequent decrease of shrubland cover, from 21% to 16.6% (Table 3).

Land Cover Change
The map of land cover change obtained by the PI of the LR images acquired in 2012 and 2018 showed an increase of the shrubland cover on the grassland area of 6.5%, whereas 6.7% of the shrubland cover already observed in 2012 remained constant (Table  3). Considering the area that changed from shrubland to grassland cover, mainly due to the limits of the classification method, these results led to a general increase in the shrubland cover of 4.3%, from 8.9% in 2012 to 13.2% in 2018, while grassland cover decreased from 91.1% to 86.8% (Table 3). The most accurate pixel-based classification model was not able to detect this trend of change and, on the contrary, showed a substantially unvaried land cover, with 24.6% of shrubland cover in 2012 and 23.6% in 2018 (Table 3). Similar results were observed for the most accurate object-based classification model that even showed an increase of grassland cover, from 79% to 83.3%, and a consequent decrease of shrubland cover, from 21% to 16.6% (Table 3). Table 3. Land cover changes occurred in the study area from 2012 to 2018 as observed by the photo-interpretation (PI) of the LR images and by the most accurate pixel-based (PBC) and objectbased (OBIA) classifications. Land cover change was determined through the identification of four cover classes: constant grassland (G) and constant shrubland (S), representing the area that maintained the same classification label (i.e., grassland or shrubland) from 2012 to 2018, and two more classes, describing the changes from grassland cover to shrubland (G→S) and vice versa Figure 10. Density curve of the trunk diameter (a,d), maximum height (b,e) and mean canopy width (c,f) of the groundsurveyed larches (GS; solid black line), larches correctly classified (CC; solid orange line) and larches that were not detected and so misclassified (MC; solid grey line) through the semi-automatic classification methods. Density plots are reported for the most accurate pixel-based (a-c) and object-based (d-f) classification maps of the images acquired by drone in an abandoned subalpine grassland during mid-October 2018. The dashed red line marks the intersection between CC-and MC-larches density curves, whose x-value was set as the size threshold of the semi-automatic classification efficiency. Table 3. Land cover changes occurred in the study area from 2012 to 2018 as observed by the photointerpretation (PI) of the LR images and by the most accurate pixel-based (PBC) and object-based (OBIA) classifications. Land cover change was determined through the identification of four cover classes: constant grassland (G) and constant shrubland (S), representing the area that maintained the same classification label (i.e., grassland or shrubland) from 2012 to 2018, and two more classes, describing the changes from grassland cover to shrubland (G→S) and vice versa (S→G).

Discussion
This study described the application of UAV-acquired imagery to the monitoring of the early stages of shrub and tree colonization (i.e., woody encroachment) of an abandoned subalpine grassland that was previously grazed by cattle, keeping the area "clean" from woody plant species. In particular, we achieved the goal of describing a straightforward workflow for an accurate, cost-effective and readily accessible image classification approach based, with manual or semi-automatic techniques, which were applied at a very fine scale (<10 cm). The results showed that RGB images acquired at the resolution of 1 to 20 cm are adequate for mapping vegetation, at least at the life-form level (i.e., tree, shrub and herbaceous species), without the support of any other information concerning spectral resolution or object texture. Indeed, we reached good automated classification performance, with the most accurate model showing an overall accuracy of~86%.
Among the three classification methods evaluated in this study, i.e., manual photointerpretation, pixel-and object-based semi-automatic classifications, the former showed the best performance at both high (1 cm) and low (20 cm) resolution, with an overall accuracy of 99.3% and 88.5%, respectively. Thus, the accuracy level of the photo-interpretation was high and increased by more than 10%, moving from a resolution of 20 cm to 1 cm. On the contrary, we observed that the overall accuracy of the raw (i.e., not filtered) pixel-based classifications generally increased with decreasing image resolution, from a minimum of 69.2% (high resolution) to a maximum of 82.5% (low resolution). Despite the high-quality classification, this observation confirmed that pixel-based classification methods may run into several constraints when they are applied to very high spatial resolution imagery, for the occurrence of noise due to a salt-and-pepper effect or to the presence of small shadows, e.g., [52], as we observed in our study. Generally, our pixel-based classification models misclassified shadows projected by larches and other smaller shadows within the herbaceous cover as brownish or greenish shrubs ("Class 2" and "Class 3"). This misclassification led to an overestimation of the corresponding classes and generally of the shrub cover, as found also by Poznanovic et al. [76], who observed an overestimation of the western juniper cover in their study area, when performing pixel-based classification. The effects of the misclassification in the high-resolution image were limited by the application of a simple filtering algorithm that increased the accuracy of the high-resolution classification map from 71.3% to 86.3%. The overall accuracy values achieved in our study (i.e., >80%) were consistent with recent studies focused on pixel-based classification of satellite images to determine the vegetation cover of herbaceous-dominated areas under woody encroachment [77,78]. Moreover, to the best of our knowledge, only one recent study used very-high-resolution UAV imagery (2.4 cm) and pixel-based classification methods to investigate shrub encroachment directly, in a watershed dominated by perennial grasses [79]. Similar to our study, Durfee et al. [79] used data collected in the field to assess the overall accuracy of their classification, which was 76.6% for the RGB imagery. These results are not only in accordance with our study but helped us to demonstrate that high resolution UAV imagery can be a successful tool for reaching high levels of accuracy, even with simple RGB images.
According to the literature, some of the limitations of pixel-based classification methods can be overcome using an object-based approach, whose minimum processing unit is an object (or segment) instead of a pixel. Several studies demonstrated that object-based classification models can successfully classify very high spatial resolution imagery [52,80], but the choice between pixel-or object-based approaches does not follow rigid rules and depends on the aims of the study and on several characteristics of the available data and the study area, such as vegetation type and phenology, land cover heterogeneity and imagery features [52,[81][82][83]. For instance, in their study on grassland colonization by Juniperus virginiana in Nebraska, Filippelli et al. [84] showed that the object-based approach had the lowest error when the cover of the target vegetation was less than 10% and maintained a low number of errors, up to cover levels of 30%. On the other hand, alternative pixel-based methods, which are less dependent on the object-background contrast, performed better in areas with a high vegetation cover (>60%) [76]. According to these observations, the object-based approach was less suitable than the pixel-based approach for the classification of our study area, characterized by a target vegetation cover of nearly 100%. Indeed, the highest overall accuracy reached by this method was 77.7% in our study. This outcome was consistent with the results of the study performed by Ma et al. [82]. In their meta-analysis, they showed that land-cover type and heterogeneity had a significant effect on object-based classification accuracy, with vegetation and forest showing lower accuracy than other categories, such as agriculture and wetland study areas, and with more homogeneous areas showing higher accuracy than heterogeneous ones. Similar to what we observed for pixel-based classifications, the overall accuracy of object-based classifications decreased with increasing image resolution. Moreover, we found that the images that underwent a smoothing step were classified with higher accuracy than the original images by the pixel-based classification models, whereas the opposite trend was observed for the objectbased classification models. All these observations can be explained considering that image smoothing and resizing likely helped the pixel-based models in reducing error-rates generated by salt-and-pepper noise and small shadows. On the other hand, image smoothing flattened the spectral differences among the pixels and consequently hampered the image segmentation and the object creation.
A notable aspect of our study is that the testing of different image classification methods was not exclusively focused on the categorical aspects of the classification, like it is in most of the available literature about remote sensing applications for encroachment monitoring [40,41,85]. Instead, we used the field measurements of the larch dimensions (i.e., trunk diameter, maximum height and mean crown width) to evaluate a quantitative threshold of tree detection. In other words, by knowing the dimension of all the larches occurring within the grassland, we were able to determine not only the percentage of detected larches but also the smallest larch size that could be detected by each method. Among the three-dimensional variables that we measured in the field, the mean crown width was undoubtedly the most important parameter, having a crucial role in larch detection. Similar to what we observed in the classification accuracy, the photo-interpretation of the high-resolution image showed the best performance, being able to detect very small larches to a threshold of 0.9 cm of diameter, 20.3 cm of maximum height and 26.5 cm of mean canopy width. The minimum dimensions of detectable larches sharply increased for the photo-interpretation of the low-resolution image, which showed a threshold value of about twice as high as the one reported above, of 1.8 cm of diameter, 45.7 cm of maximum height and 52.1 cm of mean canopy width. This implies that, by moving from a resolution of 1 cm to a resolution of 20 cm, the photo-interpretation efficiency in tree detection was at least halved. In general, the cut off size of tree detection in our study was lower than the size of all the studies we were able to find in literature, e.g., [76,81,83,86], where the canopy area of non-detected shrubs and/or trees ranged from 1 to 1.6 m 2 . Interestingly, we found that the threshold values of the most accurate semi-automatic models were very close to the values observed for the photo-interpretation of the low-resolution image, which means that the performance of the semi-automatic classification models in larch detection was comparable to the classification carried out by the operator on the low-resolution image. In fact, the photo-interpretation of the low-resolution image was able to detect 25.4% of the field-surveyed larches, whereas the best performing pixel-and object-based classification models detected, respectively, 28.8% and 33.0% of all the larches occurring in the study area.
However, although the percentage of the larches detected by the semi-automatic classification was similar or even higher than the percentage of the larches detected by the photo-interpretation, the former method underestimated the tree cover by 0.4% when the pixel-based classification was applied and overestimated by 4.1% the same class in the case of the object-based classification when compared to the photo-interpretation (0.7%). A similar trend was observed for the shrub species, whose percentage cover was overestimated by 5.3% and 3.5%, respectively, by the pixel-and the object-based classification. Considering the percentage cover of all the woody plant species (i.e., larch + brownish shrubs + greenish shrubs) and taking the 9.7% of cover detected by the high-resolution photo-interpretation as a reference value, we observed that the overestimation increased from 13.2% of the low-resolution photo-interpretation to 14.7% and 16.6% of the pixel-and the object-based classification, respectively. This overestimation of the larch and shrub cover was the most likely reason why both the semi-automatic classification methods were not able to detect the land cover change occurring in the last six years, from 2012 to 2018. This can be explained by the fact that, even if the overall accuracy of the semi-automatic classifications was quite high, the omission error of the models was still higher than the percentage of the land cover change in terms of tree and shrub cover percentage. Furthermore, the differences between 2012 and 2018 in image acquisition and features likely increased the error in the estimation of the land cover change. Unfortunately, a high-resolution image of 2012 was not available and neither was a simultaneous vegetation survey, so that we could not properly investigate the accuracy of the photo-interpretation in the detection of the land cover change. For these reasons, we recommend further studies on the efficiency of different image analysis methods in the detection of the early stages of land cover change, due to encroachment. Nevertheless, according to the overall accuracy of the photo-interpretation, we suggest using this method rather than semi-automatic classification when investigating short-term (<10 years) or slow-rate changes of the land cover (1-10%) or generally when the expected land cover change to be smaller than the omission error of the semi-automatic classification models to be applied. Finally, according to the results of photo-interpretation, the encroachment rate in the study area from 2012 to 2018 was~1% per year, in terms of tree and shrub cover expansion. This result is consistent with other studies which found that the increase of woody species cover due to the encroachment process ranged from 0.9% to 2.3% per year [10,84,87].
Therefore, we demonstrated that RGB images acquired by drone at a resolution of 1 to 20 cm can be readily and accurately (overall accuracy > 80%) classified through a pixel-based approach, in order to investigate the vegetation cover of a subalpine grassland under recent encroachment. This method outperformed the object-based classification under the circumstances of our study, requiring a very fine scale classification. Although it was not an objective of this study, we hypothesized that the accuracy of the object-based classification could be improved by setting a smaller spatial radius for segmentation, in other words, by creating smaller objects. However, in that case, the time requested by the image analysis would significantly increase and, thus, further analyses would be needed to understand if it might be worth it in terms of overall accuracy. Moreover, the pixel-based classification was shown to have several advantages compared to the field observations and the photo-interpretation: It is faster and less laborious; it can be easily applied to wider areas and removes the set of problems linked to the subjectivity of manual classification. Nevertheless, we observed that, for a low percentage of land cover changes, the semiautomatic classification methods were not as efficient as photo-interpretation, at least when only RGB bands are considered. Therefore, in light of the above results, we suggest using a semi-automatic classification of RGB images when the main aim of the study is to describe the vegetation cover of several areas or long-term land cover changes. For instance, Malatesta et al. [87] successfully detected 27 years (1988-2015) of land cover change of a pastoral landscape in the central Apennines in Italy through a supervised object-based classification of satellite imagery. On the other hand, we suggest preferring more accurate methods, such as visual photo-interpretation, when the objective is to describe short-term land cover changes on a small scale.
Regardless of the classification method, the opportunity to substitute fieldwork with high-resolution image analysis in the study of vegetation dynamics has important advantages for the monitoring of land cover change under encroachment and, consequently, for management decisions and biodiversity conservation assessment [88]. Indeed, scientific knowledge may lead management decisions regarding the optimal land-use practice to control woody encroachment in terms of grazing or fire features, such as timing, intensity and frequency. Moreover, monitoring vegetation at high-resolution may help to identify not only the encroachment stage at which biodiversity is maximized but also the areas needing an urgent management intervention due to their critical conditions. The method developed in this study has the potential to help the environmental diagnostic and drive management decisions, thanks to its high flexibility, spatial and temporal resolution, objectivity and reproducibility. In fact, the UAV imagery analysis approach allows detailed information to be collected across a wide geographical area and with a high temporal resolution. For instance, Suess et al. [41] obtained a reliable annual time series of shrub cover changes, from high-resolution satellite imagery available for 32 years, but the resolution level they worked with would not be enough to investigate small shrub and tree individuals at the species level. The access to annual data on a small scale is essential for the study of the encroachment consequences on the structure and functioning of ecosystems. Indeed, monitoring the encroachment process frequently and on a small scale facilitates the detection of its onset and the investigation of the colonization pattern of the woody species in its early stages, which are still quite unknown. At the same time, the annual structural changes of the plant community may be correlated with the available annual data on the functioning of the ecosystems. Several instruments and methods for the monitoring of terrestrial ecosystem functioning became very common in the last few decades (e.g., eddy covariance towers, litterbag approach for decomposition, respiration chambers, NDVI sensors and phenocams). The integration of the functional data provided by these methods with the information on the structural changes may help to explain and predict the encroachment effects on the ecosystem processes and services. Furthermore, another advantage of having annual data from several areas is the opportunity to investigate the relationship between structural changes in the plant community and environmental factors, such as fire [78], land-use change [83] and climate change [85,89]. There is still a lack of knowledge on the effects of climate on the encroachment rate, and having a time series of structural and climatic data may help to explain this relationship and, consequently, to disentangle the effects of climate and land cover change on ecosystem functioning and services.
In this study, the acquisition schedule was set in order to successfully exploit the unique phenological characteristics of the vegetation growing in the study area. In fact, the RGB images were acquired in mid-October because the senescent phase immediately before snow cover establishment was characterized by the highest species diversification: larches became yellow, C. vulgaris and Vaccinium sp. were brownish, J. communis remained green, and the herbaceous species of the grassland, dominated by N. stricta, appeared grayish. Therefore, the accuracy level that we reached was strictly dependent on the timing of the image acquisition, confirming that, regardless of spectral resolution or classification method, phenology is a key issue for vegetation classification, especially when using RGB images, as already suggested by other studies, e.g., [47,52,79].
Finally, all the software used in the study is free, which represents a plus point for their employment in nature conservation or land management. In fact, image analysis is often performed using paid software, such as ArcGIS [78,79], eCognition [81,83] and MATLAB [77], whereas we carried out all the analyses using the free software R and QGIS. Regarding sensors, multispectral or hyperspectral UAV sensors would probably improve the resulting classification accuracy, but they would make the methodology much more expensive and complex [90]. Furthermore, thanks to its plasticity, the method developed in this study might be applied to map and monitor not only subalpine grasslands under encroachment but also several other worldwide areas characterized by different processes of vegetation dynamics, such as invasion of alien species, bush encroachment in the savannahs, algal blooms in aquatic ecosystems, or the upward treeline shift.

Conclusions
This study showed that it is possible to efficiently replace, or at least significantly reduce, fieldwork and visual photo-interpretation of images by combining high-resolution remote sensing offered by drones with a simple and accurate pixel-based classification workflow, in order to map the vegetation of a subalpine grassland under recent encroachment, at least at the level of life-form (i.e., tree, shrub and herbaceous species). We developed a straightforward workflow that can be carried out through cost-effective methods and free software, without requiring professional skills in programming and image analysis.
Moreover, we were able not only to compare the accuracy of three different classification methods (i.e., photo-interpretation, pixel-and object-based semi-automatic classification) but also to describe their quantitative limits in juvenile tree detection, by analyzing the dimensions of the detected and undetected trees thanks to a very detailed dataset collected in the field. In light of our results, pixel-based classification of the high-resolution image can be highly recommended for the study of the encroachment process, especially in long-term studies or in the case of several and widely distributed areas. However, we suggest avoiding semi-automatic classification methods for the investigation of short-term land cover change, at least when the omission error is higher than the percentage of land cover change, in which case we recommend using visual photo-interpretation. To conclude, encroachment is a widely distributed ecological process, with important consequences on ecosystem structure and functioning, and to our knowledge, this study was the first attempt to describe its early effects on vegetation structure at a very fine scale and exclusively using the RGB bands.