Evaluation of Different Machine Learning Algorithms for Scalable Classiﬁcation of Tree Types and Tree Species Based on Sentinel-2 Data

: We use freely available Sentinel-2 data and forest inventory data to evaluate the potential of different machine-learning approaches to classify tree species in two forest regions in Bavaria, Germany. Atmospheric correction was applied to the level 1C data, resulting in true surface reﬂectance or bottom of atmosphere (BOA) output. We developed a semiautomatic workﬂow for the classiﬁcation of deciduous (mainly spruce trees), beech and oak trees by evaluating different classiﬁcation algorithms (object-and pixel-based) in an architecture optimized for distributed processing. A hierarchical approach was used to evaluate different band combinations and algorithms (Support Vector Machines (SVM) and Random Forest (RF)) for the separation of broad-leaved vs. coniferous trees. The Ebersberger forest was the main project region and the Freisinger forest was used in a transferability study. Accuracy assessment and training of the algorithms was based on inventory data, validation was conducted using an independent dataset. A confusion matrix, with User´s and Producer´s Accuracies, as well as Overall Accuracies, was created for all analyses. In total, we tested 16 different classiﬁcation setups for coniferous vs. broad-leaved trees, achieving the best performance of 97% for an object-based multitemporal SVM approach using only band 8 from three scenes (May, August and September). For the separation of beech and oak trees we evaluated 54 different setups, the best result achieved an accuracy of 91% for an object-based, SVM, multitemporal approach using bands 8, 2 and 3 of the May scene for segmentation and all principal components of the August scene for classiﬁcation. The transferability of the model was tested for the Freisinger forest and showed similar results. This project points out that Sentinel-2 had only marginally worse results than comparable commercial high-resolution satellite sensors and is well-suited for forest analysis on a tree-stand level.


Introduction
Analysis and classification of tree species and tree species groups has a long history in the field of remote sensing. Recently, climate discussions have become more prominent and forests are among the ecosystems most affected by climate change [1]. Therefore, an understanding of forest dynamics and quantitative methods to assess climatic impacts on species distributions are of great relevance. Further interest in analyzing forest structures is driven by users of environmental monitoring, spatial planning enforcement or ecosystem-oriented natural resources management systems [2,3]. There is a wide range of studies [4][5][6][7][8][9] focusing on the usage of high or very high-resolution images (e.g., WorldView-2, RapidEye satellite imagery data), but these datasets are often expensive and not freely available. In recent years, deca-metric-resolution imagery (e.g., Landsat) was easier accessible and often cost-free for a broad majority of users, leading to many research projects. With the launch of the Sentinel-2 series in 2015 (Sentinel-2A) [10][11][12], a new mission of free and open satellite data with the main objective in land monitoring, new possibilities for research came into existence. The 180 • phased twin-satellite constellation was completed with the launch of Sentinel-2B in March 2017. In the context of forest analysis, the Sentinel-2 mission is very important due to 10 m spatial resolution bands in the visible and the near infrared region (VNIR) as well as four bands (5,6,7, 8a) of 20 m resolution in the red-edge region of the electromagnetic spectrum and two bands (11,12) of 20 m resolution in the shortwave infrared (SWIR) [13]. The red-edge region is especially interesting, as it is well known for vegetation analysis and Sentinel-2 offers more bands in this spectral range than comparable satellite missions like the Landsat series. Fassnacht et al. [14] highlights that the respective visible to shortwave infrared Sentinel-2 wavelength regions mainly cover with absorption features of plant pigments and water, making it an ideal sensor for the analysis of vegetation characteristic. Thus, Sentinel-2 can set new standards for vegetation analysis with deca-metric-resolution imagery for areas that do not have a high complexity at a small scale. As the twin constellation of Sentinel-2 is phased at 180 • in the same orbit, a high temporal revisit frequency of 5 days facilitates change detection analysis. It also has the advantage that many different scenes are available for one area of interest, which allows access to cloud-free data.
There exist only few studies using Sentinel-2 data for forest analysis as of now: Immitzer et al. [15] used Sentinel-2 data for their research at two forest sites as well as cropland in Bavaria (Germany). They tested object-based image analysis (OBIA) and pixel-based (PB) methods, with the conclusion, that for their study area OBIA achieved slightly better overall accuracies (66.2% accuracy vs 63.5% accuracy) for the forest classification of seven tree groups. They mentioned the red edge part of the spectrum and the shortwave infrared region as highly important for their study. Ng et al. [16] evaluated vegetation indices for the classification of vegetation using Sentinel-2 and Pléiades data. They mapped Prosopis (negative impacts on biodiversity) and Vachellia (positive biodiversity effects as a natural resource) trees as the Prosopis tree invaded Kenya strongly in recent years. In their classification, they tested the Random Forest classification algorithm (RF) with the conclusion, that the blue, green and near-infrared bands of Sentinel-2 were important for classification purposes. Compared to the commercial Pléiades data with a 2 m spatial resolution of the multispectral bands, Sentinel-2 data showed similar results in their study. First positive results with Sentinel-2 data were also reported by Hawrylo et al. [17] in a study about forest defoliation. They tested Random Forest (RF) and Support Vector Machine (SVM) algorithms to investigate the defoliation of Scots pines in Poland with the conclusion that Sentinel-2 data were well-suited for the purpose. Immitzer et al. [18] also focused on the comparison of deca-metric-and deci-centi-metric-resolution data by comparing Landsat data with WorldView2 data. The authors achieved good classification results for both satellite data types, but stated that the time of the image collection and the acquisition parameters had significant impacts on the results. Further examples for studies using Sentinel-2 for vegetation monitoring and/or mapping include studies by Addabbo et al. [19], Puletti et al. [20] and Sothe et al. [21]. Puletti et al. [19] focused on Mediterranean environments with good results for forest group classification while Sothe et al. [21] compared Sentinel-2 and Landsat-8 data for classifying successional forest stages in Brazil.
In our study we build on these previous studies with a focus on machine learning, but also on providing a workflow that is as simple as possible and can be scaled for processing large datasets. This latter focus was born out of an interest of State Agencies and Industry to have workflows that can be applied on other, larger datasets. We focused on OBIA versus PB methods using RF and SVM classifiers implemented as so-called raster functions (tools that are optimized for distributed and dynamic processing) in the ArcGIS Platform technology.

Materials and Methods
The overall approach of this study is shown in Figure 1. All aspects of the workflow will be described in the following sections.

Study Area
We selected two forests in Germany where forest inventory data and orthophotos were available for training and validation purposes. Figure 2 shows the project areas, the Ebersberger and the Freisinger Forest. Both regions exhibit almost complete coverage of trees with only few glades, as well as patterns of forest management with rather homogeneous patches of spruce, oak and beech trees as dominant species. The occurrence of similar tree types within patches makes it perfect for the analysis with deca-metric-resolution imagery like Sentinel-2. The characteristics of the test sites are described in detail by Immitzer et al. [15]. According to the Bavarian State Forest Enterprise [22], the annual temperature for the Ebersberger Forest is around 7.6° and the average annual precipitation is 850-950 mm with around 500 mm between April and September. It is one of Germanys largest connected forests with an extent of 76.3 km 2 . The Freisinger Forest's size equals to nearly 9 km 2 and was used for a transferability study. The Ebersberger Forest is one out of 10 state forest areas that are controlled by the forest enterprise Wasserburg. In 2015, the tillering ratio consisted of two-third coniferous trees and one-third broad-leaved tree types. Overall, spruce has a dominance of 56%, followed by beech with 11%. Pines reached 9%, whereas fir, larch and Douglas fir form a total of 1-2%. Oak trees have a 4% proportion of all trees species in these forest areas.

Study Area
We selected two forests in Germany where forest inventory data and orthophotos were available for training and validation purposes. Figure 2 shows the project areas, the Ebersberger and the Freisinger Forest. Both regions exhibit almost complete coverage of trees with only few glades, as well as patterns of forest management with rather homogeneous patches of spruce, oak and beech trees as dominant species. The occurrence of similar tree types within patches makes it perfect for the analysis with deca-metric-resolution imagery like Sentinel-2. The characteristics of the test sites are described in detail by Immitzer et al. [15]. According to the Bavarian State Forest Enterprise [22], the annual temperature for the Ebersberger Forest is around 7.6 • and the average annual precipitation is 850-950 mm with around 500 mm between April and September. It is one of Germanys largest connected forests with an extent of 76.3 km 2 . The Freisinger Forest's size equals to nearly 9 km 2 and was used for a transferability study. The Ebersberger Forest is one out of 10 state forest areas that are controlled by the forest enterprise Wasserburg. In 2015, the tillering ratio consisted of two-third coniferous trees and one-third broad-leaved tree types. Overall, spruce has a dominance of 56%, followed by beech with 11%. Pines reached 9%, whereas fir, larch and Douglas fir form a total of 1-2%. Oak trees have a 4% proportion of all trees species in these forest areas. . The true color image is from the Sentinel-2 scene in September after atmospheric correction. The areas used to generate the training and testing points during the classification process are also shown. Note that not all areas used in the first step delineating coniferous vs. broadleaved trees could be used on a species basis in the next step. Thus, the bigger symbols for "Deciduous Forest" beneath single tree species.

Figure 2.
Map showing the two project regions (Ebersberger Forest and Freisinger forest). The true color image is from the Sentinel-2 scene in September after atmospheric correction. The areas used to generate the training and testing points during the classification process are also shown. Note that not all areas used in the first step delineating coniferous vs. broadleaved trees could be used on a species basis in the next step. Thus, the bigger symbols for "Deciduous Forest" beneath single tree species.

Data
In order to evaluate the potential of the Sentinel-2 data in an optimal scenario, several scenes were selected according to the criteria of minimum cloud cover for several stages of the vegetation period. Furthermore, data was selected for a year that was close to the data of the collection of the inventory data. Specifications of the datasets are given in Table 1 Open Access Hub. Three different dates over the year were chosen for multitemporal analysis. The product level of all files is L1C, i.e., calibration of data to top of atmosphere (ToA) reflectance.
Sentinel-2 has 13 bands with three different spatial resolutions and different bandwidth ( Figure 3). There are four bands with a 10 m resolution, six bands with a 20 m resolution and three bands with 60 m resolution. It has a swath of 290 km and allows for an ideal revisit time of five days at the equator.
In order to evaluate the potential of the Sentinel-2 data in an optimal scenario, several scenes were selected according to the criteria of minimum cloud cover for several stages of the vegetation period. Furthermore, data was selected for a year that was close to the data of the collection of the inventory data. Specifications of the datasets are given in Table 1. Data was downloaded from the Copernicus Open Access Hub. Three different dates over the year were chosen for multitemporal analysis. The product level of all files is L1C, i.e., calibration of data to top of atmosphere (ToA) reflectance.
Sentinel-2 has 13 bands with three different spatial resolutions and different bandwidth ( Figure  3). There are four bands with a 10 m resolution, six bands with a 20 m resolution and three bands with 60 m resolution. It has a swath of 290 km and allows for an ideal revisit time of five days at the equator.  For training and validation, we used inventory data and orthophotos provided by the Bayerischen Staatsforsten (acquired during a regular forest inventory by the Bavarian State Forest Enterprise). The inventory data contains actual data from 2014 to 2016 with information about the percentage of tree species. The circular plot size is 500 m² (radius: 12.62 m) in the Freisinger forest and 400 m² (radius = 11.28 m) in the Ebersberger forest. In total, eight different tree groups are included in the inventory data ( Table 2). Spruce is the most abundant coniferous tree type with only minor Pine, Larch or Fir trees. Oak and Beech dominate the broad-leaved trees and all other minor trees are summarized within a mixed class. We filtered all data to obtain only pure inventory circles  For training and validation, we used inventory data and orthophotos provided by the Bayerischen Staatsforsten (acquired during a regular forest inventory by the Bavarian State Forest Enterprise). The inventory data contains actual data from 2014 to 2016 with information about the percentage of tree species. The circular plot size is 500 m 2 (radius: 12.62 m) in the Freisinger forest and 400 m 2 (radius = 11.28 m) in the Ebersberger forest. In total, eight different tree groups are included in the inventory data (Table 2). Spruce is the most abundant coniferous tree type with only minor Pine, Larch or Fir trees. Oak and Beech dominate the broad-leaved trees and all other minor trees are summarized within a mixed class. We filtered all data to obtain only pure inventory circles that contain only one species in order to derive the training and testing data for the classification workflow ( Table 2). In addition to the inventory data, orthophotos from the Bavarian Administration of Surveying with a 20-cm spatial resolution were used for visual validation of the training and testing samples. Two types of orthophotos were used, RGB, as well as color infrared (CIR) orthophotos. This guarantees the quality of the input data which is crucially important for the performance of machine-learning algorithms

Data Pre-Processing
As Sentinel-2 Level 1C data is provided in top of atmosphere reflectance, we used the Sen2Cor processor to derive surface reflectance data (BoA Bottom of Atmosphere). Sen2Cor is a processor for Sentinel-2 Level 2A product generation and formatting; it performs the atmospheric-, terrain (optionally and not used for this study of flat terrain) and cirrus correction of the Top-Of-Atmosphere Level 1C input data [23].
According to Hadjimitsis et al. [24], this is particularly important for vegetation analysis, as atmospheric interaction is stronger when the target surfaces consist of non-bright objects, as is the case for vegetation. Furthermore, in order to evaluate spectral properties of tree species, correction algorithms that lead to a material-dependent signature are highly important. The usage of non-atmospheric corrected images can increase the uncertainty up to 10% [24].
In Figure 2 the corrected Sentinel-2 scene of the Ebersberger and Freisinger Forest is shown for 29 September 2016 as an example for the good data quality. The image is clear and we observe no effect of clouds. The other images have a similar quality after correction.

Classification and Accuracy Assessment
The main goals of our study were to (1) evaluate Sentinel-2 data with its particular spectral bands but also multitemporal characteristics, to (2) use and compare different machine-learning algorithms and (3) to design a semi-automated workflow that can be scaled and is optimized for processing big datasets. For this purpose, we used a hierarchical classification approach ( Figure 1) and tested different band and scene combinations for PB and OBIA classifications. In a first step, we tested 14 combinations to differentiate between coniferous and broad-leaved tree types. Based on the best results acquired, we then tested 54 different approaches to distinguish between oak, beech and other broad-leaved trees within the broad-leaved subclass. Coniferous trees were not classified further as the proportions of tree species other than spruce are minor and not enough training samples were available.
The ML algorithms used are an SVM classifier with a radial basis kernel function and a RF algorithm that are implemented in ArcGIS Pro as so-called raster functions. These functions, in comparison to standard geoprocessing tools, work in a dynamical way (on-the-fly) and are well-suited for distributed processing on an Image Server. This is a requirement if we envisage industrial use of the tools on large datasets.
According to Pal and Mather [25], the RF classifier (also see Breiman [26] und Pal [27]) consists of a combination of tree classifiers where each classifier is generated using a random vector sampled Remote Sens. 2018, 10, 1419 7 of 21 independently from the input vector. It belongs to the ensemble methods of supervised learning and reduces overfitting effects and has become quite popular within the remote sensing community (cf. Belgiu and Drăgut [28]). The maximum number of trees we eventually used was 50 and the tree depth was set to 30. As these parameters also influence classification results, we used the same parameters that we found to work well for all band combinations and tests performed in this study.
SVMs, described by Vapnik [29], are based on the principle of Support Vector Classifiers, a linear classifier. For non-linear life phenomena, other SVMs were developed that use different kernel functions such as the radial basis function used in this study to solve non-linear problems. These kernels enlarge the feature space by using different kernel functions with e.g., polynomial kernel degrees or a radial kernel [30]. The radial kernel separates two classes if the Euclidean distance is large between a test observation x* and a training observation xi. Training observations that are far from x* will play no role for the prediction of the class label x*, if the kernel K is very tiny in large Euclidean distances. Kernel functions are used to separate non-linearly separable support vectors using a linear hyperplane [31]. Using kernels instead of increasing the feature space using functions of the original features, the computational advantage increases [30]. Also, for distinct pairs i, i´, it is only necessary to compute K for the training observations xi and xi´without working in the enlarged feature space which is explicitly meaningful when the enlarged feature space would be so large that computations are unpredictable. We tested both algorithm in an OBIA and a PB approach for comparison.
For the object-based approaches, we used the mean shift algorithm [32] as segmentation algorithm that replaces each discrete point with a finitely bounded continuous kernel of density, and then groups points, according to a global density estimator [33]. The kernel can be manually adapted by the user. Spectral detail can be set in a range between 1.0 and 20.0, with a higher value being appropriate for features that should be classified separately but have somewhat similar spectral characteristics. Another parameter is the spatial detail setting. Small values produce smooth outcomes between clustered areas whereas higher values are more appropriate when small objects are observed and should be matched together. As we are dealing with small objects, high values were chosen for both parameters. The spectral detail value found to work well in this study was 15.5, the spatial detail value 15.
The first classification step consisted in separating broad-leaved and coniferous trees. Spectral profiles were analyzed with respect to absorption features and variability between the respective classes. Mean class spectra are shown in Figure 4. Significant differences in the reflectance values can be recognized in the red edge portion and the infrared region of the spectral signatures (most pronounced differences in bands 7-9). Figure 5 shows a multitemporal spectral signature (May, August and September) and it is obvious that differences are strongest in May, the flowering period of broad-leaved trees [34].
Based on the spectral characteristics, different band-combinations were tested on the seasonal input images. Furthermore, OBIA vs. PB performance was tested and all 14 runs are summarized in Table 3. 25 training regions and 10 independent test areas for accuracy assessment were defined based on the preprocessed inventory data. From these areas, samples were extracted by collecting the spectral signatures of each pixel contained in the circles. As only pixels completely within the circles were considered, the number of pixels per area training/test varies slightly with four samples per area on average. The confusion matrix for the first classification step is highlighted in the results section (Table 4).    Based on the first classification step, a polygon mask for the broad-leaved trees was used to extract the area for the second step of the classification. Analogous to the initial classification step, 54 different classification settings were tested and are summarized in the results section (Table 5). In contrast to the previous step, spectral similarities are much higher between broad-leaved tree species (compare Figure 6). Thus, a multitemporal approach as well as dimension reduction using Principal Component Analysis (PCA) was used to improve results. The main goal of using principal components is reducing noise and extracting information from data. The first principal component captures the largest possible variance within a dataset while further PCs have less contribution [35]. PCA was computed in ArcGIS Pro for each input dataset. Furthermore, a vegetation index (NDVI) was included in the classification.
For the object-based approach, we first evaluated 30 different segmentation settings to generate an optimal segmented image as input for the ML algorithms. In total, we tested 54 different classification settings that are summarized in Table 5. This approach allowed to first evaluate segmentation settings and then to optimize classification by comparing the two ML algorithms in an OBIA and PB approach on different bands, with or without additional variables such as PCs, NDVI. A total of 15 training areas and five reference areas for the creation of training and test samples were available for this second step. Samples were created as described in the previous step.  Different classification settings were chosen based on the spectral information but also the spatial resolution of the different bands (10 m vs. 20 m). For segmentation, for example, higher spatial resolution is important for better detail. Different multitemporal approaches were tested with a combination of the 20 m resolution red edge bands (bands seven, six and five). A variation of different segmentation parameters was applied to the 22 May scene (with the highest spectral differences) to test effects of varying parameters. The best segmentation result (see Table 3 part 1) was used as segmented image for the OBIA for both ML algorithms and compared to the PB classification using the same bands, PCs and indices. In total, 54 classifications were conducted and compared to optimize the separation of broad-leaved tree species (oak, beech and other deciduous trees).
For accuracy assessment of all classification approaches, samples derived from the inventory areas (compare Table 2 and Figure 1) were randomly split into training and test samples. The center of all pixels contained in the inventory circles were considered a sample. As metrics we calculated the confusion matrices with User's and Producer's accuracies, Kappa values and overall accuracies. Based on these metrics, all approaches were evaluated. We did not perform any additional statistical testing whether the performance of the different ML algorithms is significantly different as the performance of these algorithms depends a lot on sampling and the tuning of hyperparameters. In addition to this validation, we tested the overall workflow on a different area to better evaluate its performance and transferability: In this final step, a semi-automated workflow template was created and applied to the second study area, the Freisinger Forest, and the same metrics for accuracy assessment were calculated.

Results
Results for the first classification step, the separation of broad-leaved and coniferous trees, are summarized in Table 3. A total of 14 different classification methods were tested and, except for one accuracy of only 74%, all other accuracies were above 80%, eight of them better than 90%. Comparing the results from OBIA and PB approaches, we do not find performance differences. However, a comparison of performance between the two machine-learning algorithms shows slightly better results for SVM than for the RF classifier. The confusion matrix for this classification step is presented in Table 4.
The best overall accuracy of 97% was reached in an object-based approach using the SVM classifier on only band 8 of each temporal image (22 May 2016, 9 August 2016, 29 September 2016. Similar results are also achieved by using bands 8, 4 and 3 of the spring scenes, the 10 m spatial resolution bands. Three spectral bands suffice to get high classification accuracies and using more bands does not result in better results. This is due to significant differences in the blooming and autumn period between broad-leaved and coniferous trees in band 8 and the higher spatial resolution of band 8 (and bands 3 and 4) in comparison to other spectral bands. A combination of the infrared band with visible bands (2, 3 and 4) as well as a combination with the red edge region bands (in this case 6 and 7) did not result in an accuracy as high as only using the infrared band 8. Combining two red edge bands (6 and 7) with the infrared band also resulted in a high accuracy of over 95%. These results show, that a very simple and therefore reproducible approach, is ideal for achieving very good classification results for coniferous vs. broad-leaved trees. There is no need for including many bands and therefore many variables that might even reduce accuracies. Figure 7 shows the final classification result for the separation of coniferous and broad-leaved trees.
Remote Sens. 2018, 10, x FOR PEER REVIEW 12 of 21 autumn period between broad-leaved and coniferous trees in band 8 and the higher spatial resolution of band 8 (and bands 3 and 4) in comparison to other spectral bands. A combination of the infrared band with visible bands (2, 3 and 4) as well as a combination with the red edge region bands (in this case 6 and 7) did not result in an accuracy as high as only using the infrared band 8. Combining two red edge bands (6 and 7) with the infrared band also resulted in a high accuracy of over 95%. These results show, that a very simple and therefore reproducible approach, is ideal for achieving very good classification results for coniferous vs. broad-leaved trees. There is no need for including many bands and therefore many variables that might even reduce accuracies.). Figure 7 shows the final classification result for the separation of coniferous and broad-leaved trees. Furthermore, areas for training and testing samples are shown that were randomly split. The result for this first step of our hierarchical classification is shown in Figure 7 and the confusion matrix [36] that was calculated on 28 independent test samples derived from the inventory circles is shown in Table 4.
Results from the second classification step separating single tree species (oak, beech, other broad-leaved trees) within the broad-leaved class are summarized in Table 5 and shown in Figure 8. For the OBIA approach, the evaluation of changing segmentation parameters and bands showed, that best results were achieved by using the May 22 scene with a band combination of the infrared band (8) and the green (3) and blue (2) band and a maximum number of pixels per segment of five and high values for spectral and spatial detail (15.5 and 15 respectively). The attributes for segmentation were mean digital value and active chromaticity color. Further parameters such as compactness or rectangularity did not result in better segmentation and therefore this segmentation was used for all OBIA classification approaches. Furthermore, areas for training and testing samples are shown that were randomly split. The result for this first step of our hierarchical classification is shown in Figure 7 and the confusion matrix [36] that was calculated on 28 independent test samples derived from the inventory circles is shown in Table 4.
Results from the second classification step separating single tree species (oak, beech, other broad-leaved trees) within the broad-leaved class are summarized in Table 5 and shown in Figure 8. For the OBIA approach, the evaluation of changing segmentation parameters and bands showed, that best results were achieved by using the May 22 scene with a band combination of the infrared band (8) and the green (3) and blue (2) band and a maximum number of pixels per segment of five and high values for spectral and spatial detail (15.5 and 15 respectively). The attributes for segmentation were mean digital value and active chromaticity color. Further parameters such as compactness or rectangularity did not result in better segmentation and therefore this segmentation was used for all OBIA classification approaches. Remote Sens. 2018, 10, x FOR PEER REVIEW 13 of 21 In contrast to the classification results for only coniferous vs. broad-leaved trees, the use of principal components 1-12 of the August scene achieved the highest overall accuracy of 91%. The use of principal components increased the accuracy by 2% compared to the untransformed August scene. The use of only the first five principal components for the same scene resulted in a similar accuracy of 90.5%, highlighting, that the original Sentinel-2 image contains a lot of "noise" and/or redundant information. This is also indicated by the eigenvalues and factor loadings of the PCs. For all three times of the year, the eigenvalues of the first PC range between 87 and more than 90%. Most information of the Sentinel-2 image can thus be represented in only a few principal components. For all PCAs, band 8 contributes most to the first PC, highlighting its importance for classification. The statistics for the PCAs can be found as additional online material.
In general, we found that OBIA methods outperformed PB methods, especially for the combination of the infrared band with the green and blue bands. Furthermore, even by including indices such as the NDVI or PCs, the PB approach had slightly lower accuracies than the OBIA results. However, this is probably due to the patchy nature of the forest structure and the overall setting with endmembers being rather similar. PB methods perform best if there are enough sample points for all classes that do not show too much colinearity.
From the ML perspective, slightly better results were acquired using the SVM. From all settings tested, 18 classifications resulted in accuracies higher than 80%, all of them using a SVM classifier and the OBIA approach. The final classification result for this step is displayed in Figure 8. Table 6 shows the confusion matrix for the best result based on 19 independent test samples. Beech trees reach a 94% UA and a 79% PA, whereas oak trees reach a 100% accuracy for UA and PA. Other broad-leaved trees are represented with an 81% UA and a 94% PA. The Kappa value is 0.87 and indicates a very good classification result ( [37] recommendation for a detailed discussion about the suitability of the Kappa value).
The final classification result was achieved by merging the tree species classes with the coniferous forest class. The result is shown in Figure 9 and includes all four tree species groups (Beech, Oak, Other broad-leaved trees and Coniferous trees). We performed another validation on In contrast to the classification results for only coniferous vs. broad-leaved trees, the use of principal components 1-12 of the August scene achieved the highest overall accuracy of 91%. The use of principal components increased the accuracy by 2% compared to the untransformed August scene. The use of only the first five principal components for the same scene resulted in a similar accuracy of 90.5%, highlighting, that the original Sentinel-2 image contains a lot of "noise" and/or redundant information. This is also indicated by the eigenvalues and factor loadings of the PCs. For all three times of the year, the eigenvalues of the first PC range between 87 and more than 90%. Most information of the Sentinel-2 image can thus be represented in only a few principal components. For all PCAs, band 8 contributes most to the first PC, highlighting its importance for classification. The statistics for the PCAs can be found as additional online material.
In general, we found that OBIA methods outperformed PB methods, especially for the combination of the infrared band with the green and blue bands. Furthermore, even by including indices such as the NDVI or PCs, the PB approach had slightly lower accuracies than the OBIA results. However, this is probably due to the patchy nature of the forest structure and the overall setting with endmembers being rather similar. PB methods perform best if there are enough sample points for all classes that do not show too much colinearity.
From the ML perspective, slightly better results were acquired using the SVM. From all settings tested, 18 classifications resulted in accuracies higher than 80%, all of them using a SVM classifier and the OBIA approach. The final classification result for this step is displayed in Figure 8. Table 6 shows the confusion matrix for the best result based on 19 independent test samples. Beech trees reach a 94% UA and a 79% PA, whereas oak trees reach a 100% accuracy for UA and PA. Other broad-leaved trees are represented with an 81% UA and a 94% PA. The Kappa value is 0.87 and indicates a very good classification result ( [37] recommendation for a detailed discussion about the suitability of the Kappa value).
The final classification result was achieved by merging the tree species classes with the coniferous forest class. The result is shown in Figure 9 and includes all four tree species groups (Beech, Oak, Other broad-leaved trees and Coniferous trees). We performed another validation on the whole dataset (Table 7) with independent test samples from the inventory dataset (samples which were not used before).  (Figure 9) with an independent test set.  (Table 7) with independent test samples from the inventory dataset (samples which were not used before).  Table 5.  Table 5).  Table 5).

Class Name Beech Trees Oak Trees Other Broad-Leaved Total UserAccuracy Kappa
The overall accuracy is 88% with a Kappa value of 0.83. For beech trees, the UA is 94%, the PA only 71% with two samples being assigned to the coniferous forest and 4 samples to the general broad-leaved category. Oak trees have a 100% accuracy in both fields but the mixed class "Other Broad-Leaved" has a UA of 81% and a PA of 81% with some misclassification of the beech class and coniferous class and from the beech class. Coniferous Forest has a 100% PA and an 80% UA with misclassification from beech trees and other broad-leaved trees. The very high PA and UA values, however, have to be interpreted carefully as they are dependent of the sampling method. By using the available forest inventory data and information from some fieldwork, the sampling and therefore validation depends on the distribution of these samples and is not completely random as would be an ideal case for classification validation.

Transferability Study to the Freisinger Forest
Transferability is a crucial topic in image classification if automation and applicability of an approach should be guaranteed with a certain workflow. As one of our goals was to provide tools capable of scaling and distributed processing, we performed a transferability study of the final classification workflow to the test area, the Freisinger Forest. The same parameters as for the Ebersberger Forest were used. Table 8 shows the accuracies for the final classification of the Freisinger Forest. The overall accuracy is 85% with a Kappa value of 79. Beech trees have an UA of 56% and a PA of 71%. Oak trees reach a 100% UA and a 79% PA. Other broad-leaved trees are classified with an accuracy of 75% for the UA and 86% for the PA. The coniferous tree group achieves a 100% accuracy for both, the UA and the PA. Results are slightly worse than for the Ebersberger forest but still very good. More misclassification within the broad-leaved classes occurs and is probably due to the spectral similarity of the respective classes.  Figure 10).   Table 8).

Class Name Beech Trees
The transferability study was based on a semi-automatic workflow created in ArcGIS Pro in model builder and as a chain of raster functions for distributed processing. It is a ready-to-use toolset, which can be applied to new datasets ( Figure 11). Additionally, as described in the validation section, we created a mobile application using Collector for ArcGIS to validate some misclassified samples in the field. The mobile application synchronizes with the classification results as well as a predefined layer to collect data in the field that allows attaching photos. The app was especially useful to collect data for misclassified samples in the field but can also be used to update inventory data. One example of misclassification is shown  Table 8).
The transferability study was based on a semi-automatic workflow created in ArcGIS Pro in model builder and as a chain of raster functions for distributed processing. It is a ready-to-use toolset, which can be applied to new datasets ( Figure 11).  Table 8).
The transferability study was based on a semi-automatic workflow created in ArcGIS Pro in model builder and as a chain of raster functions for distributed processing. It is a ready-to-use toolset, which can be applied to new datasets ( Figure 11). Additionally, as described in the validation section, we created a mobile application using Collector for ArcGIS to validate some misclassified samples in the field. The mobile application synchronizes with the classification results as well as a predefined layer to collect data in the field that allows attaching photos. The app was especially useful to collect data for misclassified samples in the field but can also be used to update inventory data. One example of misclassification is shown Additionally, as described in the validation section, we created a mobile application using Collector for ArcGIS to validate some misclassified samples in the field. The mobile application synchronizes with the classification results as well as a predefined layer to collect data in the field that allows attaching photos. The app was especially useful to collect data for misclassified samples in the field but can also be used to update inventory data. One example of misclassification is shown in Figure 12 where an inventory data point of the class coniferous was wrongly classified as "Other Broad-leaved Trees". When rechecking the point, the error is directly visible. There are indeed several coniferous trees, but also the occurrence of maple trees and other broadleaf trees in the undergrowth.
Remote Sens. 2018, 10, x FOR PEER REVIEW 17 of 21 in Figure 12 where an inventory data point of the class coniferous was wrongly classified as "Other Broad-leaved Trees". When rechecking the point, the error is directly visible. There are indeed several coniferous trees, but also the occurrence of maple trees and other broadleaf trees in the undergrowth.

Discussion
Results of this study indicate a high potential of Sentinel-2 data for forest classification using a hierarchical semi-automatic workflow. For the classification of coniferous and broad-leaved forest types a very simple combination of only three bands sufficed to obtain very good accuracies. Crucial bands were the red edge bands 6 and 7 and the infrared band 8. The best accuracy of 97% was obtained by either combining only the infrared bands 8 of all three months and not including any further bands or bands 8, 4 and 3 of the spring scenes. This result agrees with findings of Puletti et al. [20] who found an overall accuracy of 86.2% using a RF classifier for coniferous vs. broad-leaved forest classification in the Mediterranean summer with bands 2, 8 and 7 being of most importance. Band 8 also shows highest factor loadings for the first and most important PC calculated from the images.
The coarser spatial resolution of the red edge bands (20 m) in comparison to the 10 m resolution of bands 8, and 2-4 seems to be a great disadvantage for classification. It highlights, that even for rather patchy forests, besides spectral differences, spatial resolution is critical for the overall results.
The classification of broad-leaved tree species gave results of 91% overall accuracy for the three classes beech, oak and other broad-leaved trees. This high accuracy was achieved in an object-based approach and an SVM classifier. This agrees with findings from Sothe et al. [21] and others (e.g., Ma et al. [38], Maldonado and Weber [39]) who also found that SVM learning schemes show good performance in OBIA. The segmentation was based on a mean shift algorithm using band 8 together with the green and blue bands of the May scene and therefore the flowering period of the respective trees. The SVM classifier included the August scene with 12 PCs in addition to the segmented image. The usage of principle components increased the accuracy by 1.8%, whereas including the NDVI decreased the outcome accuracy to 80.8%. Our results confirm the relevance of the NIR region that was proposed earlier by Fassnacht et al. [14]. The authors also explain the high relevance of the visible bands by the absorption of photosynthetic pigments of chlorophyll a and b. However, their analysis showed that it is not enough to only use the RGB bands. Instead, a combination of infrared and visible bands leads to high accuracies for broad-leaved tree classifications with Sentinel-2 imagery. These

Discussion
Results of this study indicate a high potential of Sentinel-2 data for forest classification using a hierarchical semi-automatic workflow. For the classification of coniferous and broad-leaved forest types a very simple combination of only three bands sufficed to obtain very good accuracies. Crucial bands were the red edge bands 6 and 7 and the infrared band 8. The best accuracy of 97% was obtained by either combining only the infrared bands 8 of all three months and not including any further bands or bands 8, 4 and 3 of the spring scenes. This result agrees with findings of Puletti et al. [20] who found an overall accuracy of 86.2% using a RF classifier for coniferous vs. broad-leaved forest classification in the Mediterranean summer with bands 2, 8 and 7 being of most importance. Band 8 also shows highest factor loadings for the first and most important PC calculated from the images.
The coarser spatial resolution of the red edge bands (20 m) in comparison to the 10 m resolution of bands 8, and 2-4 seems to be a great disadvantage for classification. It highlights, that even for rather patchy forests, besides spectral differences, spatial resolution is critical for the overall results.
The classification of broad-leaved tree species gave results of 91% overall accuracy for the three classes beech, oak and other broad-leaved trees. This high accuracy was achieved in an object-based approach and an SVM classifier. This agrees with findings from Sothe et al. [21] and others (e.g., Ma et al. [38], Maldonado and Weber [39]) who also found that SVM learning schemes show good performance in OBIA. The segmentation was based on a mean shift algorithm using band 8 together with the green and blue bands of the May scene and therefore the flowering period of the respective trees. The SVM classifier included the August scene with 12 PCs in addition to the segmented image. The usage of principle components increased the accuracy by 1.8%, whereas including the NDVI decreased the outcome accuracy to 80.8%. Our results confirm the relevance of the NIR region that was proposed earlier by Fassnacht et al. [14]. The authors also explain the high relevance of the visible bands by the absorption of photosynthetic pigments of chlorophyll a and b. However, their analysis showed that it is not enough to only use the RGB bands. Instead, a combination of infrared and visible bands leads to high accuracies for broad-leaved tree classifications with Sentinel-2 imagery. These results could be confirmed and extended by the inclusion of PCs that were capable of increasing accuracies by several percent.
The final classification result of this study with an overall accuracy of 88% and 85% does not reach the 96% overall accuracy reported for high-resolution imagery like World-View-2 [40] but is based on cost-free data. However, whereas Wulder et al. [41] encountered difficulties in mapping tree species based on deca-metric-resolution imagery, our results indicate that Sentinel-2 is at least capable of providing good results for forests with a less complex structure and intermingling of many species. Wulder et al. also refers to Landsat data that has a 30 m spatial resolution compared to the 10 m and 20 m resolution of the Sentinel-2 sensors. Thus, depending on the area, forest complexity and needed accuracy, data has to be chosen adequately.
With respect to the different ML algorithms, we found that the SVM classifier performed slightly better than the RF classifier. The good performance of SVM classifiers was also found for other spectral classification studies from other fields and is confirmed in other studies [42][43][44]. However, there are several reasons for performance differences of ML algorithms: First of all, there is a dependence on the training data (e.g., quality, number of training samples per class, characteristics of the ground-truth . . . ). If RF, for example, is used with unbalanced training data, it tends to focus on the prediction accuracy of prevailing classes which might lead to lower accuracies in less represented classes [21]. Furthermore, the tuning of hyperparameters has an impact on classification results and depends on the data used. The slightly better performance of SVM in our study might be due to the specific datasets as was also found for other studies [45][46][47]. Thus, with respect to only small performance differences, both algorithms are well-suited for the purpose of landcover classification in general and forest classification in particular as in this study.
Compared to commercial and often expensive very high-resolution data, results using Sentinel-2 are very good, even though the spatial resolution is much lower. The high spectral resolution in the red edge area of Sentinel-2, however, is an advantage compared to other platforms such as Landsat and was tested extensively in this study. Our results showed, that the red edge part shows clearly visible differences in the spectral profiles, but higher resolution 10 m bands still reached better accuracies especially band 8. The lower spatial resolution (20 m) for the red edge region (bands 5 to 7) somewhat diminishes its capability to correctly classify vegetation on a species or even tree-patch level. However, for achieving good accuracies on a single-tree level, high spatial resolution as well as hyperspectral data are more adequate. For example, Dalponte et al. [6], found that the fusion of very high geometrical resolution multi-/hyperspectral images and light detection and ranging (LiDAR) data can result in accuracies of up to 93% for macro classes using ML algorithms. Even tree species classification at individual tree crown level was shown to reach accuracies of 0.89 (kappa accuracy) for boreal forests using airborne hyperspectral data and LiDAR [48], making this kind of data more suitable for research on a single tree level, while Sentinel-2 data is more appropriate on a tree stands level in low-cost studies.
Besides optical sensors, there are also studies on active systems only. Liang et al. [49], for example, investigated the possibility to classify broad-leaved and coniferous tree types based on first and last pulses of a LiDAR methodology. They reached an overall accuracy of 89%, concluding that parameters like the branch structure influences the accuracy of the classification using LiDAR technology. Compared to our study with overall accuracies of more than 90%, we conclude that, depending on the study area and the scale, deca-metric-resolution optical sensors perform well for environments with low complexity compared to very high-resolution point-cloud data.
However, for future studies, the use of additional data such as point cloud or radar data to benefit from the advantages of both, active and passive systems might help to further improve results was also suggested by Stratoulias et al. [50]. Data fusion can help to overcome problems that arise with more complex forest structures where trees intermingle. Data that captures different elevation levels and internal structures and morphologies can provide valuable additional information.

Conclusions
Results of the present study indicate the high potential of Sentinel-2 data for applications in applied forestry and vegetation analysis despite the deca-metric spatial resolution. Our proposed workflow achieved overall classification accuracies of 88% and 85% for the study area and in a transferability study respectively, indicating its robustness and potential for scaling to a larger level. The design and backend technology allow the use on large datasets due to the concept of raster functions that are capable of on-the-fly and distributed processing.
A comparison of SVM and RF classifiers using PB and OBIA hierarchical classification approaches showed, that OBIA approaches are better suited than PB approaches in this setting. The slightly better performance of the SVM classifier is probably due to the training data used, as discussed in the previous section. Along these lines, machine learning algorithms are strongly dependent of the quality of the training data (here: inventory data) as well as on the selected areas. Thus, future studies are needed to evaluate the workflow in other forest areas to assess the effect of different forest structures and other tree species. Testing of different band combinations for improving the classification results, highlighted the importance of band 8 in combination with the red edge bands as well as the other 10 m resolution bands of the spring and summer scenes. The red edge part shows clearly visible differences in the spectral profiles, but higher resolution 10 m bands were crucially important for good results. The lower spatial resolution (20 m) for the red edge region (bands 5 to 7) somewhat diminishes its capability to correctly classify vegetation on a species or even tree-patch level while using PCs of the summer scene together with the May scene shows best results for broad-leaved tree types.
We conclude that the proposed design is well-suited to be used on larger areas with a similar forest structure and allows a streamlined workflow for applied forestry by providing analysis results directly to mobile applications for validation and data collection in the field. We showed that Sentinel-2 data is a suitable, cost-free alternative to commercial satellite data with higher spatial resolution for classifying trees at a stand level using machine-learning algorithms.