Classification of High-Mountain Vegetation Communities within a Diverse Giant Mountains Ecosystem Using Airborne APEX Hyperspectral Imagery

Mapping plant communities is a difficult and time consuming endeavor. Methods relying on field surveys deliver high quality data but are usually limited to relatively small areas. In this paper we apply airborne hyperspectral data to vegetation mapping in remote and hard to reach areas. We classified 22 vegetation communities in the Giant Mountains on 3.12-m Airborne Prism Experiment (APEX) hyperspectral images, registered in 288 spectral bands (10 September 2012). As the classification algorithm, Support Vector Machines (SVM) was used. APEX data were corrected geometrically and atmospherically, and three dimensionality reduction methods were performed to select the best dataset. As reference we used a non-forest vegetation map containing vegetation communities of Polish Karkonosze National Park from 2002, orthophotomap and field surveys data from 2013 to 2014. We obtained the post-classification maps of 22 vegetation communities, lakes and areas without any vegetation. Iterative accuracy assessment repeated 100 times was used to obtain the most objective results for individual communities. The median value of overall accuracy (OA) was 84%. Fourteen out of twenty-four classes were classified of more than 80% of producer accuracy (PA) and sixteen out of twenty-four of user accuracy (UA). APEX data and SVM with the use of iterative accuracy assessment are useful for the mountain communities classification. This can support both Polish and Czech national parks management by giving the information about diversity of communities in the whole transboundary area, helping with identification especially in changing environment caused by humans.


Introduction
Mountain ecosystems are an important indicator of climate change [1] because, within a small altitude gradient, mountain vegetation goes from deciduous forests, through pine forests, dwarf pine thickets and grassland communities to tussock grassland communities including bryophytes and lichens [2].The distribution of mountain plant communities is influenced by height above sea level, exposure, slope, substrate, the degree of development and moisture of soil, and the length of growing season and snow retention [3].
Traditional vegetation mapping is most often done in the field by methods according to the Braun-Blanquet floristic approach and by interpreting aerial photographs [4].These methods are time consuming and require an investment of resources and labour.Studies and monitoring of high-mountain vegetation are made difficult by poor accessibility and the short vegetation period, and so remote sensing is becoming increasingly popular as a base for ecological analyses.Spectral characteristics reflect species' physiological states, and the anatomical and morphological structure of species, such as the content of photosynthetic dyes, water, cellulose, and the structural state of cells [5].Therefore, hyperspectral techniques are a great support for traditional vegetation mapping [6][7][8][9].The first vegetation classification of hyperspectral data was conducted by American researchers [10][11][12].In Europe, hyperspectral data were used for classification of land cover, wetlands and Natura 2000 sites [13][14][15], tree species [16][17][18][19] and, less commonly, of plant communities [6,20].There is still a lack of typical mountain vegetation mapping works using hyperspectral data [6,9,10], some studies have covered upland areas [13,14], and the vast majority describe lowland [7,11,12,15,16].
The algorithms based on the idea of a single spectral signature representing a given spectral class for an analysed image pixel [21] are used to classify, inter alia, plant species [20].In the case of plant communities consisting of different species, this would require the spectral characteristics of each species to be obtained.For high-resolution images, the object-oriented approach is useful, although proper image segmentation should be performed.This is complicated and often requires a more individual approach, limiting the automation desired in monitoring studies.With data of high spatial and spectral resolutions, detailed vegetation classification is successfully performed using non-parametric machine learning methods [22,23], which allow for high classification accuracies.
Comparative analyses of different hyperspectral data classification methods have evaluated Support Vector Machines (SVM) as one of the best [22][23][24][25]; also, in detailed vegetation analyses [8,26], moreover, it performs better in terms of speed and ability to correctly distinguish classes when trained on small training dataset [8] and a large number of bands [25].
Evaluation of classification reliability should exclude the randomness and subjectivity associated with the selection of training and validation samples and how they are divided.This is especially important for such diverse classes as a mosaic of plant communities comprising different species.This variety can be recorded through multiple repetitions of classification and validation process with a randomised change in the training dataset and the validation dataset each time [16,19] in order to objectivise the results.This approach in the literature is used for less numerous and more homogeneous classes as tree species, there is no such elaboration performed for numerous detailed vegetation communities as we studied here.
The Airborne Prism Experiment (APEX) sensor has been operational since 2010.Existing publications described the nature of the instrument [27], the processing of acquired images [28][29][30], or the environmental analysis, e.g., identification of: cover type in urban areas [31]; of tree species [18,19]; or of non-forest vegetation [9,[32][33][34].The Giant Mountains (Polish Karkonosze and Czech Krkonoše) are an area of high environmental value and are characterised by the greatest diversity of endemic species which are sensitive to environmental changes and should thus be permanently monitored.The vegetation is currently undergoing rapid changes and is in danger due to human induced environmental threats (pollution, increased human activity due to tourism, deforestation, etc.).
In the Czech part of the Giant Mountains, one-metre aerial images allowed seven vegetation classes to be distinguished with an overall accuracy (OA) of 81% [4].It was noted that vegetation consisting of similar species compositions was confused in classification, such as vegetation along trails, and herbaceous plants and grasses.It was pointed out that the large diversity of vegetation and the images' low spectral resolution had caused the confusion of classes.Nine and twelve vegetation classes above the tree line were classified on 30-m Landsat-8 satellite data, 2-m WorldView-2 data and 12.5-cm aerial orthophotomaps [35].The best OAs obtained were 83.6% for the nine classes, and 72.0% for the 12 detailed classes built by the most important grassland vegetation classes and other vegetation.The authors suggest that the usefulness of lower-spatial-resolution data are better for general identification of land cover, and for detailed analyses of vegetation the use of high-resolution data is suggested.
Classification of APEX data assumed that vegetation would be distinguishable at the level of communities, and these objects were also analyzed on HyMap data (Hyperspectral Mapper, 126 spectral bands in 400-2500 nm) [7]; additionally, 55 species were classified.The comparative analysis of results showed that classification of communities yields better results (about 90%) than a more detailed classification at the species level, although those accuracies were also high (about 80%).This is confirmed by the classification of 20 species of herbaceous vegetation on AISA EAGLE II (128 spectral bands in 400-1000 nm) [8], where OA was approximately 80%.
Previous studies of vegetation communities on Karkonosze [32,34] and vegetation types on Krkonoše [9] conducted separately for each country on APEX data presented smaller areas and the results were obtained only once, however for different datasets.In Polish Karkonosze, analyses were based on original 288 APEX bands [32] and original 252 and 30 Minimum Noise Fraction (MNF) bands [34].In the Czech area research, APEX data were used in original form (288 bands) and were reduced spectrally with Principal Component Analyses (PCA) [9].Here we used three dimensionality reduction methods to identify the best way to process APEX data: PCA, MNF and original bands selected with backward elimination approach.Additionally, we performed the sensitivity analysis of training dataset to classification accuracy.The SVM method resulting in high accuracies of classifications on diverse classes of non-forest vegetation [7][8][9]34] was selected as a proper tool to identify high-mountain plant communities in a repeatable way.In light of this, the aerial flight over the Giant Mountains using an APEX hyperspectral scanner to classify even 22 plant communities of the whole Polish-Czech transboundary area with SVM represents a novel approach to obtain valuable documentary and reference material for future studies.It is the first elaboration where numerous accuracies for such detailed and complex high-mountain vegetation classes were obtained.Noteworthy is the fact that this study also presents the first result of classification performed for both the Polish and Czech Giant Mountains vegetation with the same repeatable method.
The aim of the research is to present a methodology for mapping mountain plant communities in the Polish and Czech Giant Mountains and the study was designed to answer the following questions: 1.
Can APEX hyperspectral image data and SVM method be used for classification of high-mountain vegetation communities in both the Polish and Czech parts of Giant Mountains?2.
Does the iterative assessment of accuracy allow to distinguish communities in terms of the difficulty of identifying them? 3.
How does the preparation of the dataset and number of sample pixels affect the accuracy?4.
What is the consistency of the classification results with the reference data? 5.
How can the evaluated algorithm and the results be useful for national parks administration?

Study Area
We conducted the research in the Giant Mountains, protected by the creation of cross-border Polish and Czech national parks belonging to the UNESCO Man and Biosphere program (Figure 1).
The study area covers the whole high-mountain ecosystem (above 1200 m a.s.l.) of both countries characterised by their diversity of species and communities typical of alpine and subalpine zones (Figure 2) and it covers an area 34.5 km long and 7.6 km wide.The species' occurrences are mainly determined by geological, climatic, soil and water conditions [36].The Giant Mountains were formed late in the Alpine orogeny, which was followed by a period of erosion which resulted in rocky-walled postglacial corries (cirques), and characteristic corrie lakes, such as Mały Staw and Wielki Staw.In the ridge areas of the mountains the landscape resembles that of two regions, subarctic and high-mountain, and is referred to as "arctic-alpine tundra" [37].A local mountain climate has been created here with long, harsh winters, large temperature fluctuations and violent winds [38].The most widespread community in the subalpine zone is Pinetum mugo sudeticum thickets, whose physiognomy is largely determined by the dominant species Pinus mugo [36].Pado-Sorbetum thickets are dominated by the mountain sub-species of rowanberry, Sorbus aucuparia subsp.glabrata, which is accompanied by bird cherry, Padus petraea.Both endemic and a relict are Salicetum lapponum communities, which occur almost exclusively in the eastern Giant Mountains, which are dominated by a remnant of the glaciation, Salix lapponum, a heavily protected species.In stream channels, gullies and springs, there is Athyrietum distentifolii, in which Athyrium distentifolium dominates.Since 2010, they have been observed to be dying off in the Polish part of the Karkonosze Mountains, which has not yet been seen in any other mountain or northern region where this community occurs [39].The high-mountain herbaceous communities also include Adenostyletum alliariae, in which the characteristic species is Adenostyles alliariae, and which occur mainly within postglacial corries with a northern exposure [40].On steep avalanche-prone slopes and leeward slopes of postglacial corries with long-lasting snow cover, Calamagrostion communities occur [41].In the most common type of grass community, Crepido-Calamagrostietum villosae, which often occurs in complexes, the species Calamagrostis villosa dominates.One characteristic feature is Vaccinium communities: Vaccinium myrtillus, with European blueberry dominating, accompanied by Calamagrostis villosa.The community covers vast areas on the slopes and channels of postglacial corries.Patches of Vaccinium communities are often found in complexes with, for example, Crepido-Calamagrostietum villosae or Carici (rigidae)-Nardetum.A smaller share of the area is covered by Empetro-Vaccinietum crowberry communities, which include, among others, Empetrum nigrum and Vaccinium myrtillus.They occupy exposed locations, including the ridges of corries.Carici (rigidae)-Nardetum grass communities comprise the species Nardus stricta; they are mainly anthropogenic in nature, and also occur in complexes with Vaccinium myrtillus, among others.For over 50 years, the species Molinia caerulea has been observed to encroach on areas of Carici (rigidae)-Nardetum grasslands, especially on the Czech side of the Krkonoše Mountains [42].On account of the diversity of species, Carici (rigidae)-Festucetum airoidis mountain grasslands are divided into two altitude categories: alpine and subalpine [43].The alpine form is richer and is distinguished by the presence of the species Juncus trifidus and lichens, while the poorer subalpine form is characterised by Festuca airoides grasslands co-occurring with alpine heaths with the alliance Loiseleurio procumbentis-Vaccinion forming small patches dominated by Calluna vulgaris, Vaccinium myrtillus blueberry, grasses and lichens.Grasslands occur in the alpine zone above 1450 m a.s.l. and partly in the subalpine zone in areas exposed to strong winds.Large areas of rocks and screes are occupied by vegetation-mainly lichen communities, whose determining factor for diversity is snow cover [41].Large areas of subalpine and alpine vegetation level in the Giant Mountains are covered by "snow-loving" communities of the alliance Rhizocarpion alpicolae and slightly smaller areas by "snow-avoiding" communities of the alliance Umbilicarion cylindricae.On the former there is snow cover during the winter months, which disappears to reveal a colourful mosaic on screes which is, for example, yellow-green thanks to lichens of the genus Rhizocarpon.Communities not covered by snow give the rock walls a dark grey colour.In small scattered areas there are spring-habitat plants in non-limestone communities of the alliance Cardamino-Montion, which include the species Pedicularis sudetica, which is endemic and a glacial relict, and occurs only in the Giant Mountains [43] The subalpine zone is characterised by the presence of subalpine-subarctic mountain bogs.Raised and transitional bogs with a total area of 40 ha have been entered in the global list of areas covered by the Ramsar Convention.The non-forest vegetation of the Giant Mountains also includes meadow communities in the form of alpine tundra, which are relics of the pastoral economy which functioned from the 17th to the end of the 19th century [41].The composition and structure of species have undergone changes, but they remain refuges of biodiversity.Tourist areas have synanthropic vegetation, including areas around mountain shelters, where nitrophilous perennials occur.The list of vegetation communities that are the object of the study is presented in Table 1.
The vegetation of the Giant Mountains is under threat due to increasing human activity in the region and environmental changes impairing its ability to grow.In the 1980s and 90s it died back due to an ecological disaster caused by, among other factors, strong winds, air pollution from industrial areas and the precipitation of nitrogen compounds [44].A relatively new danger to the fern community on the Polish side of the Giant Mountains is a weevil, which is the carrier of the microbiological pathogen or nematodes, eating fern leaves resulting in reduced growth in the next year.Another high-mountain vegetation threat is mass tourism in summer and winter.In terms of human influence, the synanthropisation of vegetation is observed, allowing invasive species as Rumex alpinus to encroach and displace natural vegetation.Expansion of native grasses Calamagrostis villosa and Molinia caerulea encroachment is also noticed.

Reference Data
We extracted the samples for classification from APEX data based on field surveys conducted at the end of August 2013 and 2014 with GPS Trimble GeoXT receiver (Trimble Inc., Sunnyvale, CA, USA).The flight over the Giant Mountains with the use of the APEX scanner was carried out in September, so it was important that the field research was done at a similar time, due to the vegetation phase.The mapping area covered two parts of the eastern (16 km 2 ) and western (12 km 2 ) Giant Mountains, for which we collected a total of 812 identified patches of vegetation communities.
As the reference material, we used non-forest vegetation map provided by Wojtu ń and Żołnierz from 2002 [45].The map contained 48 vegetation communities and 13 vegetation types for the whole Polish Karkonosze National Park.Most communities are above the tree line, but the map also include meadow ecosystems and peat bogs on the foothills, as well as communities occurring in the area of forest floors, not being in the interest of our study.Previous studies [34] identified the possibilities for combining certain classes on account of, for example, spectral similarity.We aggregated pixels representing physiognomic classes Artemisietea vulgaris and Urtica dioica communities into one class named Artemisietea vulgaris.Similarly, we aggregated Pinetum mugo sudeticum and Pino mugo-Sphagnetum into one Pinetum mugo sudeticum class, because differences existed only in the properties of the substrate.Areas with no vegetation and those in the early stages of succession we also combined into a single class due to similar spectral reflectances, which resulted in the occurrence of erroneous pixels representing the early stages of succession around lakes or on trails.Based on previous work [34] we selected Deschampsia caespitosa class in clearing near Mały Staw lake because it was larger than the anthropogenic community class identified on non-forest vegetation map from 2002.We decided to exclude from the final classification classes occupied small areas (Peucedanum ostruthium included a total of fewer than 20 pixels, as was the case with Deschampsia flexuosa, which was significantly overestimated).We included in the final study only those classes considered representative on the basis of test classifications.Based on an assessment of the possibility to classify units and the APEX spatial resolution available, the spectral differentiation of classes, and occurring in the subalpine and alpine vegetation belt, we selected 22 classes of vegetation communities, lakes and non-forest vegetation covering the study area.We extracted additional information based on photointerpretation of 12 cm orthophotomaps collected in 2012 for the Polish side of the Giant Mountains.
We used reference data collected during field surveys, the existing plant communities map and image data interpretation for creation of training and validation datasets.In this step each reference point was either put into a training or validation dataset.Next, we extracted pixel samples representing each class based on reference data location in the APEX image.Before model training we decided to perform two steps that would benefit our training and validation process.To balance the training dataset depending on available sample size for each class, we decided to use up to 200 pixels per class in the training dataset (Table 1).This created a more balanced dataset which minimized the influence of more numerous classes on model training.Most classifiers tent to favor classes with more samples, treating less numerous classes as noise.This effect is also more visible in datasets where samples for several classes dominate the whole dataset (more than 50% of total number of samples).
Similar steps were performed when creating validation dataset.We selected up to 400 pixels per class (Table 1).Its balancing was aimed at reducing OA.This allowed us to achieve somewhat lower OA but at the same time we were able to provide more accurate measure of post-classification image accuracy.All of the classes were labelled according to the vegetation communities from the non-forest vegetation map developed by Wojtu ń and Żołnierz [45].

Hyperspectral Data Processing
APEX is a pushbroom dispersive imaging spectrometer designed to simulate, calibrate and validate existing and planned satellite missions, and was developed under the ESA-PRODEX (European Space Agency-PROgramme de Développement d'Expériences scientifiques) program [46].Data are collected in 288 spectral bands within the range 380.5-2501.5 nm.Images of 3.12-m spatial resolution were acquired for the Polish and Czech areas of the Giant Mountains on 10 September 2012 from Dornier Do 288 plane belonging to the German Aerospace Centre (DLR).It was a part of the project Hyperspectral Remote Sensing for Mountain Ecosystem (HyMountEcos), co-financed by the European Facility for Airborne Research Transnational Access (EUFAR TA).
Concurrently to the aerial pass, we performed calibration spectrometric measurements on surfaces of relatively uniform spectral reflectance, such as water reservoirs, asphalt surfaces and homogeneous meadows.They were obtained using an ASD FieldSpec 3 spectrometer (ASD Inc., Longmont, CO, USA) registering within the range 350-2500 nm.Image preprocessing was carried out at VITO (the Flemish Institute for Technological Research), the owner of the APEX scanner.This included radiometric calibration, geometric and atmospheric correction using spectrometric data and a Digital Terrain Model (DTM).We extracted the area above 1200 m a.s.l.from the APEX data using DTM.It includes some very steep terrain, such as postglacial cirques, and so topographic shadows were masked to avoid false-positive identifications of the objects spectrally similar to shaded areas [47].We built the shadow mask on the basis of spectral reflection values from band 104 (946 nm), because in comparison to other bands, near infrared allows to clearly distinguish shadows, where they have very low reflectance opposite to other objects.Because we developed the mask by selecting the threshold of reflectance values to be assumed for pixels representing shadows, it slightly encompassed pixels representing spectrally similar objects, such as Umbilicarion cylindricae class.

Dimensionality Reduction
The 288 hyperspectral bands of APEX allow for detailed analysis, but a high correlation of neighbouring bands is nonetheless observed, leading to redundancy of information and lengthening the algorithm run-time.An important aspect is the reduction of the number of bands and the selection of the most informative.Based on visual analysis of all APEX bands we removed 36 bands associated with the absorption of electromagnetic waves by water vapour (ranges: 1335-1422 and 1759-1954 nm).
We used three methods of dimensionality reduction: two based on transformations PCA [48] and MNF [49] available in ENVI software, and one band selection by backward elimination [50], executed using IDL language.PCA determines a new principal axis for the coordinate system along the largest possible variance of data [48].Using covariance matrix, new, uncorrelated bands were calculated and first 40 bands were selected as the most informative based on the highest eigenvalues.MNF transformation required noise estimation based on image statistics; therefore, two separate statistics files were created.During the transformation, the average for each band of the original image was calculated and covariance matrices and noise correlations were created.Bands with eigenvalues higher than 1 contained information, while bands with values close to zero contained noise.Thirty MNF bands were selected for further analysis, which were read from the information curve.We analyzed the information on APEX spectral bands by backward elimination, carrying out 288 classifications using the SVM method on the area, where polygons with training and validation samples has been previously designated.During each classification one band was subtracted from the image to assess how the OA changes in the absence of a specific band.We selected the bands that enhanced the accuracy and finally, two sets of bands were selected, consisting of 70 and 18 bands, respectively.

Support Vector Machine Classification
The classification used a Support Vector Machine (SVM) machine learning algorithm-a non-parametric method that enables the supervision of pattern recognition [51].The algorithm for distinguishing classes in feature space R n tries to determine the best hyperplane, which separates classes while attaining the maximum possible margin between classes.It is assumed that the data set is composed of pairs (x i y i ), where: i = 1, x i ∈ R n , y i ∈ {−1; 1}.Each point from a given set has the value x i describing the elements of the training sample and y i indicating the membership in one of the classes.The hyperplane H, by means of which individual classes are separated, is defined by the following formula [52]: where x is the point on the hyperplane, w is the n-dimensional normal vector for the hyperplane, and b is the distance from the nearest point on the hyperplane to the starting point.Figure 3 shows two sets of points separable linearly.Determination of two hyperplanes H1 and H2 in such a way that there are no points from the set of training data between them, and the distance between them (margin) was as large as possible, can be described by the following equations: assuming that: w The sample closest to the hyperplane are the support vectors.The result is dependent on the width of the boundary between classes: the greater it is, the lower the classification error.If the feature vectors x ∈ R n are not linearly separable they should be mapped into higher dimensional feature space with non-linear vector function Φ : R n → F , where F is a higher dimensional Euclidean space [52].Subsequently, the SVM algorithm is used in the new feature space, and the resulting linearly separating hyperplane plane corresponds to the non-linear surface of the original space (Figure 4).The scalar product of the vectors x i x j in the transformed space is equivalent to the kernel function [52]: A central element is the selection of function, as follows: linear: radial: polynomial: sigmoid: where γ is the gamma parameter, p is the polynomial degree, and 1 is the bias parameter.SVM can achieve good results, even with noisy bands; it is not time-consuming, despite using significant amounts of data [22].
We performed the classification in package e1071 [55] in the R software [56].Our choice of kernel function was based on a series of experiments where all of the kernel functions were tested and then results were compared.The aim was to select a kernel function that yielded highest accuracy.
In machine learning it is always good to perform hyper parameter tuning before training of model.Tuned parameters were: penalty (C), which controls the trade-off between classification errors and forcing margins; and gamma, which expresses the width of the Gaussian function.The parameters were fine-tuned using a grid search method which performs a series of training and validation of a model (trained classifier) for each combination of parameters one desires to test and then selects those that give the highest accuracy.Values for gamma from 0.1 to 0.9 and for C from 1 to 100 were tested.Analysis concluded that highest accuracies were acquired for the following combination of gamma and C: 0.1 and 9 respectively.
It was an assumption of the classification algorithm for APEX hyperspectral images that the analyses would be highly objectivised, so the selection of training and validation samples was repeated 100 times with a random selection of pixels from polygons collected during field tests using a high-accuracy GPS device with support of other reference materials.

Accuracy Assesment
We provided the assessment of accuracies obtained for the results by creating error matrices from which the overall and class accuracies were calculated [57].An error matrix express the number of pixels assigned to a particular class relative to the verified class.Overall accuracy (OA) is calculated by dividing total correct (x kk ) by total number of pixels (N) in the error matrix: The accuracy for each class was determined by producer accuracy (PA), which is calculated by dividing the number of pixels correctly classified into a class (x kk ) by the total number of pixels in the reference class (x k_val ): and user accuracy (UA), which expresses the number of pixels correctly classified into a class (x kk ), divided by the number of pixels in the class according to the classification result (x k_class ): To measure sensitivity of the classifier to changing training dataset and to provide more detailed accuracy metrics of our post-classification images we decided to use an iterative accuracy assessment procedure.The whole procedure consists of 100 repetitions of following steps: 1.
Randomly select training and validation pixels without replacement from all available samples.These will create training and validation datasets.During this process, a specific number of pixels was selected for each class, as described in Table 1.

2.
Perform model training using the training dataset.

3.
Perform model accuracy assessment using the validation dataset.Calculate PA and UA for each class and OA.

4.
Remove all samples from training and validation dataset.Repeat step 1.
The iterative accuracy assessment procedure provided the information about median class accuracy and estimated distribution shape and width of calculated accuracy measures for each class.The variation within classes we assessed by analyzing the absolute values (the difference between the minimum and maximum accuracy), and focusing on the values between the first and the third quartile in each boxplot.
Additionally, we analyzed the accuracy of the classification depending on variable number of training pixels, from 50, through 100, 150, 200, 250 to 300 for all classes which have more than 300 pixels in original training samples set (for 13 classes, Table 1).The number of validation pixels for this approach was constant and amounted to 400 samples.
A minimal classification unit was defined as APEX pixel size (3.12 m) and accuracy assessment was based on this unit.To present the map cartographically we decided to generalize the result to 1:15,000 scale using minimal mapping area defined as 5 × 5 pixels and the boundaries of plant communities were smoothed using the majority analysis function in ENVI 5.3 software (Harris Geospatial Solutions, Broomfield, CO, USA), which removes individual pixels from within larger classes.In addition, we used the aggregation function to delete groups of pixels of less than a set size (25 pixels).For the design of the legend, we consulted with phytosociologists and performed cartographic work using ArcGIS 10.3 software (Environmental Systems Research Institute, Redlands, US).

Results
Dimensionality reduction methods provided us with the best dataset selection.Table 2 presents the accuracies obtained for the two best kernel functions in term of overall accuracy where five datasets were used.Linear function was used to compare based on previous studies of Polish Karkonosze vegetation [34].Additionally, we added the information on the resulting file size to present the significance of the reduction.The highest OA were achieved for 40 PCA bands with radial function, next for 252 original spectral bands and MNF transforms.Lower than transformed values were obtained for selected spectral bands; for 70 and 18 bands, respectively.In linear function, the highest OA was achieved for the original 252 bands, then for 40 PCA bands; the lowest accuracy value was also taken by the image composed of 18 selected spectral bands.It has been noticed that a significant reduction in the size of the image, like in the case of the PCA method (more than seven times smaller by volume), took place: the accuracy dropped by less than two percent, which allowed to determine the high suitability of the use of spectral space reduction methods, increasing the operability of the classification process.
The analysis of the impact of the training set size on the accuracy of the classification we executed on the best dataset: 40 PCA bands on radial function.We noticed an increase in OA as the number of training sets increased, obtained for 13 classes of communities consisting of 300 samples (Supplementary Materials, Figure S1).For such sets of data median accuracy of 86% was obtained and the distribution of values was small, as for sets consisting of 250 and 200 training pixels.With 50 samples, median OA value was less than 74% and for this dataset the widest distribution of values was obtained.
The maps of the vegetation of the Giant Mountains and the error matrices for each 100 iterations were obtained also for the best dataset, which were 40 PCA bands on radial function (examples of eastern and western parts resulting in the best iteration based on OA (85.5%) are presented in Figure 5 and Supplementary Materials, Table S1).Median value of OA obtained during 100-fold accuracy assessment was equal to 84%.In total, median values of PA and UA were high for most classes: 14/24 classes were distinguished with a PA of above 80%, while 16/24 classes were also identified with a UA of over 80%.
Within most classes (19/24) PAs show little variation (less than 15% difference in accuracy between the lowest and the highest value within a boxplot), which shows that training and validation patterns were well selected and that they can be properly classified.The biggest in-class variation was for Adenostyletum alliariae, which in the best iteration achieved a PA of 80%, while the lowest observed value was only 25% and the difference within boxplot was 41%.The class was most often confused with Cardamino-Montion with also broad distribution of accuracies (23%).In addition, higher variability of PA values (over 20%) was observed for the three classes whose median PA score was lowest: Salicetum lapponum, Calluna vulgaris and Adenostyletum alliariae.The results of UA correlate well for most classes, with the variation within individual plant communities being small (less than 15% difference between minimum and maximum values within boxplots).More variation was noticed for the same classes described by analyzing their PA distributions above.The widest distribution was noted for Adenostyletum alliariae (55%), then for Salicetum lapponum (26%) and Cardamino-Montion (23%), which were most often confused with each other and for Calluna vulgaris (23%).
In Figure 6 we also marked number of training and validation samples as colors of boxplots.Generally, high median accuracies (above 85%) and low level of variation (less than 15%) were noted for communities for which collecting a representative number of training and validation pixels (200/400) was possible, as for Carici (rigidae)-Festucetum airoidis in subalpine form, Carici (rigidae)-Nardetum, Rhizocarpion alpicolae, Calamagrostion, Oxycocco-Sphagnetea nigrae, and Molinia caerulea.However, high accuracy did not always go hand in hand with a larger training set; e.g., for Carici (rigidae)-Festucetum airoidis in alpine form represented by 100 training pixels was 96% PA and 100% UA or for Empetro-Vaccinietum, represented by 150 pixels was about 91% PA and 90% UA, while accuracies for Crepido-Calamagrostietum villosae or Vaccinium myrtillus with 200 training pixels oscillated around 75%.Nevertheless, a relationship between the size of the training set and the width of the distribution was noticeable.For Calluna vulgaris and Salicetum lapponum the sets of training and validation pixels (50/100) was too little to achieve high accuracies and their PA values distributions were wide.The Artemisietea vulgaris and Cardamino-Montion classes had very few training (30) and validation pixels (60) resulting in lower PA and higher levels of this accuracy variation.For Adenostyletum alliariae the smallest number of training and validation samples was collected (10 and 20) which caused the widest distribution of both accuracy values.

Discussion
Comparison results reported by the team of Marcinkowska-Ochtyra [26] present the OA as well as PA and UA for the same 22 vegetation communities of the whole Giant Mountains area above 1200 m a.s.l.classified using SVM and Random Forest (RF) classifier.It was observed that general accuracies were almost the same (84.62% and 84.87% OA, respectively).The difference was in the median values obtained in iterative way; for the worst classified communities represented by smaller training sample size as Salicetum lapponum, Adenostyletum alliariae and Calluna vulgaris median PA for RF were much lower than for SVM (11%, 25% and 23%, respectively) and they were characterised by similar to SVM level of dispersion.It is confirmed by comparative studies of using SVM, RF and Maximum Likelihood to classify herbaceous vegetation [8], where the authors also find out that SVM is not sensitive in the training sample size, which makes it useful when only a limited number of training pixels are available.The accuracies for vegetation communities classes of the Polish Tatras obtained for DAIS7915 data (Digital Airborne Imaging Spectrometer, Geophysical and Environmental Research, New York, US, 79 spectral bands in 400-12,600 nm,) on August 2002 with Fuzzy ARTMap classifier [6], were comparable to Giant Mountains vegetation: PA above 90% for, among others, mountains grasslands and Empetro-Vaccinietum, and UA below 70% for willow thicket, Chamaenerion angustifolium-Salix silesiaca community and Calamagrostietum villosae tatricum in the Tatras, physiognomically comparable to Salicetum lapponum and Crepido-Calamagrostietum villosae in the Giant Mountains.For Vaccinium myrtilus in a complex with tall herb communities in the Tatra mountains, PA of 72% and UA of 78% were achieved; in the Giant Mountains, for Vaccinium myrtillus, also co-occurred with grasslands, very similar PA and UA of 74% and 74%, respectively, were achieved.Complementary studies with hyperspectral data were provided by Kupková et al. [9] where APEX and AISA Dual data (Airborne Imaging Spectrometer, 494 spectral bands in 400-2300 nm) were used to classify 11 vegetation classes in Eastern Tundra of the Krkonoše Mountains, which provided very similar OAs to our results (83% and 84%, based on SVM, respectively and 69% and 73%, based on ANN).For individual classes, PA above 90% were obtained for, among others, block fields and anthropogenic areas, Pinus mugo scrubs, and below 70% for Calamagrostis villosa stands, wetlands and peat bogs, and also Molinia caerulea stands, which is confirmed in this work, but a higher PA in analysis of the Czech part of the Giant Mountains was found for alpine heathlands (90% on APEX and 82% on AISA image).
There are classifications by other authors using APEX data relating to five species of trees in Forêt de Hardt, France (OA 74%) [18] and the Polish Karkonosze (OA 68% and 79%) [19,59].Preliminary classifications of Western Karkonosze non-forest communities on raw APEX data allowed to achieve approximately 75% OA [32].Classifications of subalpine and alpine vegetation around the Mały Staw lake in the Eastern Karkonosze Mountains have achieved an accuracy of around 75% [34].Nine types of vegetation of the Karkonosze National Park were classified at 78% OA based on simulated EnMAP (Kayser-Threde GmbH, Munich, Germany) satellite data and there is a similarity to the results obtained in this work: large, homogeneous classes were classified better, e.g., dwarf pine thickets (over 90% PA and UA), and grasslands (approximately 90% PA and UA).Generally, the majority of classes were classified at PA and UA levels above 70%, while the lowest PA was for the herbaceous community class (44%), consisting of heather and Vaccinium communities.These analyses are particularly important in the event of EnMAP launch, which is to be made available free of charge from 2019.This will open up new possibilities for extracting information on the environment, in its broadest sense.
In the literature, the most common is using radial kernel function [23,60,61].In our research it was noted that it gave the best results with PCA transformed bands and with 252 original spectral bands it took second place in accuracy.In some cases this function is not the best: when the number of variables is very large, there is no need to transform the space into higher dimension and it is appropriate to use the linear function.We presented this approach in the previous studies of Polish Karkonosze vegetation classification where the linear function with original spectral bands presented the best dataset to use [34].
Hyperspectral data provide greater possibilities in the analysis of a mosaic of species forming a diverse community.Comparative analyses of the classification results obtained from original and spectrally reduced hyperspectral data show higher accuracies for the transformed data [7,8,16,17].Limiting the size of the resultant file optimises the method for further applications.
The sensitivity analysis of the classification to the training data set showing an increase in the accuracy of vegetation classification along with an increase in the training set was confirmed by other authors [7,8,24,62].The random selection of training and validation pixels made the results more objective.Thanks to the iterative accuracy assessment method, it was possible to obtain 100 accuracy metrics for each individual class, which indicated the accuracy of randomly selected pixels for communities of similar assumed values.In the literature, this approach appeared only recently, with authors classifying species of trees [16,17,19] more spectrally pure than other plant communities.It should be underlined that this method has not yet been carried out for many classes of complex plant communities.

Difficulties in Classification of Mountain Vegetation Communities
The methodological problems of mapping mountain vegetation stem mainly from the specific nature of mountain vegetation cover, which is made up of small and spatially diverse communities [63].We aggregated some classes due to their spectral similarity.From the aerial level it is not possible to detect the differences between Pinetum mugo sudeticum and Pino mugo-Sphagnetum.Despite their different structures, Pinetum mugo sudeticum and Calamagrostio villosae-Piceetum are also spectrally similar.Mapping communities near the upper forest border was difficult due to the smooth boundaries of classes mixing with particular trees.Because Calamagrostio villosae-Piceetum do not belong de facto to non-forest communities, they were found on the Wojtu ń and Żołnierz (2002) map only at the borders of other communities.LiDAR data could be helpful in distinguishing trees, thanks to information on altitude.
We also noted a confusion of co-occurring classes, such as Vaccinium myrtillus with heathers or Crepido-Calamagrostietum villosae because of occurrence in small, dispersed patches, most often forming complexes.Crepido-Calamagrostietum villosae was mixed with Deschampsia caespitosa due to their spectral similarity.Vaccinium myrtillus was also confused with grasslands, and less with Athyrietum distentifolii encroaching on areas formerly occupied by ferns.Calluna vulgaris was most often confused with Vaccinium myrtillus, due to its co-occurrence in small patches with subalpine Carici (rigidae)-Festucetum airoidis dominated by heather and also with Athyrietum distentifolii.Oxycocco-Sphagnetea nigrae classified on high accuracy is unique to the Giant Mountains, and stand out spectrally from among homogeneous areas.Scheuchzerio-Caricetea nigrae were identified in the field as occurring alongside Deschampsia caespitosa and Crepido-Calamagrostietum villosae communities encroached on them, thus making class less distinguishable.In the eastern part of the Giant Mountains these fens were replaced by bogs from Oxycocco-Sphagnetea nigrae class.
Finding representative samples of Salicetum lapponum in the field was difficult, as they often occupy areas of no more than nine square meters, and some occur on steep, shady slopes.Therefore, they were represented by a smaller set of training (50) and validation pixels (100) resulting both median PA and UA were the smallest and the high variation within this class.Cardamino-Montion was also difficult to identify due to the linear courses of its occurrence on the steep slopes of postglacial corries near water seepages; therefore.the training to validation pixels ratio were only 30:60 and the variation of PA and UA was high.
In general, PA and UA for each class were similar, but for two classes PA was lower than UA (Pinetum mugo sudeticum and areas without vegetation) and they were underestimated.Higher values of the reference communities have been correctly identified as these classes than in the case of their actual representation.Also, four classes were classified on higher UA than PA level (Artemisietea vulgaris, Calluna vulgaris, Adenostyletum alliariae and Salicetum lapponum) and they were overestimated.Higher percent values of the areas identified as these classes were actually found.However, they were represented by few training and validation pixels (100/200 or less, for Adenostyletum alliariae only 10/20) and fitting to the reference data was more difficult than for more representative dataset.The Artemisietea vulgaris class achieved lower PA (75%) because it had undergone dynamic changes and its current coverage did not coincide with that of 2002, making it difficult to determine appropriate patterns.It was most often confused with Crepido-Calamagrostietum villosae.However, the pixel fit for all the pixels classified to this class was high (90% UA), due to its lower spectral reflectance in the red radiation range and much higher reflectance in the near infrared, clearly distinguishing it from the surrounding Deschampsia caespitosa.Salicetum lapponum with lower PA were incorrectly identified in places where Pado-sorbetum was found.
Difficulties in classification of some communities were also caused by different dates of the data used affected the accuracy results.The non-forest vegetation reference map was ten years older than the APEX image.The ruderal vegetation became one of the most dynamic and related to human activity.Peucedanum ostrythium during the field inventory in 2013 occupied small patches, smaller than half the size of the APEX pixel.In 2014, it was not possible to detect even individual one.On the reference map Athyrietum distentifolii community occupies several major patches, mainly in the western part of Karkonosze, and to a lesser extent of eastern part.Since 2010, the fern's death has been observed in the Polish part of the Giant Mountains and encroachment of grassland or shrub communities in these areas.The discrepancies in acquisition date of reference and hyperspectral data highly affect the accuracy of mapping of fern communities [32] and only field validation in 2013 and 2014 allowed to observe the disappearance of the community in places of potential occurrence.Communities that have changed in the past decade needed to be inspected in the field in a phenological period close to the date of the imaging.In September, when the APEX data was acquired, vegetation begins to change color, which allows it to be differentiated well.For example, Juncus trifidus, which belongs to the alpine grasslands, has a reddish color in late summer, which distinguishes it from other species by its different spectral signature.Carici (rigidae)-Nardetum communities cover extensive homogeneous areas and are also dominated by one species-Nardus stricta, stands out from the surrounding thickets of dwarf pine or Vaccinium myrtillus for its much higher reflectance of radiation in the visible bands.Use of an image from spring or summer might not allow for the unambiguous distinction of the characteristics of communities spectrally similar during their optimal vegetation phase.
One problem associated with mapping mountain areas is topographical shadows on the images.A shadow mask excluded such areas from the analysis; otherwise, classes would have been erroneously identified.The application of masks likely excluded the part of sites representing Umbilicarion cylindricae, which typically grows in the shadows of rocks not covered by snow.
Some phenomena are similar in different parts of the Giant Mountains.In both the Polish and Czech parts, spreading of expansive Calamagrostis villosa is noticed.Some different phenomena are also observed: in the Polish part, dying of the Athyrietum distentifolii community, and the encroachment of invasive species Rumex alpinus and expansive Molinia caerulea in Czech.Our results might help in identifying them and support management of national parks areas.

Conclusions
In summary, it can be stated that: 1.
The high spatial and spectral resolution of APEX data and the use of SVM allowed to accurately classify vegetation communities with high accuracies.
Iterative random selection of training and validation samples made it possible to avoid the subjectivity of a single selection of data.The most varied values were obtained for classes represented by smaller training sets, heterogeneous classes difficult to identify due to small spatial extent (less than nine square metres) and location in shaded areas (Adenostyletum alliariae, Salicetum lapponum).Higher accuracies were achieved for classes consisting of more training and validation pixels, such as mountain grasslands, Rhizocarpion alpicolae, Calamagrostion, Molinia caerulea.

2.
The PCA reduction allowed us to keep all the information and accelerate the classification process.
Increasing the size of the training set resulted in higher OA because of a more representative dataset (the highest accuracy at 300 training pixels).

3.
The majority of the plant communities' extent of the Giant Mountains was similar to the reference map from 2002.However, we underline the importance of the up-to-dateness of the data.Anthropogenic communities are subject to dynamic changes: Athyrietum distentifoli suffer damage from pest outbreaks, which can be observed within few years.Some classes were aggregated into larger ones (Artemisietea vulgaris, Pinetum mugo sudeticum, areas without vegetation) due to spectral similarities.
The map of high-mountain plant communities developed using the same repeatable method for the Giant Mountains may support monitoring works in both national parks.With the results collected in previous studies based on APEX data and the Polish and Czech parts [32,34,35] this work is an objective material for the management of the whole transboundary area.In terms of vegetation synanthropisation, the results can be used for invasive or expansive species encroachment analysis.Differences between communities caused by phenology are highlighted due to the date of APEX data acquisition proper to map specific community for nature conservation purposes (such as Juncus trifidus in grassland communities).The iterative accuracy may be useful in detailed analyses of each community, allowing proper identification of the information about diversity and possibility thanks to their distribution widths.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2072-4292/10/4/570/s1, Figure S1: Increased accuracy of classification caused by the increase in the number of training pixels, Table S1: Error matrix for the best iteration of 40 PCA bands, in %.

Figure 2 .
Figure 2. Mosaics of diverse vegetation communities in western (a) and eastern (b) Giant Mountains.

Figure 3 .
Figure 3. Hyperplane and parallel H1 and H2 hyperplanes that separate classes while maintaining the maximum margin [53], modified.

Table 1 .
The number of samples used for classification and validation.

Table 2 .
Overall accuracies obtained for different datasets and two kernel functions.PCA: Principal Component Analysis; MNF: Minimum Noise Fraction.