1. Introduction
Non-forest communities such as grasslands and meadows are recognized as the most species-rich plant assemblages hosting numerous rare and endangered species. Increasing degradation of grassland and meadow communities have been reported recently by many authors [
1,
2,
3]. Among main reasons responsible for the phenomenon of the abandonment of these habitats or intensification of management are mentioned. One of the manifestations of the disadvantageous changes in grassland and meadow communities, including the ones important from a point of view of biodiversity conservation, is the entering of expansive species which can dominate the community and considerably limit species diversity.
Calamagrostis epigejos and
Molinia caerulea are listed among expansive species of global importance due to their colonization of various ecosystems in Europe and North America, causing grassland and meadow degradation [
4,
5,
6,
7].
Research on the encroachment of alien invasive species into non-forest habitats is widely used but also a large increase of native expansive species has been observed in many patches of non-forest communities without proper management. Preserving the species-rich non-forest communities requires monitoring of the state of their conservation values, in particular to detect any proliferation of undesirable species.
The fast and effective detection and mapping of invasive alien plants, and similarly native expansive ones, at different spatial scales is becoming increasingly important for their management [
8,
9,
10]. Also monitoring the threat caused by invasive and expansive species in natural habitats is essential for the process of the proper preservation of these habitats.
The application of hyperspectral and ALS (Airborne Laser Scanning) remote sensing data is a method complementary to traditional field surveys, which additionally allows coverage of large areas [
11]. It is probable that every plant species has a feature or a set of features which can be used for its spectral identification. Achieving the expected result of marking out the appropriate time of spectral data acquisition, in which the feature of the species is the most visible feature and simultaneously enables it to distinguish itself from other species, is significant.
Commonly, methods for mapping plant species are based on the subjective assessment of the expert made in the field using the spot-map or line-transect methods. They are recognized scientific methods, but during conducting mapping in the field they may be fraught with human error. The disadvantages of these methods are that researcher is not able to explore the area of research in a detailed way and the whole map of a bigger area has to be interpolated based on points collected in the field, making the results imprecise. Remote sensing methods are more objective because even if some errors in field data occur, the whole area is mapped in the same way. In some cases, representatives of samples used for classification can be assessed by experts after obtaining first classification results or using dominance profile graphs [
12]. Nevertheless, remote sensing data ensure measurability and verifiability, so they are reliable and, equally important, reproducible.
Remote sensing offers many possibilities for vegetation research, from condition analysis [
13,
14,
15,
16,
17] to land use/land cover mapping including plant species or community identification [
18,
19,
20]. The electromagnetic spectrum covering the visible (VIS) and near infrared (NIR) ranges is the most commonly used one for the analysis of vegetation [
21]. Depending on the scale of the study and the available resolution of remote sensing data, it is possible to identify plant units at various levels: vegetation types, habitats, communities or species. Data from broad-band multispectral scanners have successfully been used to classify land cover [
22,
23,
24] or vegetation types [
25,
26]. For more complicated and complex units, such as habitats or plant communities, a higher spectral resolution is needed to capture larger differences between them, and depending on the size of the unit, satellite or aerial data may be used, which is related to the size of the pixel [
19,
27]. Hyperspectral data consisting of hundreds of narrow spectral bands provides detailed information about analysed objects [
28]. It is a big advantage in comparison to more common multispectral data where in broad several spectral bands the characteristics of these object are generalized. The importance is in the possibility to differentiate analysed objects from the background. For particular species identification, the most suitable method involves airborne imaging spectroscopy data consisting of hundreds of spectral bands which allows for the detection of spectral signatures of particular plants relative to their background of surrounding vegetation and their high spatial resolution provides the detail needed for patch identification [
18]. Because of these valuable hundreds of bands, hyperspectral data processing is more challenging and storage demanding. To make hyperspectral data processing more operational, different transformation approaches are used, the most commonly used are Principal Component Analyses (PCA) [
29] or Minimum Noise Fraction (MNF) [
30].
Different classifiers are used for plant identification, the choice of which is related to the remote sensing data type mentioned above, as well as the scope of the study. Traditional classifiers are used to classify more general units, such as land cover, which includes vegetation cover mapping [
31]. Often, remote sensing vegetation indicators, such as normalized difference vegetation index (NDVI), are included in the classification, especially in multi-temporal analyses, insofar as they are good indicators for reflecting periodically dynamic changes of vegetation groups [
32,
33]. Machine learning methods, such as Random Forest and Support Vector Machines (SVM), are successfully used to identify both communities and species [
19,
20,
34,
35]. Comparative analysis of methods used to classify particular species presented higher accuracies reached for a Random Forest algorithm [
36,
37,
38], which is better than SVM because of processing time.
In the literature, there are many examples of applications of hyperspectral remote sensing for species detection, a significant part being devoted to the identification of tree species [
38,
39,
40]. A large group consists of classifications of invasive plants that pose a threat to native vegetation [
18,
41,
42,
43]. There are few studies using remote sensing techniques to identify expansive species that, although native, are also threatening to natural habitats. Scientists analysed
C. epigejos and
M. caerule spreading using statistical methods [
44,
45]. Several authors used hyperspectral images to identify particular species encroaching into heathlands:
M. caerulea entering the Natura 2000 habitat classification was addressed by Mücher et al. [
46] in the areas of Ederheide and Ginkelse heide in the Netherlands and by Haest et al. [
47] in Kalmthouse Heide in Belgium.
C. epigejos was mentioned by Schmidt et al. [
48] in Oranienbaum Heath located near Dessau in the Elbe-Mulde-lowland in Saxony-Anhalt in Germany, but only with encroaching into heathlands being the main object of the study. The specific features of this species have not been studied more deeply. Separating different spectrally similar classes as grassland types using narrowband images is supported by Ali et al. [
49], who underlines the possibility to obtain them from the airborne level, due to the lack of spaceborne hyperspectral sensors currently in orbit and also higher spatial resolution. Based on Mücher et al. [
46] collecting more images over the growing season and incorporation of vegetation information from LiDAR data might be helpful in grasses differentiation. Multi-temporal analyses using satellite data were used by researchers to identify invasive species [
50,
51] or grasslands [
33,
52]. Separability of grassland classes were supported by characteristic phenological development of individual habitat classes as greenness or colouring in the blooming phase using optical RapidEye data and vegetation height and structure from radar backscatter using TerraSAR-X data [
33]. Seasonal effect on tree species classification was also analysed based on hyperspectral Airborne Imaging Spectrometer for Applications and LiDAR data [
53]. LiDAR data are known as being useful for mapping canopy structure, but rarely as an alternative to the imaging vegetation classification method [
54]. It is more commonly is used with other types of data from spaceborne [
55] or airborne [
56] levels. While LiDAR with hyperspectral data were most commonly used in classification of trees [
39,
57,
58] or shrubs [
59], several studies of non-forest vegetation [
60] including also only LiDAR [
54,
61] were also presented. Separating higher vegetation as trees or shrubs from lower vegetation [
62] is much easier than capturing smaller differences in lower species from higher vegetation, but in grassland mapping of the lowland hay meadows Natura 2000 area [
54], the strong potential of vegetation high-dependent variables was noticed. Using only LiDAR derivatives as terrain height models for predictive modelling of non-forest communities locations was presented by Ward et al. [
63]. However, other authors by adding passive optical data to LiDAR data obtained higher classification accuracy [
64,
65] in vegetation analysis due to possibility to differentiate features such as lignin composition, senescent matter or soil presence [
66]. The combination of high and spectral data could provide complementary information about the study object and optimize their strengths [
65,
67]. The vast majority of classifications present single training and validation data split but they can induce biased results [
68]. Iterative accuracy assessment was proposed by several authors in the classification of trees [
39,
40,
69] and non-forest vegetation [
20,
27]. This approach allows for more objective conclusions, avoiding very poor or very good results obtained by chance. It is important especially when comparing different scenarios, datasets, or classifiers.
Objectives
Although in a number of studies, characteristic features of the targeted species such as phenology, colours of flowers, physiological traits and form of growth have been used for their detection, optimal methodologies still remain to be defined. Because the literature lacks the use of hyperspectral with LiDAR data for grassland species identification, this aim remains necessary. Referring to this, the objective of this study is to investigate the use of HySpex data and LiDAR products to classify the two expansive grass species C. epigejos and M. caerulea in Natura 2000 habitats, which have been recognized as aggressive competitors with the tendency of dominating the plant community, causing negative changes in its structure and species composition. More specifically, the investigation aims to:
compare different times of airborne data acquisition depending on the growing phase of analysed species, to point out the most optimal time of proper species detection,
collate different datasets containing spectral data and additional different vegetation with high layers to choose the most optimal dataset to detect these species.
The presented approach intends to compare these elements with respect to the maximum classification accuracy reached and botanical assessment leading to selecting the most optimal data needed to provide good material for the monitoring of Natura 2000 areas.
3. Results
The Kappa accuracy obtained for the classification in each campaign was presented in boxplots. Datasets for differences that were not statistically significant were listed in
Table S2 in Supplementary Materials. Because sc01 contains a mosaic of all spectral bands, the differences between it and the other scenarios in which MNF channels were used are significant. For Molinia in June, the highest Kappa accuracy (around 0.83) was obtained for a group of sets that used MNF transforms and full-waveform products from LiDAR, as well as for MNF with all additional layers (sc09), a slightly lower accuracy for pairs: MNF + topographic indexes and MNF + discrete data from LiDAR (approx. 0.81). For the group in which the MNF alone and MNF bands with VIS and CHM were used, the Kappa median value was lower (0.79) and these scenarios presented the most diversified values obtained during 50 iterations. In August, the accuracy for Molinia was more stable and reached the highest level. Aside from sc01, all medians were higher than 0.8. The best set included MNF with topographic indices and full-waveform data; in contrast, the lowest level of dispersion of results was obtained for all additional layers from MNF (sc09). The scenario containing the MNF bands alone was not statistically different from other MNF scenarios, but it had the highest level of dispersion. In September, the differences between data sets for Molinia were much higher than in August. The highest accuracy was obtained for MNF, MNF + CHM and MNF with all additional layers, but the lowest level of dispersion was observed for sc03. A pair of data sets with discrete data of the lowest values and the highest level of value dispersion stand out in particular. For sc01, the Kappa median was the lowest (0.69).
Accuracy for species and background was also calculated for the result obtained during the best iteration. It was a combination of producer and user accuracy, which is referred to as the F1 value. For
M.
caerulea (
Table 5), this accuracy confirms the accuracy of Kappa—the lowest values were obtained for June. The best dataset in June was sc08, where F1 reached 0.87; the lowest value was for sc01 (0.72). In general, F1 accuracies were the best in August, most of the median values for scenarios was 0.86, the highest was 0.89 for sc09 and even sc01 consisted of mosaic of spectral data was also high and reached a 0.84 value. September was the second highest in term of class accuracies and here the best was also sc09 with 0.88 value, next was sc04 and 05 with 0.87 value. A slightly lower value for sc08 (0.82) was noticeable.
For C. epigejos in June the median of Kappa values were between 0.4 and 0.6 and the highest were for MNF transforms with full-waveform and discrete LiDAR rasters therefore also for all rasters in sc09. Also mosaic of spectral bands and MNF with topographic indexes presented the worst results. A similar situation applied for August, but the accuracies were slightly higher for each dataset, while the highest median values were for sc08 and sc09 but the lowest level of dispersion was for MNF with discrete and full-waveform LiDAR data. For September, the values were the most stable and also the best dataset contained discrete LiDAR data. In each campaign, information on the intensity and structure of vegetation for C. epigejos was essential and improved accuracy. For July and August, the highest accuracy was obtained for full-waveform data; in September, due to the lack of full-waveform data, the highest accuracy was obtained for discrete data.
F1 values for
C. epigejos also confirm Kappa accuracies for each campaign and dataset (
Table 6). The lowest values were for June, where the best dataset was sc09 (0.67%), the worst sc01 (0.54%) but sc03 was similar (0.56%). The same situation was observed for August, but here the values were slightly higher, in general: the best was sc09 (0.7) and the worst sc01 (0.6) and sc02 (0.61).
C. epigejos was classified most correctly in September; however, sc05 turned out to be the best set here.
4. Discussion
Based on Landis et al. [
95] values between 0.61 and 0.80 indicate substantial strength of the agreement, while more than 0.81 means almost perfect agreement of the classification. Based on this information, it can be concluded that
M. caerulea was classified very well in each date and almost each scenario. In the case of Kappa accuracy for the entire image and F1 accuracy for individual classes, adding information about the vegetation structure from LiDAR improved the results. The difference between the use of discrete and full-waveform data was small but noticeable, which allows for the conclusion to be drawn that newer full-waveform technology that derived amplitude and used pulse width extra byte performed better. Adding this information to the September data could improve accuracy.
Referring to the visual interpretation and botanical evaluation, Molinia is best detectable in the flowering phase, which occurs between July and September, but reaches its peak in August. It should be noted that the coverage of most sites with the Molinia species was high at that time (70%–80%) and the co-existence of other species was rare. However, there was occasional overestimation of the species, especially for the results of August and September, with some artefacts, e.g., pathways to the south of the site, being classified as species.
The usefulness of only spectral information to discrimination of
M. caerulea is supported by other studies [
20,
68]. In classification of Natura 2000 heathlands in Kalmthoutse Heide in Belgium [
69] the best mean accuracies were obtained for heathlands with
Molinia (80.7% using SVM classifier, 69.7% using RF) on CHRIS data from July. Marcinkowska-Ochtyra et al. [
20] presented
M. cearulea community classification in Giant Mountains in Poland/Czech Republic with 90.3% of PA with APEX data from September using SVM. However, the assumption of these analyses were not to compare different growing stages of the species and were conducted on the data acquired once, but in each case
Molinia was well classified. In other Natura 2000 heathlands area, Dutch Ederheide and Ginkelse heide [
46],
M. caerulea encroachment abundance was estimated using AHS-160 (Airborne Hyperspectral Scanner) and Spectral Mixture Analysis (SMA) showing the correlation with field estimates at 0.48 R
2 value. The authors pointed out that time of acquisition (October) was unsuitable and did not allow for distinguishing
Deschampsia flexuosa from
M. caerulea. Molinion caeruleae was one of the best classified plant association in grassland habitats classification in Döberitzer Heide, west Berlin, reaching 96% of F1 accuracy due to a dominant yellow colour in autumn on RapidEye data and the fact that its flowering phase is later than for any other species [
33]. In the study presented here it should be noticed that last date of acquisition of airborne data was the beginning of September and
M. caerulea were not changing colour into yellow yet, so according to Schuster et al. [
33] the accuracies could be much better when data would be collected near to the end of September. High accuracy was for intra-annual time series of RapidEye and TerraSAR-X [
33], so adding vegetation structure information from radar data was useful. In grasslands classification based on only LiDAR full-waveform data in the Natura 2000 site in Sopron, Hungary [
54],
M. caerulea reached UA and PA between 60% and 80%, the authors found it overestimated but this was caused by the flight strip edges on Echo Width. Comparative tests performed in the frame of study presented here used only LiDAR data for
M. caerulea classification [
96]. In the first test full-waveform data were used for the best classified date (August) obtained previously and median Kappa from 5 iterations of calculated accuracies were 42.7% and F1 equal to 47.4% (PA and UA between 35%–74%). The second test was based on using only discrete data for September and it allowed us to reach 32.7% median Kappa and 34% median F1 (PA and UA between 25%–77%). Because the differences between used datasets with MNF transforms and tested here are significant, it underlines the importance of spectral information in discrimination of the species. This view is supported by Debes et al. [
64] and Luo et. al [
65], where improving the accuracy and efficiency of LiDAR applications after incorporation of passive optical remote sensing data were observed.
The result achieved in these examinations corresponds to the actual state of affairs, confirmed in traditional field research. Because of training and validation, only greater than 40% species cover in polygon patches with cover of M. caerulea lower than 30% were poorly detected, but one should take in mind that the species is a characteristic component of native vegetation (characteristic species for plant communities of Molinion caeruleae wet meadows). From the point of view of needs of protection of these habitats involving increasing the covering of this species, exceeding the 50% level, which can indicate about progressing disadvantageous changes, is significant.
The range for Kappa values for
C. epigejos could be interpreted as moderate in June [
95] for all scenarios and in August for all except sc08 and 09, to substantial in September, excluding sc01 and MNF with topographic indexes. For both Kappa and F1 values for this species, an increase in accuracy was observed using discrete and full-waveform data derived from LiDAR, which confirms that adding detailed information on the height and structure of vegetation is crucial in distinguishing these species from their background. Worse accuracy results were also noticeable for MNF and MNF with topographic indices in each campaign, which confirms the expansive nature of this species.
C. epigejos blooms from June to August and is in the fruiting phase in September. In the botanical evaluation of the results, the distribution of the species is fairly well presented due to the peak phase of fruiting. In this case, Calamagrostis coverage of over 60% is well-detectable. The main co-existing species is Solidago spp., which blooms and fruits at a similar time as the C. epigejos but differs in colour and shape. However, since these two species mingle with each other, it is often difficult to identify each of them within a pixel with a resolution of 1 m.
In general, species of the genus Calamagrostis belong to the group of species that is difficult to identify with remote sensing techniques. It is supported by unpublished analysis of
C. epigejos classified on September collections of HySpex data on two other Natura 2000 sites in Poland [
97], where Kappa accuracies were between 50%–60% and F1 between 60%–65%, influenced also by local conditions. Comparative test with using only discrete LiDAR data for
C. epigejos classification in Jaworzno Meadows were also conducted [
96], giving the median Kappa accuracies of about 18% and a median F1 value equal to 31% (UA and PA between 22%–52%). As in
M. caerulea test, this analysis showed that for discrimination of
C. epigejos the best method was the combination of spectral and height-dependent variables. However, higher accuracies were obtained using only hyperspectral data [
19,
20,
98]. In [
19,
20] the authors found out that a plant community with
Calamagrostis villosa in subalpine part of Giant Mountains classified on APEX data from September at about 70% level of PA. They observed this species encroaching into lower parts of the upper forest border. A similar situation was detected in Tatra vegetation classification on DAIS, with 7915 hyperspectral data acquired in August when the
Calamagrostietum villosae community reached one of the most diverse results and the accuracy was also about 70% [
98]. These analyses were carried out at the level of plant communities; therefore, it should be noted that it is the species that is being classified, which requires a slightly different approach. Both areas in mentioned literature were located in high mountains, where the variation of local conditions are totally different to in the lowlands. In the literature there is lack of detailed studies on the classification of the species
C. epigejos, so the work presented here is the beginning of a discussion for further research.
Comparing both species’ classification results, the accuracies obtained for C. epigejos were worse than for M. caerulea. The reason for this is probably that C. epigejos has a wider ecological spectrum and is appearing in analysed areas in various habitat conditions—from humid to dry and with the diversified cover, and is co-occurring with a substantial amount of species with which it can be confused.
When transferring the method to other areas, ground data with specific characteristics should be collected (more than 40 percentage of species cover, information about co-occurring species, especially these visually similar to analysed species), as well as the best time of acquisition for individual species connected with growing season should be kept in mind. The basis of dataset is hyperspectral mosaic transformed using dimensionality reduction method as MNF. Analysing the date of acquisition where grassland species differ considerably from the background requires LiDAR products, especially full-waveform or discrete rasters to improve the accuracy levels. For the hazard assessment created by expansive species the method presented in this study is bringing expected results and can be recommended for supporting traditional botanical field methods.
5. Conclusions
The expansive species encroachment into non-forest habitats under protection is a very important ecological task and it should be monitored. This study investigated the use of HySpex and LiDAR data for mapping the distribution of M. caerulea and C. epigejos—expansive grass species in “Jaworzno Meadows” Natura 2000 site in Poland at their different growth stages. The species were classified using a Random Forest algorithm with 7–9 scenarios of different datasets consisting of spectral data, MNF transforms and MNF with LiDAR derivatives. The Kappa accuracy assessment was performed iteratively using 50 repetitions of procedure, giving more objective results obtained for the different scenarios on data collected for three months. Additionally, for the best results, F1 accuracies for species and background class were calculated. In each case the dataset containing original spectral bands was the worst comparing to MNF transformation, only the F1 value for M. caerulea was slightly higher. In most cases the best dataset was sc09, consisting of MNF with all calculated LiDAR derivatives, but from an operational point of view, it is not the optimal solution. Adding intensity and vegetation structure data from LiDAR improved the results, especially full-waveform data from 2017 because of the more detailed information registered. The results show the worst accuracies obtained for the original data consisted of spectral bands.
M. caerulea spectral characteristics with used data allowed for better recognition than C. epigejos, which co-occurs with greater number of other species. Especially for C. epigejos classification datasets with topographic indexes worse accuracies were observed, which confirms the expansive character of the species, as it prefers wet and dry habitats. For M. caerulea it was not observed, confirming the preference of wetter areas. Predominantly the datasets with MNF and additional CHM or vegetation indexes did not show the differences to be significant statistically. The correctness of the result was influenced by the species covering in the polygon (the highest at the time of full flowering/fruiting), the maximum biomass and co-existing of other species. The best time to identify C. epigejos was September (optimum fruit formation) and for M. caerulea was August. The flowering period of the species was the recommended time to detect the species because of the high cover of the species in polygons (80%–100%).
The next step of this research could be feature selection giving the information which features from whole dataset allowed for the best classification results and then selecting only these to further analysis. It could make the procedure of species identification even more operational. However, our results are valuable because they show the potential of each dataset used, which is repeated three times, allowing for mapping of each species with relatively high accuracy. The results provide analytical potential for expansive species studies and great support for field work.