1. Introduction
Mapping forest tree species composition is crucial for the estimation of ecosystem services, informing policy decisions, and supporting forest management and conservation. Forests are the predominant land cover type in terrestrial protected areas and cover more than 45% of Europe’s total area [
1] They provide critical ecosystem services, including habitat provisioning, erosion and flood control, and climate regulation through carbon sequestration [
2,
3,
4,
5]. The variety and magnitude of these services are influenced by forests’ tree species composition [
6], since individual species vary in structural traits such as wood density, root depth, and water holding capacity [
7,
8,
9,
10]. Mapping tree species composition can be translated to maps of forest habitat types [
11] as defined in the European Union’s (EU) Habitat Directive [
12]. These are essential for monitoring the protected forest areas within the Natura 2000 network and support the implementation of EU’s broader environmental policies such as the Biodiversity Strategy [
13], the Forest Strategy [
14], the Birds Directive [
15] and the Nature Restoration Law [
16]. Additionally, tree species maps enable detailed analyses of land cover change, habitat quality [
3], and the effects of non-native tree species on ecosystem services [
17], serving the needs of forest managers and conservationists.
Traditionally, tree species mapping has relied on labor- and time-intensive fieldwork [
18]. This approach is expensive, spatially restricted, and often lacks standardization and external validation, limiting its usefulness in broader environmental monitoring strategies. Remote sensing (RS) images, on the other hand, provide a valuable tool for deriving spatially explicit information on forests, single trees, and tree species composition over large areas at high temporal frequencies. Nowadays, there is a wide variety of RS sensors and platforms used for tree species mapping at different scales [
19,
20], which are often more cost- and time-efficient compared to lengthy field-based inventories [
19].
The two main categories of satellite images used for tree species mapping are very high-resolution (VHR) and high-resolution (HR) images. VHR satellite images with pixel sizes below 1 × 1 m have been used to classify species at the scale of individual tree crowns (ITC) [
21,
22,
23] and can be purchased at price ranges from
$3 to
$48 per km
2 [
24]. In contrast to VHR sensors, high-resolution (HR) sensors onboard the Sentinel-2 (S2) satellites provide freely available images with a pixel resolution between 10 m and 20 m and very high temporal resolution (2–5 days). The Landsat satellite missions are another source of freely available HR images; however, they have lower spatial and temporal resolutions [
25,
26]. Overall, the accessibility of S2 data makes it more suitable than proprietary VHR data for developing standardized mapping methodologies to support European environmental policies [
27].
The lower resolution of S2 data means that tree species mapping is conducted at the scale of tree groups/forest stands [
11], rather than at the scale of individual tree crowns. The mapping is restricted to the pixel level, as the spectral signal per pixel is used to identify the dominant tree species in the canopy of that pixel [
19,
28,
29]. For such an approach, the broader term “tree species mapping” is used [
30,
31], where the dominant tree species or, consequently, forest types (e.g., beech forest), are mapped. This contrasts with the term “tree species classification”, which refers to identification at the scale of individual trees.
The classical pixel-based machine learning (ML) classification has been the most used approach for tree species mapping with S2 data, due to the sensor’s coarser resolution and lack of spatial features such as the tree crown texture. Deep learning has also been applied to S2 images to map tree species [
29,
32], or their proportions, in a single pixel [
33]. However, since the training of deep learning models is limited to studies with very large amounts of tree reference data [
20,
34], classical machine learning methods are still used [
28,
30,
35,
36,
37,
38,
39,
40,
41,
42]. While national inventories provide large amounts of forest reference data, these mainly represent the dominant tree species in the European context. In contrast, obtaining comparable amounts of data for less common species, such as those in riparian forests, remains a challenge. Consequently, most studies focus on dominant genera, such as spruce (
Picea), larch (
Larix), oak (
Quercus), beech (
Fagus), and fir (
Abies), whereas the mapping of forest types defined by less frequent tree species is still limited [
30].
The lower spatial and spectral resolutions of S2 data are compensated by their very high temporal resolution, with a 5-day revisit time at the Equator since 2015. Such dense time series enable observations of changes in the vegetation’s seasonal phenological cycles [
43], which are distinct among many tree species [
44,
45]. However, cloud-cover conditions reduce the available image dates for analysis and can significantly influence the recorded phenological patterns [
43,
46]. Nevertheless, previous studies on tree species mapping have shown that the use of multitemporal images improves overall classification accuracy by up to 5–10% compared to a single-date image classification [
28,
35,
36].
We identified three main approaches using multitemporal data for tree species mapping, which are often considered under the general term ‘time series classification’. These are: (i) combining a selection of multidate images (time series) [
28,
30,
35,
36,
47]; (ii) using seasonal composites calculated over two to four months [
48,
49]; and (iii) calculating long-term spectral-temporal variability metrics (STMs) [
31,
50,
51]. Some studies used a combination of multitemporal approaches such as incorporating STMs with time series data [
47]. Data cubes are another solution to the multitemporal classification challenge [
30]; however, these are rarely used in the field of tree species mapping.
The multidate classification, also referred to as time series classification, is one of the first explored multitemporal approaches for tree species mapping [
26,
52]. This method combines the band values from each single date in the chosen cloud-free image collection and feeds them to the classifier [
28,
36]. Usually, the dates are selected to represent different phenological periods [
35]. Comparatively, some studies use interpolated time series at equal-day intervals for the whole year [
53]. The multi-date approach does not require feature engineering, which makes it straightforward, simple, and intuitive to use. Nevertheless, its high dimensionality can lead to inefficiencies and overfitting in machine learning models, also referred to as the curse of dimensionality [
54]. Furthermore, since the feature variables are fully dependent on the dates used, the model cannot be applied to other time periods. Thus, the transferability of the multi-date approach is limited. This can be counteracted with time series interpolation; however, this increases the feature space even more [
55].
Seasonal composite approaches utilize the mean or the median pixel values of the chosen season to summarize the time series data for that period [
48,
49,
56]. Composites can focus on critical phenological periods (e.g., peak greenness in summer, senescence in autumn), which are most informative for vegetation classification or monitoring. The limitation of this approach is in defining the seasons to be used and their exact time frames. This requires expert knowledge, as key phenological periods might differ among species. Furthermore, the start of the phenological period can also differ among years due to climatic conditions, which can alter the statistical values of the fixed seasonal time frame. Another challenge is that seasonal boundaries are not universal. They vary across ecosystems, geographic regions, and climate zones. In general, seasonal composites can oversimplify the time series and important temporal variability within the season might be lost.
Instead of defining discrete seasonal periods, STMs aggregate spectral data for the whole year. This omits the need for expert knowledge on the seasonality of different species. However, variations in environmental and climatic conditions still need to be accounted for, especially if the classification is to be applied to larger spatial scales [
25,
55]. Furthermore, climatic conditions not only influence phenology traits, but also data availability and consistency across regions. Frantz et al. [
57] found that there is a significant difference across regions in the multi-annual consistency of Landsat observations and that the temporal distribution of cloud-free images is more important than the number of images, regarding STMs quality.
Although many studies have shown that the use of multitemporal data improves tree species mapping accuracies, there is a lack of research on how different multitemporal models perform compared to each other. This research gap limits our understanding of the advantages and disadvantages associated with different types of multitemporal data. It also reduces the clarity behind researchers’ decisions to choose one multitemporal model over the others. A comparison of multitemporal approaches is needed to discuss their transferability, standardization of the models, and thus advance the use of remote sensing methods for tree species mapping.
This study aims to compare three multi-temporal S2 classification approaches—multi-date stacked image classification, seasonal mean statistics, and STMs—to map five native and two non-native riparian species within a Natura 2000 riparian forest area in Austria. The tree species classification is constrained to the dominant canopy species, detectable at the S2 pixel resolution (10 × 10 m and 20 × 20 m). The multitemporal approaches are further compared to twenty single-date image classifications to test whether multitemporal data improves classification accuracy. Besides evaluating single-species classification accuracies, the binary classification of native versus non-native species groups is assessed. Finally, the potential application of the results for mapping the riparian forest habitats 91E0* and 91F0 under the EU Habitats Directive is discussed. The novelty of this study lies both in the comparison of the multitemporal approaches and in the mapping of less common EU tree species, whose distribution is important for riparian forest conservation.
4. Discussion
Similar to previous studies [
28,
30,
35,
36,
37,
38,
39,
40,
41,
42], the current research confirms that the use of multitemporal data improves RF classification models for tree species mapping. In addition, we compared three multitemporal approaches within a small, protected riparian forest. The results showed no significant difference in the accuracy between the three approaches. However, these findings are limited to the species investigated in this study and to the RF method. Future research should consider the growing potential of deep learning for tree species mapping [
20]. A focus should also be placed on improving our understanding of different multitemporal approaches across multiple spatial scales and tree species.
The three multitemporal approaches compared in this study did not differ significantly in accuracy; however, they varied substantially in terms of transferability and computational requirements. For example, the multi-date classification used six times more features (240 bands) than the seasonal classification (36 bands) and three times more features than the STMs classification (72 bands). In contrast to the multi-date classification, seasonal and STMs classification use aggregated band values over time. These approaches have the advantage of noise reduction, decreased feature space, and lower data storage and processing. Nevertheless, the robustness of those approaches under varying cloud-free image availability [
43] needs to be further researched to better assess their transferability. While the transferability of ML models for tree species mapping is rarely addressed, it is crucial for enabling RS applications to support monitoring for EU environmental policies.
Studies focusing on common European tree species have reported overall accuracies above 80% [
30,
48,
55], while this study achieved a maximum overall classification accuracy of 65%. However, the results of previous studies vary with species; the classification of less common species remains challenging. In the current study, only spruce (
Picea abies) and oak (
Quercus robur) can be considered as common European species, while the five other species are part of the less common genera
Populus (poplar),
Fraxinus (ash),
Alnus (alder), and
Salix (willow).
The F1 classification accuracies for
Salix alba,
Populus balsamifera, and
Alnus incana exceeded values reported in previous work [
28,
30,
55]. By contrast, the classification accuracy of
Quercus robur—a common European species—was lower than that of the stated studies. This is likely because those studies used oak-dominated forest stands as reference, while in this study, only data for single-planted oak trees dominant in 10 × 10 m pixels were available.
Alnus glutinosa showed an accuracy of only 30–36%, potentially due to the quality of the reference data and the absence of topographic predictors commonly used elsewhere [
28,
30].
Fraxinus excelsior also had low accuracy (<39%), consistent with other studies, as it typically occurs as scattered individuals in mixed stands [
30,
70]. Overall, direct comparison to previous studies is not straightforward, as these classified a greater number of species over larger areas. Future work should extend to less common European species, especially those of ecological relevance and linked to Habitat Directive habitat types [
71].
The evaluation of the mixed-pixels situation showed a high correlation between the classification accuracy and the amount of mixed pixels in the reference data of the 20 m bands. Tree species that contained a higher number of pure 20 m pixels achieved higher accuracies. These findings are important because previous studies claimed a classification map resolution of 10 m. However, they used only pure 20 m pixels for validation. Such a classification approach simplifies the forest landscape and introduces a bias towards higher classification accuracies. Similar to Blickensdörfer et al. [
30], this research suggests that the accuracy assessment of future studies should be more representative, including pure and mixed pixels, both in the centers and edges of forest polygons. Nevertheless, we acknowledge that the present study relies on a limited amount of reference data for evaluating the correlation; future research should use larger datasets to strengthen the validity of this statement.
Results from the feature importance test confirmed the importance of vegetation index bands in tree species classification [
36,
55]. The vegetation indices from the early growing season (April, May) were most important for tree species classification, consistent with that of Hemmerling et al. [
55] and Grabska et al. [
28,
48]. While previous studies underline the importance of the NDVI index [
30,
72] our results demonstrate better separability of tree species with the use of the Greenness index [
60]. Disadvantages of the commonly used NDVI index have been described by Baraldi et al. [
60]; these include the influence of the background signal in cases of a low leaf area index (LAI); saturation of the NDVI values if the canopy is too dense; and a non-linear relationship between the LAI and the NDVI. More applied research is needed, however, to further strengthen these assumptions, particularly for the purpose of tree species mapping.
Despite lower accuracies for some single species, the seasonal model achieved 92% accuracy in classifying native versus non-native riparian species. The introduced riparian species,
Populus balsamifera and
Picea abies, were mostly found in pure forest stands, with strong spectral signatures, which is likely the reason for their higher classification accuracies. While this is not surprising from an RS methodological point of view, this observation has practical implications for updating Natura 2000 maps. For instance, we found an 8% overlap between non-native species areas and the official Natura 2000 riparian forest map of the Salzachauen (
Figure 11). This approach has high potential for improving existing Natura 2000 maps and reducing overestimation of natural habitat areas. Maps of native versus non-native tree species can also be a valuable input to habitat quality evaluation [
3,
73] and ecosystem services estimation [
17].
Further implementation of the three species maps produced in this study would be their translation into the corresponding Annex I habitat type of the EU Habitats Directive [
11]. Our results showed that misclassification of single native species occurs with species of the same habitat class (91E0* and 91F0). This demonstrates the potential of tree species mapping for habitat type classification in order to fill data gaps in Natura 2000 areas [
71]. A relevant question for future research is whether tree species of the same habitat type have similar phenology, which can potentially improve the separation and mapping of those habitats. Unfortunately, the present study lacked enough reference data to test this hypothesis. Future work should therefore focus on identifying the phenological spectral curves of tree species (through field spectroradiometer observations or temporal gap-filling of satellite-derived time series data) to evaluate whether incorporating phenological patterns can improve habitat classification.
5. Conclusions and Outlook
This study underlines the potential of using multitemporal Sentinel-2 imagery for the classification of tree species within riparian forests, with particular attention to distinguishing between native and non-native species. The RF models based on the multitemporal approach demonstrated a significantly higher overall classification accuracy compared to models based on single-date analyses. The seasonal multitemporal approach achieved above 73% accuracy for four individual species and 92% overall accuracy for classifying native versus non-native species. Nonetheless, these results apply specifically to machine learning classification using RF; further research is required to extend them to the rapidly growing field of deep learning for tree species mapping.
The findings highlight the importance of vegetation indices, especially those obtained during the early growing season (April and May), for effective tree species classification. However, the study also revealed challenges related to mixed pixels at 20 × 20 m resolution, which influenced classification accuracy. Species with a higher ratio of pure pixels achieved better classification results, suggesting that future studies should consider both pure and mixed pixels in accuracy assessments to avoid bias and ensure a more comprehensive representation of the forest landscape.
The successful identification of non-native species within a Natura 2000 protected area emphasizes the practical applications of this methodology for environmental monitoring and management. By providing maps of native and non-native species distribution, this approach can support spatial planning for the habitat restoration of riparian forests and inform policy decisions, particularly those related to the EU Habitats Directive, by providing additional information on habitat distribution and conditions. To achieve this, more research focused on less common, but ecologically significant, tree species is needed, accompanied by a higher quantity and quality of reference data. Future work should also explore, in more detail, the scope of spectral–temporal metrics for improving classification accuracy and transferability. Finally, translating such tree species classification models into habitat type maps should be practically evaluated for the management and monitoring needs of data-poor Natura 2000 areas.