Automated Recognition of Tree Species Composition of Forest Communities Using Sentinel-2 Satellite Data

Polyakova, Alika; Mukharamova, Svetlana; Yermolaev, Oleg; Shaykhutdinova, Galiya

doi:10.3390/rs15020329

Open AccessArticle

Automated Recognition of Tree Species Composition of Forest Communities Using Sentinel-2 Satellite Data

by

Alika Polyakova

,

Svetlana Mukharamova

,

Oleg Yermolaev

^*

and

Galiya Shaykhutdinova

Institute of Environmental Sciences, Kazan Federal University, 5 Tovarisheskaya Street, 420097 Kazan, Russia

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(2), 329; https://doi.org/10.3390/rs15020329

Submission received: 30 November 2022 / Revised: 26 December 2022 / Accepted: 1 January 2023 / Published: 5 January 2023

(This article belongs to the Special Issue Remote Sensing of Climate-Vegetation Dynamics and Their Effects on Ecosystems)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Information about the species composition of a forest is necessary for assessing biodiversity in a particular region and making economic decisions on the management of forest resources. Recognition of the species composition, according to the Earth’s remote sensing data, greatly simplifies the work and reduces time and labor costs in comparison with a traditional inventory of the forest, conducted through ground-based observations. This study analyzes the possibilities of tree species discrimination in coniferous–deciduous forests according to Sentinel-2 data using two automated recognition methods: random forest (RF) and generative topographic mapping (GTM). As remote sensing data, Sentinel-2 images of the Raifa section of Volga-Kama State Reserve in the Tatarstan Republic, Russia used: six images for the vegetation period of 2020. The analysis was carried out for the main forest-forming species. The training sample was created based on the cadastral data of the forest fund. The recognition quality was assessed using the F1-score, precision, recall, and accuracy metrics. The RF method showed a higher recognition accuracy. The accuracy of correct recognition by the RF method on the training sample reaches 0.987, F1-score = 0.976, on the control sample, accuracy = 0.764, F1-score = 0.709.

Keywords:

tree species; remote sensing; Sentinel-2; classification; random forest; generative topographic mapping; forest inventory; Raifa forest

Graphical Abstract

1. Introduction

Forests are a key element of many biogeocenoses and play an important role in the functioning of the world’s socio-economic system [1]. Forest cover is constantly changing under the influence of natural and anthropogenic factors. Monitoring the Earth’s surface state is a key requirement for the study of global environmental changes [2]. Field monitoring methods are expensive and labor-intensive, and so remote methods are actively replacing them [3,4,5]. By improving the quality of satellite data in open access databases, these methods have increasingly gained popularity.

Currently, remote sensing data obtained from the Sentinel satellites family have the highest spatial resolution among Earth remote sensing data distributed on a non-commercial basis [6]. Among them, the multispectral data of the Sentinel-2 satellites (launched in 2015 and 2017) have the greatest potential for the task of recognizing the species composition of forest communities [7,8,9,10,11]. They provide information in several spectral ranges in the visible, near, and short-wavelength infrared parts of the spectrum with a spatial resolution of 10–60 m. A higher accuracy composition of forest community recognition from Sentinel-2 images than that from Landsat data were demonstrated in [12], primarily due to the higher spatial resolution of the Sentinel-2 data.

A visually subjective interpretation of images can be relatively inaccurate, especially with respect to tree species composition. There are many sources of potential errors, including photointerpreter skills, image quality, and complexity of the forest species composition [13,14]. As such, automated recognition methods are more often used. Object-based and pixel-by-pixel classification methods can be applied to recognize the species composition of forest communities. The object-oriented approach is applicable only to high-resolution and ultra-high-resolution images, from which individual tree crowns can be distinguished [15,16]. At a resolution of more than 20 m, the recognition accuracy drops significantly, and recognition based on individual trees is no longer possible (spectral mixing in pixels increases) [17]. In this case, the pixel classification is applied. This is based on the analysis of averaged information for all objects that fall into the pixel, which is characteristic of a certain plant community type. Sentinel-2 data can be attributed to high spatial resolution data conditionally. Therefore, both the first and second approaches are used in the analysis in question [3,9,11,16]. Two approaches (object-oriented and per-pixel) were compared on the quality assessment basis of the species composition interpretation from Sentinel-2 images at two test sites located in Central Europe [9]. Recognition of the dominant species, with an accuracy of 85%, was the same in both approaches. However, object-oriented methods require more time for data preparation than pixel-by-pixel recognition methods. To recognize the species composition of forest communities, machine learning methods are used, including the random forest (RF) method [18]. Over the past two decades, the use of the RF classifier in the analysis of remote sensing data has attracted increasing attention due to the results obtained with high classification accuracy and processing speed [16,19,20,21,22]. In addition, this classifier can be successfully used to select and rank variables. This is an important advantage given that the high dimensionality of remote sensing data makes the selection of the most relevant variables a laborious [23], error-prone and subjective task [24]. An approach using one of the new methods—convolutional neural networks (CNN), combining per-pixel and object approaches in the learning process—was applied to the tree species mapping of Russian boreal forests (Leningrad oblast) using Sentinel-2 satellite images [25]. The proposed modification surpassed the widely used per-pixel semantic segmentation model in terms of prediction quality (the average F1-score metric was increased from 0.68 to 0.74).

In all studies using Sentinel-2 images, the importance of infrared and SWIR bands is noted [7,8,9,10,11,26]. Some studies point to the special importance of the blue and green wavelengths [26,27]. In [28], the authors discuss the special importance of the blue range for the classification of coniferous trees due to their relatively lower photosynthetic activity in blue light. As input data for recognition algorithms, in addition to the values in the image bands, spectral indices NDVI (Normalized Difference Vegetation Index), EVI (Enhanced Vegetation Index), etc., were used. Indices can be used as additional input data in the analysis and interpretation (classification) of space images [29] and provide additional information about the vegetation cover [7,11].

It is important to note that the satellite revisit time is 2–5 days, which makes it possible to obtain several cloud-free images during the growing season. A number of works have shown that the quality of classification can be improved by using multi-temporal data. Certain vegetation species can be identified by their unique phenology [5]. The principle of recognizing these objects and revealing their characteristics through remote sensing data are based on differences in the spectral response of objects of different types on different phenological dates [30]. The greatest differences are shown by tree species in spring and autumn images, which is confirmed in [8,10,31,32]. In [32], the best results were achieved when using three images, as well as in the study [8] when using five images during the growing season. At the same time, it is recommended to avoid taking images in late autumn, when most of the leaves have fallen (the influence of the background increases and the separation of broad-leaved species becomes more problematic [33]).

Particular attention is given to the problem of creating training samples, since their quality and representativeness significantly affect the accuracy of the results. The materials for the training sample can be ground data, collected in the field during the study itself [10,31,32], as well as fund data of forest inventory [7,8,9,10,11,16,17]. Sometimes these are orthophotomaps and ultra-high resolution satellite data, which are partially visually interpreted [7,9,19,34,35]. In [36], they built models using inventory data and using field data, concluding that models trained in the field can outperform models trained from inventory data, despite the larger sample size.

The aim of the study is to evaluate the automated recognition possibilities of the forest communities’ species composition in the coniferous–deciduous forests of Russia using Sentinel-2 satellite data. The usefulness of Sentinel-2 data, in combination with ground forest inventory data, is investigated for predicting multivariate species composition using the example of the Raifa section of Volga–Kama State Reserve (hereinafter referred to as the Raifa forest), located in the Tatarstan Republic, Russia.

2. Materials and Methods

2.1. Study Area

The study object is the forest cover of the Raifa forest, with a total area of 3450 thousand ha. The Raifa forest is located within the Western Kazan terraced valley region of East European pine and broad-leaved pine subtaiga forests on high floodplain terraces of the Volga (Figure 1) [37]. There is a temperate, continental climate, with warm summers, moderately cold winters, and uneven precipitation. The geomorphological structure of the site with its diversity contributes to the heterogeneity of the microclimate. Soils are represented by thick, sandy podzol on loamy soils, sandy, soddy, strongly podzolic soils on loamy soils, and sandy, loamy, soddy, weakly podzolic soils on sandy soils, as well as soddy, medium and slightly podzolic light loamy soils [38]. The study area combines the formations of three forest zones in the European part of Russia—the southern taiga, mixed and broad-leaved forests [39]. Most of the tree species growing in the form of pure plantations and mixtures in the study area are three main groups: small-leaved-birch, aspen; conifers—pine, spruce; broad-leaved—linden, oak, maple [40]. The Raifa Forest is a protected forest, a part of the Volga–Kama State Reserve which was organized in 1960.

2.2. Ground Data

The geo-information database of forest inventory data, created on the base of cadastral descriptions of the Raifa forest in 2013 (provided by the administration of the Volga–Kama Reserve), was used as the initial forest cover ground data. The database includes a vector map (layer) of forestry stands—a digitized and georeferenced plan of forest stands (on a scale of 1:10,000) (Figure 2). For each stand, the database contains the following taxation characteristics: composition of the forest community in the stand (percentages abundance of the main tree species), average age, average diameter, the average height of trees, forest quality, and others. The study used information about the upper (first) forest tier of 1972 stands of the Raifa forest. According to the forest cadastral data, there are 13 species of trees here (Table 1, Figure 3). Pine grows on most of the territory (53.6%). Birch, linden, and spruce are quite abundantly represented. Ground data were used to create a training sample and also to test the quality of the remote sensing data interpretation.

2.3. Remote Sensing Data

For the study, we used remote sensing data obtained from the Sentinel-2 satellites. We used data with a Level-2A processing level (atmospherically corrected) [41]. Sentinel-2 Level-2A data for the study area is available from the second half of 2019 only. We used images for the 2020 growing season. In our case, the study area is located in the protected zone. The reserve regime provides full protection against active forest management: clear and selective cutting, and so on. There were not any forest fires and significant climate disasters (windbreaks, windblows) in the Raifa Forest over the past decade also. There are only background dynamic processes that take place in the protected forest that lead to the relative stability of tree species composition of forest communities, no sharp changes from year to year in the upper forest tier. In this regard, we consider that the seven-year shift between the date of ground surveys (2013) and the date of remote sensing observations (2020) should not become critical.

Sentinel-2 Level-2A data are data on the reflectances of the earth’s surface (values of Bottom-Of-Atmosphere (BOA) reflectance); these data are organized into ten spectral bands. The spatial resolution is 10 m in four spectral bands (Band 2: Blue, 458–523 nm; Band 3: Green, 543–578 nm; Band 4: Red, 650–680 nm; Band 8: Near-infrared (NIR), 785–900 nm). Additionally, the spatial resolution is 20 m in six bands (Band 5: Red-edge I, 698–713 nm; Band 6: Red-edge II, 733–748 nm; Band 7: Red-edge III, 773–793 nm; Band 8a: Narrow Near-infrared (NNIR), 855–875 nm; Band 11: Shortwave infrared-1 (SWIR1), 1566–1651 nm; Band 12: Shortwave infrared-2 (SWIR2), 2100–2280 nm). Cloudless images, covering the study area and closest in time to the ground Raifa forest inventory, were selected from the Copernicus Open Access Hub server [42]. A total of 6 images for the 2020 growing season were used (Table 2). Dates of remote sensing data: 9 May, 21 June, 8 July, 5 August, 24 September, and 29 October 2020.

Spectral vegetation indices are often used as additional input data in the analysis and interpretation of vegetation cover using remote sensing data; these indices can provide additional information for recognition algorithms [7,11,29]. That is why, for each of the 6 images, a raster with vegetation index NDVI = (NIR − Red)/(NIR + Red) for the corresponding date was calculated as an additional spectral variable, where NIR is the near-infrared band (785–900 nm), and Red is the red band (650–680 nm).

For further processing and analysis, a composite was prepared to contain 66 raster layers: 10 spectral bands of 6 multi-temporal images and 6 corresponding NDVI vegetation indices (for the corresponding dates). At the stage of pre-processing, all the raster layers were masked by the study area border.

2.4. Training Sample

The list of recognizable classes (tree species) and the reference sites of their presence were determined based on the ground study data. The list of classes includes birch, spruce, linden, alder, and pine. Rare species (<1%) are not included in the list of recognizable classes. Pine, which is widely represented in the study area, is divided into two subclasses: “natural old-growth forest” and “plantations at the felling site”. To determine the reference sites—the locations of certain tree species—stands with a homogeneous composition of the forest community were selected. So, for birch and two types of pine, stands were considered as homogeneous where a 100% presence of one of these species was recorded in the upper tier. Additionally, for alder, spruce and linden, which are represented on mixed stands in the study area, stands were taken with one of these species predominant, namely, with at least 80% presence. In total, the data set of reference sites contains 122 stands with a total area of 230 ha—Table 3. The pixels, corresponding to the reference sites, define a multidimensional training sample: for them, belonging to the recognizable classes is known, and the values of the layers of the analyzed composite in these pixels serve as the reference values of the corresponding classes. The control sample was created only for species that are abundantly represented in the territory—birch and two subclasses of pine—using stands with 100% presence of each of these species that were not included in the training sample (Table 3). Alder, spruce and linden grow on mixed stands in the study area. There are few stands where these species predominate, and all of them were included in the training sample so that it was representative (sufficient sample size) and preliminarily balanced.

2.5. Spectral Properties Analysis

The analysis of the spectral properties of the selected forest-forming species was carried out on the basis of spectral signature curves construction. This curve shows the relationship between the wavelength and the reflectance of the object under study. The spectral curve of vegetation cover objects has a typical shape, but some differ for different types of objects (for example, different tree species). It also depends on the survey date due to the phenological variability of vegetation cover properties. An analysis of the spectral curves features makes it possible to identify the most informative spectral bands by determining at what wavelengths, and on what phenological dates, the greatest differences occur between the reflectances of different tree species.

2.6. Recognition Methods

To recognize the species composition of forest communities, we used two modern modeling methods, both of which are actively used in the analysis and interpretation of remote sensing data: the random forest (RF) and the generative topographic mapping (GTM).

The RF method, proposed by Leo Breiman [18], has become widely used in solving pattern recognition problems and is currently one of the most popular methods of the supervised classification and construction of nonparametric regression. This method is an ensemble classifier. That is, it is a set of classifiers (decision trees), each of which generates its own solution, and the final classification is obtained by “voting” different classifiers. The decision trees are trained on various data sets, obtained from the training sample using the bootstrap aggregation (bagging) procedure. Moreover, the decision trees use various features, randomly selected from the original set of features, to make decisions. The method is resource-intensive but quite simple to use—its implementation requires two parameters to be set: the number of classifiers (decision trees) and the number of features randomly selected when branching each decision tree. Before building classification models, the VSURF (variable selection using random forests) function is used to reduce the number of input variables by selecting the most important variables [43]. The VSURF provides two variables’ subsets. The first, called «interpretation», shows variables highly related to the response variable (even with high redundancy in this subset) for interpretation purposes. The second, called «prediction», is a sufficient parsimonious (redundancy is eliminated) subset of important variables, designed to provide a good prediction of the response variable. To implement the RF algorithm, as well as to preprocess data and analyze the results, software modules were developed in the R language [44] using rgdal [45], raster [46], RandomForest [47], and VSURF [43] packages.

The GTM method is a development of Kohonen’s self-organizing neural networks (Self Organizing Map, SOM) [48,49,50] and allows us to obtain a predictive model for the probabilities of thematic classes. Thematic classification by this method is carried out in two stages. At the first one, an unsupervised classification of pixels is performed, based on the similarity of their spectral characteristics, generating a given (large enough) number of spectral classes. The method makes it possible to build the ordination of these classes—a mapping of classes in a two-dimensional space, which allows the tracking of the “mutual position” of classes in the feature space and, thereby, the approximating a continual change in the properties of objects. The mapping of classes to the ordination plane is performed with the preservation of topological properties (that is, classes that are similar in the feature space will be located close to the ordination plane). To convey the degree of similarity of classes, their visualizations are rendered in the form of a minimum spanning tree and its Sammon mapping [51]. In the second stage, based on the training sample, the thematic interpretation (calibration) of spectral classes is performed with the construction of a nonparametric regression model of thematic classes on spectral classes. The result of the calibration is a probability distribution of thematic classes for each spectral class. The GTM method was implemented using the «Scanex Image Processor» program for processing Earth remote sensing data (ThematicPro module) [52].

2.7. Accuracy Assessment

To assess the classification quality considered two ways. In the first case, a pixel-by-pixel comparison of the classification results, conducted with the training and control samples, was performed. A pixel-by-pixel comparison was carried out based on the contingency matrices and the calculation of such classification quality metrics as recall, precision, F1-score, accuracy:

T P R = \frac{T P}{T P + F N}

(1)

P P V = \frac{T P}{T P + F P}

(2)

F 1 = \frac{2 \times T P R \times P P V}{T P R + P P V}

(3)

a c c u r a c y = \frac{T N + T P}{T P + F N + F P + T N}

(4)

where TPR is a true positive rate or recall, PPV is positive predictive value or precision, TP is True Positive (the number of correctly classified pixels of a given class), FN is False Negative (the number of pixels of a given class missed by the model), FP is False Positive (the number of pixels classified as a given class while being of another class), and TN is True Negative (the number of correctly classified pixels do not belong to the given class).

The second way was to compare the model’s (obtained from the results of recognition) and the real (terrestrial, presented in the forest taxation database) percentage compositions of tree species on stands. For each species, statistical characteristics of errors (differences between real and model percentages of species in the stands) were calculated: mean error (ME), mean absolute error (MAE), root mean square error (RMSE), and weighted average percentage error (WAPE):

M E = \frac{\sum_{i = 1}^{N} (p r o c_{r e a l, i} - p r o c_{m o d e l, i})}{N}

(5)

M A E = \frac{\sum_{i = 1}^{N} | p r o c_{r e a l, i} - p r o c_{m o d e l, i} |}{N}

(6)

R M S E = \sqrt{{\frac{\sum_{i = 1}^{N} (p r o c_{r e a l, i} - p r o c_{m o d e l, i})}{N}}^{2}}

(7)

W A P E = \frac{\sum_{i = 1}^{N} | p r o c_{r e a l, i} - p r o c_{m o d e l, i} |}{\sum_{i = 1}^{N} p r o c_{r e a l, i}}

(8)

where N is the number of stands. In addition, for each stand, the distance between the model «forest stand formula» and real «forest stand formula» was calculated using a metric such as the Manhattan distance:

Manhattan distance = \sum_{j = 1}^{n = 5} |p r o c_{r e a l, j} - p r o c_{m o d e l, j}|

(9)

where n is the number of tree species.

3. Results

3.1. Spectral Properties of the Studied Tree Species

The spectral curves were constructed based on the pixels of the training sample of six Sentinel-2 images, obtained on different phenological dates, and generalized using the mean for the studied species. This is shown in Figure 4. The greatest differences between the reflectances of tree species are observed in the red-edge (Band 6, Band 7) and near-infrared (Band 8, Band 8a) bands in all images. Differences are practically not observed in the visible part of the spectrum. Far infrared (SWIR Band 11, Band 12) bands provide useful information at the beginning and end of the growing season. In the summer months, the spectral curves of coniferous and deciduous species are quite different from each other—the reflectances of coniferous species in all bands are lower than those of deciduous species. In autumn, the situation changes to the opposite. Indeed, for linden, this already so in September, whereas for alder and birch, this is so in October. Between themselves, the curves of hardwoods are less distinguishable. The greatest difference between linden and other deciduous trees can be seen on the August and September curves, while alder and birch in these months are practically indistinguishable from each other in terms of spectral properties. The ratio between the deciduous curves (higher–lower) varies depending on the date. That is, for the separation of hardwoods, the individuality of the seasonal dynamics of their reflectances (June, July, August, September) is important. The most informative data point for the separation of different ages pine subclasses is the October image, while in the summer months their reflectances practically coincide. The spectral curve of spruce noticeably separates from the curves of other species on the graphs of summer images. In our case, the May image turned out to be the least informative, due to its too early date (9 May), before the start of the “green wave” (usually, images from the second half to the end of May are very useful for recognizing tree species). An analysis of the separability of the NDVI empirical distributions for different tree species (Figure 5) showed that two pine subclasses are best separated from other species by NDVI, calculated from June and October images. The least significant images for the recognition of tree species are the May and September NDVI, and the most significant are the July and August ones.

The analysis results of the tree species’ spectral reflective properties, the weights of each of the 66 features included in the satellite composite (for 10 spectral bands and NDVI of 6 multi-temporal images) were set expertly—as shown Table 4. These weights were taken into account at the classification stage.

3.2. Automated Recognition of Tree Species

Pixel classification, with the allocation of six target classes (birch, linden, alder, spruce, two subclasses of pine), was implemented by two methods—RF and GTM. The prepared composite of multi-temporal Sentinel-2 data with 66 layers (variables) and the training data set were fed into the algorithms as input. In both cases, the training data set was balanced using sampling algorithms. The significance of predictors was also taken into account. In the case of the RF method, the most important variables were selected using the VSURF function [43]. In our case, these are Band 5, Band 6, Band 7 from July image to October image, Band 12 on May image and on October image, Band 3 on July image, NDVI on June, August and October images. For the GTM method, the significance of variables was set by weights (Table 4). The parameters of the RF method are set as follows: the number of decision trees is 500 (which, as many studies have shown [53,54], is sufficient to stabilize errors), and the number of features randomly selected for each tree from the original set of features is 8 (square root of the total number of variables) [55]. For the GTM method, the number of allocated spectral classes was 225 classes (15 × 15 grid). The results of recognition are presented in the form of 2 raster layers, the pixels of which are assigned the codes of classes of recognized species (birch, linden, alder, spruce, two pine subclasses), obtained by two methods (RF and GTM), respectively. Figure 6 shows maps of the classification results.

3.3. Recognition Quality Assessment

To assess the quality of interpretation, each of the 2 resulting rasters was pixel by pixel compared with the training and control samples. The recognition quality metrics (1)–(4) calculated based on contingency matrices are shown in Table 5.

The second way for assessing the quality of interpretation was a comparison of data in stands: ground data (represented in the forest taxation database) and model data (obtained from recognition results). Preliminarily, both resulting classification rasters were sampled on the vector layer of the Raifa Forest stands, and the model percentages of the recognizable species present in each of the 1972 stands were calculated, i.e., percentages of species according to the interpretation results. The statistical characteristics (5)–(7) of errors (differences between real and model percentages of species in the stand) are given in Table 6. Figure 7 shows frequency histograms of these errors. Figure 8 shows frequency histograms of the Manhattan distance (9) values between real and model «forest stand formulas». Thematic maps of the spatial distribution of the real (based on ground data) and predicted (model) percent abundance values of tree species in stands in Raifa forest are given in Figure 9. These maps allow you to assess the quality of interpretation visually, as well as to see the spatial specifics of the agreement between the model and real values.

4. Discussion

4.1. Validation of Results

In general, the RF model outperformed the GTM model (Table 5). The overall accuracy of correct recognition of six tree species by the RF method reaches 0.987 on the training data set, while the GTM method demonstrates an accuracy of 0.829. Such a results correctness measure is the F1-score, which makes it possible to evaluate the accuracy and sensitivity of the method in aggregate (harmonic mean of precision and recall), for RF is 0.976, for GTM is 0.729. On the control data set for three classes (birch and two pine subclasses), the recognition accuracy was 0.764 for RF and 0.673 for GTM, and the F1-score was 0.709 and 0.628, respectively. For species that are less frequently represented, both in the entire study area and in the training sample (alder, spruce), the GTM method performed noticeably worse than RF (Table 5). This seems to be due to insufficient effort in balancing the training data when implementing the GTM algorithm, and training with rebalancing can improve the quality of this model. A comparison of the accuracy metrics, calculated on the training data, with ones on the test data characterize a property of the model such as overfitting (overtraining). These kinds of issues should be avoided. A model is said to be overfitted when it shows pretty good performance for the training data set (fit) but gives poor accuracy for the test data set (predict). When overfitting, the model tries to fit the training data exactly and ends up modeling noise/random fluctuations, defeating the modeling purpose. In our case, the RF-model shows some overfitting (Table 5), but it could not be reduced without reducing the accuracy of the prediction. The GTM-model is less overfitted.

In addition to comparing the results with the training and control samples, we compared the results of interpretation with the data of the ground inventory of the Raifa forest. That makes it possible to assess agreement throughout the study area. We compared real (according to ground inventory data) and predicted (according to models based on remote sensing Sentinel-2 data) values of the percent abundance of tree species in stands—Table 6, Figure 7, Figure 8 and Figure 9. The correlation coefficients of real and model percentages of species in the stands allow us to speak about the consistency of real and predicted values. The correlation of the model percentage of pine and linden in the stands with real percentages in both methods (RF and GTM) is quite high (>0.8). The correlation coefficient is relatively high for birch (RF—0.68 and GTM—0.6) and alder for the RF method (0.7). At the same time, the accuracy of predicting the percentage of a species in a stand (characterized by the MAE, RMSE, and WAPE indicators) is significantly better by the RF method than the GTM method for all species. In general, for the entire territory, both methods somewhat overestimate the presence of pine (RF—ME = −4.4, GTM—ME = −3.9), and underestimate the presence of birch (RF—ME = 2.8, GTM—ME = 12). The RF method coped much better with the prediction of the percentage of spruce and alder, rarer species for the territory, on stands.

Quality control of the prediction results of the species percentages in total can be assessed using the similarity metrics of the «forest stand formulas»—real formula (according to ground inventory data on the stands) and model formula (according to models on base remote sensing Sentinel-2 data). As such a metric, we used the Manhattan distance—(9). This indicator can take a value from 0 to 200 (0—if the real and model formulas are the same; 200—with the maximum difference between reality and forecast). The closer the value is to 0, the smaller the «distance» between the formulas, and hence the interpretation error. The histograms (Figure 8) clearly demonstrate the degree of similarity of the stand formulas: the distribution has a pronounced right-sided asymmetry, and most of the values are close to zero. That is, the model and real forest stand formulas are basically close. Here, it can also be observed that the results obtained by the RF method are more accurate than the results of GTM.

For a comparison in geographic space, we mapped the forest inventory data and the predicted abundances (using both methods RF and GTM) of pine and linden in the Raifa forest (Figure 9). It can be seen how the obtained results correlate with the forest taxation information geographically.

4.2. Limiations and Further Study

A comparison of our results with other similar studies [8,9,10,25,36], devoted to automated recognition of multivariate tree species composition of forest communities using Sentinel-2 satellite data, showed a similar quality of correct recognition. The review of these studies shows that the overall accuracy of tree species recognition using Sentinel-2 data can be characterized by values in the range of 0.7–0.9, depending on the study area and the study design. Usually, differences in the success of classifying certain species are explained by the quality of the created training samples, the small size of the training data set, the similarity of the spectral characteristics of individual species, as well as their unbalanced representation in the training data [8,10,56,57,58]. The source of errors is also the diversity within each forest species: spectral characteristics differ for different ages of trees and depend on habitat conditions. In addition, species tend to grow in a mixture, and we will have a mixture of the spectral properties of several species in a pixel with a spatial resolution of 10 m. It should be noted that the error of estimating the percent abundance of one or another tree species in the stand increases with an increase in the heterogeneity of the species composition of the community. So, for all stands in general, the weighted absolute percentage error (WAPE) was: linden—0.52, birch—0.64, pine—0.26. However, if we take into account only stands with a homogeneous composition of the forest community (those where >90% presence of only one specific type), then the WAPE error becomes smaller: linden—0.10, birch—0.32, pine—0.05. That is, the more heterogeneous in species composition of the community is, the less accurately its species composition is recognized, and, consequently, the ratio of the percent abundance of species in the community. The limiting factor here, first of all, is the spatial resolution of the Sentinel-2 data (10 m), which does not a allow more accurate recognition of the species composition of trees in mixed communities. Additionally, as already mentioned, the source of errors is the fact that the spectral properties of species depends on many factors, including the age and ecological conditions of trees, habitat conditions, the effects of shading the crowns of some species by others, undergrowth influence on upper-tier species spectral characteristics, etc. In addition, the values of the error indicators could also be affected by possible errors in the inventory ground data: incorrectly identified species and boundaries of stands, inaccuracies in determining the ratios of species in stands, etc. In our case, another factor is the time gap between the ground inventory (2013) and the Sentinel-2 survey (2020). Even though sharp anthropogenic changes in the vegetation cover under the conditions of the reserve are excluded, local disturbances associated with natural causes (windfalls, animal activity, etc.) still occur.

Thus, the assessment of the quality of tree species recognition, based on Sentinel-2 multi-temporal multispectral data, showed that the constructed models generally demonstrate a high degree of agreement with ground data on the territory of the Raifa Forest. The results obtained by the RF method showed higher accuracy. Although the GTM method demonstrates lower recognition accuracy than RF, it has a useful property—the ability to give an answer in the form of a probability distribution of species on a particular pixel, which provides additional opportunities for detailed analysis of forest composition. In order to further improve the quality of models, a promising approach is to adjust the training set. Although even fairly old forest inventory information can be used to train models, it can be expected that a smaller time gap between the ground data used in training and the Sentinel-2 satellite imagery will lead to improved recognition quality. Another possibility for improving the quality of recognition may be the use of the convolutional neural network (CNN) method for these purposes, which has shown good results in a number of studies. The relative rarity of ground updates of forest inventory data (once every 10 years and less often for protected areas), and the fact that new Sentinel-2 satellite data has become permanently available, testifies to the potential usefulness of our Sentinel-2 data automated interpretation approach for forest data updates. Information from Sentinel-2 can be used to characterize forests in combination with field data. The possibilities include checking model consistency.

5. Conclusions

The study used multi-temporal multispectral remote sensing data from the Sentinel-2 satellites for the automated recognition of tree species composition of forest communities. Additionally, NDVI rasters were calculated for each survey date. The training sample was created using forest inventory data: by pixels of stands with a homogeneous composition of the forest stand (where only one predominant species was observed in the upper tier). Recognition was carried out by two methods of automatic classification: RF and GTM. The RF method showed a higher recognition accuracy. The accuracy of correct recognition of six species (birch, spruce, linden, alder, pine/natural old-growth forest, pine/young at the felling site) by the RF method, estimated on the training set, reaches 0.987, F1-score = 0.976. In the control sample, for three classes (birch and two sub-classes of pine), accuracy = 0.764 and F1-score = 0.709. Throughout the study area, the correlation between real and predicted percentage abundance of species in the stands allows us to talk about the consistency of real and model stand formulas. The error in estimating the percentage abundance of one or another tree species on a stand increases with an increase in the heterogeneity of the species composition of the community. Both the optimization of the training set and the testing of new recognition methods based on remote sensing are promising means of improving models. The considered approach can be useful for updating forest inventory data and for checking the information on the ground forest inventory.

Author Contributions

Conceptualization, S.M.; formal analysis, A.P. and S.M.; funding acquisition, O.Y.; investigation, A.P.; methodology, S.M.; project administration, O.Y.; resources, G.S.; supervision, O.Y.; validation, G.S.; visualization, A.P.; writing—original draft, A.P.; writing—review and editing, A.P. and S.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Russian Science Foundation (grant No. 22-17-00025, https://rscf.ru/project/22-17-00025/, accessed on 29 November 2022).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The author would like to thank the reviewers for their valuable comments and suggestions, which help to improve the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Westoby, J.C. Introduction to World Forestry: People and Their Trees; B. Blackwell: Oxford, UK; New York, NY, USA, 1989; ISBN 978-0-631-16133-2. [Google Scholar]
Global Environmental Change: Research Pathways for the Next Decade; National Research Council (U.S.), Ed.; National Academy Press: Washington, DC, USA, 1999; ISBN 978-0-309-06420-0. [Google Scholar]
Fassnacht, F.E.; Latifi, H.; Stereńczak, K.; Modzelewska, A.; Lefsky, M.; Waser, L.T.; Straub, C.; Ghosh, A. Review of Studies on Tree Species Classification from Remotely Sensed Data. Remote Sens. Environ. 2016, 186, 64–87. [Google Scholar] [CrossRef]
Felbermeier, B.; Hahn, A.; Schneider, T. Study on User Requirements for Remote Sensing Applications in Forestry. In Proceedings of the ISPRS TC VII Symposium—100 Years ISPRS, Vienna, Austria, 5–7 July 2010; Volume 38, pp. 210–212. [Google Scholar]
Xie, Y.; Sha, Z.; Yu, M. Remote Sensing Imagery in Vegetation Mapping: A Review. J. Plant Ecol. 2008, 1, 9–23. [Google Scholar] [CrossRef]
European Space Agency Sentinel-2 User Handbook: Standard Document 2015. Available online: https://sentinel.esa.int/documents/247904/685211/sentinel-2_user_handbook (accessed on 29 November 2022).
Bolyn, C.; Michez, A.; Gaucher, P.; Lejeune, P.; Bonnet, S. Forest mapping and species composition using supervised per pixel classification of Sentinel-2 imagery. Biotechnol. Agron. Soc. Environ. 2018, 22, 172–187. [Google Scholar] [CrossRef]
Grabska, E.; Hostert, P.; Pflugmacher, D.; Ostapowicz, K. Forest Stand Species Mapping Using the Sentinel-2 Time Series. Remote Sens. 2019, 11, 1197. [Google Scholar] [CrossRef]
Immitzer, M.; Vuolo, F.; Atzberger, C. First Experience with Sentinel-2 Data for Crop and Tree Species Classifications in Central Europe. Remote Sens. 2016, 8, 166. [Google Scholar] [CrossRef]
Persson, M.; Lindberg, E.; Reese, H. Tree Species Classification with Multi-Temporal Sentinel-2 Data. Remote Sens. 2018, 10, 1794. [Google Scholar] [CrossRef]
Wessel, M.; Brandmeier, M.; Tiede, D. Evaluation of Different Machine Learning Algorithms for Scalable Classification of Tree Types and Tree Species Based on Sentinel-2 Data. Remote Sens. 2018, 10, 1419. [Google Scholar] [CrossRef]
Addabbo, P.; Focareta, M.; Marcuccio, S.; Votto, C.; Ullo, S.L. Contribution of Sentinel-2 Data for Applications in Vegetation Monitoring. Acta Imeko 2016, 5, 44. [Google Scholar] [CrossRef]
Pinto, F.; Rouillard, D.; Sobze, J.-M.; Ter-Mikaelian, M. Validating Tree Species Composition in Forest Resource Inventory for Nipissing Forest, Ontario, Canada. For. Chron. 2007, 83, 247–251. [Google Scholar] [CrossRef]
Magnussen, S.; Russo, G. Uncertainty in Photo-Interpreted Forest Inventory Variables and Effects on Estimates of Error in Canada’s National Forest Inventory. For. Chron. 2012, 88, 439–447. [Google Scholar] [CrossRef]
Ballanti, L.; Blesius, L.; Hines, E.; Kruse, B. Tree Species Classification Using Hyperspectral Imagery: A Comparison of Two Classifiers. Remote Sens. 2016, 8, 445. [Google Scholar] [CrossRef]
Immitzer, M.; Atzberger, C.; Koukal, T. Tree Species Classification with Random Forest Using Very High Spatial Resolution 8-Band WorldView-2 Satellite Data. Remote Sensing 2012, 4, 2661–2693. [Google Scholar] [CrossRef]
Krahwinkler, P.; Rossmann, J. Tree Species Classification and Input Data Evaluation. Eur. J. Remote Sens. 2013, 46, 535–549. [Google Scholar] [CrossRef]
Breiman, L. Random Forest. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Mellor, A.; Haywood, A.; Stone, C.; Jones, S. The Performance of Random Forests in an Operational Setting for Large Area Sclerophyll Forest Classification. Remote Sens. 2013, 5, 2838–2856. [Google Scholar] [CrossRef]
Du, P.; Samat, A.; Waske, B.; Liu, S.; Li, Z. Random Forest and Rotation Forest for Fully Polarized SAR Image Classification Using Polarimetric and Spatial Features. ISPRS J. Photogramm. Remote Sens. 2015, 105, 38–53. [Google Scholar] [CrossRef]
Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J.P. An Assessment of the Effectiveness of a Random Forest Classifier for Land-Cover Classification. ISPRS J. Photogramm. Remote Sens. 2012, 67, 93–104. [Google Scholar] [CrossRef]
Pal, M. Random Forest Classifier for Remote Sensing Classification. Int. J. Remote Sens. 2005, 26, 217–222. [Google Scholar] [CrossRef]
Körting, T.S.; Garcia Fonseca, L.M.; Câmara, G. GeoDMA—Geographic Data Mining Analyst. Comput. Geosci. 2013, 57, 133–145. [Google Scholar] [CrossRef]
Belgiu, M.; Drǎguţ, L.; Strobl, J. Quantitative Evaluation of Variations in Rule-Based Classifications of Land Cover in Urban Neighbourhoods Using WorldView-2 Imagery. ISPRS J. Photogramm. Remote Sens. 2014, 87, 205–215. [Google Scholar] [CrossRef]
Illarionova, S.; Trekin, A.; Ignatiev, V.; Oseledets, I. Tree Species Mapping on Sentinel-2 Satellite Imagery with Weakly Supervised Classification and Object-Wise Sampling. Forests 2021, 12, 1413. [Google Scholar] [CrossRef]
Ng, W.-T.; Rima, P.; Einzmann, K.; Immitzer, M.; Atzberger, C.; Eckert, S. Assessing the Potential of Sentinel-2 and Pléiades Data for the Detection of Prosopis and Vachellia Spp. in Kenya. Remote Sens. 2017, 9, 74. [Google Scholar] [CrossRef]
Peerbhay, K.Y.; Mutanga, O.; Ismail, R. Investigating the Capability of Few Strategically Placed Worldview-2 Multispectral Bands to Discriminate Forest Species in KwaZulu-Natal, South Africa. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 307–316. [Google Scholar] [CrossRef]
Pu, R.; Liu, D. Segmented Canonical Discriminant Analysis of In Situ Hyperspectral Data for Identifying 13 Urban Tree Species. Int. J. Remote Sens. 2011, 32, 2207–2226. [Google Scholar] [CrossRef]
Cherepanov, A.S.; Druzhinina, E.G. Spectral Properties of Vegetation and Vegetation Indices. Geomatics 2009, 3, 28–32. [Google Scholar]
Chandra, A.M.; Ghosh, S.K.; Ghosh, S.K. Remote Sensing and Geographical Information System; Alpha Science International Ltd: Oxford, UK, 2007; ISBN 978-1-84265-278-7. [Google Scholar]
Dudley, K.L.; Dennison, P.E.; Roth, K.L.; Roberts, D.A.; Coates, A.R. A Multi-Temporal Spectral Library Approach for Mapping Vegetation Species across Spatial and Temporal Phenological Gradients. Remote Sens. Environ. 2015, 167, 121–134. [Google Scholar] [CrossRef]
Hill, R.A.; Wilson, A.K.; George, M.; Hinsley, S.A. Mapping Tree Species in Temperate Deciduous Woodland Using Time-Series Multi-Spectral Data. Appl. Veg. Sci. 2010, 13, 86–99. [Google Scholar] [CrossRef]
Cho, M.A.; Mathieu, R.; Asner, G.P.; Naidoo, L.; van Aardt, J.; Ramoelo, A.; Debba, P.; Wessels, K.; Main, R.; Smit, I.P.J.; et al. Mapping Tree Species Composition in South African Savannas Using an Integrated Airborne Spectral and LiDAR System. Remote Sens. Environ. 2012, 125, 214–226. [Google Scholar] [CrossRef]
Yin, H.; Khamzina, A.; Pflugmacher, D.; Martius, C. Forest Cover Mapping in Post-Soviet Central Asia Using Multi-Resolution Remote Sensing Imagery. Sci. Rep. 2017, 7, 1375. [Google Scholar] [CrossRef]
Zhu, X.; Liu, D. Accurate Mapping of Forest Types Using Dense Seasonal Landsat Time-Series. ISPRS J. Photogramm. Remote Sens. 2014, 96, 1–11. [Google Scholar] [CrossRef]
Malcolm, J.R.; Brousseau, B.; Jones, T.; Thomas, S.C. Use of Sentinel-2 Data to Improve Multivariate Tree Species Composition in a Forest Resource Inventory. Remote Sens. 2021, 13, 4297. [Google Scholar] [CrossRef]
Bakin, O.V.; Rogova, T.V.; Sitnikov, A.P. Vascular Plants of Tatarstan; Izd-vo Kazanskogo universiteta: Kazan, Russia, 2000; ISBN 978-5-7464-0475-6. [Google Scholar]
Grishin, P.V. Soils of the Raifa forest dacha. Uchenye Zap. Kazan. Univ. 1956, 116, 61–123. [Google Scholar]
Garanin, V.I.; Gil’mutdinov, K.G.; Skokova, N.N.; Hasanshin, B.D. Reserves of the USSR. Reserves of the European Part of the RSFSR; USSR: Moscow, Russia, 1989; pp. 96–108. [Google Scholar]
Rogova, T.V.; Mangutova, L.A.; Ljubina, O.E.; Farhutdinova, S.F. Classification of the vegetation cover of the Volga–Kama Reserve on the landscape-ecological basis. In Transactions of Volzhsko-Kamsky National Nature Biosphere Reserve; Idel-Press: Kazan, Russia, 2005; pp. 213–240. [Google Scholar]
Sentinel 2 MSI—Level 2A Product Definition. 2016. Available online: https://sentinel.esa.int/documents/247904/1848117/Sentinel-2-Level-2A-Product-Definition-Document.pdf (accessed on 29 November 2022).
Copernicus Open Access Hub. Available online: https://scihub.copernicus.eu/dhus/#/home (accessed on 7 September 2022).
Genuer, R.; Poggi, J.-M.; Tuleau-Malot, C. VSURF: An R Package for Variable Selection Using Random Forests. R J. 2015, 7, 19. [Google Scholar] [CrossRef]
The R Project for Statistical Computing. Available online: https://www.r-project.org/ (accessed on 21 March 2021).
Keitt, T.H.; Bivand, R.; Pebesma, E.; Rowlingson, B. Rgdal: Bindings for the Geospatial Data Abstraction Library. Available online: http://CRAN.R-project.org/package=rgdal (accessed on 8 November 2022).
Hijmans, R.J. Raster: Geographic Data Analysis and Modeling. R Package Version 2.4-15. Available online: http://CRAN.R-project.org/package=raster (accessed on 8 November 2022).
Liaw, A.; Wiener, M. Classification and Regression by RandomForest. R News 2002, 2, 18–22. [Google Scholar]
Bishop, C.M.; Svensén, M.; Williams, C.K.I. GTM: The Generative Topographic Mapping. Neural Comput. 1998, 10, 215–234. [Google Scholar] [CrossRef]
Kohonen, T. The Self-Organizing Map. Proc. IEEE 1990, 78, 1464–1480. [Google Scholar] [CrossRef]
Savel’ev, A.A. Modeling of the Spatial Structure of the Vegetation Cover (Geoinformation Approach); Kazanskiĭ Gos. Universitet: Kazan, Russia, 2004; ISBN 978-5-98180-100-6. [Google Scholar]
Sammon, J.W. A Nonlinear Mapping for Data Structure Analysis. IEEE Trans. Comput. 1969, 100, 401–409. [Google Scholar] [CrossRef]
SCANEX Group User’s Manual Scanex Image Processor v.4.0 (Rukovodstvo Pol’zovatelja Scanex Image Processor v.4.0). 2013. Available online: https://www.scanex.ru/en/software/image-processing/scanex-image-processor/ (accessed on 29 November 2022).
Belgiu, M.; Drăguţ, L. Random Forest in Remote Sensing: A Review of Applications and Future Directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Lawrence, R.L.; Wood, S.D.; Sheley, R.L. Mapping Invasive Plants Using Hyperspectral Imagery and Breiman Cutler Classifications (RandomForest). Remote Sens. Environ. 2006, 100, 356–362. [Google Scholar] [CrossRef]
Gislason, P.O.; Benediktsson, J.A.; Sveinsson, J.R. Random Forests for Land Cover Classification. Pattern Recognit. Lett. 2006, 27, 294–300. [Google Scholar] [CrossRef]
Immitzer, M.; Neuwirth, M.; Böck, S.; Brenner, H.; Vuolo, F.; Atzberger, C. Optimal Input Features for Tree Species Classification in Central Europe Based on Multi-Temporal Sentinel-2 Data. Remote Sens. 2019, 11, 2599. [Google Scholar] [CrossRef]
Ivanov, M.A.; Mukharamova, S.S.; Yermolaev, O.P.; Essuman-Quainoo, B. Mapping croplands with a long history of crop cul-tivation using time series of MODIS vegetation indices Uchenye Zapiski Kazanskogo Universiteta. Seriya Estestv. Nauki. 2021, 162, 302–313. [Google Scholar] [CrossRef]
Mukharamova, S.; Saveliev, A.; Ivanov, M.; Gafurov, A.; Yermolaev, O. Estimating the soil erosion cover-management factor at the european part of Russia. ISPRS Int. J. Geo-Inf. 2021, 10, 645. [Google Scholar] [CrossRef]

Figure 1. Study area: Raifa forest (section of Volga–Kama State Reserve).

Figure 2. Map of forestry stands of the Raifa forest: (a) study area; (b) map fragment.

Figure 3. The abundance of tree species in the Raifa forest.

Figure 4. Spectral signatures of tree species for different phenological dates: (a) 9 May 2020; (b) 21 June 2020; (c) 8 July 2020; (d) 5 August 2020; (e) 24 September 2020; (f) 29 October 2020.

Figure 5. Histograms of the frequencies of NDVI values for different phenological dates: (a) 9 May 2020; (b) 21 June 2020; (c) 8 July 2020; (d) 5 August 2020; (e) 24 September 2020; (f) 29 October 2020.

Figure 6. Sentinel-2 data classification results: (a) RF method; (b) GTM method.

Figure 7. Frequencies histograms of the difference between real and model percentages of species in the stands: (a) linden, RF model; (b) linden, GTM model; (c) birch, RF model; (d) birch, GTM model; (e) pine, RF model; (f) pine, GTM model.

Figure 8. Histograms of the Manhattan distance between real and model «forest stand formulas»: (a) RF model; (b) GTM model.

Figure 9. Thematic maps of the spatial distribution of real (according to ground inventory data) and predicted (according to models based on remote sensing Sentinel-2 data) percent abundance values of tree species in stands in Raifa forest: (a) pine, real data; (b) pine, RF model; (c) pine, GTM model; (d) linden, real data; (e) linden, RF model; (f) linden, GTM model.

Table 1. The presence of tree species in the Raifa forest.

Tree Species	Number of Stands Where the Species Is Present	Number of Stands with 100% Presence of the Species
Wych elm (Ulmus glabra Huds.)	17	0
Silver birch (Betula pendula Roth., B. pubescens Ehrh)	1171	69
Siberian larch (Larix sibirica Ledeb.)	18	2
Scotch pine (Pinus sylvestris L.)	1416	375
Sallows (Salix sp.)	13	3
Poplar (Populus sp.)	5	0
Norway maple (Acer platanoides L.)	33	0
Little-leaf linden (Tilia cordata Mill.)	609	8
Finnish spruce (Picea x fennica (Redel) Kom.)	719	1
Common oak (Quercus robur L.)	74	2
Common aspen (Populus tremula L.)	22	1
Common alder (Alnus glutinosa (L.) Gaertn.)	74	4
Cedar (Cedrus sp.)	3	0

Table 2. Sentinel-2 images. Spectral bands: Band 2—Blue, 458–523 nm; Band 3—Green, 543–578 nm; Band 4—Red, 650–680 nm; Band 5—Red-edge I, 698–713 nm; Band 6—Red-edge II, 733–748 nm; Band 7—Red-edge III, 773–793 nm; Band 8—Near infrared (NIR), 785–900 nm; Band 8A—Narrow Near infrared (NNIR), 855–875 nm; Band 11—Shortwave infrared-1 (SWIR1), 1566–1651 nm; Band 12—Shortwave infrared-2 (SWIR2), 2100–2280 nm.

File Name	Date	10 m Bands	20 m Bands	Processing Level
S2A_MSIL2A_20200509T075611_N0214_R035_T39VUC_20200509T111003	2020.05.09	2,3,4,8	5,6,7,8A,11,12	L2A
S2A_MSIL2A_20200621T080611_N0214_R078_T39VUC_20200621T112622	2020.06.21	2,3,4,8	5,6,7,8A,11,12	L2A
S2A_MSIL2A_20200708T075611_N0214_R035_T39VUC_20200708T111246	2020.07.08	2,3,4,8	5,6,7,8A,11,12	L2A
S2B_MSIL2A_20200805T080609_N0214_R078_T39VUC_20200805T105558	2020.08.05	2,3,4,8	5,6,7,8A,11,12	L2A
S2B_MSIL2A_20200924T080649_N0214_R078_T39VUC_20200924T102247	2020.09.24	2,3,4,8	5,6,7,8A,11,12	L2A
S2A_MSIL2A_20201029T081051_N0214_R078_T39VUC_20201029T104651	2020.10.29	2,3,4,8	5,6,7,8A,11,12	L2A

Table 3. Trees species in the training and control samples.

Tree Species	Short Name	Number of Stands in the Training Sample	Total Area of Stands (ha) in the Training Sample	Number of Stands in the Test Sample	Total Area of Stands (ha) in the Test Sample
Common alder (Alnus glutinosa (L.) Gaertn.)	alder	16	21	-	-
Finnish spruce (Picea x fennica (Redel) Kom.)	spruce	11	7	-	-
Little-leaf linden (Tilia cordata Mill.)	linden	17	30	-	-
Silver birch (Betula pendula Roth.)	birch	27	26	42	28
Scotch pine (Pinus sylvestris L.)	pine o.	24	98	137	316
Old-growth natural forest/stands	pine o.	24	98	137	316
Scotch pine (Pinus sylvestris L.)	pine y.	27	48	116	146
Pine plantations on at the felling site	pine y.	27	48	116	146
Total:		122	230	295	490

Table 4. Weights of spectral bands.

Band	May	June	July	August	September	October
Band 2—Blue	0.5	0.5	0.5	0.5	0.5	0.5
Band 3—Green	0.6	0.6	0.6	0.6	0.6	0.6
Band 4—Red	0.5	0.5	0.5	0.5	0.6	0.8
Band 5— Red-edge I	0.8	0.6	0.6	0.6	1.0	0.8
Band 6— Red-edge II	0.9	1.0	1.0	1.0	1.0	1.0
Band 7— Red-edge III	0.9	1.0	1.0	1.0	1.0	1.0
Band 8—NIR	0.9	1.0	1.0	1.0	1.0	1.0
Band 8A—Narrow NIR	0.9	1.0	1.0	1.0	1.0	1.0
Band 11—SWIR1	0.9	0.9	0.9	0.9	0.9	1.0
Band 12—SWIR2	0.8	0.6	0.6	0.6	0.8	1.0
NDVI	0.5	0.8	1.0	1.0	0.6	0.8

Table 5. Accuracy metrics of automated tree species recognition.

	RF/Training Data:			GTM/Training Data:
Tree Species	Recall	Precision	F1-Score	Recall	Precision	F1-Score
Alder	0.977	0.988	0.983	0.665	0.563	0.610
Spruce	0.899	0.974	0.935	0.575	0.509	0.540
Linden	0.994	0.991	0.992	0.948	0.914	0.931
Birch	0.971	0.967	0.969	0.531	0.747	0.621
pine y.	0.979	0.994	0.986	0.792	0.762	0.777
pine o.	0.998	0.988	0.993	0.891	0.895	0.893
macro avg	0.970	0.983	0.976	0.734	0.732	0.729
accuracy	0.987			0.829
	RF/Test Data:			GTM/Test Data:
Tree Species	Recall	Precision	F1-score	Recall	Precision	F1-score
Birch	0.694	0.606	0.647	0.546	0.600	0.572
pine y.	0.607	0.706	0.652	0.635	0.523	0.574
pine o.	0.849	0.805	0.826	0.701	0.779	0.738
macro avg	0.717	0.706	0.709	0.628	0.634	0.628
accuracy	0.764			0.673

Table 6. Comparison of real and model percent abundance values of tree species in stands.

Tree Species	Mean of Real Percent	Mean of Model Percent	ME	MAE	RMSE	WAPE	Pearson Correlation Coefficient
RF Model
Alder	1.9	5.2	−3.4	4.1	10.3	2.23	0.70
Spruce	6.3	4.5	1.8	7.0	12.6	1.10	0.39
Linden	13.6	13.1	0.5	7.1	15.2	0.52	0.83
Birch	22.2	19.4	2.8	14.3	22.7	0.64	0.68
Pine	53.4	57.8	−4.4	13.8	22.9	0.26	0.86
GTM Model
Alder	1.9	7.1	−5.3	6.6	14.9	3.56	0.45
Spruce	6.3	11.5	−5.2	11.9	19.8	1.88	0.21
Linden	13.6	13.8	−0.2	8.0	17.2	0.59	0.81
Birch	22.2	10.2	12.0	16.8	27.0	0.76	0.60
Pine	53.4	57.3	−3.9	14.1	23.4	0.27	0.85

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Polyakova, A.; Mukharamova, S.; Yermolaev, O.; Shaykhutdinova, G. Automated Recognition of Tree Species Composition of Forest Communities Using Sentinel-2 Satellite Data. Remote Sens. 2023, 15, 329. https://doi.org/10.3390/rs15020329

AMA Style

Polyakova A, Mukharamova S, Yermolaev O, Shaykhutdinova G. Automated Recognition of Tree Species Composition of Forest Communities Using Sentinel-2 Satellite Data. Remote Sensing. 2023; 15(2):329. https://doi.org/10.3390/rs15020329

Chicago/Turabian Style

Polyakova, Alika, Svetlana Mukharamova, Oleg Yermolaev, and Galiya Shaykhutdinova. 2023. "Automated Recognition of Tree Species Composition of Forest Communities Using Sentinel-2 Satellite Data" Remote Sensing 15, no. 2: 329. https://doi.org/10.3390/rs15020329

APA Style

Polyakova, A., Mukharamova, S., Yermolaev, O., & Shaykhutdinova, G. (2023). Automated Recognition of Tree Species Composition of Forest Communities Using Sentinel-2 Satellite Data. Remote Sensing, 15(2), 329. https://doi.org/10.3390/rs15020329

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automated Recognition of Tree Species Composition of Forest Communities Using Sentinel-2 Satellite Data

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Ground Data

2.3. Remote Sensing Data

2.4. Training Sample

2.5. Spectral Properties Analysis

2.6. Recognition Methods

2.7. Accuracy Assessment

3. Results

3.1. Spectral Properties of the Studied Tree Species

3.2. Automated Recognition of Tree Species

3.3. Recognition Quality Assessment

4. Discussion

4.1. Validation of Results

4.2. Limiations and Further Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI