Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Improving the Accuracy of Tree Species Mapping by Sentinel-2 Images Using Auxiliary Data—A Case Study of Slyudyanskoye Forestry Area near Lake Baikal

Forests 2025, 16(3), 487; https://doi.org/10.3390/f16030487

by Anastasia Popova

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4:

Zhen Zhang

Forests 2025, 16(3), 487; https://doi.org/10.3390/f16030487

Submission received: 30 January 2025 / Revised: 5 March 2025 / Accepted: 8 March 2025 / Published: 10 March 2025

(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

I reviewed the manuscript "Improving the Accuracy of Tree Species Mapping by Sentinel-2 Images Using Auxiliary Data – A Case Study of Slyudyanskoye Forestry near Lake Baikal" by Papova. Although the topic of the manuscript is interesting and falls within the scope of the Forests, it has several critical issues, as I pointed out below.

1- The English quality of the manuscript needs improvement. While there are typos and grammatical errors throughout, some sections are particularly difficult to understand. It appears that insufficient time was dedicated to writing and editing the manuscript.

2- The Introduction section requires significant revision. First, a thorough review is necessary to improve the quality of the English language used, not just typos and grammar. Additionally, the author should incorporate recent case studies on forest mapping and tree species classification, particularly within the study region and similar areas. These case studies should be discussed in detail, emphasizing their key findings and how they have influenced the design and organization of the current research. Furthermore, the author must clearly articulate the contribution of this study to the field. As it stands, the contribution appears limited, as previous research has already established that supplementing satellite data with auxiliary data can improve classification accuracy. Furthermore, I suggest using the names of the authors instead of only the reference numbers for better readability.

3- There are several critical issues and a lack of detailed explanations in Section 2. (1): the temporal mismatch between the Sentinel-2 data (2019) and the reference map (2009) is not addressed, raising concerns about data validity. (2): the NDWI equation is incorrect, as both bands 8 and 8a are NIR. (3): relevant references for each data source and index are missing in Table 1. (4): the distribution of reference samples is not provided, which is essential for understanding the dataset. (5): the spatial resolution of all data, as in Table 1, is unclear; while resampling to 30 meters is mentioned, the rationale for this choice and how spatial diversity is maintained when resampling from 30-second resolution data need clarification. (6): further explanations about the Chelsa datasets are required. (7): the model evaluation section lacks clarity on how training and test samples were collected, making the results and conclusions questionable; the reference data collection process must be clearly described to ensure robust review and validation. (8): details about the high-resolution images used for validation, including their source and acquisition dates, are missing. These issues must be addressed to meet the standards of an academic paper.

4- The methodology section also has two critical issues. (1) Why was the RF used, and how did you tune its hyperparameters? (2): The authors mentioned selecting features, though they only removed three features, which is not logical in the context of 110 features. The authors should implement a reasonable feature selection method to select the most contributing features for the task.

5- Please provide the confusion matrixes for all scenarios and discuss the performance of the model based on the omission and commission errors in detail.

6- What is the added value of Figure 4 when the values are summarized in Table 4?

Comments on the Quality of English Language

The English quality of the manuscript needs significant improvement to enhance readability, flow, and overall engagement. While addressing typos and grammatical errors is essential, the revision should also focus on restructuring and rewriting sentences for clarity, coherence, and better reader engagement.

Author Response

Response to Reviewer 1 Comments

Thank you very much for taking the time to review this manuscript. Thank you so much for such a detailed review of my article! Thanks to your comments and suggestions, I was able to significantly improve the quality of the material. Please find the detailed responses, along with the corresponding revisions highlighted in the resubmitted files.

Summary

Please forgive me for the low-level errors, such as typos and grammatical mistakes. I have made significant corrections to the text, including stylistic corrections, and I hope that nothing will hinder the understanding of the material now.

I agree with this comment. I have revised the Introduction, added recent case studies within the similar areas, articulated the contribution of this study, and added the authors' names to the references.

Review of case studies is included on lines 83-89, 104-116:

Chiang and Valdez [20], in their classification of tree species in Mongolia within the Siberian taiga zone—a region with topographic, climatic, and species composition characteristics similar to the area of this study—found that topographic variables (elevation, slope, aspect, and curvature) were more important than multispectral data for classifying individual tree species (birch, cedar, and willow). However, topographic data alone were insufficient to achieve high classification accuracy.

Liu et al. [24] investigated the distribution of tree species in a mountain forest using Sentinel-2 imagery, vegetation indices, and topographic data. The highest classification accuracy was achieved with the Random Forest method applied to monthly datasets, with the most important features being the SWIR bands (B11, B12), the NDVI index, elevation, and slope. Topographic features were more effective in distinguishing deciduous species than coniferous ones. Wang et al. [25] combined spectral, phenological, textural, and topographic features to identify tree species in Changbai Mountain. In their study, topo-graphic variables, particularly elevation, were the most significant, while phenological features based on NDVI time series had a limited impact on the results. Li et al. [26] utilized seasonal Landsat composites, vegetation indices, mean temperature, precipitation, and topographic data to classify tree species in a mountainous region in southwestern China. The top 10 important attributes in their study included six vegetation indices, elevation, temperature, and precipitation.

Contribution of this study:

Although previous studies have explored the use of auxiliary data to improve classification accuracy, the impact of each feature varies significantly across different regions and forest types, necessitating further investigation. Moreover, researchers have predominantly focused on combining satellite imagery with vegetation indices and topographic data, rarely incorporating soil data, and typically limiting climatic variables to air temperature and precipitation. This study aims to evaluate the influence of a wide range of ecological variables on the accuracy of tree species classification in mountain forest eco-systems, including Sentinel-2 imagery, vegetation indices, topographic, climatic, and soil data, as well as forest canopy height.

3- There are several critical issues and a lack of detailed explanations in Section 2.

I agree with this comment. Indeed, some aspects were not sufficiently detailed in the text, which could make them difficult to understand. I provide responses to each point below.

(1): the temporal mismatch between the Sentinel-2 data (2019) and the reference map (2009) is not addressed, raising concerns about data validity.

The 2009 map was used only as basic information about the species to identify key sites. Then, for each site, its spectral characteristics were calculated using the 2019 image; sites with sharply different values for the same species were discarded. Further marking was based on the averaged spectral characteristics for each species. I have expanded the description of the dataset collection in the text:

The 2009 forest map provided baseline species information, after which each selected site was assessed using Google Earth imagery from 2019 to confirm the absence of disturbances since 2009 (e.g., logging, fires, or pests damage). Additionally, crown texture was visually compared to ensure consistency. Using QGIS Semi-Automatic Classification Plugin, spectral characteristics were calculated for each site based on Sentinel-2 bands. The obtained spectra for all sites of the same species were compared, and sites with significant variations in spectral values were discarded as unreliable for that species.

(2): the NDWI equation is incorrect, as both bands 8 and 8a are NIR.

That was an error in the text, thanks for noticing! I have corrected the NDWI formula to the correct one with B3 and B8 bands.

(3): relevant references for each data source and index are missing in Table 1.

The references for auxiliary data sources were added. I decided not to add references to each vegetation index so as not to overload the text, as I only used commonly used indices.

(4): the distribution of reference samples is not provided, which is essential for understanding the dataset.

I have added Figure 4, which illustrates the distribution of the training and test datasets.

(5): the spatial resolution of all data, as in Table 1, is unclear; while resampling to 30 meters is mentioned, the rationale for this choice and how spatial diversity is maintained when resampling from 30-second resolution data need clarification.

Indeed, the auxiliary data had various resolutions. During processing, they were all resampled to 30 m resolution using the gdal utility, which ensured accurate overlay of all cells. Although the Sentinel-2 data allow 10 m resolution, in this case the entire set of auxiliary variables required too much memory for classification:

Sentinel-2 bands were resampled to a 30 m resolution to reduce memory usage and im-prove processing efficiency. All auxiliary data were resampled to the same 30 m resolu-tion using the Nearest Neighbour method via the ‘gdalwarp’ utility, ensuring that raster cells were aligned for precise overlap with the satellite band cells.

(6): further explanations about the Chelsa datasets are required.

I've expanded Chelsa's description:

Additionally, Chelsa bioclimatic datasets were utilized [34,35]. These datasets provide globally interpolated data derived from key climatic variables (such as air temperature and precipitation) and are designed to model species distributions. They reflect annual trends (e.g., mean annual temperature), seasonality (annual range of temperature and precipitation), growing season parameters, and extreme or limiting environmental factors. The Chelsa datasets also includes climatological norms for frost frequency, days of snow cover and growing season characteristics (beginning, end, duration). The values of Chelsa bioclimatic variables presents averages for the period 1981-2010.

(7): the model evaluation section lacks clarity on how training and test samples were collected, making the results and conclusions questionable; the reference data collection process must be clearly described to ensure robust review and validation.

The test samples were collected similarly to the training samples, except that test points were first generated in QGIS, uniformly distributed across the entire study area. Subsequently, a АсATaMa plugin was used to assign a label to each point using Google Earth images and forest inventory maps.

(8): details about the high-resolution images used for validation, including their source and acquisition dates, are missing. These issues must be addressed to meet the standards of an academic paper.

As a high-resolution image, I used Google Earth image. I have added this clarification to the text of the article.

4- The methodology section also has two critical issues.

(1) Why was the RF used, and how did you tune its hyperparameters?

RF has been used as the most commonly used in satellite image classification. The algorithm gives good accuracy and is not very demanding on resources, robust to outliers. The parameters used were mostly default, only the number of trees was set to 500 after a number of tests on sample dataset as giving the best accuracy in an acceptable time.

These clarifications have been added to the text:

Classification was performed using the Random Forest machine learning method from the Python scikit-learn library, which has demonstrated strong performance in solving similar problems [10,18,19,21,22,24]. The number of trees parameter was set to 500, while other algorithm parameters were retained at their default values.

(2): The authors mentioned selecting features, though they only removed three features, which is not logical in the context of 110 features. The authors should implement a reasonable feature selection method to select the most contributing features for the task.

I agree with this comment. Indeed, removing three features from 110 is not a sufficiently important achievement. I have removed the description of the 98 features model from the text, leaving only the mention that the removal of features with low importance allowed for a slight improvement in accuracy.

5- Please provide the confusion matrixes for all scenarios and discuss the performance of the model based on the omission and commission errors in detail.

All confusion matrices have been added to the supplementary materials. Misclassifications and best-case analyses for each tree species have been added to the text. I did not describe these data in too much detail, I think they are easier to understand in tabular form than to read as text.

6- What is the added value of Figure 4 when the values are summarized in Table 4?

Indeed, Figure 4 only repeats the data in Table 4 in graphical form. I find it useful, because we can more quickly see the models with the highest values in the figure, and we can already find their quantitative values in the table.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The following work closely resembles yours in almost every aspect. I would like to see in the discussion section a comparison between their results and yours.

Here is the paper from 2017:

https://www.mdpi.com/1999-4907/8/2/42

Other than that, the paper is carefully written and methodologically correct. The results are clear and concise. Good luck.

Author Response

Response to Reviewer 2 Comments

The following work closely resembles yours in almost every aspect. I would like to see in the discussion section a comparison between their results and yours.

Here is the paper from 2017:

https://www.mdpi.com/1999-4907/8/2/42

Other than that, the paper is carefully written and methodologically correct. The results are clear and concise. Good luck.

Thank you very much for taking the time to review this manuscript. Thank you for the suggested link, this paper is really similar to my research. I have read it and added the comparison of results to the Discussion, especially on the use of topographic variables.

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript "Improving the Accuracy of Tree Species Mapping by Sentinel-2 Images Using Auxiliary Data – A Case Study of Slyudyanskoye Forestry near Lake Baikal" (forests-3478853). The present article is well-structured and highlights the relevance of tree species mapping and the limitations of traditional methods. However, some aspects could be improved. Therefore, before recommending the manuscript for publication, the authors must improve several aspects of the present study. Thus, I am recommending this work for major revisions.

As major observations, which must be attended to, I highlight:

1 – I noticed some grammatical errors in writing, therefore, I suggest the revision of English by a native speaker.

2 – Authors must reformulate the abstract. Note that you are presenting an abstract with 277 words, while Forests limits an abstract to 200 words. I also highlight that authors must follow the premise of presenting the highlights of the results in the abstract, something that I did not observe in this summary.

3 – In the introduction, although the article mentions that auxiliary data improve classification, it would be interesting to add more details on what differentiates this study from previous works that also use auxiliary data for tree species classification.

4 – The introduction cites references on the use of multispectral data for forest mapping but could further explore the gap in the literature. For example, what specific challenges are faced in the study region that justify the proposed method?

5 – The Materials and Methods section is well-organized but has some areas that could be improved, particularly in subsection 2.1: Some sections are too compact. It is important to provide more details on the criteria for selecting training areas and the methodology used to determine the polygon delineation of tree species.

How was the quality of the training data ensured?
Was there any cross-validation or field verification to ensure the accuracy of the class labels?

6 – The description of the preprocessing steps could be more detailed. For example, what were the exact atmospheric correction steps applied? How were the auxiliary data normalized, and why was this choice made?

7 – The model evaluation section could provide more details on the data division methodology. What was the proportion of training and test data? Was there any specific technique to prevent overfitting?

8 – The discussion presents good analyses but could be expanded to offer more insights based on the obtained results:

Although the results show an increase in classification accuracy by adding auxiliary data, the discussion could further detail “which” species benefited most from the different types of auxiliary data.
How did topography impact classification? The discussion mentions its influence but does not detail the possible mechanisms.

9 – How do the results of this study compare quantitatively with similar studies? It would be interesting to include a comparative table with accuracy values obtained in previous works.

10 – How can these results be practically applied to forest management or conservation? Are there any limitations to implementing these models outside the study area?

11 – Some limitations are mentioned but could be explored further. For example:

How does the availability of auxiliary data impact the generalization of the model to other regions?
Are there any computational limitations that could affect the scalability of the proposed method?

12 – The conclusion summarizes the findings well but could be more emphatic about the relevance of the results and how this study advances the field. It would be interesting to suggest more concrete future studies. For example, investigating whether other approaches, such as deep learning, could further improve classification accuracy.

Comments on the Quality of English Language

I noticed some grammatical errors in writing, therefore, I suggest the revision of English by a native speaker.

Author Response

Response to Reviewer 3 Comments

As major observations, which must be attended to, I highlight:

1 – I noticed some grammatical errors in writing, therefore, I suggest the revision of English by a native speaker.

I agree with this comment. I have reduced the Abstract to 203 words and changed its emphasis to the results obtained.

I reworked the Introduction, expanded references to other similar works. The differences of my research have been reformulated:

Agree. I added to the Introduction a description of why the chosen method is needed in the study area:

This study, focus on the Slyudyanskoye Forestry in the Irkutsk Region, located along the shore of Lake Baikal. Inventory data for this area, which include information on the spatial distribution of tree species, are not always accessible to the scientific community and are outdated for part of the region due to the complex mountainous terrain, which complicates field surveys [4]. Meanwhile, the forests around Baikal are experiencing the impacts of wildfires, insect damage, and the prohibition of sanitary logging.

Indeed, the process of preparing the training data samples was not described in enough detail in the text. I have expanded it and tried to reveal all the details:

Based on an analysis of literature on the spectral reflectance of different tree species [7,40,41] and a visual comparison of the forest map with high-resolution Google Earth imagery, several key sites were selected for each of the seven tree species present in the Slyudyanskoye forestry. The 2009 forest map provided baseline species information, after which each selected site was assessed using Google Earth imagery from 2019 to confirm the absence of disturbances since 2009 (e.g., logging, fires, or pests damage). Additionally, crown texture was visually compared to ensure consistency. Using QGIS Semi-Automatic Classification Plugin, spectral characteristics were calculated for each site based on Senti-nel-2 bands. The obtained spectra for all sites of the same species were compared, and sites with significant variations in spectral values were discarded as unreliable for that species.

After analyzing the set of collected key sites, it was decided to proceed with further annotation and classification into five species - pine, cedar, larch, fir and birch. The spec-tral characteristics of aspen and spruce were found to be too similar to those of birch and fir, respectively. Additionally, the areas occupied by aspen and spruce within the forestry were significantly smaller. As a result, additional data will be required in future studies to accurately delineate areas dominated by aspen and spruce.

How was the quality of the training data ensured?

The quality of the training data is supported by the use of the spectral characteristics of tree species that have been validated by other researchers and used to compile the dataset (references to the studies are available in the paper).

Was there any cross-validation or field verification to ensure the accuracy of the class labels?

Yes, cross-validation was used in estimating the model, this is described in section 2.2. Unfortunately, field survey data for our study area is only available in a very limited form - it is species area for the whole forestry. I compared these data and the areas of species by classification and obtained a correlation coefficient of 0.87, which also allows to consider the obtained result reliable.

I've expanded the description of these details in the paper:

The original images were acquired from the Copernicus Hub on 5 July 2019, and subsequently processed using the Sen2Cor algorithm to perform base atmospheric correction, resulting in Bottom-Of-Atmosphere (BOA) reflectance values for Sentinel-2 bands.

The values of all variables were normalized to the interval (0, 1). The initial datasets exhibited significant differences in absolute values: (0, 10000) for Sentinel-2 bands, ranges (-2000, 30000) and (-1, 1) for vegetation indices, (0, 1000) for soil parameters. To address this imbalance, all indices were transformed using the method proposed for the Dynamic World global classification [42]. This approach involves logarithmic transformation and rescaling each dataset to a uniform interval (0, 1), ensuring the robustness of machine learning models during classification. Additionally, this transformation helped mitigate the influence of high reflectance outliers in the spectral data distributions.

Our method uses cross-validation to estimate the models, so manually the data was not divided into training and test part. The test data was labelled separately, it was about 10% of the training data. Overfitting was not specifically controlled in any way, as the model was first trained on all training data and then predicted classification on the entire satellite image. Our study area is quite large, although we were able to collect training data in 17,500 pixels, which was a small fraction of the area of the entire satellite image (it occupies about 3 million pixels).

8 – The discussion presents good analyses but could be expanded to offer more insights based on the obtained results:

Although the results show an increase in classification accuracy by adding auxiliary data, the discussion could further detail “which” species benefited most from the different types of auxiliary data.

I agree, information on how different types of supplementary data affected different tree species is essential in this study. Section 4.1 is devoted to describing these details, and I have adjusted the text to make the details clearer.

How did topography impact classification? The discussion mentions its influence but does not detail the possible mechanisms.

The description of the influence of topography has been expanded:

Topographic features are widely recognized for their ability to enhance the accuracy of land cover classification [49]. Topography influences the natural distribution of tree species by regulating microclimate conditions and species habitats. Variations in elevation are directly linked to changes in light availability, precipitation, air pressure, and humidity levels. Our study area features mountainous terrain with elevation ranging from 580 to 2330 meters above sea level. This significant elevation variation contributed substantially to the model's performance, improving the recognition accuracy of fir, larch, and cedar. In the full feature set, elevation ranked seventh in importance, consistent with the findings from other studies [50–52]. However, other topographic variables – slope, aspect, and shading – had a lesser impact on tree species recognition accuracy. This observation aligns with the results of [50] but contrasts with those of [52,53]. The relatively homogeneous dominance of cedar (62% of the study area) may explain this discrepancy, as it reduces the relationship between slope parameters and species distribution.

9 – How do the results of this study compare quantitatively with similar studies? It would be interesting to include a comparative table with accuracy values obtained in previous works.

Such information is surely interesting, but it is difficult to get reliable data for it. I know of a number of similar studies where the dataset was divided randomly into training and test parts. In this case, a very high accuracy of 98-99% is obtained, even when using cross-validation, which is not true due to the proximity of the data. When the training and test datasets are spatially separated, the accuracy is usually 75-85%. Unfortunately, I could not find a sufficient number of similar studies where test data were not randomly separated.

10 – How can these results be practically applied to forest management or conservation? Are there any limitations to implementing these models outside the study area?

The proposed model can be used for other regions if a good training dataset is available:

The developed model can be adapted for tree species classification in other regions with similar environmental conditions, including location, climate, soil types, and species composition. However, the successful application of this model to new areas will depend on the availability of high-quality training data specific to those regions.

11 – Some limitations are mentioned but could be explored further. For example:

How does the availability of auxiliary data impact the generalization of the model to other regions?
Are there any computational limitations that could affect the scalability of the proposed method?

All auxiliary datasets used are global, so they can be used for a different region. Of the computational limitations, the most important is the memory limitation - the entire set on our study area of 101 features requires 3.1 GB, when classifying this space is needed in RAM. Accordingly, as the region size increases, more RAM will be needed.

Certainly, the use of deep learning is promising in this area. I have reflected this in the text:

Additionally, the multi-temporal image series, when multiple satellite images of the same area captured at different times are used for the model training, holds promise for en-hancing classification performance. Finally, exploring more complex model architectures, such as ensemble methods, stacking, or deep learning models, could better capture spatial heterogeneities and further improve accuracy when combined with the auxiliary dataset proposed in this study.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

This study presents a robust integration of Sentinel-2 imagery with auxiliary data to enhance tree species classification accuracy in a mountainous region near Lake Baikal. While the paper is scientifically sound and addresses a relevant challenge, several aspects require clarification and refinement to strengthen the validity and generalizability of the findings.

Abstract: please explicitly states the challenge of spectral similarity in mountainous forests and the novelty of integrating multi-source data.

Lines 18-19: This sentence, in relation to the ones before and after it, leaves readers completely confused.

Line 20: Please give us some details about the method, just mentioning random forests is not enough.

Lines 22-23: It makes it impossible for reader to quickly grasp the meanings of 101 and 98. Please details feature selection (101 features ——>98 features) and highlights key variables .

Lines 24-25: the overall accuracy increased……——>the overall accuracy increased from 49.59% (Sentinel-2 bands alone) to 80.69% when combining spectral data with climate, soil, and canopy height variables.

Lines 41-52: The purpose of this paragraph is not clear. I think this paragraph should emphasize the limitations of optical remote sensing images and introduce the reasons for using auxiliary data.

Lines 53-62: This paragraph has the same problem as the previous paragraph, and the purpose is not clear, so I feel that these two paragraphs are a little confusing.

Line 99: Smooth the transition by summarizing the gaps in previous research and how this study addresses them. For example: "While previous studies have explored the use of auxiliary data to improve classification accuracy, few have comprehensively evaluated the impact of a wide range of environmental variables in a mountainous forest ecosystem. This study aims to fill this gap by..."

Line 119: Specify the exact sources and provide URLs or references where the data can be accessed.

Line 190: QGis——>QGIS

Line 201: Fusion of different data sources (e.g. SoilGrids with 250 m resolution and Sentinel-2 with 10 m resolution) may introduce spatial errors, and additional instructions are needed on how to handle resolution differences (e.g. Resampler methods).

Line 202: I may have missed something and never figured out why from 101 to 98, I didn't know which three features were removed until I read line 392. I suggest that the next three bioclimatic features of low importance should be mentioned above. In addition, three bioclimatic features of low importance were removed without specifying their specific mechanisms of influence or whether their redundancy was verified by statistical tests such as p-values. It is suggested to supplement relevant analysis to enhance persuasion.

Line 242: Consider breaking down the results into subsections for better readability.

Lines 274-277: While climatic variables (e.g., snow cover days) improved pine classification, link these findings to species-specific ecological adaptations (e.g., pine’s cold tolerance) to strengthen the discussion.

Line 296: Contrast the minimal impact of vegetation indices in this study with literature findings (e.g., MNDWI’s high importance here vs. its limited role in other works). Highlight unique regional or methodological factors driving these differences.

Line 327: The discussion does not provide a thorough analysis of the results, particularly regarding the specific impacts of different auxiliary data on the classification accuracy.

Fig.1 Figure (a) is not clearly marked, so it is difficult to see the location of the study area.

Fig.2: The flowchart is suggested to be optimized to make the logic clearer.

Author Response

Response to Reviewer 4 Comments

Abstract: please explicitly states the challenge of spectral similarity in mountainous forests and the novelty of integrating multi-source data.

Lines 18-19: This sentence, in relation to the ones before and after it, leaves readers completely confused.

Line 20: Please give us some details about the method, just mentioning random forests is not enough.

Lines 22-23: It makes it impossible for reader to quickly grasp the meanings of 101 and 98. Please details feature selection (101 features ——>98 features) and highlights key variables .

I have modified the Abstract to take into account your comments and suggestions. All questionable points have been edited or deleted. Unfortunately, according to the Forests's rules, the Abstract should be no more than 200 words, so I can't go into more detail on a number of points, such as the choice of Random Forest method.

New Abstract:

Timely and accurate knowledge of forest composition is crucial for ecosystem conservation and management tasks. Information regarding the distribution and extent of forested areas can be derived through the classification of satellite imagery. However, optical data alone are often insufficient to achieve the required accuracy due to the similarity in spectral characteristics among tree species, particularly in mountainous regions. One approach to improving the accuracy of forest classification is the integration of auxiliary environmental data. This paper presents the results of research conducted in the Slyudyanskoye Forestry of the Irkutsk Region. A dataset comprising 101 variables was collected, including Sentinel-2 bands, vegetation indices, climatic, soil, and topographic data, as well as forest canopy height. The classification was performed using the Random Forest machine learning method. The results demonstrated that auxiliary environmental data significantly improved the performance of the tree species classification model—the overall accuracy increased from 49.59% (using only Sentinel-2 bands) to 80.69% (combining spectral data with auxiliary variables). The most significant improvement in accuracy was achieved through the incorporation of climatic and soil features. The most important variables were the shortwave infrared band B11, forest canopy height, the length of the growing season, and the number of days with snow cover.

Lines 41-52: The purpose of this paragraph is not clear. I think this paragraph should emphasize the limitations of optical remote sensing images and introduce the reasons for using auxiliary data.

Lines 53-62: This paragraph has the same problem as the previous paragraph, and the purpose is not clear, so I feel that these two paragraphs are a little confusing.

The purpose of these two paragraphs is to gradually tell readers about the methods and technologies used. To do this, I first talk about the existing problem (lack of forest data), then about possible sources of data (satellite imagery - their types and references, examples of studies) and methods of processing satellite data (machine learning methods). This sequence of information seems logical and shows that a lot of modern research is being conducted in these areas, which makes them relevant.

Thank you for this valuable suggestion! Indeed, there was a sharp switch in the text at this part.

The text has been changed:

Line 119: Specify the exact sources and provide URLs or references where the data can be accessed.

References added to table 1.

Line 190: QGis——>QGIS

Thank you for noticing such a low-level error, it has been fixed.

I agree, this moment was not described in sufficient detail. I have supplemented the text:

To cover the entire study area, the three original tiles were first merged band-by- band and then cropped to the forestry boundary in QGIS. Sentinel-2 bands were resampled to a 30 m resolution to reduce memory usage and improve processing efficiency. All auxiliary data were resampled to the same 30 m resolution using the Nearest Neighbour method via the ‘gdalwarp’ utility, ensuring that raster cells were aligned for precise overlap with the satellite band cells.

Line 242: Consider breaking down the results into subsections for better readability.

I considered this suggestion, but decided not to split the Results text. There is relatively limited text in this section and it all contains a sequential presentation of the models result, so splitting it might make it difficult to understand.

The description has been expanded:

This climatic condition underscored the high importance of variables such as the number of snow cover days and snow water equivalent. The significance of the precipitation seasonality (bio15) aligns with the findings from previous studies, which indicate that winter precipitation levels significantly influence the growing season in boreal forests [55–57]. Winter climatic conditions are particularly critical for cold-tolerant conifers – the snow is one of the primary moisture resources in mountainous regions, mitigating the effects of low temperatures, their fluctuations and needles desiccation.

The description has been expanded:

In the full model with 101 features, MNDWI ranked fifth in importance, whereas NDWI was among the least important features, ranking fourth from the bottom. In our study area, coniferous species dominate, accounting for 78% of all tree species, with cedar comprising 80% of this group. The combination of green and SWIR bands proved more informative for interpreting moisture content in these species. This supports findings from previous research, which indicate that interspecies differences in coniferous tree species are most pronounced in the SWIR range [46,47]. Consequently, MNDWI, which includes the SWIR band B11, received higher importance, while vegetation indices derived from other bands had a lesser impact.

Line 327: The discussion does not provide a thorough analysis of the results, particularly regarding the specific impacts of different auxiliary data on the classification accuracy.

Fig.1 Figure (a) is not clearly marked, so it is difficult to see the location of the study area.

Figure 1 has been modified to make the location of the study area clearer.

Fig.2: The flowchart is suggested to be optimized to make the logic clearer.

Figure 2 is optimized.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The author has addressed most of the previous comments and the manuscript has improved accordingly. However, there are a few critical issues that should be addressed.

1- The author mentioned the usage of high-resolution imagery of Google Earth for validation purposes. Please clarify the specifications (e.g., dates, sensor, spatial resolution) of the high-resolution image used.

2- The author mentioned that only the number of trees for the RF model was selected, and other parameters were used as default. The author should consider hyperparameter tuning, e.g., through grid search analysis, to choose the optimal parameters for the task.

3- The best result was achieved when using 101 features, implying the importance of considering a wide range of features for model development and tree species classification. However, such a rich database may not be available to any other researcher. Therefore, the author should apply a suitable feature selection method and select the most contributing ones, and then re-run the classification step using a more abstract feature set, which can provide further insights for readers.

4- Please add the references of the vegetation indices used in Table 1.

Author Response

The author has addressed most of the previous comments and the manuscript has improved accordingly. However, there are a few critical issues that should be addressed.

Dear Reviewer,

Thank you very much for taking the time to review my article and for providing detailed and helpful comments. Thanks to your feedback, I was able to improve the description of my methodology, making it more structured and clearer for the readers.

I specified in the text that I used the Google Earth imagery from May 4, 2019. Unfortunately, Google officially does not provide more detailed information about the imagery other than the date and provider. They state in the documentation: “Google is not able to provide any more information about imagery it owns beyond what is displayed in Google Earth and Maps.” https://support.google.com/earth/answer/6327779?hl=en#zippy=%2Csatellite-aerial-images

I also checked a number of publications in the Forests and found that authors typically mention using high-resolution Google Earth imagery without specifying the sensors or resolution.

Thank you for this recommendation! I agree that tuning hyperparameters can affect the model's results. I will consider the method you suggested for use in future research.

I agree that within the scope of this study, the selection of the most important features was not conducted. However, the main goal of my research was to evaluate the impact of a wide range of ecological factors on the accuracy of tree species classification. Therefore, I adjusted the list of objectives presented in the last paragraph of the Introduction to maintain the logical flow of the material. I removed the objective "selection of the best combination of features," replacing it with " to investigate the influence of auxiliary data on the accuracy of classifying different forest species".

Regarding the use of the set of 101 features by other researchers — all the presented datasets are global and open-access, so researchers can freely download them for any region. For my study area, the complete dataset occupies 3 GB on disk, and running the Random Forest processing together with Python requires 5.5 GB of RAM, with the entire calculation taking about 7 minutes. I believe these requirements are feasible for most researchers working with remote sensing data processing.

In the future, I would certainly like to use a reduced dataset. I plan to test another algorithm for this purpose, as Random Forests, due to their nature of random selection, show a decrease in accuracy with any reduction in the dataset if the discarded features have non-zero importance score. I also want to check the model's robustness when applied to another region, as features important for one area may show low importance in another. However, these are directions for future research.

In this study, the goal was specifically to explore a wide range of features, and I believe it has been achieved.

4- Please add the references of the vegetation indices used in Table 1.

References for all vegetation indices have been added to Table 1.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Considering the corrections provided by the authors, the quality has improved exponentially based on the criteria established by me. Therefore, I am considering the present study for publication.

Author Response

Dear Reviewer,

Reviewer 4 Report

Comments and Suggestions for Authors

The quality of the article has been greatly improved, and it is suggested to modify Figure 1 again. Figure 1 (a) should show some famous places, such as rivers, oceans, cities and countries, so that reader can find them easily.

Author Response

Dear Reviewer,

Article Menu

Improving the Accuracy of Tree Species Mapping by Sentinel-2 Images Using Auxiliary Data—A Case Study of Slyudyanskoye Forestry Area near Lake Baikal

Further Information

Guidelines

MDPI Initiatives

Follow MDPI