Risk Modeling for the Emergence of the Primary Outbreak Area of the Siberian Moth Dendrolimus sibiricus Tschetv. in Coniferous Forests of Central Siberia
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis article describes a machine-learning approach to the development of risk assessment for the Siberian silk moth impacting mid-taiga forests of central Siberia. The article is novel in that it applies a series of machine-learning methods across 2 consecutive years of an outbreak and applied to both remote sensing-based and ground-based training data. Such a comparison is important because access to ground-based data is quite challenging in this region of the world, so remote-sensing based methods for risk assessment may be the only option for nearby regions. The Siberian silk moth has had enormous impact in recent decades, with strong implications for ecosystem health, such that reliable and consistent methods for hazard rating are desperately needed. Methods applied by the authors appear well-justified. These are the article’s greatest strengths.
The article suffers quite a bit from what appears to be a hastily submitted manuscript. The title itself is rambling and not even complete. I would not consider the contents of the article to be a “spatial modelling” of outbreak emergence. Such a model implies spatial process – and other than proximity metrics and the fact that the results can be mapped, there is no spatial process investigated here. Rather it is more aligned with this line in the introduction: “Thus far, no attempts have been made to construct a risk assessment model based on these characteristics, similar to the models developed for bark beetles”. I would agree with the authors that this is a real need. Past studies to date have sought empirical predictors of silk moth activity, and honestly, that is essentially what the authors have repeated here. An actual risk assessment model requires synthesis across these different studies, including this one. So I would rethink the premise of the study and what it actually contributes to the literature, and revise the title, abstract, and introduction accordingly.
I struggled to follow the authors’ logic in part because the variables were either assigned cryptic acronyms (e.g., A for age; forest types with acronyms that did not correspond to their English language descriptions, etc.) or classes that were never fully described. I still have no idea what “bonitet” means (Figure 6f). The article desperately needs a summary table describing the variables and units, and if acronyms are applied, they should correspond in some fashion to corresponding English terms. The supplementary materials were a simple collection of graphs without any figure caption or description of terms necessary to interpret them. The end result is a frustrated reader working back and forth between methods, graphics, and interpretations in the Results and Discussion – and the reader should not have to work that hard. I actually think that the study found some interesting and useful results – but the authors should rework the presentation of their results in a more logical manner.
Specific comments
Lines 216-220 – the direction of host value with class 1 >80% dark conifer and 10 equal proportions of coniferous and deciduous is neither intuitive nor consistent with other metrics used (e.g., fir “units”
Line 219-221 – why were some forest types ignored in the remote sensing data and not in the ground data?? Relates to lines 238-240 and lines 418-419?
Line 274 – what does a “10%” defoliation of a compartment represent? How large are these forest compartments?
Table 3 – typo for 2016 RS data
Figure 7 D – A good example where you need to make the variable easier to understand (distance_5?) and the units explicit. A figure should be interpreted easily with just its figure caption.
Line 400 How does one call 3-10 units out of 10 a “maxima”? Instead of units – are these percentage classes?
Comments on the Quality of English LanguageSee above recommendations
Author Response
To Reviewer 1
Dear colleague,
We are very grateful to You for Your time and efforts. Your comments give us a good chance to improve our manuscript. We hope, we were able to took Your advices made the text better.
Best wishes,
Authors
—-
The article suffers quite a bit from what appears to be a hastily submitted manuscript. The title itself is rambling and not even complete. I would not consider the contents of the article to be a “spatial modelling” of outbreak emergence. Such a model implies spatial process – and other than proximity metrics and the fact that the results can be mapped, there is no spatial process investigated here. Rather it is more aligned with this line in the introduction: “Thus far, no attempts have been made to construct a risk assessment model based on these characteristics, similar to the models developed for bark beetles”. I would agree with the authors that this is a real need. Past studies to date have sought empirical predictors of silk moth activity, and honestly, that is essentially what the authors have repeated here. An actual risk assessment model requires synthesis across these different studies, including this one. So I would rethink the premise of the study and what it actually contributes to the literature, and revise the title, abstract, and introduction accordingly.
After another revision of our text we have added some remarks about risk assessment model development and remove the mention about spatial modelling from the title. We hope, the small insertions are enough for correction of the sense of text.
I struggled to follow the authors’ logic in part because the variables were either assigned cryptic acronyms (e.g., A for age; forest types with acronyms that did not correspond to their English language descriptions, etc.) or classes that were never fully described. I still have no idea what “bonitet” means (Figure 6f). The article desperately needs a summary table describing the variables and units, and if acronyms are applied, they should correspond in some fashion to corresponding English terms.
We have added two tables for the better readibility. The first one describes the ground survey-based predictors, and the second one additionally describes the groups of forest types. Finally, we removed the acronyms as completely as possible (the only exception is RS for remote sensing).
The supplementary materials were a simple collection of graphs without any figure caption or description of terms necessary to interpret them.
We have removed Supplementary materials (see below).
The end result is a frustrated reader working back and forth between methods, graphics, and interpretations in the Results and Discussion – and the reader should not have to work that hard. I actually think that the study found some interesting and useful results – but the authors should rework the presentation of their results in a more logical manner.
For better readibility we have rearranged Discussion section and put there the plots from Supplementary materials.
Specific comments
Lines 216-220 – the direction of host value with class 1 >80% dark conifer and 10 equal proportions of coniferous and deciduous is neither intuitive nor consistent with other metrics used (e.g., fir “units”
I guess, this comment is related to our mistake. It must be mentioned, that these data have been obtained by remote sensing, but not ground survey. It explains the difference between metrics. We add the necessary remark to l. 217–219. I agree also that the class codes (1 and 10) are not intuitive, but we can not improve it: these codes was assigned by authors of vegetation map we used.
Line 219-221 – why were some forest types ignored in the remote sensing data and not in the ground data?? Relates to lines 238-240 and lines 418-419?
This Your remark is very reasonable. I must give some clarifications. In the modern scientific literature published in English, the forest types are identified and described by dominant tree species. But in the ground survey data which we used, the forest types identification is based on phytosociological methodology developed by J. Braun-Blanquet and modified for purposes of forestry by V.N. Sukachev (l. 85–87). In this case, forest types have been described not by tree layer but by the whole plant community, including mosses and lichens. Such approach allow us evaluate the proximity of the site conditions despite of differences of tree species composition. Of course, remote sensing can not give us any information about forest type in the sence of Braun-Blanquet and Sukachev. Therefore, we don’t use this term with respect to remote sensing data.
Line 274 – what does a “10%” defoliation of a compartment represent? How large are these forest compartments?
We have characterized the matter more precisely (l. 278–279). The area of such compartment can vary from 1 to 506 ha, median area is 18 ha (l. 186–187).
Table 3 – typo for 2016 RS data
Thank You very much; this misprint is corrected.
Figure 7 D – A good example where you need to make the variable easier to understand (distance_5?) and the units explicit. A figure should be interpreted easily with just its figure caption.
We made more detailed and self-explained caption of this figure and some another ones.
Line 400 How does one call 3-10 units out of 10 a “maxima”? Instead of units – are these percentage classes?
We assume here the maximum of the damage probability (see corrected proposition in l. 412–414).
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors used multiple machine learning approaches to model the outbreak areas of the Siberian Moth Dendrolimus sibiricus Tschetv. The manuscript has a clear structure and presentation. I have some recommendations.
1. Lines 282–283 mention: “The classification task was solved using three machine learning algorithms: decision tree (DT), support vector machine (SVM), and extreme gradient boosting (XGB).” There are many machine learning models available, and Random forests (RF) are frequently used. However, the manuscript chose only DT, SVM, and XGB. Please provide appropriate justification for this selection or supplement the analysis with RF.
2. Lines 282–283 also state: “The significance of individual predictors was evaluated through the use of the variable permutation method (feature_importance function from the DALEX package) [59], which was applied 50 times for each variable.”
The term “significance of individual predictors” is not appropriate. This is because feature importance measures a feature's contribution to the model's prediction rather than evaluating its statistical causal significance as in traditional statistics. Feature Importance does not equate to statistical significance. It is recommended to revise the “significance”. Additionally, how were the 50 times conducted?
3. Please improve the resolution of Figures 4 (line 331) and 5 (line 345).
4. Additionally, the manuscript should explain the values of horizontal axis in Figure 5.
5. Moreover, regarding (Figure 5a, b) in line 344 and (Figure 5c, d) in lines 350–351, “abcd” are not labeled in the Figure 5.
Author Response
Dear colleague,
We are very grateful to You for Your time and efforts. Your comments give us a good chance to improve our manuscript. We hope, we were able to took Your advices made the text better.
Best wishes,
Authors
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe author has made revisions based on recommendations. I agree to accept it.
Author Response
Уважаемый коллега,
Мы очень благодарны Вам за Ваше время и усилия. Ваши комментарии дают нам возможность улучшить нашу рукопись. Мы надеемся, что смогли воспользоваться Вашими советами и сделать текст лучше.
С наилучшими пожеланиями,
Авторы