A Two-Step Framework for Mapping, Classification, and Area Estimation of Stand- and Non-Stand-Replacing Forest Disturbances
Round 1
Reviewer 1 Report (Previous Reviewer 4)
Comments and Suggestions for AuthorsGeneral comments
Paper can be compressed for better readability i.e. Abstract is long, some results can be appended, etc; sampling method should be elaborated and even showed say in study area map or separate; and awareness and if possible propagation and estimation of uncertainties given the sequence of steps to derive area estimates at the end.
Specific comments
Line 23-24 can be rephrased for better readability
Line 45 and 255 how are these sampled? randomly? hopefully...
If possible, compare or at least discuss with similar EO products that already mapped the same disturbances
Table 2 Change over-estimation / high commission error make sure you elaborate the discussion and implications, and vice versa since No Change-Change should be almost perfect to subsequently implement the disturbance class mapping
Given the probability layers in FigA1, I wonder why aren't they used to estimate how uncertainties are propagated to area estimates and given that you did a two-step approach 313d and ML classification. I see that some weights are used (Table A3) to derive area-level variances and CI but not sure if they are robust enough.
Table 3 can be summarized and raw tables can be placed in the supplementary materials
Figure 5 zoom-in to hotspots of wildfire and thinning as they are almost not visible in the map
Any results and discussion what covariates explain the variability of disturbances?
Author Response
Reviewer 1
General comments
Paper can be compressed for better readability i.e. Abstract is long, some results can be appended, etc; sampling method should be elaborated and even showed say in study area map or separate; and awareness and if possible propagation and estimation of uncertainties given the sequence of steps to derive area estimates at the end.
Following the reviewer’s suggestion, we have compressed the abstract for a better readability. (L 30-58).
Regarding the appendix of results, some results were initially included in the appendix but were moved back into the main text after the first revision.
Sampling method has been further explained (L 219-233).
From a user-producer’s accuracy point of view, the change detection step increases the User’s Accuracy and the Producer’s Accuracy of No-Change/non-stand replacing class because the great efficiency of 3I3D algorithm to detect changes. This is illustrated in the following confusion matrix, which has been calculated with the complete validation sample (7555 photointerpretation points) for the overall two-step approach, merging the No-Change class with the non-stand replacing class of the disturbance classification step.
As can be noticed when compared with Table 4A in the manuscript, the Producer’s Accuracy do not change for the disturbance classes, because only Change polygons in the first step were classified in the second step. However, omission errors (pixels classified as no change were any disturbance has occurred) in the first step are propagated through the two-step approach, decreasing the User’s Accuracy of the global classification for wildfire, clear-cut and thinning classes. Depending on the final purpose of the analysis, the magnitude threshold may be determined as the ROC optimal or that which maximize the sensitivity or the specifity in the first step.
|
Predicted/Observed |
Non-stand replacing |
Wildfire |
Clear-cut |
Thinning |
Omission error |
Producer’s accuracy |
|
Non-stand replacing |
5963 |
31 |
205 |
97 |
0.05 |
0.95 |
|
Wildfire |
16 |
49 |
18 |
2 |
0.42 |
0.58 |
|
Clear-cut |
149 |
7 |
356 |
46 |
0.36 |
0.64 |
|
Thinning |
33 |
5 |
37 |
98 |
0.43 |
0.57 |
|
Comission error |
0.03 |
0.47 |
0.42 |
0.60 |
|
|
|
User’s Accuracy |
0.97 |
0.53 |
0.58 |
0.40 |
|
0.91 |
We have included an explanation within the manuscript on how uncertainties are propagated across the two steps (L 549-560).
Specific comments
- Line 23-24 can be rephrased for better readability
We have rephrased the bullet point for a better readability. (L 23-24)
- Line 45 and 255 how are these sampled? randomly? hopefully...
We have included in the text that the 7555 photointerpretation points for ROC analysis were (stratified) randomly sampled (L 219-226), whereas 1524 points for disturbance classification training were randomly sampled. (L 254-256)
- If possible, compare or at least discuss with similar EO products that already mapped the same disturbances
We thank the reviewer for the suggestion. It is an excellent idea. We have included a new section for comparison with different EO products: 4.5 Comparison with large-scale EO monitoring system (L 561-586)
- Table 2 Change over-estimation / high commission error make sure you elaborate the discussion and implications, and vice versa since No Change-Change should be almost perfect to subsequently implement the disturbance class mapping
We have discussed this issue in the discussion (556-560)
- Given the probability layers in FigA1, I wonder why aren't they used to estimate how uncertainties are propagated to area estimates and given that you did a two-step approach 313d and ML classification. I see that some weights are used (Table A3) to derive area-level variances and CI but not sure if they are robust enough.
We have now better explained how area-level variances are calculated following Olofsson et al 2014. (L 344-358) and improved Table A3
- Table 3 can be summarized and raw tables can be placed in the supplementary materials
We understand reviewer’s suggestion. Actually, Table 3 were originally included in the supplementary materials but were moved back into the main text after the first revision as they were main results.
- Figure 5 zoom-in to hotspots of wildfire and thinning as they are almost not visible in the map.
We have improved Figure 5 with zoom-in to hotspots
- Any results and discussion what covariates explain the variability of disturbances?
We have further explained how the covariates account for differences in disturbances. (L. 377-382).
We present information of covariates and predictive performance in the discussion section (L 534-537).
Reviewer 2 Report (New Reviewer)
Comments and Suggestions for AuthorsThe manuscript presents a clear and practically oriented framework that combines 3I3D-based change detection with supervised classification to distinguish disturbance types in Mediterranean forests. The study is generally well structured, and the workflow is described in sufficient detail. My comments are as follows, for the author's reference.
- The formulas for NDMI and NBR are written incorrectly as typed. They should be ratios with parentheses in the numerator and denominator, not B8-B11/B8+B11. This needs to be fixed because it affects reproducibility.
- The choice of threshold 205 is presented as “optimal”, but the operational implication is a change user’s accuracy around 43%. If the goal is an official style area estimate, you need to justify why this level of commission is acceptable, and what downstream safeguards exist.
- The reference sampling for the 7,555 points is described as random, but I cannot tell whether you ensured coverage of rare disturbances or heterogeneous forest conditions. If it is purely random, say so and discuss the consequences.
- The photointerpretation buffer is 5 m, while several Sentinel 2 inputs are at 20 m. Explain the handling of mixed resolutions (resampling method, target grid), otherwise the label support vs pixel support is muddy.
- Your disturbance training set is heavily imbalanced (for example, wildfire polygons are few and much larger than others). Class weights help, but they do not solve representativeness. At minimum, discuss how this limits generalization, especially for wildfire and thinning.
- The 10 fold cross validation is likely optimistic if nearby polygons share conditions. A spatially blocked validation, or at least a sensitivity check that avoids training and testing on neighboring polygons, would make the accuracy claims more believable.
- “Non stand replacing disturbance” is effectively a residual class, everything detected as change that you cannot confidently label as wildfire, clear cut, or thinning. Treating it as a coherent disturbance type is conceptually risky. I would either rename it as an “other or unassigned change” class, or better characterize what it contains.
- The variable selection step relies on correlation filtering plus “expert criteria”. That can work, but you should be explicit about how you chose between correlated variables and whether the selection was done inside each training fold or once on the full dataset.
- The area estimation angle is currently under explained. You cite an area estimation framework, but the manuscript mostly reports mapped areas. If you want to claim suitability for official statistics, show the estimator clearly, report uncertainty, and explain what “Map OA = 0.77” in Table A3 represents relative to the earlier cross validation metrics.
- The comparison with national wildfire statistics is interesting, but you should unpack why event counts diverge so strongly while total area is close. Minimum mapping unit, forest mask, patch aggregation, and the seasonal compositing window are all plausible drivers. Right now it reads like a quick aside.
- Figure 5 is the main product, but it is dominated by the non stand replacing class. Add one or two zoomed in insets in representative regions so readers can see clear cut, thinning, and wildfire patterns at a meaningful scale.
- Tables 2 and 4 are informative, but I would add one compact per class summary metric beyond user and producer accuracy (for example, F1), because the thinning performance is otherwise easy to misread.
Author Response
Reviewer 2
The manuscript presents a clear and practically oriented framework that combines 3I3D-based change detection with supervised classification to distinguish disturbance types in Mediterranean forests. The study is generally well structured, and the workflow is described in sufficient detail. My comments are as follows, for the author's reference.
- The formulas for NDMI and NBR are written incorrectly as typed. They should be ratios with parentheses in the numerator and denominator, not B8-B11/B8+B11. This needs to be fixed because it affects reproducibility.
The reviewer is correct. Formulas as they were written led to confusion. We have revised them to improve clarity. (L 192).
- The choice of threshold 205 is presented as “optimal”, but the operational implication is a change user’s accuracy around 43%. If the goal is an official style area estimate, you need to justify why this level of commission is acceptable, and what downstream safeguards exist.
As the reviewer notes, a user’s accuracy of 43% may appear high for an official disturbance area estimate. However, the main point of our study lies precisely in this two-steps approach. The second step allow us to reclassify these initial false positives into a separate class named as non-stand replacing disturbances, where several factors may change the reflectance value, from a subtle change such as insects activity, increases in vegetation vigor, or situations where no meaningful change has occurred. Therefore, the two-steps framework ensures the consistency of the final area estimates.
One of the main strengths of this framework is its flexibility: the analysis can be adapted to the characteristics of the region and to the specific objectives of the study, prioritizing either sensitivity or specifity, or both, as disturbance regimes vary depending on the ecological context.
For example, we are currently evaluating this framework in southern Europe, where a wider range of disturbance types is expected. In such cases, additional disturbance classes may emerge and could be incorporated into the classification scheme. We also applied this method in a particular region of Spain, Galicia, where the main objective was to determine the age of the affected stands. In this case, stakeholders were particularly interested in ensuring that the disturbance identified truly corresponded to a disturbance event. For this reason, minimizing false positives (commission errors) was a priority. Consequently, the classification threshold was selected using the ROC analysis in a way that prioritized high specificity (i.e., lower commission error), even if this slightly reduced sensitivity (i.e., increased omission error).
We have rephrased the sentence (L 217-218)
We have emphasized this in the discussion section (L 486-488).
- The reference sampling for the 7,555 points is described as random, but I cannot tell whether you ensured coverage of rare disturbances or heterogeneous forest conditions. If it is purely random, say so and discuss the consequences.
The 7 555 points sample was stratified with different sampling rate for no change (0.00001) and change class (0.00010), although sampling points location was selected randomly within each class. We have clarified this point (L 219-221).
- The photointerpretation buffer is 5 m, while several Sentinel 2 inputs are at 20 m. Explain the handling of mixed resolutions (resampling method, target grid), otherwise the label support vs pixel support is muddy.
The reviewer is correct. A 5m radius doesn’t correspond to a 20m pixel. There is a typo. The buffer radius actually used was 10 m, which corresponds to the radius of a circle circumscribed within a 20 m pixel. We apologize for the error. (L 227 and L 228) and Figure 2.
- Your disturbance training set is heavily imbalanced (for example, wildfire polygons are few and much larger than others). Class weights help, but they do not solve representativeness. At minimum, discuss how this limits generalization, especially for wildfire and thinning.
As in Altalhan et al. (2025) there are three main ways to handle imbalance in data: data-level techniques, algorithm-level techniques, and integrated methods. To improve representativeness at data-level, random training-validation data selection was pixel based, so each class sample was approximately proportional to the area occupied by the class. For instance, as wildfire polygons are much larger than non-stand replacing disturbances, wildfire sample covers more than 1/3 of wildfire polygons, providing a representative sample. Respecting algorithm-level, metrics such as AUCPR (area under the precision-recall –PR– curve) and AUCROC(area under the receiver operating characteristic –ROC– curve), have been proposed and shown to be effective in classifying tasks with imbalanced data (Altalhan et al. 2025 and Bradter et al. 2022).
We have included in 4.2 section a discussion regarding imbalanced on number and size of disturbances. (L 507-514)
Also, we have now included the F1-score in Table 3, which provides a realistic accuracy measure for unbalanced data.
- The 10 fold cross validation is likely optimistic if nearby polygons share conditions. A spatially blocked validation, or at least a sensitivity check that avoids training and testing on neighboring polygons, would make the accuracy claims more believable.
In order to prevent neighbouring points in the training/validation data set, the 1 524 sampling points were spatially randomized, thereafter eliminating those points that share the same change polygon. We visually checked also for the absence of spatial correlation between polygons of the same disturbance class.
- “Non stand replacing disturbance” is effectively a residual class, everything detected as change that you cannot confidently label as wildfire, clear cut, or thinning. Treating it as a coherent disturbance type is conceptually risky. I would either rename it as an “other or unassigned change” class, or better characterize what it contains.
We have better described what non-stand replacing class contains in the Material and methods (L 246-250) and in the discussion section (L 549-555).
- The variable selection step relies on correlation filtering plus “expert criteria”. That can work, but you should be explicit about how you chose between correlated variables and whether the selection was done inside each training fold or once on the full dataset.
We employed an iterative process in which uncorrelated variables were progressively selected while evaluating the classification results at each iteration. When two variables exhibited multicollinearity (Pearson correlation coefficient, R > 0.70) expert criteria were used to select the one considered more relevant for the study regarding ecological and spectral characteristics. The resulting single, consistent set of variables was used throughout both the training and validation phases of the classification process. (L 262-270).
- The area estimation angle is currently under explained. You cite an area estimation framework, but the manuscript mostly reports mapped areas. If you want to claim suitability for official statistics, show the estimator clearly, report uncertainty, and explain what “Map OA = 0.77” in Table A3 represents relative to the earlier cross validation metrics.
We have included in the manuscript how estimators are calculated (L 344-358) and improved Table A3.
- The comparison with national wildfire statistics is interesting, but you should unpack why event counts diverge so strongly while total area is close. Minimum mapping unit, forest mask, patch aggregation, and the seasonal compositing window are all plausible drivers. Right now it reads like a quick aside.
The divergence in event counts can be explained by several factors mentioned by the reviewer. First, the national statistics include open woodlands, which were not considered in this study and may lead to differences in the detected area. Second, patch aggregation may introduce discrepancies in the number of events. In addition, the minimum mapping unit (MMU) strongly affects event counts, as multiple small disturbances may be aggregated into a single event. However, it is important to note that these factors primarily influence the number of detected events rather than the total disturbed area.
We have further discussed this result in the revised manuscript. (L 578-586)
- Figure 5 is the main product, but it is dominated by the non stand replacing class. Add one or two zoomed in insets in representative regions so readers can see clear cut, thinning, and wildfire patterns at a meaningful scale.
We have improved Figure 5 with zoom-in insets.
- Tables 2 and 4 are informative, but I would add one compact per class summary metric beyond user and producer accuracy (for example, F1), because the thinning performance is otherwise easy to misread.
We have included F1-score accuracy assessment (L 331-343) and improved Table 4.
Reviewer 3 Report (New Reviewer)
Comments and Suggestions for AuthorsReview of “A two-step framework for mapping, classification, and area estimation of stand- and non-stand-replacing forest disturbances”
This study is timely in the sense that it covers a topic of high relevance in today’s climate change and effects on forest. It is well done with useful statistical methods, and it is well written in the context of other studies. I recommend this to be published after some revisions, mainly on text details and wording.
The objectives of the study and the conclusion should be better linked. While the objectives were to develop a scalable and operational disturbance monitoring, they don’t conclude on whether they have achieved this. Nothing is said about scalability in the conclusion. Neither have they concluded whether the solution was or could be operationalized. They do, however, say that the method shows strong potential for disturbance monitoring, but how far is that from an operational monitoring? Another detail that should be rewritten in the conclusion is that they should replace the word “improves”. The problem here is that they say that the approach improves disturbance mapping, however, firstly, they do not refer to what the existing method was, and secondly, an improvement does not say anything about how good their method is.
For the term disturbance, in general, the singular tense should be used when you talk about forest disturbance in general, while the plural tense should be used when you consider a number of specific types of disturbance.
Headings at level 3 are not given a number. If the journal supports having numbered headings at this level I recommend using it.
Details:
L31-32: If you want to keep this claim that it has increased, please provide a reference for it.
L71: Keep the terms in singular tense, i.e. wildfire and outbreak, to keep them in line with drought and wildfire.
L72: I suggest: «accelerating decline in marginal forest populations»
L74: maybe replace «increasing forest vulnerability to pests» by «increasing forest vulnerability to secondary pests and diseases»
L75: an increase in
L79: Please specify what this refers to: «shifts in intensity, frequency or spatial patterns», - I suggest to add «of disturbance»
L83: «forest disturbance processes»
L84: Delete «this effective»
L86: Use singular tense: «condition and disturbance»
L87: It is unclear what “may” and “its” refers to. You may want to reword this sentence, and my suggestion is: “In this context, traditional National Forest Inventories (NFIs) provide accurate and essential field-based information for assessing forest condition and disturbance and the NFIs may benefit from complementary approaches to better capture the spatial extent and temporal dynamics of disturbance.”
L289: Use “for each classification tree” to avoid misunderstanding since this is a forestry study.
L90: “disturbance”.
L102: specific types of disturbance
L103: What do you mean by «broader» here? Several insect species?
L103: phenology anomalies
L104: Drop the comma and «even»: [28] and urban expansion [29],
L105: Above you have used the terms «algorithms», «systems» and «approach», and now on this line you say «most of these algorithms», - the problem here is that we don’t know if «algorithms» here also cover the «methods» and «approaches» that you mentionned above.
L108: The same as the previous point on your use of «approaches»
L121: disturbance differs
L123: disturbance
L127: distinguish between
L139: Discard: «To this end»
L140: Maybe a comma «adaptive, two-step»
L141: It is unclear here whether «integrating machine …, a photointerpreted, and S2 imagery»were already parts of the 3I3D algorithm or whether these are your own and further developments to be added to 3I3D. Please clarify!
L142-153: This is a text containing technical details and I recommend moving it to the materials and methods section. It could make up an ingress before the heading 2.1.
L156: Replace “Forest ecosystems” by “Forests”
L161: Use space instead of comma here if you mean one thousand eight hundred…: 1,842
L161: In this study
L163: 1:50 000
L165: sparse canopy cover
L166: wildfire
L169: «natural regeneration or planting»
Figure 1: The inlet figure of Spain would benefit if you used two colors with higher contrast. Now, scattered pixels of green are hardly seen against the grey background. For example use light yellow.
L180: Introduce GEE
L180-181: 2 528 and 3 124 etc
L184: What about cloud shadows? Did you also remove them?
L185: image composites
L188: «.. selecting the observation with the median multi-dimensional value across all considered bands..» This is unclear. With «observation» you mean for one, given pixel you select its value from one S2 dataset? With median multi-dimensional you mean the median value for each pixel in the multi-dimensional dataset of band times x times y coordinates? And, altogether, does this mean that for each pixel, the resulting value could come from different acquitions?
L191: such as
L194: forest change
L196: Insert «index» before (NDMI)
L200: Maybe replace «scene» by «median composit image»
L215: «… the overall change magnitude for each pixel, calculated as the sum of the two vector magnitudes, ranging from 0 to 255». This means that each vector magnitude |a| and |b| should range between 0 and 255, or that their sum should have that range?
L219: lower case t in threshold
L225: How did you apply the ROC to select the best threshold value? You took the one having the highest area under the ROC curve?
L226: Delete: «For this end,»
L226: a random sample
L226: 7 555
L232: «To match the spatial resolution of Sentinel-2 imagery», - do you here mean the uncertainty of the S2 geocoding, i.e. the random error of the location of a given pixel?
L233: I suggest starting this sentence as: «For ground truthing, we manually classified ….»
L232-239: This section appears as the description of how you obtained the ground truh, and hence, it could be singled out as a separate paragraph with its own subtitle. At least do it as you did below for disturbance classification, where you had a subtitle on L254.
L248: classified the pixel into one of the predominant disturbance types …
L255: 1 524
L257: disturbance
L257: What if a few trees died from bak beetle attacks, it is not a stand replacing disturbance, but it affects the stand density.
L258: What if an entire stand was killed by drought stress and bark beetle attacks? Which category would that be?
L266: What are the geometrical metrics? I don’t find any in Giannetti et al. 2020.
L319-338: This section seems to mainly consider binary classification accuracy. How was the accuracy of 4 classes of disturbance type assessed?
L346: replace “Low omission errors” with either “Low omission error rates” or “Few omission errors»
L346: replace «at expense for relatively low User’s accuracy» by «at the expense of User’s accuracy»
L353: 264 900 ha
L360: Be more specific here. Rewrite to e.g. «… wildfires were well characterized by having high perimeter standard deviation and high shape index..». Follow up along this way for the other disturbance types.
L361: Are perimeter standard deviation, shape index and fractal dimension three examples of the geometrical metrics mentionned in L266? Are they described anywhere in the mansucript?
L357-364: Some of, or most of, this section seems to belong to Materials and Methods section.
Fig. 4: What is module a and module b? Are they the magnitude of the vectors a and b?
L369: an MCC
L370: leaf size of 3, what is the unit here?
L373: what is the He initializer?
Fig. 5: I think maybe it would be better if the change-nochange map had the same size as the classification map, and expand both of them as much as possible within the page margins.
Fig. 5, caption: It says in mainland Spain, but some islands are shown.
Fig. 5: Which time period does these maps represent? It shouldbe given in the figure.
Author Response
Reviewer 3
Review of “A two-step framework for mapping, classification, and area estimation of stand- and non-stand-replacing forest disturbances”
This study is timely in the sense that it covers a topic of high relevance in today’s climate change and effects on forest. It is well done with useful statistical methods, and it is well written in the context of other studies. I recommend this to be published after some revisions, mainly on text details and wording.
The objectives of the study and the conclusion should be better linked. While the objectives were to develop a scalable and operational disturbance monitoring, they don’t conclude on whether they have achieved this. Nothing is said about scalability in the conclusion. Neither have they concluded whether the solution was or could be operationalized. They do, however, say that the method shows strong potential for disturbance monitoring, but how far is that from an operational monitoring? Another detail that should be rewritten in the conclusion is that they should replace the word “improves”. The problem here is that they say that the approach improves disturbance mapping, however, firstly, they do not refer to what the existing method was, and secondly, an improvement does not say anything about how good their method is.
We have better linked the objectives with the conclusion by empathising the scalability and adaptability of the framework proposed. (L 627-628).
In addition, we have replaced the word “improves” with “scalable and tailored” which also reinforces previous reviewer’s observation. (L 611-615)
For the term disturbance, in general, the singular tense should be used when you talk about forest disturbance in general, while the plural tense should be used when you consider a number of specific types of disturbance.
We thank the reviewer for the observation. The text has been revised accordingly.
Headings at level 3 are not given a number. If the journal supports having numbered headings at this level I recommend using it.
Following reviewer recommendation, we have added third level headings.
Details:
L31-32: If you want to keep this claim that it has increased, please provide a reference for it.
I appreciate reviewer’s comment and understand the concern. However, we prefer to keep the abstract free of references to maintain a better flow and readability of the text. However, the same statement is also included in the introduction, where the appropriate references have been added [1,2] L (65)
References:
- Patacca, M.; Lindner, M.; Esteban, M.; et al. Significant increase in natural disturbance impacts on European forests since 642 1950. Glob. Change Biol. 2023, 29, 1359–1376. https://doi.org/10.1111/gcb.16531 643
- Seidl, R.; Schelhaas, M.J.; Rammer, W.; Verkerk, P.J. Increasing forest disturbances in Europe and their impact on carbon stor-644 age. Nat. Clim. Chang. 2014, 4, 806–810. https://doi.org/10.1038/nclimate2318
L71: Keep the terms in singular tense, i.e. wildfire and outbreak, to keep them in line with drought and wildfire.
We have provided the text as suggested. (L 65)
L72: I suggest: «accelerating decline in marginal forest populations»
We have provided the text as suggested. (L 66)
L74: maybe replace «increasing forest vulnerability to pests» by «increasing forest vulnerability to secondary pests and diseases»
We have provided the text as suggested. (L 67)
L75: an increase in
We have corrected the typo. (L 69-70)
L79: Please specify what this refers to: «shifts in intensity, frequency or spatial patterns», - I suggest to add «of disturbance»
We have corrected the text as suggested. (L 73)
L83: «forest disturbance processes»
We have corrected the text (L 77)
L84: Delete «this effective»
Done (L 78)
L86: Use singular tense: «condition and disturbance»
Done (L 80)
L87: It is unclear what “may” and “its” refers to. You may want to reword this sentence, and my suggestion is: “In this context, traditional National Forest Inventories (NFIs) provide accurate and essential field-based information for assessing forest condition and disturbance and the NFIs may benefit from complementary approaches to better capture the spatial extent and temporal dynamics of disturbance.”
Thanks for the suggestion. The sentence has changed considerably due to the review process (L 80-85)
L289: Use “for each classification tree” to avoid misunderstanding since this is a forestry study.
Thanks for the suggestion. We have reworded the sentence (L 290-291)
L90: “disturbance”.
Done (L 82)
L102: specific types of disturbance
Done (L 96-97)
L103: What do you mean by «broader» here? Several insect species?
Thanks for the suggestion. We have specified the agent: pine tortoise scale outbreaks (L 97)
L103: phenology anomalies
Thanks for the suggestion. (L 97-98)
L104: Drop the comma and «even»: [28] and urban expansion [29],
Done (L 98)
L105: Above you have used the terms «algorithms», «systems» and «approach», and now on this line you say «most of these algorithms», - the problem here is that we don’t know if «algorithms» here also cover the «methods» and «approaches» that you mentionned above.
The reviewer is right. We have revised this section and unified the terms. (L 87-104)
L108: The same as the previous point on your use of «approaches»
Done (L 86-103)
L121: disturbance differs
We have removed the sentence due to redundancy (L 115)
L123: disturbance
Done (L 115-116)
L127: distinguish between
Thanks for the suggestion. We have corrected the typo. (L 119)
L139: Discard: «To this end»
We have change “to this end” for “We propose”. (L 132-133)
L140: Maybe a comma «adaptive, two-step»
Done. (L 133)
L141: It is unclear here whether «integrating machine …, a photointerpreted, and S2 imagery»were already parts of the 3I3D algorithm or whether these are your own and further developments to be added to 3I3D. Please clarify!
We have removed this part, as it is explained more clearly later and could lead to misunderstandings (L 132-134)
L142-153: This is a text containing technical details and I recommend moving it to the materials and methods section. It could make up an ingress before the heading 2.1.
We appreciate the reviewer’s comment. However, after considering the suggestion, we have decided to keep the text in the Introduction section, as we believe it is important for a better understanding of the subsequent workflow.
L156: Replace “Forest ecosystems” by “Forests”
Done (L 147)
L161: Use space instead of comma here if you mean one thousand eight hundred…: 1,842
We have corrected the typo. (L 152)
L161: In this study
We have corrected the typo. (L 152)
L163: 1:50 000
Done. (L 154)
L165: sparse canopy cover
Done. (L 156)
L166: wildfire
Done. (L 158)
L169: «natural regeneration or planting»
Done. (L 161-162)
Figure 1: The inlet figure of Spain would benefit if you used two colors with higher contrast. Now, scattered pixels of green are hardly seen against the grey background. For example use light yellow.
We have updated the Figure 1 for a better interpretation of the data.
L180: Introduce GEE
The reviewer is right, we haven’t introduced GEE previously in the text. We have corrected it (L 172-176)
L180-181: 2 528 and 3 124 etc
We have corrected the typo. (L 171-176)
L184: What about cloud shadows? Did you also remove them?
Cloud shadows were not explicitly removed. Medoid algorithm uses COPERNICUS/S2_CLOUD_PROBABILITY dataset to establish a threshold on the probability of cloud presence. Therefore, only pixels exceeding the defined cloud probability threshold were masked, while cloud shadows were not specifically identified or removed (Francini et al. 2023).
We have removed “After cloud masking” sentence to avoid possible misinterpretation. (L. 178)
L185: image composites
Done. (L 178)
L188: «.. selecting the observation with the median multi-dimensional value across all considered bands..» This is unclear. With «observation» you mean for one, given pixel you select its value from one S2 dataset? With median multi-dimensional you mean the median value for each pixel in the multi-dimensional dataset of band times x times y coordinates? And, altogether, does this mean that for each pixel, the resulting value could come from different acquitions?
We have rephrased the sentence for a better readability (L 179-182):
“Originally developed for Landsat data [57] and later adapted for Sentinel-2 imagery (e.g., [35,58]), this approach assigns to each pixel the surface reflectance values closest to the multidimensional median value among the selected images, providing robustness against outliers and extreme values.”
L191: such as
Done. (L 185)
L194: forest change
Done. (L 187)
L196: Insert «index» before (NDMI)
Done. (L 189)
L200: Maybe replace «scene» by «median composit image»
Thanks for the suggestion. (L 193)
L215: «… the overall change magnitude for each pixel, calculated as the sum of the two vector magnitudes, ranging from 0 to 255». This means that each vector magnitude |a| and |b| should range between 0 and 255, or that their sum should have that range?
The sum should range between 0 and 255.
We have rephrased the sentence for a better readability. (L 205-210)
L219: lower case t in threshold
We have corrected the typo. (L 211)
L225: How did you apply the ROC to select the best threshold value? You took the one having the highest area under the ROC curve?
We selected the one which optimizes both sensitivity and specificity.
We have specified this within the manuscript (L217-218) and further in the discussion (L 486-488)
L226: Delete: «For this end,»
We replaced “for this end” with “We conducted”. (L 219)
L226: a random sample
Done (L 219)
L226: 7 555
Done (L 219)
L232: «To match the spatial resolution of Sentinel-2 imagery», - do you here mean the uncertainty of the S2 geocoding, i.e. the random error of the location of a given pixel?
We refer to match the spatial extent of a 20m Sentinel-2 pixel. We have rephrased the sentence for a better readability (L 227-228).
L233: I suggest starting this sentence as: «For ground truthing, we manually classified ….»
Thank you for the suggestion. We have considered it; however, we do not feel it is appropriate to modify the beginning of the sentence, as calibration data may not represent ground truth.
L232-239: This section appears as the description of how you obtained the ground truh, and hence, it could be singled out as a separate paragraph with its own subtitle. At least do it as you did below for disturbance classification, where you had a subtitle on L254.
We thank the reviewer for this suggestion. However, we feel that the classification of buffered points is an integral part of the calibration process. Splitting this section into two separate parts or adding a subsection would create too many levels and could interrupt the logical flow, making it more difficult for readers to follow the full calibration procedure.
Nevertheless, we understand reviewer’s point. To better reflect the scope of the section, we have revised the title so that it more clearly encompasses all the steps described (L 211)
L248: classified the pixel into one of the predominant disturbance types …
We have rephrased the sentence (L 242-243)
L255: 1 524
Done (L 254)
L257: disturbance
Done (L 256)
L257: What if a few trees died from bark beetle attacks, it is not a stand replacing disturbance, but it affects the stand density.
If only a few trees died from bark beetle attacks, the disturbance is extremely subtle and occurs at the level of individual trees. Such localized disturbances would affect only a very small fraction of a Sentinel-2 pixel and would be difficult to detect using the proposed framework. This is what we refer to in Section 5 when discussing the “limitations inherent to the spatial resolution of Sentinel-2 imagery”. Techniques such as proposed by Karlson et al. or Ji et al., suggested in the manuscript, which rely on tree crown segmentation using finer resolution data, may be capable of detecting disturbances at the level of individual trees.
To clarify this point we have specified the explanation in the aforementioned section of the manuscript (L 549-554 and L 594).
L258: What if an entire stand was killed by drought stress and bark beetle attacks? Which category would that be?
Excellent question. In that case, it would be assigned to a separate class (i.e. windthrow). However, during the photointerpretation process we haven’t identify this disturbance class in our study area, except for a single polygon affected by wind (large enough to exceed the size of a Sentinel pixel), what we described as rare events, disturbances that were insufficiently represented. In such rare cases, the detected change would be assigned to non-stand replacing class during photointerpretation. (L 246-249)
L266: What are the geometrical metrics? I don’t find any in Giannetti et al. 2020.
We are afraid there may have been a misunderstanding regarding the references. The citation we refer to is Hermosilla et al. (L 263):
Hermosilla, T.; Wulder, M.A.; White, J.C.; et al. Regional detection, characterization, and attribution of annual forest change 736 from 1984 to 2012 using Landsat-derived time-series metrics. Remote Sens. Environ. 2015, 170, 121–737 132. https://doi.org/10.1016/j.rse.2015.09.004
Geometrical metrics are: Perimeter, Compactness, Fractal dimension and Shape index
L319-338: This section seems to mainly consider binary classification accuracy. How was the accuracy of 4 classes of disturbance type assessed?
We have explained how AUCROC, AUCPR and F1-score have been calculated for each class and the Matthews correlation coefficient is calculated in the frame of multiclass classification. (L 328-330) and (L 333-334)
L346: replace “Low omission errors” with either “Low omission error rates” or “Few omission errors»
We have replaced “Low omission errors” with “Low omission error rates”. (L 366)
L346: replace «at expense for relatively low User’s accuracy» by «at the expense of User’s accuracy»
We have replaced the term. (L 367)
L353: 264 900 ha
We have corrected the typo. (L 373)
L360: Be more specific here. Rewrite to e.g. «… wildfires were well characterized by having high perimeter standard deviation and high shape index..». Follow up along this way for the other disturbance types.
Following reviewers’ suggestion, we have clarified this section. (L 377-382)
L361: Are perimeter standard deviation, shape index and fractal dimension three examples of the geometrical metrics mentionned in L266? Are they described anywhere in the mansucript?
The reviewer is correct. These are the geometrical metrics mentioned. More information can be found in Hermosilla et al. 2015.
L357-364: Some of, or most of, this section seems to belong to Materials and Methods section.
We have move most of the information into Materials and Method section. (L 262-270).
Fig. 4: What is module a and module b? Are they the magnitude of the vectors a and b?
That is correct. The module a and b are the length of vectors a and b. There was a previous typo when defining these variables: we have changed “magnitude” for “module” in 2.3 Step 1: 3I3D Change detection section. (L 205-210)
L369: an MCC
We have corrected the typo. (L 387)
L370: leaf size of 3, what is the unit here?
Leaf size of 3 samples. We have clarified in the text (L 388)
L373: what is the He initializer?
For each layer, the weights are sampled from a normal distribution with zero mean and variance 2/I, where I is the input size for the layer (L 313-315). This initialization ensures that symmetry is broken and that the network can begin learning.
Fig. 5: I think maybe it would be better if the change-nochange map had the same size as the classification map, and expand both of them as much as possible within the page margins.
We have improved Figure 5 for a better interpretation.
Fig. 5, caption: It says in mainland Spain, but some islands are shown.
The reviewer is correct. The Balearic Islands are shown in the figure. We have corrected this in the caption. (L 421-425)
Fig. 5: Which time period does these maps represent? It shouldbe given in the figure.
We have clarified the year of the analysis (2018) in the caption. (L 421-425).
Round 2
Reviewer 2 Report (New Reviewer)
Comments and Suggestions for AuthorsRecommended to accept.
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript presents a comprehensive approach for mapping and classifying forest disturbances across Spain using Sentinel-2 imagery and machine learning techniques. While the study offers valuable insights, its most significant weakness lies in its validation methodology.
The study relies on photointerpretation from a single trained analyst as its reference data. As stated in the methods, “The photointerpretation was carried out by a trained analyst through visual comparison of orthophotos…” Validation data should be objective and robust. However, using the interpretation of only one analyst introduces a high degree of subjectivity and potential for random error, which may significantly compromise the reliability of the accuracy assessment.
To strengthen the study, I recommend that the authors incorporate more objective and comprehensive validation data. This could be achieved by:
- Collecting official ground-truth data from various authoritative sources, such as fire departments or forestry agencies.
- Employing a consensus-based photointerpretation approach involving multiple trained analysts (e.g., at least five) to ensure the reference data is more reliable and less prone to individual bias.
Adopting such a rigorous validation strategy would substantially increase the credibility and persuasiveness of the study’s findings.
Reviewer 2 Report
Comments and Suggestions for AuthorsPlease see the attachment.
Comments for author File:
Comments.pdf
Reviewer 3 Report
Comments and Suggestions for Authors This study focuses on national-scale forest disturbance monitoring in Spain, proposing a two-step framework to identify and estimate the area of four disturbance types. It holds practical significance for supplementing national forest inventory data. However, there are multiple aspects requiring improvement regarding research completeness, methodological rigor, and depth of result interpretation, as detailed below:1. Title
The current title fails to highlight the core research focus. It is recommended to revise it to explicitly reflect key elements such as the "two-step framework" (integrating 3I3D and machine learning), "disturbance classification (including high/low-intensity types)", and "operational application value for national forest inventories", to enhance the clarity of the study’s contribution.2. Introduction
- Logical Coherence & Innovation Clarity: The logical flow is not sufficiently clear, and the core innovations are poorly articulated. It remains ambiguous whether the research focus lies in: (1) the combination of Sentinel-2 and photogrammetry data; (2) the classification of multiple disturbance types (especially low-intensity ones); (3) improvements to the 3I3D algorithm; or (4) the development of a scalable operational method. These priorities need to be explicitly clarified to guide readers’ understanding.
- Term Consistency: In Line 145, the study mentions "high- and low-intensity forest disturbances", but subsequent sections focus on "different types of disturbances" (e.g., wildfires, clear-cuts). Terminology should be unified—either consistently use "disturbance intensity" (high/low) or "disturbance type", and clarify the correspondence between the two (e.g., wildfires/clear-cuts as high-intensity, thinning as low-intensity) to avoid confusion.
3. Materials and Methods
The description of methods lacks clarity and key details, particularly in parameter settings and technical specifications, which undermines the reproducibility of the study:- Algorithm Parameter Omission: Sections 2.6 (SVM), 2.7 (RF), and 2.8 (Neural Networks) only introduce the basic principles of the algorithms but provide no information on critical parameter settings. For example:
- For SVM: The type of kernel function (Gaussian/linear/polynomial), box constraint value, and kernel scale used.
- For RF: The number of decision trees, the number of random predictor variables considered at each node split, and the minimum leaf size.
- For Neural Networks: The number of hidden layers, number of neurons per layer, and regularization strength (Lambda).
- Visualization and Map Standardization:
- Line 169: For different disturbance types, it is recommended to include photos of the study area before and after disturbance to make the disturbance characteristics more intuitive, which will help readers understand the distinction between disturbance types (e.g., visual differences between clear-cuts and thinning).
- Line 182: Figure 1 (Spanish forest study area) lacks a scale bar and a north arrow, which are essential for spatial interpretation. The map should be recreated, and direct reliance on reference [52] is not recommended—key spatial elements must be included in the figure itself.
- Sentinel-2 Data Details:
- Line 185: The study uses annual Medoid compositing for Sentinel-2 data, but it does not address whether this approach overly smooths the data, potentially masking intra-annual disturbances (e.g., wildfires occurring in a specific month). This limitation and its impact on disturbance detection need to be discussed.
- Line 189: The product level of Sentinel-2 data (Level-1C or Level-2A) is not specified. This is critical because Level-1C requires atmospheric correction (which affects spectral index accuracy), while Level-2A is atmospherically corrected—omitting this information undermines the reliability of spectral analysis.
- Line 194: A cloud cover threshold of 70% for image selection is excessively high. The study must clarify whether cloud removal was performed and, if so, specify the method (e.g., median filtering, cloud masking using QA bands) to ensure the quality of composited images.
- 3I3D Algorithm Clarity:
- Line 198: The three spectral indices (NDMI, NBR, MSI) used in the 3I3D algorithm lack either their calculation formulas or the specific Sentinel-2 bands employed (e.g., NBR = (NIR - SWIR2)/(NIR + SWIR2), with bands specified). This omission prevents readers from verifying the accuracy of index calculations.
- Line 210: A schematic diagram of the 3I3D algorithm (e.g., illustrating vector calculations for angles φa/θb and magnitude |a|/|b|) should be added to visualize the algorithm’s workflow, as the current textual description is overly abstract.
- Reference Data Coverage:
- Line 225: The study uses data from the Spanish National Plan for Aerial Orthophotography and Google Earth for photointerpretation, but it does not confirm whether these datasets cover the entire 2017–2019 study period. Gaps in temporal coverage could lead to inaccurate labeling of "change/no-change" samples.
- Figure-Text Consistency:
- Line 234: In Figure 2, the yellow-marked change points are not visually prominent, and the 5-meter buffer zone (mentioned in Section 2.4 for sample expansion) is not shown. This creates a disconnect between the figure and the text, as readers cannot link the photointerpretation process to the described buffer zone method.
- Literature Support and Terminology Accuracy:
- Lines 261/271: The introductions to SVM, RF, and other algorithms lack relevant literature citations (e.g., foundational studies on SVM for remote sensing classification or RF for forest disturbance mapping). Citations are necessary to contextualize the choice of these methods.
- Line 316: The term "Mathews Correlation Coefficient" contains a typo; the correct name is "Matthews Correlation Coefficient" (MCC).
- Line 320: In the MCC formula, there is a duplicate "TN" (written as "TN, TN, FP and FN")—this typo should be corrected to "TP, TN, FP and FN" to avoid confusion.
4. Results
The interpretation of results is one-sided and lacks in-depth analysis, with critical errors in data calculation and logical consistency:- Accuracy Assessment of Change Detection: In Table 2, the user’s accuracy for the "Change" class is only 43%, which is unacceptably low. The study fails to acknowledge or analyze this poor performance (e.g., whether it stems from over-detection of non-stand-replacing disturbances or spectral confusion with non-forest land), which weakens the credibility of subsequent disturbance classification.
- Area Calculation Error: Line 338 states that disturbed areas account for 14.3% of the "study area", but Section 2.1 clearly defines the study area as 18.5 Mha (18,500,000 ha) of dense forests. Calculations show 264,900 ha ÷ 18,500,000 ha ≈ 1.43%—not 14.3%. This order-of-magnitude error must be corrected, and the correct proportion of disturbed areas should be reported.
- Variable Selection Omission: Line 345 mentions that 16 variables were selected from 47 initial variables, but the method for this selection (e.g., how multicollinearity was tested, whether feature importance was used) is not explained in the Methods section. This makes it impossible to evaluate the rationality of the final variable set.
- Classification Effectiveness of Tables: Table 4 (disturbance classification results) fails to effectively distinguish between different disturbance types (e.g., low user’s/producer’s accuracy for thinning). The study should analyze the reasons for poor classification performance (e.g., spectral overlap between thinning and non-stand-replacing disturbances) rather than merely presenting data.
- Proportion Calculation Ambiguity:
- Line 374: The 69% proportion for non-stand-replacing disturbances is unclear—it is not specified whether this refers to the proportion of polygons or total area. Given that non-stand-replacing disturbances have small average polygon sizes (0.14 ha), the two metrics would lead to vastly different interpretations.
- Line 375: The proportions of thinning (19%), wildfires (26%), and clear-cuts (55%) sum to 100% (not 169% as noted, but the original text likely misstates the denominator). The study must explicitly clarify the denominator (e.g., "proportion of all detected change pixels" or "proportion of non-stand-replacing excluded change pixels") to avoid misinterpretation.
- Figure Consistency: Figure 5 (disturbance classification map) is inconsistent with Figure 3 (change/no-change map)—the spatial extent of "Change" areas in Figure 3 does not align with the classified disturbance areas in Figure 5. This inconsistency suggests potential errors in the classification workflow and must be corrected.
- Citation Format: Line 399 cites references as [42,3], which violates standard citation practices (e.g., ordering by publication year or citation sequence in the text). Citations should be reordered to comply with journal guidelines.
Reviewer 4 Report
Comments and Suggestions for AuthorsTitle too long.. perhaps get rid of "operational two-step"
Haven't realized teh MPDI journals now have highlights
Line 22 change detection of what?
Line 28 so each model yielded 72%? clarify!
Line 29 how exactly?
53-55 careful - % sounds like accuracy rather than proportion
Line 56 sounds like text is GPT-derived, pls declare if yes in the declaration
Para 1-2 of intro can be merged.. too long to give context about global to national issue
Line 39 the jump to change detection is too early, give more context about RS first
Line 109 onwards mention Avocado also by Decyper et al
Can you give more context in the intro why NFIs can be complemented by RS in terms of distrubance mapping
Line 193 composites based on? mean? median?
LIne 198 Moisture Stress INdex?
Line 212-213 confusing to see notations in the regular text
Table 1 show how teh sample points/polygons are distributed spaitally, are they randomly sampled? - better if you could have sub-section of this e.g. 2.4
Can you compare FIgure 3 wiht other EO-based maps like GFW, high-res land cover like GLC etc?
Can you justify again why Table 3 switches to AUC and not the usual confusion matrix so both emission and commission errors are accounted?
Do tehy have probability layers? Deterministic outputs/maps for area estimation purposes can be risky, check literature
4.3 How about deep-learning based classification? e.g. Masolele et al. papers
Also harmonized landsat and sentinel2 why not use it.. Note also that OpenGeoHub publishes analysis-ready time series of Landsat data
Oh the confusion matrix for the distrubance classes are appended... Why? They are key results!
