A Multi-Temporal Sentinel-2 and Machine Learning Approach for Precision Burned Area Mapping: The Sardinia Case Study
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsLine 384 – The authors should detail the process for obtaining the optimal values for the thresholds and show the ROC curve obtained. In lines 654 and 655, these values are mentioned, but without mathematical or empirical justification.
Figure (2) – I suggest separating the conditions and explaining them individually, especially since the third condition was not explained.
Figure (2) - It is important to explain which model was used to obtain the parameter values used in each condition, or if it was done empirically, then explain how the same technique can be applied to other regions of the world.
Figure (2) - In the third condition, explain what operation the comma represents in the inequality.
Lines 412 and 657 – The authors should provide details about the Random Forest model used and explain the model in the methodology section in a concise manner, detailing the parameters used and the results achieved by the model, as well as which metric was used to evaluate its performance.
Line 572 – Table 3 – I suggest that the authors use the ROC curve as a measure for model validation.
Author Response
Comment 1: Line 384 – The authors should detail the process for obtaining the optimal values for the thresholds and show the ROC curve obtained. In lines 654 and 655, these values are mentioned, but without mathematical or empirical justification.
Response 1: We thank the reviewer for the valuable suggestion . We have now added ROC analysis details to justify the selection of the optimal threshold values for the most relevant indices (SI > 1.5), considering single, couple, and triplet combinations. The resulting ROC curves and their corresponding AUC values are presented in the new Figure 8.These revisions have been included in the Methods (Lines 398-414) and Results sections (Lines 729-741).
Comment 2: Figure (2) – I suggest separating the conditions and explaining them individually, especially since the third condition was not explained.
Response 2: We thank you for this comment. To make the core condition clearer and easier to read, we have now separated the conditions in Table 3 and provided individual explanations for each in the lines 476-512, including the previously missing third condition.
Comment 3: Figure (2) - It is important to explain which model was used to obtain the parameter values used in each condition, or if it was done empirically, then explain how the same technique can be applied to other regions of the world.
Response 3: We thank the Reviewer for the suggestion to clarify how the thresholds were chosen. The thresholds were determined empirically and based on ROC analysis, and the transferability of the model is now discussed more explicitly in Lines 538–544.
Comment 4: Figure (2) - In the third condition, explain what operation the comma represents in the inequality.
Response 4: We thank the Reviewer for pointing this out and apologize for the unclear formulation of Condition 3. All conditions, including the third, are now reported more precisely in Table 3.
Comment 5: Lines 412 and 657 – The authors should provide details about the Random Forest model used and explain the model in the methodology section in a concise manner, detailing the parameters used and the results achieved by the model, as well as which metric was used to evaluate its performance.
Response 5: We thank the Reviewer for this comment. We have added details about the Random Forest model, including the parameters used (ntree = 1000, mtry = 4), variable importance, and model performance evaluated using the out-of-bag (OOB) error, in the Methods (Lines 584-589).
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsAlthough the accurate burned area mapping is significant, I have major concerns about the method and other parts of the manuscript. Please consider the following comments.
- In the title, I don't think the Sardinia island (24000km2) is large enough to be "regional scale". The regional scale usually refers to a larger extent. You don't have to emphasize the regional scale in the title.
- The abstract should be improved. It is not attractive to readers. Please use objectives instead of workflows. Please state the meaning of the study at the end of the abstract.
- The section of materials and methods is extremely long. Some parts are not necessary. For example, please shorten the section 2.3.4. You use some mature indices. It is unnecessary to introduce the details.
- Lines 248-256, this part does not belong to the data section. It should be moved to somewhere in the method.
- Section 2.2.1, Lines 185-188: The decision to use Level-1C Top-of-Atmosphere (TOA) reflectance instead of Level-2A surface reflectance to avoid topographic correction artifacts is controversial. While topographic artifacts are indeed a concern, utilizing TOA data without atmospheric correction introduces significant noise due to temporal variations in aerosol optical thickness and water vapor. This is particularly critical for multi-temporal analysis where "pre-fire" and "post-fire" spectral values are compared. The authors should provide a more rigorous justification or a comparative analysis demonstrating that the errors introduced by atmospheric variability are indeed smaller than those resulting from topographic correction in this specific study area.
- Section 2.2.1, Lines 190-193:The study employs data from 2020 for training and 2024 for validation, leaving a significant temporal gap (2021–2023). Why were these intermediate years excluded? Evaluating the model over a continuous time series would provide better insight into its stability under varying climatic conditions (e.g., wet vs. dry years). Please explain this exclusion or, preferably, include results from the intermediate years to demonstrate the temporal robustness of the model.
- Section 2.3.3.1, Line 427:The text cites "Figure 2" when describing the rule-based algorithm; however, based on the context, it appears the intended citation is "Equation 2". Please verify and correct this citation.
- Section 2.3.3.1, Lines 459 & 464:The authors employ several hard-coded thresholds, such as the 60% decrease limit for the 30-day median MIRBI. What is the physical or statistical basis for setting the recovery threshold specifically at 60%? Relying on static, hard thresholds raises concerns about the model's transferability to regions with different vegetation phenology or soil backgrounds. The authors should clarify how these values were determined (e.g., empirical trial-and-error vs. statistical optimization) and discuss their robustness across different environmental contexts.
- Section 2.3.3.2:Given that the Random Forest (RF) model is a core component of the "local calibration" innovation in this study, the manuscript must detail the model's hyperparameters (e.g., number of trees, maximum depth, minimum samples per leaf). Reproducibility is key, and these parameters are essential for other researchers to replicate or adapt the study.
- Section 2.3.3.3 (Lines 530-531) & Section 3.3 (Lines 689-691):The authors applied a Minimum Mapping Unit (MMU) of 1,600 m² (approx. 4 pixels) to filter noise, which is a common issue associated with pixel-based Random Forest classifiers ("salt-and-pepper" noise). Given this limitation, why did the authors not consider deep learning approaches (such as CNNs or U-Net)? These methods inherently utilize spatial context and neighborhood information to suppress noise, potentially negating the need for such aggressive MMU post-processing. The authors need to better justify the choice of Random Forest over spatially-aware deep learning methods in this context.
- Section 2.3.4, Lines 553-556:The authors state that the reference dataset was derived from GPS measurements corrected by "expert visual interpretation of Sentinel-2 imagery." If the visual interpretation relied on the same Sentinel-2 dataset used for the classification model, this introduces a risk of circularity and overfitting. The "ground truth" may become biased towards features visible in Sentinel-2 rather than actual ground conditions. Independent high-resolution imagery (e.g., Planet, SPOT, or aerial photography) should be used for this correction process to ensure objectivity.
- Section 4, Lines 760-769; Table 4:The manuscript compares the proposed 20m resolution method against coarse-resolution global products like MODIS MCD64A1 (500 m) and CLMS (300 m). This comparison is inherently unbalanced, as results will naturally favor high-resolution inputs. While this highlights the advantage of Sentinel-2 data, it does not strictly validate the superiority of the specific algorithm proposed. A more meaningful comparison would be against other Sentinel-2-based methods or products. Additionally, the comparison with the Sentinel-2-based EFFIS product shows a significant performance gap; the discussion should delve deeper into why EFFIS underperformed—is it due to algorithm design, temporal aggregation methods, or definitional differences?
Author Response
Comment 1: In the title, I don't think the Sardinia island (24000km2) is large enough to be "regional scale". The regional scale usually refers to a larger extent. You don't have to emphasize the regional scale in the title.
Response 1: We thank you and fully agree with your observation, we have revised the title to remove the emphasis on 'regional scale’.
Comment 2: The abstract should be improved. It is not attractive to readers. Please use objectives instead of workflows. Please state the meaning of the study at the end of the abstract.
Response 2: We thank you for the advice, thus we have entirely restructured it to emphasize the research objectives rather than the procedural workflow.
Comment 3: The section of materials and methods is extremely long. Some parts are not necessary. For example, please shorten the section 2.3.4. You use some mature indices. It is unnecessary to introduce the details.
Response 3: We thank the Reviewer for this practical suggestion. We have significantly shortened Section 2.3.4 by removing the detailed descriptions of the DC, OE, and CE metrics.
Comment 4: Lines 248-256, this part does not belong to the data section. It should be moved to somewhere in the method.
Response 4: We thank the Reviewer for this comment. We agree that the detailed description of the land cover map generation is outside the scope of this study. Consequently, this part has been removed from the manuscript, and we now simply refer to the relevant literature for any further details regarding its production.
Comment 5: Section 2.2.1, Lines 185-188: The decision to use Level-1C Top-of-Atmosphere (TOA) reflectance instead of Level-2A surface reflectance to avoid topographic correction artifacts is controversial. While topographic artifacts are indeed a concern, utilizing TOA data without atmospheric correction introduces significant noise due to temporal variations in aerosol optical thickness and water vapor. This is particularly critical for multi-temporal analysis where "pre-fire" and "post-fire" spectral values are compared. The authors should provide a more rigorous justification or a comparative analysis demonstrating that the errors introduced by atmospheric variability are indeed smaller than those resulting from topographic correction in this specific study area.
Response 5: We thank the Reviewer for this insightful comment, which allowed us to clarify a key methodological choice. As suggested, we have updated the manuscript (Lines 190-198) to include a detailed explanation. Our comparative analysis (now supported by Figure 3) demonstrates that in the mountainous regions of Sardinia, the Level-2A topographic correction introduces significant artifacts and erratic fluctuations in the MIRBI time series.
Comment 6: Section 2.2.1, Lines 190-193:The study employs data from 2020 for training and 2024 for validation, leaving a significant temporal gap (2021–2023). Why were these intermediate years excluded? Evaluating the model over a continuous time series would provide better insight into its stability under varying climatic conditions (e.g., wet vs. dry years). Please explain this exclusion or, preferably, include results from the intermediate years to demonstrate the temporal robustness of the model.
Response 6: We thank the Reviewer for this relevant question regarding the temporal gap in our dataset. The decision to validate the model in 2024, after training in 2020, was a strategic choice aimed at evaluating the algorithm’s robustness under contrasting climatic scenarios rather than a continuous time series. We added a clarification in lines 313-316.
Comment 7: Section 2.3.3.1, Line 427:The text cites "Figure 2" when describing the rule-based algorithm; however, based on the context, it appears the intended citation is "Equation 2". Please verify and correct this citation.
Response 7: We thank the Reviewer for the careful check. The typo has been corrected; the citation was intended to refer to Figure 5 and has been updated accordingly in the text.
Comment 8: Section 2.3.3.1, Lines 459 & 464:The authors employ several hard-coded thresholds, such as the 60% decrease limit for the 30-day median MIRBI. What is the physical or statistical basis for setting the recovery threshold specifically at 60%? Relying on static, hard thresholds raises concerns about the model's transferability to regions with different vegetation phenology or soil backgrounds. The authors should clarify how these values were determined (e.g., empirical trial-and-error vs. statistical optimization) and discuss their robustness across different environmental contexts.
Response 8: We thank the Reviewer for the suggestion to clarify how the thresholds were chosen. The threshold of 60% was determined empirically, and the transferability of the model is now discussed more explicitly in Lines 538-544.
Comment 9: Section 2.3.3.2:Given that the Random Forest (RF) model is a core component of the "local calibration" innovation in this study, the manuscript must detail the model's hyperparameters (e.g., number of trees, maximum depth, minimum samples per leaf). Reproducibility is key, and these parameters are essential for other researchers to replicate or adapt the study.
Response 9: We thank the Reviewer for this suggestion. Following the Reviewer’s advice, we have expanded the Methods section (Lines 584-589) to include a comprehensive description of the Random Forest architecture. Specifically, we provided details on the hyperparameters (ntree = 1000, mtry = 4) and the model’s internal validation through the OOB error rate.
Comment 10: Section 2.3.3.3 (Lines 530-531) & Section 3.3 (Lines 689-691):The authors applied a Minimum Mapping Unit (MMU) of 1,600 m² (approx. 4 pixels) to filter noise, which is a common issue associated with pixel-based Random Forest classifiers ("salt-and-pepper" noise). Given this limitation, why did the authors not consider deep learning approaches (such as CNNs or U-Net)? These methods inherently utilize spatial context and neighborhood information to suppress noise, potentially negating the need for such aggressive MMU post-processing. The authors need to better justify the choice of Random Forest over spatially-aware deep learning methods in this context.
Response 10: We thank the Reviewer for this insightful suggestion. We agree that deep learning architectures are excellent for capturing spatial context and reducing 'salt-and-pepper' noise. We intend to explore these methods in future research.
Comment 11: Section 2.3.4, Lines 553-556:The authors state that the reference dataset was derived from GPS measurements corrected by "expert visual interpretation of Sentinel-2 imagery." If the visual interpretation relied on the same Sentinel-2 dataset used for the classification model, this introduces a risk of circularity and overfitting. The "ground truth" may become biased towards features visible in Sentinel-2 rather than actual ground conditions. Independent high-resolution imagery (e.g., Planet, SPOT, or aerial photography) should be used for this correction process to ensure objectivity.
Response 11: We thank the Reviewer for raising this valid and legitimate concern. To address the potential issue regarding overfitting or circularity in the construction of the reference dataset, we have added a short clarification in the Methodology section (lines 334-341). The new text clarifies that the reference polygons are based primarily on official GPS measurements, with Sentinel-2 imagery used only for geometric refinement and optimal detection of burned areas, while very-high-resolution imagery (e.g., SPOT or PlanetScope) was not used due to limited availability and high costs.
Comment 12: Section 4, Lines 760-769; Table 4:The manuscript compares the proposed 20m resolution method against coarse-resolution global products like MODIS MCD64A1 (500 m) and CLMS (300 m). This comparison is inherently unbalanced, as results will naturally favor high-resolution inputs. While this highlights the advantage of Sentinel-2 data, it does not strictly validate the superiority of the specific algorithm proposed. A more meaningful comparison would be against other Sentinel-2-based methods or products. Additionally, the comparison with the Sentinel-2-based EFFIS product shows a significant performance gap; the discussion should delve deeper into why EFFIS underperformed—is it due to algorithm design, temporal aggregation methods, or definitional differences?
Response 12: We agree with the Reviewer that a direct comparison between products with significantly different resolutions can be misleading. Consequently, we have revised the Introduction (lines 167-173) and Discussion (lines 884-887) sections to clarify that global datasets (MODIS and CLMS) are included only as a general thematic benchmark. We now emphasize that the primary comparative assessment is centered on the EFFIS product, which represents the regional operational standard. A deeper attribution of the observed performance gap between our product and the Sentinel-2-based EFFIS dataset is inherently limited by the lack of publicly available technical documentation describing the internal design of the EFFIS burned area algorithm. Thus, a more detailed explanation of where EFFIS errors occurred is described now in lines 888-901.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for Authors1. The specific sampling strategy of 2020 training data and 2024 validation data is not clear, and it is not clear whether the random sampling considers the uniform distribution of terrain and vegetation types, which may lead to sample bias.
2. MIRBI, NBR and other indices were only screened by Separability Index (SI), but the effects of different combinations of indices were not compared.
3. The key parameters of the random forest model (e.g., the number of decision trees, the maximum depth, and the proportion of feature sampling) are not clear.
4. The spatial distribution characteristics of model errors were not analyzed (e.g., the error differences between mountain areas and plains, and different vegetation types); The reasons for the "high misjudgment rate of natural grassland" and "high misjudgment rate of moor and heather shrub" were only speculated qualitatively, lacking quantitative analysis.
5. Only the overall accuracy is compared, and the differences in detection effects of different fire seasons (such as summer and non-summer) and different recovery stages are not compared.
Author Response
Comment 1: The specific sampling strategy of 2020 training data and 2024 validation data is not clear, and it is not clear whether the random sampling considers the uniform distribution of terrain and vegetation types, which may lead to sample bias.
Response 1: We would like to thank the Reviewer for this insightful comment, which allowed us to better clarify our sampling strategy. The training samples were collected using a simple random sampling design across the entire regional territory. We deliberately chose this approach to ensure statistical impartiality and to avoid any subjective selection bias that might arise from manual stratification (lines 578-583).
Comment 2: MIRBI, NBR and other indices were only screened by Separability Index (SI), but the effects of different combinations of indices were not compared.
Response 2: We thank the reviewer for the valuable suggestion . We have now added a ROC analysis for the most relevant indices (SI > 1.5), considering single, couple, and triplet combinations. The resulting ROC curves and their corresponding AUC values are presented in the new Figure 8. These revisions have been included in the Methods (Lines 399–407) and Results sections (Lines 729–741).
Comment 3: The key parameters of the random forest model (e.g., the number of decision trees, the maximum depth, and the proportion of feature sampling) are not clear.
Response 3: We thank the Reviewer for this suggestion. We included the description of the Random Forest architecture (Lines 584-589). Specifically, we provided details on the hyperparameters (ntree = 1000, mtry = 4) and the model’s internal validation through the OOB error rate.
Comment 4: The spatial distribution characteristics of model errors were not analyzed (e.g., the error differences between mountain areas and plains, and different vegetation types); The reasons for the "high misjudgment rate of natural grassland" and "high misjudgment rate of moor and heather shrub" were only speculated qualitatively, lacking quantitative analysis.
Response 4: We thank the Reviewer for this comment. The spatial distribution of classification errors was analyzed and is presented in Figure 10, showing how CE and OE are distributed across different vegetation types. In addition, we have added information on the overall spatial distribution of errors in the manuscript (Lines 839-842), stating that errors are relatively uniformly distributed over the entire territory, with no evident clustering related to specific morphological settings.
The justification for the commission error in natural grasslands is now clarified in the manuscript (lines 847-860) and is based on the size of misclassified polygons. Commission errors larger than 1 ha represent only 30% of the total commission error area, indicating that most false positives are small patches likely related to noise. Regarding the explanation for the high omission rate in moor and heathland, it is now more strongly supported by quantitative evidence. A new figure (Figure A3) has been added in Appendix A, clearly showing that most omissions in this class occur during the fire off-season, which explains the reduced detection performance. Furthermore, To address the Reviewer's request for a more detailed and quantitative analysis, we have expanded the result section in lines 809-825, and added a new explanatory figure (Appendix A, Figure A2).
Comment 5: Only the overall accuracy is compared, and the differences in detection effects of different fire seasons (such as summer and non-summer) and different recovery stages are not compared.
Response 5: We thank the Reviewer for this stimulating observation. Regarding the performance across different periods and land cover types, we have clarified that these details are already provided in Figure 10, which breaks down the accuracy by month and vegetation class. While analyzing post-fire recovery stages is an interesting research direction, it falls outside the current scope of this study. We believe this topic warrants a dedicated future investigation, potentially through a multi-year comparative analysis.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsAll the minor corrections were done. the paper can be accepted in the present form.
Author Response
Comment 1: All the minor corrections were done. the paper can be accepted in the present form.
Response 1: We thank the reviewer for the positive evaluation, we are pleased that the manuscript can be accepted in its present form.
Author Response File:
Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsThank you for improving the manuscript. I only have a few comments.
- Section 2.3.3.1, Lines 534-544: Regarding the previous comment on the hard-coded 60% threshold, the added explanation that the value was "empirically defined" remains vague. Merely stating that the value is empirical does not explain why 60% is the optimal cut-off compared to, for instance, 50% or 70%. To ensure scientific rigor and reproducibility, the authors should provide a more concrete basis for this selection. Even if it is a trial-and-error results, it should be clearly stated in the manuscript.
- Section 5, Lines 982-992: In Response 10, the authors agreed that Deep Learning (DL) architectures (such as CNNs or U-Net) offer inherent advantages in utilizing spatial context to reduce "salt-and-pepper" noise, potentially negating the need for aggressive MMU post-processing. I suggest explicitly incorporating this point into the "Future improvements" paragraph in the Conclusions section.
Author Response
Comment 1: Section 2.3.3.1, Lines 534-544: Regarding the previous comment on the hard-coded 60% threshold, the added explanation that the value was "empirically defined" remains vague. Merely stating that the value is empirical does not explain why 60% is the optimal cut-off compared to, for instance, 50% or 70%. To ensure scientific rigor and reproducibility, the authors should provide a more concrete basis for this selection. Even if it is a trial-and-error results, it should be clearly stated in the manuscript.
Response 1: We thank the Reviewer for this comment, which allowed us to clarify the logic behind our threshold selection (now explained in lines 534 -550). The 60% value was determined through a systematic trial-and-error process. We evaluated the algorithm's performance by testing a range of thresholds from 10% to 90% (with 10% increments).
Comment 2: Section 5, Lines 982-992: In Response 10, the authors agreed that Deep Learning (DL) architectures (such as CNNs or U-Net) offer inherent advantages in utilizing spatial context to reduce "salt-and-pepper" noise, potentially negating the need for aggressive MMU post-processing. I suggest explicitly incorporating this point into the "Future improvements" paragraph in the Conclusions section.
Response 2: We thank the reviewer for this valuable suggestion. As recommended, we have explicitly incorporated this point into the Conclusions section ( Lines 995–998).
Author Response File:
Author Response.docx

