Deep Learning-Based Detection and Assessment of Road Damage Caused by Disaster with Satellite Imagery
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis paper presents a deep learning-based framework for the automatic detection and quantitative assessment of road damage using high-resolution pre- and post-disaster satellite imagery. However, several critical issues need to be addressed:
1 The methodology section lacks detail, particularly regarding the algorithms employed, especially the deep learning models. A more thorough explanation of the algorithms would enhance the reliability and credibility of the paper.
2 The characteristics comparison of the three models presented in Table 1 would be more effectively aligned with Section 2, which covers Related Works. Consider integrating the table into that section for clarity.
3 The results section requires deeper discussion. Currently, it primarily describes the figures without providing enough analytical insights.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper is entitled “Deep Learning-based Detection and Assessment of Road Damage Caused by Disaster with Satellite Imagery” In this case, the idea and results of the paper are interesting but the following comments can be utilized to improve this paper in future.
Abstract
- The motivation could be sharpened by briefly specifying why satellite imagery is advantageous (e.g., availability, coverage, speed).
- The three approaches are listed without adequate context or rationale. Briefly describe the logic or purpose of comparing these approaches, e.g., “to evaluate the effectiveness of spatial vs. temporal features.”
- Reporting an F1-score (0.598) is useful, but lacks comparison or a benchmark. Indicate how significant this result is. For instance, “...surpassing baseline methods by X%.”
Experimental Design
- The use of the DeepGlobe dataset is mentioned in the abstract but not explained here. This inconsistency creates confusion. Clearly state whether DeepGlobe was used in the experimental setup or only in conceptual comparisons. If not used, remove it from the abstract or justify its exclusion here.
- The use of image augmentation is good, but no rationale is provided. Justify augmentation by briefly stating its benefit (e.g., to prevent overfitting, enhance generalization, or balance class distribution).
- While YOLOv9 and SAM are cutting-edge, the labeling accuracy and possible human verification of masks are not discussed. Add a statement on how the accuracy or reliability of these pseudo-labels was validated (e.g., small-scale manual checks or expert validation).
- The three models differ significantly in input format and output type, which may affect comparability. Add a discussion justifying that despite architectural differences, the evaluation is fair due to consistent training conditions and metrics.
- The "Siamese" model is not adequately explained. It mentions "feature fusion using difference of encoded features," but lacks clarity on what fusion method (e.g., subtraction, concatenation, attention) was used. Expand on how feature fusion is implemented in the Siamese model and whether it uses learned or fixed fusion.
- No validation strategy is described. Was early stopping or best-validation-checkpoint saving applied? Describe how model performance during training was monitored (e.g., validation loss plateau, early stopping, or best validation F1-score model saving).
- The equations are described clearly, but no information is given on class imbalance or segmentation thresholding. Briefly mention whether thresholds were optimized (e.g., 0.5 default or ROC-based tuning), and how class imbalance (damaged vs. undamaged road) was addressed.
Results and Analysis
- Quantitative Performance Comparison
- Add standard deviation or confidence intervals for the F1-scores to help assess the model robustness.
- Structural Analysis and Comparative Insights
- This section could benefit from referencing existing literature that supports the effectiveness of difference imaging in linear structure detection to strengthen the argument.
- Suggest performing an ablation study to show the contribution of individual model components (e.g., feature fusion strategy, encoder depth).
Final decision: This manuscript has interesting objectives; however, it needs minor correction.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis is an interesting article that demonstrates a modern approach to detecting and assessing road damage caused by disasters using deep learning from satellite images.
The version is very well written and presented, and I only have a few minor comments:
- I would like to see some reference to the difficulties caused by the low resolution of some satellite images in certain areas and on specific dates;
- I would also like to understand how variations that may have occurred between the date of the pre-disaster satellite image and the immediate pre-disaster period can be addressed. For example, repaving may have taken place (in which case the actual colour of the pavement surface will be much darker than that recorded in the pre-disaster satellite image);
- How long might it take to conduct an analysis using the difference-based approach for detecting road damage (e.g. per hectare of urban area or per km of road)?
- Line 276: Where is written “Prior to the actual implementation in the real world, following issues should be addressed …” can be “Prior to the actual implementation in the real world, the following issues should be addressed …”.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have made corrections based on my comments.