Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

MSP U-Net: Crack Segmentation for Low-Resolution Images Based on Multi-Scale Parallel Attention U-Net

Appl. Sci. 2024, 14(24), 11541; https://doi.org/10.3390/app142411541

by Joon-Hyeok Kim, Ju-Hyeon Noh, Jun-Young Jang and Hee-Deok Yang^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2024, 14(24), 11541; https://doi.org/10.3390/app142411541

Submission received: 12 November 2024 / Revised: 28 November 2024 / Accepted: 30 November 2024 / Published: 11 December 2024

(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper presents a Multi-Scale Parallel Attention U-Net (MSP U-Net) aimed at enhancing crack detection performance on low-resolution images. By designing an enhanced attention module to reduce feature loss and employing high-resolution image scaling as input, MSP U-Net achieves an average mean Intersection over Union (mIoU) of 0.7752 on the Crack500 dataset, significantly surpassing existing methods and demonstrating its potential and advantages in practical applications.

(1) Can the authors elaborate on the specific mechanism employed by the attention gate in the Broad Reception Field Block (BRFB)? How does it quantitatively reduce feature loss compared to traditional U-Net designs?

(2) What specific preprocessing techniques did the authors employ when adjusting high-resolution images to low-resolution for training, to ensure that key features related to cracks are retained? Did the authors consider the impact of this adjustment on the model's generalization ability?

(3) Besides the average Intersection over Union (mIoU), did the authors consider other evaluation metrics such as accuracy, recall, or F1-score? These metrics could provide a deeper understanding of performance, especially in scenarios with varying impacts of false positives or false negatives.

(4) The authors mention limitations in detecting very fine or extensive cracks. Can they provide further insights into why these specific types of cracks pose challenges for the model? Are there specific features (e.g., texture, color variation) that complicate segmentation?

(5) What strategies do the authors plan to implement when training the model to address larger and more diverse datasets? How will they ensure that the model maintains performance across various environmental conditions and types of cracks?

(6) What specific architectural modifications are the authors considering to reduce the number of training parameters? Can they provide examples of potential techniques or methods (e.g., pruning, quantization) they believe may be effective?

(7) What recent advancements in loss functions are the authors considering introducing? How do they hypothesize these losses will enhance the network's performance in crack segmentation?

(8) Based on the research findings, how do the authors perceive the practical significance of MSP U-Net in road maintenance practices? Will the automation of crack detection lead to significant changes in inspection methods?

(9) Do the authors believe that there are other research pathways or techniques (such as transfer learning from other fields or integrating multimodal data) that could further improve their proposed crack segmentation method?

(10) Can the authors provide a detailed comparison of their method with other state-of-the-art crack detection models beyond mIoU? What qualitative differences did they observe in their comparative analysis?

(11) The research background of the paper is insufficient, with relatively few research papers in recent years.

(12) A qualitative assessment of the research findings is needed in the abstract and conclusion.

Author Response

[Comments] What specific preprocessing techniques did the authors employ when adjusting high-resolution images to low-resolution for training, to ensure that key features related to cracks are retained? Did the authors consider the impact of this adjustment on the model's generalization ability?

- We used the Nearest Neighbor method. While there are various methods for adjusting image size, we applied this method because it performs the worst in terms of image adjustment capabilities, as our focus was on low-resolution images.

[Comments] Besides the average Intersection over Union (mIoU), did the authors consider other evaluation metrics such as accuracy, recall, or F1-score? These metrics could provide a deeper understanding of performance, especially in scenarios with varying impacts of false positives or false negatives.

- We presented precision, recall, and F1-score in Table 2 but did not provide an in-depth discussion of these metrics. Although there were several reasons for this, analyzing these metrics for crack detection networks proved to be challenging.

[Comments] The authors mention limitations in detecting very fine or extensive cracks. Can they provide further insights into why these specific types of cracks pose challenges for the model? Are there specific features (e.g., texture, color variation) that complicate segmentation?

-Although we could not include all images in the paper, crack images exhibit diverse textures and materials. As a result, most studies face similar challenges. Additionally, convolutional operations tend to blur very fine cracks, further complicating detection.

Our initial attempt aimed to develop a network capable of detecting cracks regardless of material. However, due to difficulties in acquiring a suitable dataset, this goal was postponed for future research.

[Comments] What strategies do the authors plan to implement when training the model to address larger and more diverse datasets? How will they ensure that the model maintains performance across various environmental conditions and types of cracks?

-Similarly to the response to the fourth comment, we plan to conduct follow-up experiments by diversifying the dataset further.

[Comments] What specific architectural modifications are the authors considering to reduce the number of training parameters? Can they provide examples of potential techniques or methods (e.g., pruning, quantization) they believe may be effective?

- At present, we do not have plans to modify the architecture, but we intend to apply various models in future research.

[Comments] What recent advancements in loss functions are the authors considering introducing? How do they hypothesize these losses will enhance the network's performance in crack segmentation?

[Comments] Based on the research findings, how do the authors perceive the practical significance of MSP U-Net in road maintenance practices? Will the automation of crack detection lead to significant changes in inspection methods?

- The automation of crack detection is expected to make significant contributions to what has traditionally been a manual process. In particular, it is anticipated to provide substantial benefits in terms of cost and time efficiency.

[Comments] Do the authors believe that there are other research pathways or techniques (such as transfer learning from other fields or integrating multimodal data) that could further improve their proposed crack segmentation method?

- We are currently considering applying diffusion-based methods.

[Comments] Can the authors provide a detailed comparison of their method with other state-of-the-art crack detection models beyond mIoU? What qualitative differences did they observe in their comparative analysis?

This will be addressed in future research.

Reviewer 2 Report

Comments and Suggestions for Authors

Manuscript ID: applsci- 3338149

Title: MSP U-Net: Crack segmentation for low-resolution images 2 based on multi-scale parallel attention U-Net

Recommendation: Minor revision

Brief summary

The Authors propose the Multi-Scale Parallel Attention U-Net (MSP U-Net) as a network for low-resolution images. It considers the irregular characteristics of cracks. The test on the Crack500 dataset outperformed literature methods.

Broad comments

The topic is relevant, since safety inspections have emerged as a crucial task and crack segmentation techniques can play a relevant role in this field.

The English is generally fluent, even if thoroughly re-reading the paper could help the Authors to improve the manuscript readability. Please pay particular attention to the punctuation use.

Moreover, the article is quite well contextualized in the literature background, even if some references should be added (e.g., in relation to algorithms for crack detection or also when comparing the results with literature ones) – also considering publications from Applied Sciences journal, when relevant.

Some suggestions are provided in the next comments, which may help the authors in improving the quality of this paper.

Specific comments

Lines 49-73: each literature article is reported in a new paragraph and this does not enhance readability. Please take care of this matter.

Also, at the end of the introduction section, the Authors could briefly describe the structure of the remainder of the article, for the sake of clarity.

Lines 85-93: please consider inserting a bullet point with the main contributions of the work beyond the state of the art. Moreover, the Authors highlights that the application is suitable for low-resolution images and this should be motivated also previously underlining the importance of this aspect.

Section 2: the title is not very appropriated, since the section does not deal only with literature analysis (which on another hand was present also in the previous section). Please revise or present the first parts of the section in a different way.

Line 270 onwards: please check the font size.

Table 2: please check the labels.

Lines 325-335: please consider inserting a bullet point.

Author Response

[Comments 1] Lines 49-73: each literature article is reported in a new paragraph and this does not enhance readability. Please take care of this matter. Also, at the end of the introduction section, the Authors could briefly describe the structure of the remainder of the article, for the sake of clarity.

[Response 1] : We removed the indentation and improved the readability.

[Comments 2] Lines 85-93: please consider inserting a bullet point with the main contributions of the work beyond the state of the art. Moreover, the Authors highlights that the application is suitable for low-resolution images and this should be motivated also previously underlining the importance of this aspect.

[Response 2] : We have added a brief description of the structure of the paper at the end of the introduction.

[Comments 3] Section 2: the title is not very appropriated, since the section does not deal only with literature analysis (which on another hand was present also in the previous section). Please revise or present the first parts of the section in a different way.

[Response 3] : We have added a brief explanation at the beginning of the section.

[Comments4] Line 270 onwards: please check the font size.

[Response 4] : We modify the font size

[Comments5] Table 2: please check the labels.

[Response 5] : We modify the labels.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The revised manuscript can be accepted.

Article Menu

MSP U-Net: Crack Segmentation for Low-Resolution Images Based on Multi-Scale Parallel Attention U-Net

Further Information

Guidelines

MDPI Initiatives

Follow MDPI