Round 1

Reviewer 1 Report

This research focuses on image-based concrete surface crack pattern recognition utilizing a Deep Convolutional Neural Network (DCNN) and Encoder-decoder module for semantic segmentation 18 and classification tasks, thereby lightening the inspectors' workload. It requires major revision, Specific opinions are as follows:

1. The author should pay attention to some words and grammar problems in the article. This is very important to improve the readability of the article.

2.” 2.3 Machine vision methods.” It is recommended to list the advantages and disadvantages of various methods.

3. Pixel attitude of Figure 1, 2, 4, 10, 11, it is recommended to optimize the picture.

4. The data in Table 5 lacks units.

5. Data source, data classification, characteristics of training data, and characteristics of validation data need to be clearly explained.

6. The paper analyzes the concrete crack, but does not explain its characteristics, such as crack width, length and other parameters.

7. The method proposed in the article is somewhat confused. Three deep-learning models, 331 Xception_41, Xception_65 and Xception_71, Is this the innovation of the article?

8. The conclusion and abstract were lack of summary, please simplify.

9. The literature in the introduction lacks summary, and the content of each article is too long. When the abbreviation first appears, the full name shall be added. Included the abstract of the paper. For example: “DCNN” et.al.

Author Response

Thank you very much for your review and constructive suggestions with regard to our manuscript "Autonomous concrete crack semantic segmentation using a deep fully convolutional encoder-decoder network in bridge inspection" (buildings-2007944). Those comments are all valuable and helpful for revising and improving our paper. We have modified the paper accordingly, which we hope to meet with approval. The corrections in the paper and responses to all reviewers' comments are as follows.

Reviewer 1

Authors' reply:

We appreciate the reviewer's constructive and valuable comments on the manuscript. We will reply to the comments and revise the manuscript in detail one by one.

Point 1:

The author should pay attention to the article's words and grammar problems. This is very important to improve the readability of the article.

Authors' reply:

Thanks for the comment. After addressing the main review comments, we edited and went through the language proofreading and updated this paper's language and presentation.

Point 2:

"2.3 Machine vision methods." It is recommended to list the advantages and disadvantages of various methods.

Authors' reply:

Thanks for the comment. The advantages and disadvantages of various methods in the machine vision methods section have been added to table 1 in section 2.3. the contents are also condensed accordingly.

Point 3:

Pixel attitude of Figures 1, 2, 4, 10, and 11, it is recommended to optimize the picture.

Authors' reply:

Thanks for the comment. To improve the image, the following steps were taken:1. The quality and presentation of figure 2, figure 4 and figure 10 is improved by adjusting their size and resolution.2. The size of figure 1 and figure 11 is adjusted to the optimization.

Point 4:

The data in Table 5 lacks units.

Authors' reply:

Thanks for the comment.Table 5 is updated by adding the "%" thank you for pointing this out.

Point 5:

Data source, data classification, characteristics of training data, and characteristics of validation data need to be clearly explained.

Authors' reply:

Thanks for the comment. We add more content in section 4.1 regarding the information relating to the data source, classification and data characteristics as follows: "224×224 pixels from 6069 crack images collected from bridges by a Phantom 4 Pro's CMOS cameras by Xu et al. (2019) [36] and 2) 227 × 227 pixels from 40,000 crack images collected at a stock of campus building of Middle East University by Özgenel (2019) [47] are used for training testing and validation." And"For validation, the dataset includes 100 crack images and 50 background images. Therefore, there are two classifications: crack and background."

Point 6:

The paper analyzes the concrete crack but does not explain its characteristics, such as crack width, length and other parameters

Authors' reply:

Thanks for this kind suggestion/comment. To answer your questions, the research goal is to achieve better crack pixel recognition at the current stage; therefore, the current focus mainly focuses on the basic status of the crack pattern. The other parameters can be extracted with information measured by software; however, the current focus is on the particular range of concrete cracks from the perspectives of semantic segmentation from the image, the future research will address this point.

Point 7:

The method proposed in the article is somewhat confusing. Three deep-learning models, 331 Xception_41, Xception_65 and Xception_71, Is this the innovation of the article?

Authors' reply:

Thank you for this comment. To answer the questions, we would say this is part of the innovation for this paper. In this paper, we widely selected a range of advanced open-source modules for semantic segmentation, seeking the optimized combination and methods and finally finding the best matching network. The data experiment tests these three Xception models to find the most compatible combination. We find this research method is innovative, suitable for the application, and can be used for practice.

Point 8:

The conclusion and abstract lack of summary, please simplify.

Authors' reply:

Thank you for the comment. More contents are added to the abstract and the conclusion section to enlighten the summary of the innovation and contribution of the paper. The abstract is now re-written as follows:"The structure health inspection is the way to ensure that structures stay in optimum condition. Traditional inspection work has many disadvantages in dealing with the large workload despite using remote image-capturing devices. This research focuses on image-based concrete crack pattern recognition utilizing a Deep Convolutional Neural Network (DCNN) and Encoder-decoder module for semantic segmentation and classification tasks, thereby lightening the inspectors' workload. To achieve this, a series of contrast experiments have been implemented. The results show that the proposed deep-learning network has competitive semantic segmentation accuracy (91.62%) and is over-performed compared with other crack detection studies. This proposed advanced DCNN is split into multiple modules, including Atrous Convolution (AS), Atrous Spatial Pyramid Pooling (ASPP), modified Encoder-decoder module and Depthwise Separable Convolution (DSC). The advancement is that those modules are well-selected for this task and modified based on their characteristics and functions, exploiting their superiority into full play to achieve robust and accurate detection globally. This application improved the overall performance of detection and can be implemented in industrial practices." In conclusion, add: "The proposed method exploits each module's superiority in characteristics and functionality and innovatively optimizes the results by integrating different networks. Therefore, this method shows satisfactory performance in terms of efficiency and accuracy."

Point 9:

The literature in the introduction lacks summary, and the content of each article is too long. When the abbreviation first appears, the full name shall be added. Included the abstract of the paper. For example: "DCNN" et.al.

Authors' reply:

Thank you for the comment. More contents regarding the summary and the conclusions of the review are added to the literature review section as follows: "These detection methods indicate whether or not there is a crack in an image; it also locates the crack at a pixel-wise level."and"These studies above achieved decent results in crack classification, detection and segmentation, but currently, there are no significant developments in method optimizing and detection accuracy of classification and segmentation. In this research, a state-of-the-art FCN-based technique is proposed to classify images into crack or non-crack and to detect cracks by segmenting them from a background on a pixel-wise level. And this network could be used directly in industrial practices to reduce the inspectors' workloads and improve inspection speed dramatically."and" However, this method could not be used for detecting cracks caused by carbonation phenomena and corrosion, etc., except for stress.."

Author Response File: Author Response.docx

Reviewer 2 Report

In this paper, Deep Convolutional Neural Network (CDNN) architecture is utilized for the autonomous image-based concrete crack patterns identification for the purpose of bridge inspection. The authors claim to modify the existing modules and the proposed framework also over performs as compared to related studies. The framework focuses on increasing the efficiency and accuracy of conventional crack recognition algorithms by utilizing advanced convolution and pooling techniques. Overall, the paper is very nicely written, novel, the research is interested and state-of-the-art, and could be accepted for possible publication after addressing the following minor concerns.

1. Please comment a bit more on the novelty of the paper, as many similar deep learning techniques have been utilized for the crack detection of bridge decks, especially Convolutional Neural Network.

2. A bit more information on the dataset of Özgenel (2019) utilized. I mean how credible is it? I mean is it widely utilized data and utilized by other researchers also? How to manage the shadings, illuminance issues and does it have considerable effect on the prediction capabilities of the developed and/or modified algorithms?

3. Little more information on hyperparameters adjustments? Current explanation regarding available computational power is a bit confusing.

4. The accuracy, precision, sensitivity, specificity and overall F1-Score seems a bit high in general? Please provide some comments.

Author Response

Reviewer 2:

In this paper, Deep Convolutional Neural Network (CDNN) architecture is utilized for the autonomous image-based concrete crack pattern identification for the purpose of bridge inspection. The authors claim to modify the existing modules and the proposed framework also over performs as compared to related studies. The framework focuses on increasing the efficiency and accuracy of conventional crack recognition algorithms by utilizing advanced convolution and pooling techniques. Overall, the paper is very nicely written, novel, the research is interested and state-of-the-art, and could be accepted for possible publication after addressing the following minor concerns.

Authors' reply:

We appreciate the reviewer's kind comments and follow the reviews' kind suggestions to improve the paper. We have addressed the following remarks by adding revisions or answers from the authors' perspective. Again, thank the reviewer for this comment.

Point 1:

Please comment a bit more on the novelty of the paper, as many similar deep learning techniques have been utilized for the crack detection of bridge decks, especially Convolutional Neural Networks.

Authors' reply:

Thank you for your kind comment. To answer this question, this paper's innovation is that we widely selected a range of advanced open-source modules for semantic segmentation tasks and sought to find the optimized matching network. Although many papers research crack detection using CNN, we did two ways for crack detection: 1) crack and background classification and 2)semantic segmentation of crack pixels. And the frameworks are tested based on the real situation, which can further be used for industrial practices. More content is added to this paper's abstract and conclusion section to enlighten the innovation part for this question.

Point 2:

A bit more information on the dataset of Özgenel (2019) utilized. I mean how credible is it? I mean is it widely utilized data and utilized by other researchers also? How to manage the shadings, illuminance issues and does it have considerable effect on the prediction capabilities of the developed and/or modified algorithms?

Authors' reply:

Thanks for the comment. To answer this question, we add more contents in section 4.1 regarding the information relating to the data source, classification and data characteristics as follows: "224×224 pixels from 6069 crack images collected from bridges by a Phantom 4 Pro's CMOS cameras by Xu et al. (2019) [36] and 2) 227 × 227 pixels from 40,000 crack images collected at a stock of campus building of Middle East University by Özgenel (2019) [47] are used for training testing and validation." and"For validation, the dataset includes 100 crack images and 50 background images. Therefore, there are two classifications: crack and background." There are two data sources,1) one is shared by Özgenel (2019) and contains 40,000 images of 227227 pixels, including 20,000 crack images and 20,000 background images collected at a stock of campus building of Middle East Technical University; 2) another one is shared by Xu et al. (2019) contains 6069 images of 224224 pixels including 4058 crack images and 2011 non-crack images (background images), which were firstly collected from bridges by a Phantom 4 Pro's CMOS camera. Those two datasets are open-source and collected from different concrete structures. To avoid the impact of concrete surface finishes, strains, shading and illumination conditions, these datasets include images with various shading, strains and illuminance. Therefore, influence factors such as surface finish, strains and illumination condition are considered during training, validation and test processes.

Point 3:

Little more information on hyperparameter adjustments? Current explanation regarding available computational power is a bit confusing.

Authors' reply:

Thanks for this comment. That table 3 is updated with the following contents being added:" Batch size is related to the computation speed. Parameter 2 is used due to limited computational power. And 100 is the point where the training loss remains almost constant. Other hyper-parameters are also tested to prove the most suitable settings. The settings in Table 3 are to test the accuracy results of different layers. And that decides which type of Xception model we use as the backbone for the network."

Point 4:

The accuracy, precision, sensitivity, specificity and overall F1-Score seem a bit high in general. Please provide some comments.

Authors' reply:

Thanks for this comment. To answer this question, In this paper, we improved the accuracy of crack and background classification and reached a high accuracy of crack pixel semantic segmentation. I think there are three keys to achieving that accuracy, 1)we did a comprehensive literature review so that we got to know the modules that were suitable for this task；2)we used two datasets collected from the bridge and building for training which makes the network rather robust; 3)we tested a range of modules and found the most accurate network for this task.

Author Response File: Author Response.docx

Reviewer 3 Report

The reviewer appreciates the work done by the authors. The technical contents of the paper are in general interesting. Its findings are useful information for bridge field applications in the future. Anyway, I do not recommend the publication in the “Buildings, MDPI” unless the following suggestions are taken into account within the article:

1) Please, revise the title as follows: “Autonomous crack semantic segmentation using deep fully convolutional encoder-decoder network in concrete bridge inspections”.

2) Please, polish the abstract. Please, add sentences to explain the meaning, the main points, the improvement and the promising technology of the study. The logic should also be improved.

3) Introduction. It is important to describe the current progress, problems and improvements to be further studied. Please, highlight the creative contributions.

4) Introduction. In concrete bridges, cracking could be caused by many combined factors: carbonation phenomena; corrosion along steel rebars and tendons; excessive deflections due to significant prestress losses, shrinkage and creep of concrete. Please, refer to the aforementioned issues.

5) Section 2. The article could be improved by inserting tables where the literature is discussed to briefly characterize each reference (e.g. authors, year, testing, topic, and findings). This could help to give more strength and significance to the state-of-the-art. And, moreover, by introducing original figures with schemes to explain the driving ideas traced by the literature review.

6) Additional comments should be added in regard to the practical value of this work, and how the industry can profit from this paper.

7) The steps of the proposed technology should be better underlined. Please, review the corresponding parts.

8) Introduction and Conclusions: Please, provide range (mm), sensitivity, resolution (mm) and accuracy (mm) of the crack measurements that can be reached by using the technology proposed.

9) The technology requires a distance between the surfaces of the concrete bridge under cracking (to be monitored) and the device itself. Please, provide the corresponding range.

10) A general check of English grammar, punctuation, spelling, verb usage, sentence structure, conciseness, readability and writing style is suggested.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Paper has been revised.

Reviewer 3 Report

The authors have adequately addressed my comments.

Review Reports

Reviewer 1

Authors' reply:

Point 1:

Authors' reply:

Point 2:

Authors' reply:

Point 3:

Authors' reply:

Point 4:

Authors' reply:

Point 5:

Authors' reply:

Point 6:

Authors' reply:

Point 7:

Authors' reply:

Point 8:

Authors' reply:

Point 9:

Authors' reply:

Reviewer 2:

Authors' reply:

Point 1:

Authors' reply:

Point 2:

Authors' reply:

Point 3:

Authors' reply:

Point 4:

Authors' reply: