Next Article in Journal
Lignin-Based Thin Films in Emerging Organic Transistor Devices: Challenges, Strategies, and Applications
Previous Article in Journal
Wear Transition of Silicon-Doped Tetrahedral Amorphous Carbon (ta-C:Si) Under Water Lubrication
 
 
Article
Peer-Review Record

Steel-Reinforced Concrete Corrosion Crack Detection Method Based on Improved VGG16

Coatings 2025, 15(6), 641; https://doi.org/10.3390/coatings15060641
by Lingling Chen 1, Zhiyuan Wang 2,* and Huihui Liu 2
Reviewer 1:
Reviewer 2:
Coatings 2025, 15(6), 641; https://doi.org/10.3390/coatings15060641
Submission received: 11 April 2025 / Revised: 19 May 2025 / Accepted: 20 May 2025 / Published: 26 May 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The abstract does a good job of summarizing the problem statement, methodology (Improved VGG16 + U-Net + YOLO), and key results (94.4% precision, 2.6% loss, 85% IoU). It effectively communicates the major contributions and final conclusions.
However, it could be slightly enhanced by briefly mentioning some of the identified limitations (e.g., challenges with very small or low-contrast cracks).

Figures and diagrams (e.g., architecture diagrams, performance comparisons) are generally clear, appropriately labeled, and well formatted according to academic publishing standards.
Minor recommendation: Figures 7 and 8 could include error bars or standard deviation to better represent the variation in precision and loss across epochs, but overall no critical formatting errors were found.

The manuscript clearly defines the existing problems in crack detection (low real-time performance, poor detection in complex environments) and justifies why an integrated model combining VGG16, U-Net, and YOLO is necessary.
The authors differentiate their model by emphasizing multi-module fusion for both fine segmentation and real-time detection, which is novel compared to most prior single-architecture methods.

(1) Validation on More Diverse Real-World Datasets
Reason: The current study is mostly validated on SDNET2018 and DeepCrack datasets. Additional validation on real field data (with different lighting, surface roughness, noise types) would improve generalizability.

(2) Robustness Study under Extreme Environmental Noise or Damage
Reason: Although some noise tests were done, a more detailed stress test under extreme conditions (e.g., dirt, rust, partial occlusions) would demonstrate the robustness of the model.

(3) Lightweight Model or Inference Efficiency Optimization
Reason: For real-world SHM (Structural Health Monitoring) applications, model efficiency on edge devices is crucial. Presenting a lightweight or pruned version of UY-VGG16 would show broader applicability.

The conclusion is logically consistent with the experimental results. It clearly summarizes the performance advantages of UY-VGG16, acknowledges remaining challenges (small cracks, complex backgrounds), and offers realistic future improvement directions.

the authors explicitly propose introducing adaptive image enhancement and refined feature extraction algorithms in future work to address low-contrast crack detection challenges.

References are recent, relevant, and formatted correctly according to journal guidelines. Most citations are from 2023–2024 IEEE Access and IEEE Transactions, matching the technological scope.
Small note: the reference formatting could be slightly polished by uniformly abbreviating journal names if the journal requires strict abbreviation standards.

Comments on the Quality of English Language
Original Expression Recommended Correction
"Through large-scale experiments, the model outperforms current advanced detection technologies in all indicators." "Large-scale experiments show that the model outperforms existing advanced detection technologies across all key metrics."
"This detection model provides an efficient and intelligent solution for structural health monitoring..." "The proposed detection model offers an efficient and intelligent solution for structural health monitoring..."
"Resultant displacement of the vessel is presented in the Figures..." (in method sections) "The resultant displacements are presented in the figures..."
"Under negative SNR, the crack detection accuracy reached up to 82.3%." "Under negative SNR conditions, the crack detection accuracy reached up to 82.3%."
"However, there are still shortcomings in capturing detailed features and real-time processing." "However, capturing detailed features and achieving real-time processing remain challenges."

Author Response

Reviewer Report 1

Quality of English Language

The English could be improved to more clearly express the research.

Reply: Thank you for pointing out the issue. The entire manuscript has been re-reviewed and revised to ensure the accuracy of the expressions.

 

Comments and Suggestions for Authors

The abstract does a good job of summarizing the problem statement, methodology (Improved VGG16 + U-Net + YOLO), and key results (94.4% precision, 2.6% loss, 85% IoU). It effectively communicates the major contributions and final conclusions.
However, it could be slightly enhanced by briefly mentioning some of the identified limitations (e.g., challenges with very small or low-contrast cracks).

Reply: Thank you for your suggestion. The identified limitations should be briefly mentioned in the abstract, and the corresponding revisions have been made as shown below.

However, the model still presents a risk of misclassification when identifying fine cracks under low-contrast or complex background conditions. Future work will incorporate adaptive image enhancement and more refined feature extraction algorithms to further improve detection robustness and real-time performance.

Figures and diagrams (e.g., architecture diagrams, performance comparisons) are generally clear, appropriately labeled, and well formatted according to academic publishing standards.
Minor recommendation: Figures 7 and 8 could include error bars or standard deviation to better represent the variation in precision and loss across epochs, but overall no critical formatting errors were found.

Reply: Thank you for your comment. Error bars have been added to Figures 7 and 8, and the revisions are presented as follows:

Figure 7. Comparison of precision and loss rate of each model during training. (a) Precision of the models (b) Loss rate of the models

Figure 8. Comparison of positioning accuracy of different models on two datasets. (a) Localization accuracy in SDNET2018 (b) Localization accuracy in DeepCrack

The manuscript clearly defines the existing problems in crack detection (low real-time performance, poor detection in complex environments) and justifies why an integrated model combining VGG16, U-Net, and YOLO is necessary.
The authors differentiate their model by emphasizing multi-module fusion for both fine segmentation and real-time detection, which is novel compared to most prior single-architecture methods.

(1) Validation on More Diverse Real-World Datasets
Reason: The current study is mostly validated on SDNET2018 and DeepCrack datasets. Additional validation on real field data (with different lighting, surface roughness, noise types) would improve generalizability.

Reply: Thank you for the suggestion. Tests on field data under varying lighting conditions, surface roughness levels, and types of noise have been added to the Results section, and the revised content is presented as follows:

Table 2 Crack width detection errors of different models under various conditions

Model

UY-VGG16 (mm)

UY-VGG16-Tiny (mm)

UY-VGG16-Fast (mm)

Surface roughness

Smooth

±4.0

±5.3

±4.6

Moderate

±4.9

±6.7

±5.7

Rough

±5.6

±8.3

±6.6

Interference

Soil-covered

±5.4

±7.4

±6.1

Partial occlusion

±5.1

±6.8

±5.9

Water stain

±4.8

±6.6

±5.6

(2) Robustness Study under Extreme Environmental Noise or Damage
Reason: Although some noise tests were done, a more detailed stress test under extreme conditions (e.g., dirt, rust, partial occlusions) would demonstrate the robustness of the model.

Reply: Thank you for the suggestion. Tests under extreme conditions such as soil coverage and partial occlusion have been added to the Results section, and the revised content is presented as follows:

Under interference scenarios, UY-VGG16 maintains strong robustness, with width errors of ±5.4 mm for soil-covered cracks and ±5.1 mm for partially occluded cracks, significantly outperforming both UY-VGG16-Tiny and UY-VGG16-Fast.

(3) Lightweight Model or Inference Efficiency Optimization
Reason: For real-world SHM (Structural Health Monitoring) applications, model efficiency on edge devices is crucial. Presenting a lightweight or pruned version of UY-VGG16 would show broader applicability.

Reply: Thank you for the suggestion. A discussion on the lightweight or pruned versions of UY-VGG16 has been added, and the revised content is presented as follows:

To further accommodate varying operational conditions, two lightweight variants—UY-VGG16-Fast and UY-VGG16-Tiny—were developed based on the original UY-VGG16 framework. A comparative analysis of the runtime efficiency across five different models was conducted, with evaluation metrics including image processing speed measured in Frames Per Second (FPS), inference latency during real-time detection, and Average Precision (AP), as summarized in Table 1.

Finally, the study tested the UY-VGG16, UY-VGG16-Tiny, and UY-VGG16-Fast models on field data under a broader range of extreme conditions, with the detailed results presented in Table 2.

The conclusion is logically consistent with the experimental results. It clearly summarizes the performance advantages of UY-VGG16, acknowledges remaining challenges (small cracks, complex backgrounds), and offers realistic future improvement directions.

the authors explicitly propose introducing adaptive image enhancement and refined feature extraction algorithms in future work to address low-contrast crack detection challenges.

References are recent, relevant, and formatted correctly according to journal guidelines. Most citations are from 2023–2024 IEEE Access and IEEE Transactions, matching the technological scope.
Small note: the reference formatting could be slightly polished by uniformly abbreviating journal names if the journal requires strict abbreviation standards.

Reply: Thank you for pointing out the issue. The journal names in the references have been standardized using proper abbreviations.

 

 

Comments on the Quality of English Language

Original Expression

Recommended Correction

"Through large-scale experiments, the model outperforms current advanced detection technologies in all indicators."

"Large-scale experiments show that the model outperforms existing advanced detection technologies across all key metrics."

"This detection model provides an efficient and intelligent solution for structural health monitoring..."

"The proposed detection model offers an efficient and intelligent solution for structural health monitoring..."

"Resultant displacement of the vessel is presented in the Figures..." (in method sections)

"The resultant displacements are presented in the figures..."

"Under negative SNR, the crack detection accuracy reached up to 82.3%."

"Under negative SNR conditions, the crack detection accuracy reached up to 82.3%."

"However, there are still shortcomings in capturing detailed features and real-time processing."

"However, capturing detailed features and achieving real-time processing remain challenges."

Reply: Thank you for highlighting the language quality issue. The corresponding sentences have been revised, and the language throughout the manuscript has also been reviewed and modified.

 

 

 

 

 

 

 

 

 

 

 

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This is a review of the manuscript coatings-3610346 entitled: “Steel-Reinforced Concrete Corrosion Crack Detection Method Based on Improved VGG16” submitted to Coatings.

 

  1. The manuscript describes a new model for concrete corrosion crack detection. The idea is novel and the results are promising. However, the presentation of the model and discussion of results need improvement.
  2. Add more numerical results to the abstract. For example, the inference speed, the width error and length error…
  3. In the introduction, authors should discuss the different models used in more details and give their importance and differences/advantages and disadvantages. Then why this specific current model is proposed. Why VGG-16 and not VGG-19?
  4. Scientifically speaking, precision and accuracy are different. Which one is measured in Fig. 7?
  5. What is the standard and acceptable values of precision and loss rates in such measurements?
  6. Figure 9: (a) and (b) are reversed. Make it correct!
  7. The processing speed curves(colored ribbons) in fig. 9 are not clear. Improve the presentation.
  8. The first five lines of the conclusions are not needed and should be removed or reduced to a minimum.
  9. More references should be added to cover the models and their applications plus their characteristics.
  10. There are no tables in the whole manuscript although it is based in many aspects on numbers. Perhaps some figures can be represented as tables for accurate output results. A point to consider.
  11. It is clearer to add the description of figure captions within the main figure caption and not separately. On the images, use (a) , (b), … only.
  12. What is the “engineering standards for detection capability in complex environments”? Reference?
  13. What is the source of the crack images used to test this model and judge the performance comparison?
  14. Discuss the possible experimental limitations of this model in comparison to currently used techniques.
  15. How is concrete corrosion crack detection done in industry and practice? Elaborate.
  16. The keywords need to be revised.
  17. The word “steel” appears only once in the whole manuscript and that was in the title. Rewrite the title to show the real research idea. Add “model” or something similar to the title.
  18. Add an “abbreviations” section.
  19. The “real-time crack detection” is not well-covered or presented. Was it one of the goals of this research?
  20. Where is the “automation” in this research? Mention as it is an important factor in this industry.
  21. A suggestion is to merge the literature review section (related work) to the introduction. Also, reorganize the other sections to make them in a better flow order.

 

  

Comments for author File: Comments.pdf

Comments on the Quality of English Language

Proofreading is needed.

Author Response

Reviewer Report 2

Comments and Suggestions for Authors

This is a review of the manuscript coatings-3610346 entitled: “Steel-Reinforced Concrete Corrosion Crack Detection Method Based on Improved VGG16” submitted to Coatings.

 

1. The manuscript describes a new model for concrete corrosion crack detection. The idea is novel and the results are promising. However, the presentation of the model and discussion of results need improvement.

Reply: Thank you for your comment. The presentation of the model and the discussion of the results have been further refined, with the specific revisions detailed below in response to the related issues, and the revised content is presented as follows:

To improve the accuracy of crack detection, this study proposed a crack detection method that combines an improved VGG16 with a convolutional neural network architecture. The U-Net was used to achieve fine segmentation of the crack regions, and YOLO was utilized to quickly locate the crack targets, resulting in the development of a hybrid crack detection model, UY-VGG16. The inference speed of UY-VGG16 improved to 38 FPS after training. When extracting cracks, the model's width error and ranged from 3.8mm to 5.9mm. and the length error was approximately 2.1 cm. Additionally, the real images extracted by UY-VGG16 displayed clear crack contours, with edges nearly free of fractures. Under negative SNR conditions, the crack detection accuracy reached up to 82.3%. The experimental results demonstrated that UY-VGG16 exhibited excellent performance in the detection of reinforced concrete corrosion cracks, offering an efficient and intelligent solution for practical engineering applications. However, the model still faces a risk of misclassification when identifying fine cracks under low-contrast or complex background conditions. In real-world environments involving factors such as soil occlusion or water stain interference, the accuracy of crack boundary recognition tends to degrade, resulting in more blurring and misidentification compared to results obtained under standard lighting and clear viewing angles. Future work should focus on expanding the diversity of training data and incorporating environment-aware modules or active learning mechanisms to enhance the model’s adaptability and scalability in real-world scenarios.

 

2. Add more numerical results to the abstract. For example, the inference speed, the width error and length error…

Reply: Thank you for pointing out the issue. Additional numerical results have been included in the abstract, and the revised content is presented as follows:

In high-roughness field tests, the proposed model achieved a crack width detection error of ±4.0 mm. For conditions cracks that were soil-covered or partially occluded, the detection errors were ±5.4 mm and ±5.1 mm, respectively. Based on the original model, two additional lightweight variants were constructed, with the inference speeds of the three models recorded as 36 ms, 28 ms, and 24 ms in descending order.

 

3. In the introduction, authors should discuss the different models used in more details and give their importance and differences/advantages and disadvantages. Then why this specific current model is proposed. Why VGG-16 and not VGG-19?

Reply: Thank you for the suggestion. The rationale for selecting VGG16 has been added to the Introduction section, and the revised content is presented as follows:

Existing research shows that VGG16 offers a good balance between representational power and scalability, whereas VGG19, with its additional convolution and fully connected layers, imposes significantly higher computational overhead and slower convergence in industrial deployment.

 

4. Scientifically speaking, precision and accuracy are different. Which one is measured in Fig. 7?

Reply: Thank you for pointing out the error. The measurement indicators in Figure 7 have been corrected, and the revised content is presented as follows:

Figure 7. Comparison of precision and loss rate of each model during training

 

5. What is the standard and acceptable values of precision and loss rates in such measurements?

Reply: Thank you for your comment. The standards and acceptable ranges for precision and loss rate have been clarified, and the revised content is presented as follows:

In crack recognition tasks, precision and loss rate are key indicators for evaluating model performance. According to internationally accepted evaluation criteria and mainstream research literature, a precision above 90% and a loss rate below 5% are generally considered the minimum acceptable performance thresholds for crack detection models under laboratory conditions. However, during engineering deployment, additional considerations such as convergence speed, robustness across datasets, and generalization ability in diverse scenarios become critical.

 

6. Figure 9: (a) and (b) are reversed. Make it correct!

Reply: Thank you for your correction. In response to the suggestions, Figure 9 has been converted into a table, and the revised content is presented as follows:

Table 1 Comparison of efficiency metrics across different models

Model

FPS

Processing time (ms)

AP (%)

UY-VGG16

38

36

93.2

UY-VGG16-Tiny

47

24

84.6

UY-VGG16-Fast

40

28

89.7

Yolov8-seg

31

43

90.5

CrackFormer

16

67

82.4

 

7. The processing speed curves(colored ribbons) in fig. 9 are not clear. Improve the presentation.

Reply: Thank you for the suggestion. In accordance with the combined feedback, Figure 9 has been converted into a table, and the revised content is presented as follows:

Table 1 Comparison of efficiency metrics across different models

Model

FPS

Processing time (ms)

AP (%)

UY-VGG16

38

36

93.2

UY-VGG16-Tiny

47

24

84.6

UY-VGG16-Fast

40

28

89.7

Yolov8-seg

31

43

90.5

CrackFormer

16

67

82.4

As shown in Table 1, among the efficiency metrics, UY-VGG16-Tiny demonstrates the best inference speed with 47 FPS and a processing latency of 24 ms, making it particularly suitable for scenarios with stringent real-time requirements. UY-VGG16, on the other hand, leads in detection accuracy, achieving an Average Precision (AP) of 93.2%, outperforming all baseline models, while also maintaining a processing latency of 36 ms and an FPS of 38, thus balancing both accuracy and speed. Among the three proposed models, UY-VGG16-Fast achieves the optimal balance between performance and efficiency. In comparison, Yolov8-seg reaches an AP of 90.5% with a latency of 43 ms and 31 FPS, slightly inferior to UY-VGG16-Fast, whereas CrackFormer shows the weakest performance overall. In summary, the UY-VGG16 model family outperforms mainstream crack detection models in terms of accuracy, inference speed, and deployment flexibility.

 

8. The first five lines of the conclusions are not needed and should be removed or reduced to a minimum.

Reply: Thank you for pointing out the issue. The first five lines of the conclusion section have been condensed accordingly, and the revised content is presented as follows:

To improve the accuracy of crack detection, this study proposed a crack detection method that combines an improved VGG16 with a convolutional neural network architecture.

 

9. More references should be added to cover the models and their applications plus their characteristics.

Reply: Thank you for the suggestion. Additional references have been incorporated, and the revised content is presented as follows:

For example, Kuchipudi et al. proposed an ultrasonic shear wave imaging-based method for corrosion damage detection and grading, using k-means clustering to classify corrosion severity based on image amplitude [2]. Crognale et al. compared various image processing-based damage identification techniques, including Otsu thresholding, Markov random field segmentation, RGB color detection, and k-means clustering algorithms [3].

2.Kuchipudi S T, Ghosh D, Ganguli A. Imaging-based detection and classification of corrosion damages in reinforced concrete using ultrasonic shear waves. J BUILD ENG, 2025, 105(23): 112490.

3.Crognale M, De Iuliis M, Rinaldi C, Gattulli V. Damage detection with image processing: a comparative study. EARTHQ ENG ENG VIB, 2023, 22(2): 333-345.

 

10. There are no tables in the whole manuscript although it is based in many aspects on numbers. Perhaps some figures can be represented as tables for accurate output results. A point to consider.

Reply: Thank you for the suggestion. The content of Figure 9 has been converted into a table named Table 1, and additional data have been included in Table 2. The revised content is presented as follows:

Table 1 Comparison of efficiency metrics across different models

Model

FPS

Processing time (ms)

AP (%)

UY-VGG16

38

36

93.2

UY-VGG16-Tiny

47

24

84.6

UY-VGG16-Fast

40

28

89.7

Yolov8-seg

31

43

90.5

CrackFormer

16

67

82.4

 

Table 2 Crack width detection errors of different models under various conditions

Model

UY-VGG16 (mm)

UY-VGG16-Tiny (mm)

UY-VGG16-Fast (mm)

Surface roughness

Smooth

±4.0

±5.3

±4.6

Moderate

±4.9

±6.7

±5.7

Rough

±5.6

±8.3

±6.6

Interference

Soil-covered

±5.4

±7.4

±6.1

Partial occlusion

±5.1

±6.8

±5.9

Water stain

±4.8

±6.6

±5.6

 

11. It is clearer to add the description of figure captions within the main figure caption and not separately. On the images, use (a) , (b), … only.

Reply: Thank you for pointing out the error. All figure captions and titles throughout the manuscript have been revised accordingly.

 

12. What is the “engineering standards for detection capability in complex environments”? Reference?

Reply: Thank you for the suggestion. The engineering standards have been explained, and relevant references have been provided. The revised content is presented as follows:

While digital image-based methods are widely used in industrial environments due to their ease of deployment and relatively low cost, they still struggle to meet the requirements of engineering standards such as ASTM C823-21, ASTM C1583, and RILEM TC187-SOC under challenging conditions involving lighting variation, surface contamination, and diverse crack scales [4–5].

4.Reyes E, Gálvez J C, Planas J. Final Report of RILEM Technical Committee TC 187-SOC: Experimental Determination of the Stress-Crack Opening Curve for Concrete in Tension. RILEM publications, 2007.

5.Brandtner-Hafner M H. Assessing the natural-healing behavior of adhesively bonded structures under dynamic loading. ENG STRUCT, 2019, 196(56): 109303.

 

13. What is the source of the crack images used to test this model and judge the performance comparison?

Reply: Thank you for your comment. The sources of the crack images used in the experiments have been indicated, and the revised content is presented as follows:

The study verified the effectiveness of the proposed UY-VGG16 in crack detection by selecting the SDNET2018 and DeepCrack concrete crack datasets for training and testing.

The study conducted crack sampling on steel-reinforced concrete structures form Project W, which had been completed over ten years ago.

 

14. Discuss the possible experimental limitations of this model in comparison to currently used techniques.

Reply: Thank you for the suggestion. The potential experimental limitations of the model have been discussed, and the revised content is presented as follows:

However, the model still faces a risk of misclassification when identifying fine cracks under low-contrast or complex background conditions. In real-world environments involving factors such as soil occlusion or water stain interference, the accuracy of crack boundary recognition tends to degrade, exhibiting more blurring and misidentification compared to results obtained under standard lighting and clear viewing angles. Future work should focus on expanding the diversity of training data and incorporating environment-aware modules or active learning mechanisms to enhance the model’s adaptability and scalability in real-world scenarios.

 

15. How is concrete corrosion crack detection done in industry and practice? Elaborate.

Reply: Thank you for the suggestion. Commonly used methods for detecting corrosion cracks in concrete have been elaborated, and the revised content is presented as follows:

In practical engineering, the detection of corrosion-induced cracks in reinforced concrete typically relies on traditional techniques such as ultrasonic pulse velocity, impact echo, digital image processing, and manual visual inspection [1].

 

16. The keywords need to be revised.

Reply: Thank you for pointing out the issue. The keywords have been revised accordingly, and the updated content is presented as follows:

Keywords: Steel-reinforced concrete; Corrosion crack detection; Target detection; Image segmentation; VGG16; Deep learning

 

17. The word “steel” appears only once in the whole manuscript and that was in the title. Rewrite the title to show the real research idea. Add “model” or something similar to the title.

Reply: Thank you for the suggestion. The title has been revised accordingly, and the updated version is presented as follows:

  1. Steel-Reinforced Concrete corrosion crack detection model based on improved VGG16

2.1. Image segmentation framework combining improved VGG16 and U-Net for steel surface analysis

2.2. Steel-Reinforced Crack detection model construction integrating image segmentation and target detection

 

18. Add an “abbreviations” section.

Reply: Thank you for the suggestion. A "Terms and Abbreviations" section has been added accordingly.

 

19. The “real-time crack detection” is not well-covered or presented. Was it one of the goals of this research?

Reply: Thank you for your comment. Further discussion on "real-time crack detection" has been added, and the revised content is presented as follows:

Industry standards likewise emphasize the importance of automation in increasing inspection frequency and data traceability, and deep learning–driven end-to-end models serve as the technological cornerstone for realizing this closed-loop automation. Against this backdrop, real-time crack detection is not only one of the core objectives of this study but also a fundamental capability for transforming structural monitoring systems from reactive response to proactive warning. The goal of real-time detection is to leverage the model’s fast inference capability to continuously track crack formation and respond instantly, thereby meeting the real-world demands for high-frequency, real-time monitoring on engineering sites.

On the one hand, the U-Net decoder performs fine-grained semantic segmentation of the crack regions; on the other hand, mid- and high-level features are passed into the YOLO object detection module for rapid position-level identification. This architecture enables parallel coordination between segmentation and detection, which significantly enhances response speed without imposing substantial computational overhead, thereby fulfilling the core requirement of real-time crack detection in structural health monitoring.

 

20. Where is the “automation” in this research? Mention as it is an important factor in this industry.

Reply: Thank you for the suggestion. An explanation of "automation" has been provided, and the revised content is presented as follows:

Eliminating reliance on manual inspection and annotation, and realizing full-process automation from crack data acquisition to segmentation and real-time alerting, has become a central goal for reducing inspection costs, avoiding high-altitude operation risks, and ensuring routine monitoring of large-scale structures [7]. Industry standards likewise emphasize the importance of automation in increasing inspection frequency and data traceability, and deep learning–driven end-to-end models serve as the technological cornerstone for realizing this closed-loop automation. Against this backdrop, real-time crack detection is not only one of the core objectives of this study but also a fundamental capability for transforming structural monitoring systems from reactive response to proactive warning. The goal of real-time detection is to leverage the model’s fast inference capability to continuously track crack formation and respond instantly, thereby meeting the real-world demands for high-frequency, real-time monitoring on engineering sites.

 

21. A suggestion is to merge the literature review section (related work) to the introduction. Also, reorganize the other sections to make them in a better flow order.

Reply: Thank you for the suggestion. The literature review has been integrated into the introduction section, and the revised content is presented as follows:

The widespread use of steel-reinforced concrete structures has increased the demands for structural health monitoring, making efficient and accurate corrosion crack detection a critical challenge. In practical engineering, the detection of corrosion-induced cracks in reinforced concrete typically relies on traditional techniques such as ultrasonic pulse velocity, impact echo, digital image processing, and manual visual inspection [1]. For example, Kuchipudi et al. proposed an ultrasonic shear wave imaging method for detecting and grading corrosion damage, applying k-means clustering to classify corrosion severity based on image amplitude [2]. Crognale et al. compared various image processing-based damage identification techniques, including Otsu thresholding, Markov random field segmentation, RGB color detection, and k-means clustering algorithms [3]. While digital image-based methods are widely used in industrial environments due to their ease of deployment and relatively low cost, they still fail to meet engineering standards such as ASTM C823-21, ASTM C1583, and RILEM TC187-SOC under challenging conditions like lighting variation, surface contamination, and diverse crack scales [4–5]. In recent years, deep learning techniques have rapidly advanced in the field of structural health monitoring, with convolutional neural networks, attention mechanisms, and generative adversarial networks achieving notable results in feature extraction, object localization, and anomaly identification [6]. Eliminating manual inspection and annotation, and achieving end-to-end automation from data acquisition to real-time alerting, have become a key goal for reducing costs, avoiding high-altitude risks, and ensuring routine monitoring of large-scale structures [7]. Industry standards likewise emphasize the importance of automation in increasing inspection frequency and data traceability, while deep learning–driven end-to-end models serve as the technological cornerstone for realizing this closed-loop automation. In this context, real-time crack detection is not only one of the core objectives of this study but also a key capability for transforming structural monitoring systems from reactive response to proactive warning. The goal of real-time detection is to leverage the model’s fast inference capability to continuously track crack formation and respond instantly, thereby meeting the real-world demands for high-frequency monitoring on engineering sites.

Among deep learning architectures, the improved Visual Geometry Group Network 16 (VGG16) has been widely adopted as a backbone for feature extraction due to its strong representation capacity. For instance, Rehman et al. proposed a hybrid model combining sequential VGG16 and convolutional neural networks for diagnosing knee osteoarthritis, achieving over 93% accuracy on training, validation, and testing datasets [8]. Guo et al. addressed the challenge of low river extraction accuracy in remote sensing images by using VGG16 and ResNet-50 as feature extractors in a dual-branch fusion model comprising scale-level and semantic-level outputs [9]. However, existing models still struggle with multi-scale crack detection, real-time inference, and preserving edge details, prompting further exploration of technical enhancements. In urban infrastructure applications, Koh et al. developed an automated sidewalk crack detection framework, demonstrating robustness on a dataset of 8,000 real-world images [10]. Luo et al. proposed a method that combines adaptive Canny edge detection and semantic segmentation, achieving over 6.5% improvement in mean Intersection over Union (mIoU) compared to single algorithms on the CRACK500 dataset [11]. To address real-time requirements, Mishra’s weakly supervised method and Li’s lightweight embedded U-Net showed promising results in monitoring concrete bridge and pavement cracks, with the latter achieving an mIoU of 79.38% [12–13]. Existing research shows that VGG16 offers a good balance between representational power and scalability, whereas VGG19, with its additional convolutional and fully connected layers, imposes significantly higher computational overhead and slower convergence in industrial deployment. U-Net, on the other hand, excels in enhancing pixel-level segmentation of crack regions [14], while the YOLO (You Only Look Once) framework achieves real-time object localization through a single forward pass, thus offering advantages in detection speed and robustness [15].

Therefore, this study proposes an integrated model—UY-VGG16—that combines improved VGG16, U-Net, and YOLO architectures. The model is designed to achieve precise segmentation and real-time localization of corrosion cracks, aiming to overcome the limitations of existing detection technologies in complex environments. Through multi-scale feature fusion and enhanced real-time capabilities, UY-VGG16 provides an accurate, efficient, and multi-technology collaborative solution for structural health monitoring.

 

Author Response File: Author Response.pdf

Back to TopTop