Review Reports - Surface Damage Detection and Analysis for Reduction-Fired Cyan Square Bricks in Jiangnan Gardens via YOLOv12

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper presents surface damage detection and analysis for reduction-fired cyan square bricks in Jiangnan Gardens Heritage. An intelligent detection method based on advanced computer vision, utilizing the YOLOv12 object detection model, was developed to achieve non-contact, automated identification of typical tile surface damage types. The paper is well-written and structured. The reviewer recommends the following minor revisions to improve the paper’s quality.

The definition or description of model C should be added to the Abstract.
The Chinese characters may be deleted in the manuscript.
What was learned and the purpose of this study? How it differs from previous research. Please summarize them in the end paragraph of the Introduction.
Line 353 seems unrelated to the next sentence (Line 356).

Author Response

The definition or description of model C should be added to the Abstract.

Response: We thank the reviewer for this suggestion. The description of Model C has been added to the Abstract as requested.

The Chinese characters may be deleted in the manuscript.

Response: Thank you for pointing this out. We have deleted the Chinese characters in the manuscript and have carefully checked the entire document to ensure no similar issues remain.

What was learned and the purpose of this study? How it differs from previous research. Please summarize them in the end paragraph of the Introduction.

Response: We thank the reviewer for this valuable suggestion. We have revised the final paragraph of the Introduction section to clearly summarize the purpose, learnings, and novelty of this study. The updated paragraph now reads:

"In contrast to prior research on brick heritage, which has predominantly focused on chemical composition and microscopic properties, studies specifically addressing the surface damage of reduction-fired cyan square bricks remain scarce. The integration of intelligent detection technologies in this domain is even more limited. This study pioneers the application of an advanced object detection model, YOLOv12, for the non-contact and highly efficient identification of surface damage on these bricks. The primary purpose is to develop an automated approach that transcends the limitations of traditional manual inspection, which is often subjective and time-consuming. This research not only demonstrates the feasibility of using deep learning for heritage material diagnostics but also provides a robust and scalable framework for the condition assessment of garden heritage, thereby offering a significant departure from and contribution to the existing body of work."

Line 353 seems unrelated to the next sentence (Line 356).

Response: We appreciate the reviewer's keen observation regarding the lack of coherence between Line 353 and Line 356. We have revised the manuscript to address this issue by inserting a bridging sentence to clarify the logical progression. The text now reads (after Line 353):

" The following are the training settings for the three key components in the model structure."

This addition seamlessly connects the preceding general description to the subsequent specific training settings, significantly improving the readability and flow of the paragraph.

Reviewer 2 Report

Comments and Suggestions for Authors

The article presents original research at the intersection of machine learning and cultural heritage preservation. The authors collected a decent amount of training data and developed a machine learning model that can detect paving defects such as cracks and wear with reasonable accuracy.

There are following questions and comments on the article:

Line 402-408: How were the model parameters selected during training? Did the authors try to vary the parameters to get a better result?
Figure 9. Metrics are shown starting from epoch 50. The training process up to epoch 50 should be shown, otherwise it is not clear that errors in the training process are actually decreasing.
To be able to reproduce the results, it is recommended to place the training dataset in a publicly accessible repository and provide a link to it in the article.
Figure 7: "Smatr device" should be corrected to "Smart device"
Line 231: Duplication "authenticity and authenticity" should be removed.

Author Response

There are following questions and comments on the article:

Line 402-408: How were the model parameters selected during training? Did the authors try to vary the parameters to get a better result?

Response: Thanks for the suggestion, we will add the following content:

The model training parameters for this study are shown in Table 1. Training was performed based on the pre-trained YOLOv12x weights, with CUDA enabled, using a 512×512 input size and a batch size of 16 for 500 epochs. The SGD optimizer with a cosine annealing learning rate strategy was used, with momentum = 0.937 and weight_decay = 5e-4 to stabilize convergence and suppress overfitting. Warmup_epochs = 5 was used to mitigate initial gradient oscillations. AMP mixed precision and workers = 16 were enabled to improve data pipeline and computational efficiency. Validation was enabled throughout the training process to monitor mAP50–95/Precision/Recall. In terms of parameter exploration, we tried AdamW with stepwise learning rates (step LR), as well as combinations of lr0∈{5e-3, 2e-2}, momentum∈{0.90, 0.98}, weight_decay∈{1e-4, 1e-3}, warmup∈{0, 3}, imgsz∈{416, 640}, and batch∈{8, 24}. Results showed that the current configuration performed most robustly on the validation set: SGD+cosine exhibited less overfitting than AdamW. While imgsz=640 slightly helped with very small targets, it was computationally expensive and resulted in increased convergence volatility. Batches greater than 16 were susceptible to GPU memory limitations, while batch=8 resulted in unstable statistics and slightly reduced accuracy. In summary, the hyperparameters used in this paper represent a compromise between efficiency, stability, and accuracy.

Figure 9. Metrics are shown starting from epoch 50. The training process up to epoch 50 should be shown, otherwise it is not clear that errors in the training process are actually decreasing.

Response: Thanks for your comments. We will redraw Figure 9 and add the training process data before 50 epochs.

To be able to reproduce the results, it is recommended to place the training dataset in a publicly accessible repository and provide a link to it in the article.

Response: Thank you very much for your suggestion. We have included the data availability statement at the end of this article.

Figure 7: "Smatr device" should be corrected to "Smart device"

Response: We thank the reviewer for pointing out these typographical errors. We have corrected "Smatr device" to "Smart device" in both Figure 7 and Figure 16, as well as in their accompanying captions. We sincerely apologize for these oversights. Additionally, we have conducted a thorough spell-check of the entire manuscript, including all figures and captions, to prevent similar issues.

Line 231: Duplication "authenticity and authenticity" should be removed.

Response: We sincerely thank the reviewer for their meticulous reading and for identifying this duplication error. The repeated word "authenticity" on Line 231 was indeed a typographical oversight. We have corrected it to "completeness" as intended.

The sentence now reads: "For historical buildings, restoration efforts must preserve the original features to maximize authenticity and completeness while also ensuring the continued function of the building."

We have also performed a careful review of the manuscript to ensure no similar errors are present.

Reviewer 3 Report

Comments and Suggestions for Authors

Abstract: You don't need to write "500 epochs" in the abstract. You already wrote it in the Results section.
Line 38: The disadvantages of traditional methods are mentioned in the abstract. However, this should be highlighted in the Introduction section. Therefore, the literature on traditional methods should also be included and their shortcomings should be mentioned. I also believe the Introduction section is insufficient for this study. More studies in this field should be added. Furthermore, contributions to the literature and the article's plan should be added at the end of the Introduction section.
Figure 7: In the 6th Model Application view, it is written as "Smatr Device." It should be changed to "Smart Device."
Line 334: The title is at the bottom of the page. Please move it to the next page.
Figure 10: It's really hard to understand what Molecules A, B, C, and D are. You could provide this in a table with explanations. I also think four digits after the decimal point are sufficient for the metric values.
Figure 12: The text on the Confusion matrix is too small. Please make it more readable.
Line 529: Move the figure legend to the previous page.
Line 532: For example, the precision value is given to three decimal places. However, Figure 10 has many decimal places.
Lines 541-542: "Reduction fired cyan square bricks (RBCs)." Capitalize the first letters.
Figure 16: "Smatr" was written, please correct it.
Line 708: I believe the number of references is low. This is due to the limited literature research.

Author Response

Abstract: You don't need to write "500 epochs" in the abstract. You already wrote it in the Results section.

Response: We thank the reviewer for this constructive suggestion. We have removed the mention of "500 epochs" from the abstract to keep it concise and avoid unnecessary duplication, as this detail is thoroughly presented in the Results section.

Line 38: The disadvantages of traditional methods are mentioned in the abstract. However, this should be highlighted in the Introduction section. Therefore, the literature on traditional methods should also be included and their shortcomings should be mentioned. I also believe the Introduction section is insufficient for this study. More studies in this field should be added. Furthermore, contributions to the literature and the article's plan should be added at the end of the Introduction section.

Response: We thank the reviewer for these constructive suggestions. We have revised the Introduction section by adding relevant literature on traditional methods and further elaborated on their limitations. Additionally, we have expanded the discussion of previous studies in this field and explicitly stated the research contributions and article structure at the end of the Introduction section. These modifications have improved the completeness and clarity of our introduction.

Figure 7: In the 6th Model Application view, it is written as "Smatr Device." It should be changed to "Smart Device."

Response: We are grateful to the reviewer for catching this typo. The misspelling of "Smart device" in Figure 7 (and also in Figure 16) has been corrected in the figure and its caption. We apologize for this error and have performed a full spelling check across the entire manuscript to ensure consistency.

Line 334: The title is at the bottom of the page. Please move it to the next page.

Response: We thank the reviewer for this suggestion. The title on Line 334 has been moved to the next page to improve readability. We have also reviewed the overall formatting to ensure proper layout.

Figure 10: It's really hard to understand what Molecules A, B, C, and D are. You could provide this in a table with explanations. I also think four digits after the decimal point are sufficient for the metric values.

Response: We thank the reviewer for these insightful suggestions. In response to Comments 5, we have added a new table (Table 1) to clearly explain the definitions and applications of Molecules A, B, C, and D. Regarding the decimal precision, and in consideration of Comments 8, we have unified all metric values throughout the manuscript to three decimal places, which we believe provides sufficient precision while maintaining consistency across all figures and results.

Figure 12: The text on the Confusion matrix is too small. Please make it more readable.

Response: Thank you for the suggestion. We have enlarged the text in Figure 12 to improve readability.

Line 529: Move the figure legend to the previous page.

Response: The figure legend on Line 529 has been moved to the previous page to ensure proper placement with its corresponding figure. We have also reviewed the overall document formatting to maintain consistency in figure presentation.

Line 532: For example, the precision value is given to three decimal places. However, Figure 10 has many decimal places.

Response: We thank the reviewer for this observation. In accordance with your previous comment (Comments 5), we have now unified the numerical format throughout the document by retaining three decimal places for all metric values, including those in Figure 10. This adjustment ensures consistency in data presentation across the manuscript. We have also verified all other numerical values to maintain this standard format.

Lines 541-542: "Reduction fired cyan square bricks (RBCs)." Capitalize the first letters.

Response: The term "Reduction-fired cyan square bricks (RBCs)" has been revised to "Reduction-fired Cyan Square Bricks (RCSB)" We have also verified all similar terms to ensure consistent formatting.

Figure 16: "Smatr" was written, please correct it.

Response: The misspelling of "Smart device" in Figure 16 (and also in Figure 7) has been corrected in the figure and its caption.

Line 708: I believe the number of references is low. This is due to the limited literature research.

Response: We appreciate the reviewer's helpful feedback regarding the references. In response, we have expanded the literature review by adding relevant citations to better situate our study within the existing research landscape. These additions have been incorporated throughout the manuscript where appropriate.