A Multi-Deep Learning Intelligent Surface Rock Crack Identification Method for Transmission Tower Siting
Round 1
Reviewer 1 Report (Previous Reviewer 2)
Comments and Suggestions for AuthorsThe revised manuscript can be published.
Reviewer 2 Report (Previous Reviewer 4)
Comments and Suggestions for AuthorsThe manuscript has been appropriately updated in accordance with the peer review.
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe research proposes an end-to-end surface crack identification method based on Convolutional Neural Networks (CNNs), aiming to address the issues of low accuracy and poor generalization ability of traditional crack detection methods and provide technical support for the siting of transmission towers. However, there are still the following problems in the manuscript:
1.Although Chapter 2 of the paper mentions the division and processing of the dataset, it lacks detailed descriptions of the data source, collection scope, and environmental diversity. It is recommended to supplement this information, such as the geographical location of data collection and differences in geological conditions, to evaluate the representativeness of the data for different scenarios.
2.Regarding the network model in Figure 3, there is a lack of explanations for the internal working mechanism and decision - making process of the model. It is difficult to identify the innovation points of the author's method. Please use mathematical formulas and visual logical relationships to explain the network model diagram, making it easier for readers to clearly understand the innovation points of this paper.
3.Continuing from the previous question, the author only mentions setting the training parameters but does not elaborate on the process of hyperparameter tuning, such as how parameters like the learning rate, number of iterations, and batch size were determined. It is recommended that the author supplement this information.
4.When comparing with other methods, only the accuracy rates of different models are listed. It is suggested to add more performance indicator comparisons, such as recall rate, F1 - value, and mean average precision (mAP), to comprehensively evaluate the model performance from multiple dimensions.
5.The experiments are mainly based on a specific surface crack dataset, and the verification of the model's generalization ability in different scenarios (such as transmission towers in different regions and different types of geological disaster scenarios) is insufficient. It is recommended to add tests on other relevant datasets or in actual different scenarios to more fully verify the generalization performance of the model and improve the practicality of the research results.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authenticity of the content studied in this paper is questionable. Or it can almost be concluded that this is a pieced together paper. The author has not actually conducted any research. Because the details of this article cannot withstand scrutiny. The reviewer listed 5 instances of common sense violations and 3 missing details in the paper to support my viewpoint. However, it should be pointed out that there are many parts of the entire text that do not conform to common sense and lack details.
Contrary to common sense 1: For image detection or recognition tasks, it is best to directly extract features, but the author needs to preprocess the image first (lines 131-144). The enhancement of images, median filtering, and other operations are carried out for better observation by the human eye. However, for feature extraction, it is a very poor step. These operations will completely destroy the original features, resulting in a decrease in recognition rate.
Contrary to common sense 2: The author presented the images used in their experiment in Figure 5. This is the most distinctive part of this article. The image used is actually for detecting cracks on asphalt pavement! This is completely inconsistent with the task of this article. If the author has truly conducted research, then he should look for images related to the task for experimentation. Does the transmission tower shown in Figure 1 have to stand on the asphalt road in the center of the city?
Contrary to common sense 3: Figure 1b is a schematic diagram of the collapse of a transmission tower due to surface cracks. However, from a common sense perspective, the collapse of buildings should be due to the settlement of deep soil layers. Can't a solid foundation surface have cracks? Does a surface without cracks mean that buildings can be built? May I ask if there are cracks in the sand in the desert?
Please note that the literature on crack detection methods cited by the author is reasonable. For example, in references [17-19], some studies identify wall cracks, while others identify cracks in hardened road surfaces. For the task of selecting the location of transmission towers, the cracks on the ground have little to do with the location of the towers, but should be related to the deep structure beneath the soil layer.
Contrary to common sense 4: Figure 4 is the structural diagram of the neural network proposed by the author. Figure 4 shows that the fully connected layer at the end of the neural network has two outputs. This represents that the task is a binary classification problem. However, this is actually an object detection problem, and the author should output the boundary bounding box of the crack. The author doesn't even have this simple technical knowledge, did he really conduct research?
Contrary to common sense 5: The author uses YOLO algorithm for feature box annotation in line 146. The tool for annotating feature boxes should be labelme. As an object detection network, YOLO also needs to be trained using annotated images. In addition, YOLO, as one of the most popular object detection networks, can already accomplish the task of this article on its own. If YOLO is used to complete the crack detection task, its detection accuracy will definitely be much higher than 93%.
Detail Missing 1: Currently, YOLO has developed 10 versions, I don't know which version the author used?
Detail Missing 2: In Section 2.1, the author introduces image acquisition and filtering. The author used a lot of text to point out what to pay attention to when collecting and filtering images. This tone seems like a big shot is giving a speech. But the author doesn't mention how many images they collected in total or the size of each image. The author said that YOLO and other methods were used for annotation, so what exactly is the method?
Detail Missing 3: Since this article uses deep neural networks, the relevant parameters should be explained in the experimental section. Based on which deep learning framework? What is the version number? What is the CUDA version number? What is the batch size set to? What is the GPU model? These parameters are mandatory for all deep learning papers. So that readers can reproduce your code.
In summary, why does a paper have so many violations of common sense and missing details? This is not a matter of writing skills, a reasonable explanation is that the author has not conducted any research.
In addition:
This paper is written too roughly.
From lines 116 to 125, the author wrote something like this: what should be written in the introduction and abstract sections. The format and text of the references.
The experimental data in this paper is unreliable.
The author compares SVM, random forest, and decision tree. But how can papers from 2025 be compared to technologies from 20 years ago?
This paper lacks any innovation. The third part of the paper should provide a detailed introduction to the methods proposed in this paper. But the author introduces some content that can be seen on the Internet.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper proposes a CNN-based method for surface crack identification, addressing the limitations of traditional methods in terms of accuracy, efficiency, and generalization ability. The end-to-end framework integrating image acquisition, data processing, model training, and crack identification is innovative, especially the multi-scale feature fusion CNN architecture and the use of adaptive image enhancement techniques.
While the experiments are well-designed, a discussion on the dataset's diversity and how the method might perform in different real-world scenarios would be beneficial. For example, how well the method generalizes in diverse environments could be further explored.
The paper could further discuss some technical details, such as the choice of loss function, regularization techniques, and their theoretical implications, to enhance the depth of the study.
Reviewer 4 Report
Comments and Suggestions for AuthorsThis manuscript reports the application of CNN to the identification problem of surface cracks. The topic dealt with is interesting and the obtained results are compared with other algorithms for showing the superiority of suggested CNN approach. However, the reviewer has some questions on the identification problem and the novelty of this manuscript. Please consider the following comments.
1.Unnatural hyphenations:
There are many unnatural hyphenations. Please check them and correct them, if necessary, at lines 52, 65, 67, 70, 73, 74, 96, 98, 99, 102, 104, 105, 107, 113, 136, and 349.
2. Definitions of crack patterns:
In Figure 1, the crack dealt with in this paper is shown, however, not the images of cracks but only an example of simple collapse caused by cracks is shown. In Figure 5(c) and 5(d), the images of cracks are shown as examples, however, all types of cracks should be defined at the beginning of this manuscript. The number of types of cracks is important and it should also be discussed what types of cracks have not been detected because the accuracy is still 93.23%. Moreover, a confusion matrix should be shown in this type of appearance examination.
3. Typographical errors:
The sentences from line 116 to line 124 should be deleted.
line 244: "theinitial" -> "the initial"
line 302: "layersof" -> "layers of "
line 338: "used. Learning Rate scheduler.", its meaning is not clear
line 372: "Bayes)", its meaning is not known
4. lines 253 and 258:
The descriptions, "about 4900 images" and "about 2,600 images" are not precise. Why the number of images are "about n images"? Please explain the reason, or show the exact number.
5. related to Figure 4:
Please show the computing environment, such as, used CPU time, main memory size, and with or without GPU, etc. Moreover, what is the version number of YOLO? YOLO's functions and performances are strongly dependent with its versions.
6. Figure 5(c) and 5(d):
Are these images the outputs of detection results of cracks generated by YOLO? Usually, the output images of YOLO are with label and confidence on an identification window.
7. Originality:
This is the largest problem with this manuscript which should be improved. The reviewer does not recognize its originality although the identification problem is interesting. First of all, the accuracy is better than other types of algorithms, but the accuracy 93.23% is not so high. Secondly, the architecture of CNN is not new, and other processes, such as, denoising, image normalization, data augmentation, drop-out, and activation function are quite standard. Thirdly, the analysis on the identification insufficiency should be discussed. Please describe the authors' original algorithms or modifications on CNN.