Review Reports - Lithology Identification Based on Improved Faster R-CNN

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Dear Authors,

To improve the quality of the paper, I have provided a few comments below.

1. Please include one paragraph that describes the different consecutive sections of the paper such as, section 2 consists of the model structure of Faster R-CNN, section 3......etc. It gives clear description to the reader of the paper to understand the sections of the paper easily.

2. Include a short summery table regarding the previous studies in "Lithology Identification Based on Improved Faster R-CNN". The table should consisits of reference, dataset type, techniques, ..........etc

3. Please modify the font size in figure 1, all the fonts are not visible.

4, Include the limitation and future work of the study.

5. Include the discussion part to section 5, and improve the "Experimental Results and Analysis" part

6. Improve the conclusion part and include some results in details.

7. Please check the final declaration section especially the funding part is not correct.

Author Response

Comments 1:[Please include one paragraph that describes the different consecutive sections of the paper such as, section 2 consists of the model structure of Faster R-CNN, section 3......etc. It gives clear description to the reader of the paper to understand the sections of the paper easily. ]

Response 1: [Thank you for highlighting this point. I concur with your observation, and as a result, we have implemented the following modifications:[Changes have been made in the introduction section.]]

Comments 2:[Include a short summery table regarding the previous studies in "Lithology Identification Based on Improved Faster R-CNN". The table should consisits of reference, dataset type, techniques, ..........etc.]

Response 2: [Thank you for highlighting this point. I concur with your observation, and as a result, we have implemented the following modifications:[A summary table has been added in the introduction section.]]

Comments 3:[Please modify the font size in figure 1, all the fonts are not visible. ]

Response 3: [Thank you for highlighting this point. I concur with your observation, and as a result, we have implemented the following modifications:[The font size of Figure 1 has been adjusted.]]

Comments 4:[Include the limitation and future work of the study. ]

Response 4: [Thank you for highlighting this point. I concur with your observation, and as a result, we have implemented the following modifications:[The conclusion section has been revised.]]

Comments 5:[Include the discussion part to section 5, and improve the "Experimental Results and Analysis" part.]

Response 5: [Thank you for highlighting this point. I concur with your observation, and as a result, we have implemented the following modifications:[The discussion from Chapter 4 has been moved to Chapter 5.]]

Comments 6:[Improve the conclusion part and include some results in details.]

Response 6: [Thank you for highlighting this point. I concur with your observation, and as a result, we have implemented the following modifications:[Some data has been added to the conclusion section.]]

Comments 7:[Please check the final declaration section especially the funding part is not correct. ]

Response 7: [Thank you for highlighting this point. I concur with your observation, and as a result, we have implemented the following modifications:[Checked]]

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The article presents an innovative approach to ore identification in the mining industry by proposing an improved Faster R-CNN model. This model introduces several key enhancements to address the limitations of traditional ore identification methods and the challenges faced by existing deep learning-based approaches. Many innovations, such as Use of Res2Net-50 as the Backbone Network；Improved Feature Pyramid Network (FPN)；ROI Align for Spatial Alignment；Efficient Channel Attention (ECA) Module；Soft-NMS for Enhanced Detection Robustness. The article's strength lies in its clear articulation of the problem and its systematic exploration of potential solutions. By focusing on enhancing the Faster R-CNN model, the study provides a targeted and effective approach to improving ore identification accuracy and efficiency. The experimental results corroborate the model's effectiveness, showcasing significant improvements in detection performance, which is a testament to the quality of the research. Therefore, the current status of the article is close to being acceptable. However, in order to further improve the quality and readability of the article, it is recommended that the author consider the following issues and make corresponding modifications:

1 The article abstract states that the method proposed in this paper is not only highly accurate, but also highly efficient. However, there seems to be no content comparing the computational efficiency of various methods in the article. It is recommended to add;

2 Although this article has done a lot of comparative experiments, these experiments are all simulated mine environments. However, the actual environment in mines is much worse. Can we directly compare the actual rock photos in the mines for identification experiments? This will be more convincing.

3 Why does the decline curve in Figure 13 in the article have dramatic fluctuations? What is the reason for this? It is recommended that the author analyze the reasons.

Author Response

Comments 1:[The article abstract states that the method proposed in this paper is not only highly accurate, but also highly efficient. However, there seems to be no content comparing the computational efficiency of various methods in the article. It is recommended to add;
]

Response 1: [Thank you for highlighting this point. I concur with your observation, and as a result, we have implemented the following modifications:[Modifications have been made in Section 5.2.4.]]

Comments 2:[ Although this article has done a lot of comparative experiments, these experiments are all simulated mine environments. However, the actual environment in mines is much worse. Can we directly compare the actual rock photos in the mines for identification experiments? This will be more convincing.]

Response 2: [Thank you for your suggestion. Based on your proposal, I have provided an explanation: [Your suggestion is excellent; images from real mining environments would indeed provide clearer insights. We had considered this issue as well, but obtaining such images from mining companies is challenging due to restricted access and required permissions to enter the sites. Additionally, there are very few publicly available images of real mining environments on the internet that meet the requirements for recognition, making it difficult to provide authentic data as support.]]

Comments 3:[ Why does the decline curve in Figure 13 in the article have dramatic fluctuations? What is the reason for this? It is recommended that the author analyze the reasons.]

Response 3: [Thank you for your suggestion. Based on your proposal, I have provided an explanation: [This is due to our limited hardware conditions; currently, we are using an NVIDIA GeForce RTX 4090 graphics card with only 24GB of VRAM and just a single card. As a result, we adopted a small batch training approach. Small batch training data tends to have high variability, which caused significant fluctuations in the loss curve. However, despite these fluctuations, the loss values remained within a reasonable range and showed an overall downward trend.]]

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Lithology identification is crucial in the mining industry to ensure the safety of operations and improve production efficiency. Traditional methods such as visual inspection, physical testing, and chemical analysis have limitations, including operational complexity and high costs. The authors suggest that modern identification methods, particularly those combining computer vision with deep learning, can significantly enhance the accuracy and efficiency of lithological identification.

The paper introduces an improved Faster R-CNN model with several modifications, including replacing the ResNet backbone with Res2Net-50, incorporating an enhanced Feature Pyramid Network (FPN), and using ROI Align instead of ROI Pooling. These changes are aimed at improving feature extraction, spatial alignment, and robustness in complex mining environments.

The use of Res2Net-50 and an improved FPN represents a significant advancement in feature extraction, particularly in handling multi-scale features. This is crucial for accurately identifying rocks in complex and harsh mining conditions.

The paper mentions that open-source datasets were insufficient for the study’s purposes, leading to the creation of a new dataset. While necessary, the size of the dataset (840 images) could still be considered small for deep learning applications, which may limit the model’s generalization capabilities. While the paper thoroughly evaluates the model in simulated conditions, it would benefit from a more detailed discussion on the deployment of this model in actual mining operations, including potential challenges and solutions for real-time processing and integration with existing systems.

The paper is well-structured, with clear explanations of the improvements made and their impact on model performance. It provides a solid foundation for future research in this area and could significantly influence the development of intelligent ore identification systems.

The main shortcomings:

Figure 2. Principle diagram of RPN anchor generation. – Why 2k coordinates after intermediate layer gain to reg layer (So the reg layer has 4k outputs encoding the coordinates of k boxes, and the cls layer outputs 2k scores that estimate probability of object or not object for each proposal(For simplicity we implement the cls layer as a two-class softmax layer. Alternatively, one may use logistic regression to produce k scores.)[Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." IEEE transactions on pattern analysis and machine intelligence 39.6 (2016): 1137-1149.])

Line 287 - “and L_box represents the regression loss of the detection network.” - There is not L_box in equation(1)

There is no citing to the original Faster R-CNN. There is also lack of references for the original papres for most used in the work methods:Faster R-CNN, Res2Net-50, FPN, ECA, ROI Align, Soft-NMS

Loss function – equations 2-5 seems to be a bit different from original Faster R-CNN [Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." IEEE transactions on pattern analysis and machine intelligence 39.6 (2016): 1137-1149.], partially, loss function taken from [Liu, Xiaobo, et al. "Research on intelligent identification of rock types based on faster R-CNN method." Ieee Access 8 (2020):21804-21812]. It is worth adding a discussion about of the applicability of such a loss function. ¨

Equation 3 – It seems to be redundant p^*_i (any ways, it can be taken outside the sum sign)

Figure 14 - The images of the marble in the third row which represent the detection outcomes from the improved Faster R-CNN model at third and forth column seems not to be images under simulated mining noise conditions like low lighting and lens smudges.

Line 552 and below(Figures 12-13) - how iterations related to epochs?

Table 4 Flops is common used as float point operation per second (this is usually a computing performance but no computational complexity). The naming Parm/M and FLOPs/G introduces ambiguity (the relation to Mega and Giga is not obvious)

There is no computational complexity comparison between Faster R-CNN and final(with all improvements) improved Faster R-CNN

There is often no space after the period(full stop) at the end of the sentence

Comments for author File: Comments.pdf

Comments on the Quality of English Language

There is often no space after the period(full stop) at the end of the sentence

Author Response

Comments 1:[The paper mentions that open-source datasets were insufficient for the study’s purposes, leading to the creation of a new dataset. While necessary, the size of the dataset (840 images) could still be considered small for deep learning applications, which may limit the model’s generalization capabilities. While the paper thoroughly evaluates the model in simulated conditions, it would benefit from a more detailed discussion on the deployment of this model in actual mining operations, including potential challenges and solutions for real-time processing and integration with existing systems.]

Response 1: [Thank you for highlighting this point. I concur with your observation, and as a result, we have implemented the following modifications:[The changes are in the conclusion chapter.]]

Comments 2:[Figure 2. Principle diagram of RPN anchor generation. – Why 2k coordinates after intermediate layer gain to reg layer (So the reg layer has 4k outputs encoding the coordinates of k boxes, and the cls layer outputs 2k scores that estimate probability of object or not object for each proposal(For simplicity we implement the cls layer as a two-class softmax layer. Alternatively, one may use logistic regression to produce k scores.)[Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." IEEE transactions on pattern analysis and machine intelligence 39.6 (2016): 1137-1149.])]

Response 2: [Thank you for highlighting this point. I concur with your observation, and as a result, we have implemented the following modifications:[The principle diagram of RPN anchor generation has been modified. ]]

Comments 3:[Line 287 - “and L_box represents the regression loss of the detection network.” - There is not L_box in equation(1)]

Response 3: [Thank you for highlighting this point. I concur with your observation, and as a result, we have implemented the following modifications:[It has been modified at line 264 to denote $L_{reg}$ as the regression loss of the detection network.]]

Comments 4:[There is no citing to the original Faster R-CNN. There is also lack of references for the original papres for most used in the work methods:Faster R-CNN, Res2Net-50, FPN, ECA, ROI Align, Soft-NMS]

Response 4: [Thank you for highlighting this point. I concur with your observation, and as a result, we have implemented the following modifications:[The references for Faster R-CNN, ResNet-50, Res2Net-50, FPN, ECA, ROI Align, and Soft-NMS are [34]-[40], respectively.]]

Comments 5:[Loss function – equations 2-5 seems to be a bit different from original Faster R-CNN [Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." IEEE transactions on pattern analysis and machine intelligence 39.6 (2016): 1137-1149.], partially, loss function taken from [Liu, Xiaobo, et al. "Research on intelligent identification of rock types based on faster R-CNN method." Ieee Access 8 (2020):21804-21812]. It is worth adding a discussion about of the applicability of such a loss function. ¨]

Response 5: [Thank you for your suggestion. Based on your proposal, I have provided an explanation: [The loss functions mentioned in the two references you provided only list the classification and regression losses for the bounding boxes. In contrast, this paper details both the classification and regression losses of the RPN, as well as the classification and regression losses of the bounding boxes, effectively encompassing the loss functions from the previous two references. Since these loss functions have not been altered in form, we consider them to be consistent in terms of applicability.]]

Comments 6:Equation 3 – It seems to be redundant p^*_i (any ways, it can be taken outside the sum sign)]

Response 6: [Thank you for highlighting this point. I concur with your observation, and as a result, we have implemented the following modifications:[Equation 3 has been modified.]]

Comments 7:[Figure 14 - The images of the marble in the third row which represent the detection outcomes from the improved Faster R-CNN model at third and forth column seems not to be images under simulated mining noise conditions like low lighting and lens smudges.]

Response 7: [Thank you for your suggestion. Based on your proposal, I have provided an explanation: [We primarily considered the following two scenarios: First, in mining or underground mining environments, lighting conditions are very complex and often quite dim. In underground mines in particular, insufficient light sources can lead to reduced image contrast and blurred details. Second, after prolonged use, the recognition equipment's lens may accumulate stains, dust, water droplets, or oil smudges. Due to the extended operation of the equipment and the potential negligence during manual maintenance, these issues might be overlooked. Such stains can form irregular translucent spots or blurred areas on the images, obstructing some image content and consequently affecting the image quality. Therefore, we regard these two scenarios as mining noise.]]

Comments 8:[Line 552 and below(Figures 12-13) - how iterations related to epochs?]

Response 8: [Thank you for your suggestion. Based on your proposal, I have provided an explanation: [The number of iterations is equal to the number of batches per epoch (i.e., the dataset size divided by the batch size). During training, the total number of iterations is the product of the number of epochs and the iterations per epoch. In this experiment, our total number of iterations is 9,450, with 100 epochs set. However, the model actually underwent 18,900 iterations. The iteration count is shown as 9,450 because we were concerned that displaying the loss curve for 18,900 iterations would be too dense and difficult to observe. Therefore, we simplified the presentation by averaging the loss values for every two iterations.]]

Comments 9:[Table 4 Flops is common used as float point operation per second (this is usually a computing performance but no computational complexity). The naming Parm/M and FLOPs/G introduces ambiguity (the relation to Mega and Giga is not obvious)There is no computational complexity comparison between Faster R-CNN and final(with all improvements) improved Faster R-CNN]

Response 9: [Thank you for highlighting this point. I concur with your observation, and as a result, we have implemented the following modifications:[The changes have been made in the corresponding locations (Table 5 and English sentences).]]

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

Thank you for your detailed and thoughtful response to my comments. I have reviewed your revisions, and I am pleased to confirm that my concerns have been fully addressed. The changes you have made significantly improve the quality of the article, and I believe it is now in a much better position for publication. However, there remains one point that still requires attention. In the images for marble showing the results of the improved Faster R-CNN model, particularly under low lighting and lens smudge conditions, the outcomes appear significantly different from those of the original Faster R-CNN model for the same scenarios.

Comments for author File: Comments.pdf

Comments on the Quality of English Language

The quality of English is ok

Author Response

Comments 1:[Thank you for your detailed and thoughtful response to my comments. I have reviewed your revisions, and I am pleased to confirm that my concerns have been fully addressed. The changes you have made significantly improve the quality of the article, and I believe it is now in a much better position for publication. However, there remains one point that still requires attention. In the images for marble showing the results of the improved Faster R-CNN model, particularly under low lighting and lens smudge conditions, the outcomes appear significantly different from those of the original Faster R-CNN model for the same scenarios.]

Response 1: [Thank you for pointing this out. I agree with your observation; this issue arose due to an oversight during the previous modification. Consequently, we have made the following improvements:[Figures 14 and 15 have been revised.]]