Review Reports - Automatic Asbestos Control Using Deep Learning Based Computer Vision System

Round 1

Reviewer 1 Report

Review of the paper “Automated Asbestos Control Using Deep Learning Based Computer Vision System“

The paper presents a setup for the acquisition of images for online asbestos fiber content (productivity) estimation in veins of rock chunks in open-pit conditions and the corresponding algorithm for the processing of the images.

After the reading of the paper, several questions arise about the algorithm and results of the paper. The first one is that the description of the algorithm lacks formality and mathematical soundness so that it can be reproduced in other applications. Also, the number of images for the training of the CNNs is very low in my opinion, and also the number of experiments, so I propose to extend these numbers. A question that I would like to be answered in the article is the dependency of the system results on the distance of de camera to the veins in rocks. As it is known, the size of a real area covered by a pixel depends on the working distance of the optical system and the field of view. If you take an image at 10 meters these parameters vary if you take the image at 15 meters. That means that the proposed system will produce different results depending on these parameters. So please answer this concern. The graph in figure 13 in my opinion demonstrates that the system has a poor agreement with the results of the human experts, so that obligates to a reanalysis of the proposed system. Please explain this point.

Finally, there is a lot of writing errors that sometimes make the understanding of the paper very hard. Please correct them. Here are some of them: Formalization of the algorithm; English improvement; allows to atomize; systems allows to atomize the; are most suit ; for the each of ; near infrared range; it is also could be; 30 degree; lens are allow; resolution about 4; to avoiding the; training was randomly; are shown Figure 13; The Table 3 shows; an error about 0.4%; Training curves are shown in Figure 11.;….

Author Response

Q1. The first one is that the description of the algorithm lacks formality and mathematical soundness so that it can be reproduced in other applications.

Answer. The algorithm has been renamed to the word "scheme". We understand that in the mathematical meaning the algorithm implies several formally described steps, but here want to show the technical aspects of the designed system work.

Q2. Also, the number of images for the training of the CNNs is very low in my opinion, and also the number of experiments, so I propose to extend these numbers.

Answer. All blocks that were applied in CNN architectures have pertained. Thus our stage was only fine-tuning. The number of samples is comparable with the one used in other similar papers. Moreover, the obtained accuracy on the test data (we mean asbestos content value accuracy) is comparable with the geological service accuracy. As it was mentioned the test data was obtained in different weather conditions with the trained one - which guarantees sufficient generalization ability. Thus we suppose that we reach the main goal of the work.

Q3. A question that I would like to be answered in the article is the dependency of the system results on the distance of de camera to the veins in rocks. As it is known, the size of a real area covered by a pixel depends on the working distance of the optical system and the field of view. If you take an image at 10 meters these parameters vary if you take the image at 15 meters. That means that the proposed system will produce different results depending on these parameters. So please answer this concern.

Answer. The selected camera and lens allow one to obtain a resolution of about 4 pixels in 1 mm at a distance of 5 m, which is assumed to be enough in comparison with the typical asbestos vein width (about 4-12 mm). This information is added to the paper, thank you for your note.

Q4. The graph in figure 13 in my opinion demonstrates that the system has a poor agreement with the results of the human experts, so that obligates to a reanalysis of the proposed system. Please explain this point.

Answer. Fig.13 results are explained is in the discussion section. Most of the results can be approximated by the line with some sufficient accuracy. However, in several specific cases which of course need to be improved, we have outliers (only 7 outliers). The reasons for outliers are due to a small number of instances (images of rock chunks) in the open-pit place or due to the specificity of manual stone selection in those experiments (only asbestos-rich images).

Q5. Finally, there is a lot of writing errors that sometimes make the understanding of the paper very hard. Please correct them. Here are some of them: Formalization of the algorithm; English improvement; allows to atomize; systems allows to atomize the; are most suit ; for the each of ; near infrared range; it is also could be; 30 degree; lens are allow; resolution about 4; to avoiding the; training was randomly; are shown Figure 13; The Table 3 shows; an error about 0.4%; Training curves are shown in Figure 11.;….

Answer. Thank for note, the gramma is improved.

Reviewer 2 Report

This paper describes a CV approach in geology. Authors implement ResNet50 to detect asbestos veins and rock chunks detection. The method, implementation, and result have been clearly described. The approach shortens the timescale of the above detection task.

Just one question: what is used for running the ResNet50? Phone or tablet? Did you do any optimization to the network or it is just a standard implementation of ResNet50? It would be nice if you mention exactly the device you are using. Because to me, for images with high resolution, the phone or tablet probably cannot provide enough computing power. Are there any special optimizations?

Author Response

Thank you for your review. The information about PC is added. The tablet desten cyberbook t850 was applied. The time of measurement for one open-pit place with 50 rock chunks was taken about 10 minutes in the fully automatic mode. We did not perform any special optimizations – the measurement time is comparable with geological specialist estimation one.

Reviewer 3 Report

The study focuses an intelligent system for asbestos content localisation within the rock chunks veins at an open-pit condition. They use deep learning models for the object detection and classification. The paper needs following improvements before publishing:

Abstract: Check the writing style e.g., avoid using past tense.

Author claims real-time application however, no evidence provided in the outcomes. Real-time means use of the system/model for live video streamed data and not the set of pre-recorded images.

Overall, writing needs improvement throughout the manuscript. I would recommend to be done by a native English speaker.

Line 25: Sort the references w.r.t their occurrence

Line 32-43: Are there any existing studies supporting these statements?

Line 103: Summarise the limitations of existing systems/research studies

Line 107: At the end of Para, address the major contributions of your work

In relation to Object detection and segmentation, I would recommend to cite the following recent studies:

https://www.sciencedirect.com/science/article/pii/S0957417420310289; https://www.mdpi.com/1424-8220/20/13/3785; https://arc.aiaa.org/doi/abs/10.2514/1.I010570

Section 2: There are so many bullet points in the manuscript and it seems like a report instead of scientific paper. Please avoid using these in the revised version.

In the beginning of Section, add description of the system/framework.

Explain the terms used in Fig 2 in the caption e.g., PoE?

Section 2 and 3 can be merged together.

Algorithm needs significant improvements:

It does not look like proper computing algorithm. I can see steps but need formatting used in Scientific papers.
You need to further disclose/expand the algorithm for use of deep learning

Dataset:

Line 166: Check writing style “it was used 2 datasets were collected”.

Add a Table explaining the attribiutes of all datasets.

Add a note about ethical approval? What restriction are there? Can you publish data?

Be careful by saying ‘random shuffle of samples’ because this would lead to bias detection/recognition. This is because same sample/rock might exisit in both Training as well as testing.

Section 5

Line 214: Do you think 6 images are sufficient to validate the Model? Generalisation?

You also need to adda Table with metrics/statistics and not only Images to show the accuracy.

Fig 11 clearly indicate poor fitting on validation data and offcourse, on test data. What is your response?

Where are the statistical outcomes?

As author claims it to be used for real-time, I would recommend to add processing time. Specifically, test time for predicting/localising the target object in such high resolution images.

Table 1: What does % indicate? Error or accurate prediction?

Can you compare your work with the related literature? If not in terms of performance, you can address other aspects such as the study objectives, use of conventional machine learning vs DL etc.,

Conclusion:

Remove bullet points. Address the limitations of your work

Author Response

Q1. Abstract: Check the writing style e.g., avoid using past tense.

Answer. Thank you for your note, the grammar is improved.

Q2. Author claims real-time application however, no evidence provided in the outcomes. Real-time means use of the system/model for live video streamed data and not the set of pre-recorded images.

Answer. The time of measurement for one open-pit place with 50 rock chunks was taken about 10 minutes in the fully automatic mode. The information about applied PC is also added. The measurement time is comparable with geological specialist estimation one. Thus we considered it as a system for working on a real-time scale. The solved task does not imply video stream processing. We include this clarification in line 138.

Q3. Overall, writing needs improvement throughout the manuscript. I would recommend to be done by a native English speaker.

Answer. Thank you for your note, the grammar is improved.

Q4.

Line 25: Sort the references w.r.t their occurrence

Answer. Thank you for your note, references sequence is changed.

Line 32-43: Are there any existing studies supporting these statements?

Answer. This information was collected during the discussion of the problem with geological service specialists.

Line 103: Summarise the limitations of existing systems/research studies

Answer. Please, clarify, what kind of information do you mean.

Line 107: At the end of Para, address the major contributions of your work

Answer. Please, clarify, what kind of information do you mean.

Q5. In relation to Object detection and segmentation, I would recommend to cite the following recent studies:

https://www.sciencedirect.com/science/article/pii/S0957417420310289;

https://www.mdpi.com/1424-8220/20/13/3785

https://arc.aiaa.org/doi/abs/10.2514/1.I010570

Answer. Please, clarify, what kind of information do you mean.

Q6. Section 2: There are so many bullet points in the manuscript and it seems like a report instead of scientific paper. Please avoid using these in the revised version.

Answer. The bullet-like style is excluded except for the system description and scheme of its work. If you allow, we would like to rest it as part of the authors' style specify.

Q7. In the beginning of Section, add description of the system/framework.

Answer. Section 2 is renamed to a description of the experimental system and scheme of its work. The new section includes two subsections: system description and the scheme of the system work description.

Q8. Explain the terms used in Fig 2 in the caption e.g., PoE?

Answer. Abbreviation description is added to the line 127. PoE is the Power over Ethernet (camera supply system).

Q9. Sections 2 and 3 can be merged together.

Answer. The merging is done.

Q10. Algorithm needs significant improvements: It does not look like proper computing algorithm. I can see steps but need formatting used in Scientific papers. You need to further disclose/expand the algorithm for use of deep learning

Q11. Dataset: Line 166: Check writing style “it was used 2 datasets were collected”. Add a Table explaining the attribiutes of all datasets. Add a note about ethical approval? What restriction are there? Can you publish data?

Answer. There are no ethical restrictions concerning data in this work.

Q11. Be careful by saying ‘random shuffle of samples’ because this would lead to bias detection/recognition. This is because same sample/rock might exisit in both Training as well as testing.

Answer. Thank you for your valuable notation.

Q12. Section 5 Line 214: Do you think 6 images are sufficient to validate the Model? Generalisation?

Answer. This statement is given with respect to open-pit plane data (dataset 5), the obtained results for the test data and corresponding experiment results show sufficient generalization ability for obtaining the overall asbestos content estimation accuracy. However, we understand that this number of images is too small at all. In further researches, we will extend the dataset size.

Q13. You also need to add Table with metrics/statistics and not only Images to show the accuracy.

Answer. Thank you for your valuable notation.

Q14. Fig 11 clearly indicate poor fitting on validation data and offcourse, on test data. What is your response? Where are the statistical outcomes?

Answer. Thank you for your valuable notation, however, the obtained accuracy is enough for the target result.

Q15. As author claims it to be used for real-time, I would recommend to add processing time. Specifically, test time for predicting/localising the target object in such high resolution images.

Answer. The information is added.

Q16. Table 1: What does % indicate? Error or accurate prediction?

Answer. The values in table 1 show the estimation of asbestos content results.

Q17. Can you compare your work with the related literature? If not in terms of performance, you can address other aspects such as the study objectives, use of conventional machine learning vs DL etc.,

Answer. The comparison of the work with a similar one is done in the introduction. Lines 79-97

Conclusion:

Q18. Remove bullet points. Address the limitations of your work

Answer. Thanks for your extended review. All the notes are accepted and taken into account. We hope, that after correction the paper has become much better.

Round 2

Reviewer 1 Report

Review of the paper “Automated Asbestos Control Using Deep Learning Based Computer Vision System“

The paper is a reviewed version of a previous one already sent to the journal.

In general, after a careful read of the new version, the points to be solved that I have already raised in the first review, remain unresolved.

For example, the changing of the word algorithm to the word scheme does not resolve the unclarity of the steps. For example, the authors do not explain how the first stage works, how it should select automatically the possible stone candidates. It implies in my opinion the recognition of objects at different scales and positions. It is also mentioned that it used a CNN but nothing is said about its parameters, training, etc. Also, check the images in figure 3, they do not correspond between stages 1 and 2.

The dependency of the distance between the camera and the open spit remains as it is set in fig. 2. There is mentioned that the system will work at a distance between 5 and 10 meters, which produces a change in the size of the pixel concerning the real world. Also as the authors use optical zoom in the system, it increases this change in size.

The text mentions that the authors of reference [1] (Gao, et.al.) use a similar number of images as this paper under review, nonetheless Gao, et. al. worked in a semi-controlled environment. In the paper, under review, the authors claim that the system works in several weather conditions. Nothing is said about the test done about images taken in the rain, foggy conditions, poor light conditions as in the early morning or late afternoon, etc.

If the authors propose as validation of the system the opinion of some human experts, I propose to explain how the validation was done, that is the number of persons that were involved in the validation as also the protocol that was executed (number of measures, number of stones analyzed, repetition of the measures, etc.)

The question about graph 13 in my opinion is not answered satisfactorily.

Also, in spite that several typos were corrected, a lot of writing and redaction errors remain in the text. Here some of them: systems allows; system need to; training was randomly; system need to solve; for the each; of plants roots; of it work; angle 30 degree; scheme in Figures 3 assumes; the train one were mixed; the images ware randomly; are shown Figure 12; it was used 2 datasets were collected; It is also could be; …

Author Response

The paper is a reviewed version of a previous one already sent to the journal.

In general, after a careful read of the new version, the points to be solved that I have already raised in the first review, remain unresolved.

Q1. For example, the changing of the word algorithm to the word scheme does not resolve the unclarity of the steps. For example, the authors do not explain how the first stage works, how it should select automatically the possible stone candidates. It implies in my opinion the recognition of objects at different scales and positions.

Answer. The algorithm is added.

Q2. It is also mentioned that it used a CNN but nothing is said about its parameters, training, etc.

Answer. The information is in lines 231-243 (Note that the initial weights of the EfficientNet-B3 block were taken as weights obtained by its pre-training on the ImageNet dataset [26]. For the training routine, the Adam optimizer was taken [27] with a learning rate10−5and betas parameters(0.99, 0.99). The loss function was the binary-cross-entropy [10]. TheDice coefficient was used as a quality metric in the training for the model evaluation [10].)

Q3. Also, check the images in figure 3, they do not correspond between stages 1 and 2.

Answer. Images in figure 3 correspond to the renewed described algorithm.

Q4. The dependency of the distance between the camera and the open spit remains as it is set in fig. 2. There is mentioned that the system will work at a distance between 5 and 10 meters, which produces a change in the size of the pixel concerning the real world. Also as the authors use optical zoom in the system, it increases this change in size.

Answer. The information is in the new description of the algorithm.

Q5. The text mentions that the authors of reference [1] (Gao, et.al.) use a similar number of images as this paper under review, nonetheless Gao, et. al. worked in a semi-controlled environment. In the paper, under review, the authors claim that the system works in several weather conditions. Nothing is said about the test done about images taken in the rain, foggy conditions, poor light conditions as in the early morning or late afternoon, etc.

Answer. The limitations of the work are placed in the conclusion section. We made our experiments in comparable conditions with the human specialist work ones. They do not work in foggy conditions, poor light conditions as in the early morning or late afternoon.

Q6. If the authors propose as validation of the system the opinion of some human experts, I propose to explain how the validation was done, that is the number of persons that were involved in the validation as also the protocol that was executed (number of measures, number of stones analyzed, repetition of the measures, etc.)

Answer. The information is added in the Results Discussion Section.

Q7. The question about graph 13 in my opinion is not answered satisfactorily.

Answer. An additional explanation is added to the paper (Figures 13 and 14).

Q8. Also, in spite that several typos were corrected, a lot of writing and redaction errors remain in the text. Here some of them: systems allows; system need to; training was randomly; system need to solve; for the each; of plants roots; of it work; angle 30 degree; scheme in Figures 3 assumes; the train one were mixed; the images ware randomly; are shown Figure 12; it was used 2 datasets were collected; It is also could be; …

Answer. Thank you for your valuable notation, all mistakes that we find are corrected.

Reviewer 3 Report

Author must take the comments very carefully. Some of the simple corrections are made however, most of the points are not addressed properly.

Q1: Author responded it is improved however Abstract is still problematic. See below sentence e.g.,

‘The system work result is the overall asbestos content 14 (productivity)’

10 minutes process time is not real-time. In other words, how much data would be streamed in 10 Minutes? If your model is taking 10 minutes to make prediciotn/recognition, what about the data that can be streamed during that 10 minutes.

Its not even ‘near real-time’ in the context of computer vision and machine learning

From my previous version, these comments are not responded at all:

Line 32-43: Are there any existing studies supporting these statements?
Answer. This information was collected during the discussion of the problem with geological service specialists.

Author did not responded in detail. These statements must be supported by a solid reference:

‘The results of the evaluation made by the geological service (visual analysis) could 38 be very subjective. In most cases, these results differ from the laboratory one. 39 Moreover, the estimates made by different experts may vary significantly. As a rule, 40 geological service specialists can not describe formally their algorithms or criterion 41 of how they made estimations of asbestos content in the open pit. Such estimation 42 techniques are more an art than a science. Also, it is required to spend a lot of time 43 and cost to study such specialists.’

Line 103: Summarise the limitations of existing systems/research studies

Answer. Please, clarify, what kind of information do you mean.

Author should understand the structure of scientific paper.

Line 107: At the end of Para, address the major contributions of your work

Answer. Please, clarify, what kind of information do you mean.

Q5. In relation to Object detection and segmentation, I would recommend to cite the following recent studies:

https://www.sciencedirect.com/science/article/pii/S0957417420310289;

https://www.mdpi.com/1424-8220/20/13/3785

https://arc.aiaa.org/doi/abs/10.2514/1.I010570

Answer. Please, clarify, what kind of information do you mean.

These are the previous works in the domain of Object detection in real time/near real time that could be referenced in the Introduction.

Q10: Again, author changed the heading but not the contents. In scientific writing style, you need to follow proper algorithm format otherwise, you may use Flow Chart if you want instead of bullet points

Q11: Author did not respond to comment. E.g., no table is added summarising data and features. No ethical note is added in the paper. No respond to whether the data can be shared/published.

Random Shuffle:

Not responded. When you do random shuffle on this kind of data/images, possibility is the biased test outcomes hence, I made the comment. Instead, it should be either 1 image per object/stone or Subjective train/test.

Q12: If the validation is on 6 images only, this does not validate the model as data is too small. However, if the testing is performed on other similar datasets with more samples, then it should be clarify in the manuscript.

Q13, Q14: Statistical validation is still missing. Author did not respond to statements

Q16: Yet, % is not explained. Caption could be improved with details

Q17: Not addressed again. The comparison in this case should be statistical outcomes/performance comparison.

Author Response

Q1: Author responded it is improved however Abstract is still problematic. See below sentence e.g.,

‘The system works result is the asbestos content (productivity)’

Answer. The mistake is corrected.

Q2: Author mentioned it is resolved however, manuscript/Abstract still using the term ‘real time’ . Author needs to understand the meaning of real time regardless of manual time comparison. A real time MEANS live and not storing on the device and then processing offline to make decsions.10 minutes process time is not real-time. In other words, how much data would be streamed in 10 Minutes? If your model is taking 10 minutes to make prediciotn/recognition, what about the data that can be streamed during that 10 minutes. Its not even ‘near real-time’ in the context of computer vision and machine learning

Answer. The mistake is corrected. The term ‘real time’ replaced with ‘comparable with the human specialist work time ’

From my previous version, these comments are not responded at all:

Line 32-43: Are there any existing studies supporting these statements? Author did not responded in detail. These statements must be supported by a solid reference:‘The results of the evaluation made by the geological service (visual analysis) could 38 be very subjective. In most cases, these results differ from the laboratory one. 39 Moreover, the estimates made by different experts may vary significantly. As a rule, 40 geological service specialists can not describe formally their algorithms or criterion 41 of how they made estimations of asbestos content in the open pit. Such estimation 42 techniques are more an art than a science. Also, it is required to spend a lot of time 43 and cost to study such specialists.’

Answer. Additional reference to the corresponding work added (in Russian).

Luzin V.P. Complex investigation of the longitudinal fiber chrisolit-asbestos field (In Russian) [Kompleksnye issledovaniya prodol’novoloknistogo hrizotilasbesta bazhenovskogo mestorozhdeniya]

The paper is Devoted to the investigation of the laboratory-based methods of asbestos analysis and its content estimation. In addition, in our opinion, this is a priory expected, that the statement that human-based visual asbestos content estimation is subjective and more art than science.

Line 103: Summarise the limitations of existing systems/research studies. Author should understand the structure of scientific paper.

Answer. Thank you for your valuable notation, we add this information in the conclusion.

Line 107: At the end of Para, address the major contributions of your work

Answer. Thank you for your valuable notation, the contribution is added at the end of the introduction section.

Q5. In relation to Object detection and segmentation, I would recommend to cite the following recent studies: https://www.sciencedirect.com/science/article/pii/S0957417420310289; https://www.mdpi.com/1424-8220/20/13/3785 https://arc.aiaa.org/doi/abs/10.2514/1.I010570. These are the previous works in the domain of Object detection in real time/near real time that could be referenced in the Introduction.

Answer. No problem. We add these recommendations.

Answer. Algorithm is added

Q11: Author did not respond to comment. E.g., no table is added summarising data and features. No ethical note is added in the paper. No respond to whether the data can be shared/published.

Answer. The table summarizing data and features is added. On an ethical note - we do not know either we can publish our data or not, but we think that it is quite ethical to take images of stones in the open pit.

Random Shuffle: Not responded. When you do random shuffle on this kind of data/images, possibility is the biased test outcomes hence, I made the comment. Instead, it should be either 1 image per object/stone or Subjective train/test.

Answer. Random Shuffle was applied only during the pretraining dataset gathering.

Answer. Thank you for your note, we agree with it. However, this case is for rock chunks segmentation. Of course, we made some tests, but unfortunately, we did not count the number of samples. We obtain fine results on practice and it was not the case for study.

Q13, Q14: Statistical validation is still missing. Author did not respond to statements

Answer. The table is added.

Q16: Yet, % is not explained. The caption could be improved with details

Answer. The values in Table 1 show the estimation of asbestos content results (% it is the asbestos content relative to the rock chunk volume). The information is added to the text.

Q17: Not addressed again. The comparison, in this case, should be a statistical outcomes/performance comparison.

Answer. The comparison is added at the end of the paper.