Next Article in Journal
Correlation of Kinematics and Kinetics of Changing Sagittal Plane Body Position during Landing and the Risk of Non-Contact Anterior Cruciate Ligament Injury
Previous Article in Journal
Recent Advances in Cellulose-Based Structures as the Wound-Healing Biomaterials: A Clinically Oriented Review
 
 
Article
Peer-Review Record

Deep Learning for Drug Discovery: A Study of Identifying High Efficacy Drug Compounds Using a Cascade Transfer Learning Approach

Appl. Sci. 2021, 11(17), 7772; https://doi.org/10.3390/app11177772
by Dylan Zhuang 1,2 and Ali K. Ibrahim 1,3,*
Reviewer 1:
Reviewer 2: Anonymous
Appl. Sci. 2021, 11(17), 7772; https://doi.org/10.3390/app11177772
Submission received: 6 July 2021 / Revised: 9 August 2021 / Accepted: 16 August 2021 / Published: 24 August 2021
(This article belongs to the Topic Medical Image Analysis)

Round 1

Reviewer 1 Report

see attachment

Comments for author File: Comments.docx

Author Response

Replies to Reviewers Comments and Suggestions

 

The authors wish to thank both reviewers for their comments and suggestions. Below are our replies to these comments.

 

The First Reviewer:

 

Comments and Suggestions for Authors

In the study entitled "Deep Learning for Drug Discovery: A Case Study of Identifying High Efficacy Drug Compounds forCovid-19" the authors utilize a deep learning model to identify possible drug candidates for the treatment of SARS-CoV-2 infection.

 

Although the used DL methods and obtained results are logical and promising, the article in the current form requires correction prior to acceptance. The section regarding in vitro data used as input for the model (cell cultures) needs more detailed description.

 

Reply: A more detailed description of the data and the validation method used in the experimental study has been provided in the revised manuscript. In a number of places, we added the discussion on using the siRNA dataset, a larger dataset with similar treatment conditions with the SARS-CoV-2 dataset, to train the classification model first, and then use the viral/mock cells in the the SARS-CoV-2 dataset to retrain the model. This approach significantly improves the performance of the classification model.

 

What exactly generated signal 1 and what generated signal 0 for training purposes?

 

Reply: A healthy cell is assigned a score of "0", and a viral cell with no treatment is assigned a score of "1". Therefore, mock cells are labeled "0," and active viral cells are labeled "1". A viral cell treated by a compound is scored between 0 and 1, indicating the degree of effectiveness of the compound in treating COVID-19. This discussion has been added to the revised paper.

 

Were the best hits (best drug candidates, remdesivir) in the training data set?

 

Reply: These are not given in the training dataset. We have only active viral (infected) cells in the training dataset or mock (control) cells. In the test dataset, we have cells treated with different compounds. This discussion has been added to the text, including a new figure (Figure 5).

Moreover, some expressions and sentences should be definitely rewritten and the following remarks need to be addressed:

 

  1. In the Abstract, what does "the active SARS-CoV-2 viral cells" mean? Please rewrite.

 

Reply: The team has been corrected in the revised manuscript.

 

  1. I do not think it is necessary to mention any company names like "companies such as Pfizer and Moderna" in a scientific paper. Revised only to contain their scientific classification

 

Reply: The suggestion has been adopted.

 

  1. Please rewrite "Studies have revealed some drugs with potential in treating Covid-19, such as darunavir, have shown potential in inhibiting entry and thus have potential as a treatment for Covid-19"

 

Reply: The statement is not correct, therefore it has been removed.

 

  1. Please rewrite: "When the effectiveness of a drug is rated manually, they must analyze the "before and after"

 

Reply: The sentence has been corrected.

 

  1. I suggest to unify writing of COVID-19 instead of "covid-19" or "Covid-19" what can be found in text

 

Reply: The suggestion has been adopted.

 

  1. What is the performance of the model in experiments performed by authors (AUC)? Please provide the AUC to the text?

 

Reply: The AUC of the model is 0.98, which has been provided in the paper with a RUC plot.

 

  1. "Another group of researchers proposed to use machine learning in order to identify possible lead compounds. However, Their study produced relatively low precision and recall scores in classifying SARS-CoV-2 cells [20]" – needs to be rewritten. I do not think that a direct assessment about someone's precision is appropriate in a scientific paper. 

 

Reply: We agree with the reviewer. The sentence has been revised.

 

  1. Please describe more clearly what is the Active-Viral Cell and a Mock SARS-CoV-2 Cell.

 

Reply:  An Active-Viral Cell defines a human renal cortical epithelial cell that has been infected with SARS-COV-2, and a Mock Cell defines a healthy HRCE cell. Both cells are fixed, stained, and left to develop for 96 hours, and then the images are recorded. The discussion has also been included in the revised paper. 

 

  1. I suggest reorganization of the Section 3. Transfer learning and adding some subsections.

 

Reply: The suggestion has been taken.

 

  1. There is now need to indicate directly whose model was worse than the author's model: "Our model also outperformed significantly those machine learning algorithms reported in [20]." It would be enough just to write that the performance of the model was higher in comparison to other published papers.

 

Reply: The suggestion has been taken.

 

  1. Please correct typos and editorial errors in the text.

 

Reply: The paper has been thoroughly edited. 

 

 

 

Author Response File: Author Response.pdf

Reviewer 2 Report

In the study entitled “Deep Learning for Drug Discovery: A Case Study of Identifying High Efficacy Drug Compounds forCovid-19” the authors utilize a deep learning model to identify possible drug candidates for the treatment of SARS-CoV-2 infection. Although the used DL methods and obtained results are logical and promising, the article in the current form requires correction prior to acceptance. The section regarding in vitro data used as input for the model (cell cultures) needs more detailed description. What exactly generated signal 1 and what generated signal 0 for training purpose. Were the best hits (best drug candidates, remdesivir) in the training data set? Moreover, some expressions and sentences should be definitely rewritten and the following remarks need to be addressed:

 

 

  1. In the Abstract, what does  “the active SARS-CoV-2 viral cells” mean? Please rewrite.
  2. I do not think it is necessary to mention any company names like “companies such as Pfizer and Moderna” in a scientific paper.
  3. Please rewrite “Studies have revealed some drugs with potential in treating Covid-19, such as darunavir, have shown potential in inhibiting entry and thus have potential as a treatment for Covid-19”
  4. Please rewrite: “When the effectiveness of a drug is rated manually,   they must analyze the “before and after””
  5. I suggest to unify writing of COVID-19 instead of “covid-19” or “Covid-19” what can be found in text.
  6. What is the performance of the model in experiments performed by authors (AUC) Please provide the AUC to the text?
  7. “Another group of researchers proposed to use machine learning in order to identify possible lead compounds. However, Their study produced relatively low precision and recall scores in classifying SARS-CoV-2 cells [20]”  – needs to be rewritten. I do not think that a direct assessment about someone’s precision is appropriate in a scientific paper.  
  8. Please describe more clearly what is the Active-Viral Cell and a Mock SARS-CoV-2 Cell
  9. I suggest reorganization of the Section 3. Transfer Learning and adding some subsections.
  10. There is now need to indicate directly whose model was worse than the author’s model: “Our model also outperformed significantly those machine learning algorithms reported in [20].” It would be enough just to write that the performance of the model was higher in comparison to other published. Please rewrite.
  11. Please correct typos and editorial errors in the text

Author Response

Replies to Reviewers Comments and Suggestions

 

The authors wish to thank both reviewers for their comments and suggestions. Below are our replies to these comments.

 

 

The Second Reviewer:

 

The authors used deep learning to identify potential new drugs for treating covid.

 

Abstract:

  1. Please clarify what Recursion means by using their full name "Recursion Pharmaceuticals" on your first mention.

 

Reply: The suggestion has been adopted in revising the paper.

 

  1. You have identified remdesivir as one of the promising compounds using deep-learning, but many medical studies have debated their effectiveness for covid-19. So what does this mean in terms of validating the compounds found via deep-learning are actually useful and also not what is already known? 

 

Reply: A significant advantage of deep learning is that once a dataset is ready, it may only take several days for a deep learning model to identify potential compounds for therapeutic usage. Thus, we do not intend to claim that we have identified a compound for therapeutic treatment of Covid-19, but instead, we propose a method that may be valuable in future outbreaks. We revised the paper to stress this point of view.

 

Introduction:

  1. In fact, Remdesivir is a drug that has been pitched very early during the pandemic as a possible treatment for covid, why have you not discussed them? Instead, you omitted discussion of remdesivir in the introduction and the text seems like you are claiming it to be a drug that your study has identified, which is wrong. See https://www.nejm.org/doi/full/10.1056/nejmoa2007764

 

Reply: We apologize for any impression that we were the ones who identified these compounds. We revised the statement to acknowledge this in a few places.

 

  1. Line 37: what does recursion mean? please use full name for recursion if first appeared.

 

Reply: The suggestion has been adopted.

 

  1. The issue with applying deep-learning, is of course, explainability of the AI (XAI). XAI is a critical focus for the medical industry, how does your methodology comply with the guidelines for XAI so that your work can be adopted in real applications?

 

Reply: The SoftMax output layer provides probability values for both classes, which is more interpretable than a binary result. Figure 7 is added in the Experimental Study Section to show that different compounds produce a different set of probability (or efficacy) scores, from which one can see which compound performs better consistently. For example, although both GS-441524 and Remdesivir have the same efficacy scores, GS-441524 provides more consistent performance, as shown by the histograms in Figure 7.

          Though the proposed transfer learning strategy is more explainable compared with a classification approach that outputs a hard binary decision, since it produces probability scores for candidate compounds in treating viral cells, the model as a whole is not yet transparent. For example, we cannot explain the properties of the features linked to either viral cells or mock cells. A way to improve the model's explainability is to link the features extracted by the model to biomarkers of mock and viral cells with a functional relationship which may also be realized by a data-driven model.

          The above discussion has been added to the conclusion section of the paper.

 

  1. Line 49: you are discussing "disease" but covid is a "virus". Although the processes are similar, it is questionable to be not discussing drug selection processes specifically for anti-viral treatments.

 

Reply: Following the suggestion, we have corrected the text in a number of places in the revised paper.

 

  1. Line 51: is druggability even a word?

 

Reply: Yes it is. It describes the measured and or predicted efficacy of a drug on a target.

 

  1. Line 66-67: this looks like a general description of supervised learning: please expand on the sentence beyond just "labelling", which is the "classification" aspect of supervised learning. In fact, you mentioned you focus on "regression". So the logical link is missing.

 

Reply: Indeed this is a classification problem. We removed "regression".

 

  1. Line 62-82: If the paper only uses deep learning, the introduction should discuss techniques/details specific to deep learning. Throughout the paragraph discussing deep learning, you are also mentioning methods/techniques for machine learning, which may not be appropriate for deep learning. Please remove the ambiguity.

 

Reply: The suggestion has been adopted.

 

  1. Line 87-91: contradict with abstract. Please modify the abstract to at least say that you "also" identified Remdesivir and GS-441524 for clarification

 

Reply: The correction has been made.

 

  1. More intro on siRNA since you are using it for your transfer learning. Tell us a few sentences more about what it is, why you use it, and citations of previous work on this dataset.

 

Reply: The suggestion has been adopted.

 

Dataset:

 

  1. Line 107, unclear if its April 2020. Cells … or April. 2020 cells … I assume is the prior but missing period.

 

Reply: The correction has been made.

 

  1. Line 126-127: please use specific numbers and not "lower" or "greater".

 

Reply: The suggestion has been adopted.

 

 

Transfer Learning:

 

  1. Line 156: can you explain what "shorter connections" mean? What is a "standard connection" in CNN, shorter compared to what?

 

Reply: It was a mistake. The correction has been made.

 

  1. Line 187: Typo "Bacth", Line 197: "Remiaing", Line 199: "SoftwMax", Line 201: "Peretrained"… Who wrote this paragraph???

 

Reply: Sorry. The corrections have been made.

         

  1. Line 203 -204: I like this, probably the most succinct way of describing sensitivity and specificity I've seen.

 

Reply: Thanks.

 

Experimental Study and Conclusion:

 

  1. How did you do the hyper-parameter optimization?

 

Reply: In our study, the number of neurons in the added layer was mainly constrained by the input and output sizes. Furthermore, the number of added layers in a transferred model was typically limited to 3 to 4. Finally, we used the Matlab learning algorithm, adam optimizer, for model training. These details have been added to the text.

 

  1. The results are unclear. You have a network that does binary classification of "virus/active" and "healthy", but somehow that ties to drugs used.

 

Reply: For training, a healthy cell is assigned a score of "0", and a viral cell with no treatment is assigned a score of "1". Therefore, mock cells are labeled "0," and active viral cells are labeled "1". For testing, a viral cell treated by a compound is scored between 0 and 1, indicating the degree of effectiveness of the compound in treating COVID-19. This is made possible by using a SoftMax layer as the last layer in the classification model.

 

  1. These two sections also need to be expanded to have more impactful results. Applying deep learning itself to something is not enough for a research paper

 

Reply: We expanded Experimental Study and Conclusion sections to provide more descriptions on the novelty of the proposed approach. For example, the following text is added to the Conclusion section:

 

One novelty of the proposed approach is the way the transfer learning strategy is implemented. We first trained the DenseNet, a pre-trained deep neural network, with the siRNA dataset, larger than the SARS-CoV-2 dataset with similar characteristics. The resulting model is then used to extract features from mock cells and active viral cells provided in the SARS-CoV-2 dataset. Thus, we used transfer learning twice from the ImageNet to the siRNA image dataset and then from the siRNA dataset to the SARS-CoV-2 dataset. This cascade transfer learning approach produced superior results for the case study. Another novelty of the approach is using a SoftMax layer as the output layer for the classifier, which produces probability (equivalently efficacy) scores for classifying viral cells treated with different compounds, which allows users to analyze test results with a statistical method. Experimental results demonstrated that the model was able to identify highly promising compounds, which were consistent with those identified in the literature and the drugs that are now under clinical trials.

 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Major concerns addressed, good for publication

Back to TopTop