Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Bi-LSTM Based Ensemble Algorithm for Prediction of Protein Secondary Structure

Appl. Sci. 2019, 9(17), 3538; https://doi.org/10.3390/app9173538

by Hailong Hu^1,2, Zhong Li^1,3,*, Arne Elofsson⁴ and Shangxin Xie³

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Appl. Sci. 2019, 9(17), 3538; https://doi.org/10.3390/app9173538

Submission received: 31 July 2019 / Revised: 17 August 2019 / Accepted: 20 August 2019 / Published: 28 August 2019

Round 1

Reviewer 1 Report

The manuscript describes a novel deep learning architecture and server for protein secondary structure prediction. The paper is generally well-written, although its English needs slight improvement, and the results are of interest to the scientific community. I have only a few remarks about the work.

In the last paragraph of section 1,t he authors claim that DSSP assigns three states of secondary structure, although in fact it assigns 8 states to residues. It is necessary to clarify how the authors used DSSP classification as a benchmark (see e.g. the section Converting Secondary Structure States to Three Classes in Chapter "Secondary Structure Assignnment" in the book "Structural Bioinformatics, 2nd ed. Gu and Bourne, 2007)
- On a related note, the authors might want to consider to cite and discuss results from the paper by Hanson et al. (2018) Improving Prediction of Protein Secondary Structure, Backbone Angles, Solvent Accessibility, and Contact Numbers by Using Predicted Contact Maps and an Ensemble of Recurrent and Residual Convolutional Neural Networks. Bioinformatics doi: 10.1093/bioinformatics/bty100
- I suggest that the authors provide more details on the five types of protein features used. E.g I guess that the authors used the random coil Ca chemical shifts for residues, but tis should be explicitly stated. Besides, the difference between features (2) and (3) is anything but trivial, and feature (5) would also benefit from a more detailed explanation (maybe an example). These could go as supplenetray material as are crucial for understanding the work and the differences between these features and those applied e.g. in hanson et al. above.
- It is interesting that the SOV measure is lowest for coil residues. If the authors can perform the analysis with reasonble effort, it might be interesting to see a bit more dtails about this, e.g. whether the length of coil regions affects prediction accuracy, whether the uncertainty of the termini of helices and sheets is a factor here or whether there is any dependence on protein structural class (e.g. CATH classification) or neighboring strucural elements.

Minor issues:
- The acronym BLEPSS is not defined in the paper.
- For Tables 1,2 and 3 the measure names should be consistent with those descrobed for the SOV and MCC measures (eg. CE, CH and CC are never called
such in the text)
- Please rephrase the sentence " [...] the sequence similarity between the training
set and the test sets is less than 30%." to be more specific 8I guess you intended to say that no two sequences in the two sets have a similarity over 30%. Please also clarify whether you refer to sequence identity here (if not, the exact method to measure similarity should be referenced).

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The paper present a Bi-LSTM based ensemble model for prediction of protein secondary structure. The obtained results demonstrated advantage of proposed approach over single sub-model used for this purpose. However, some issues require clarification to improve scientific soundness of this work.

What is meaning of {H, E, C}? Is it related somehow with α-Helix, β-strands, random coil? Please provide a more diagram of prosed model; this one shown in Fig 5 is not clear; Fig. 5 has missing number; What feature selection method was implemented in Section 3.1? How many features were finally fed to the input of Bi-LSTM network? Please rewrite this section to make it more understandable and readable; It seems from the results that pssm and hmm models outperform others like pssm_count, pps and wordembedding. Please explain why this happened, is it because of model properties or rather it is data depending? If so, maybe two best models (pssm and hmm) would lead to the similar prediction accuracy when combined in ensemble? Please provide classification accuracy for the same data obtained by other researchers using different methods.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

The paper is about a Bi-LSTM based ensemble method for the prediction of protein secondary structures. Methodology and results are presented in an orderly manner, supporting the case made by the authors. There are some few minor grammatical and syntactical mistakes (e.g. lines 156-192: verbs in sentences continuously move from present to past tense, creating confusion) all over the document, but other than that it is an interesting reading.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

I would like to thank Authors for addressing all issues raised in my review; the paper now is suitable for publication

Article Menu

A Bi-LSTM Based Ensemble Algorithm for Prediction of Protein Secondary Structure

Further Information

Guidelines

MDPI Initiatives

Follow MDPI