Discovering the Ultimate Limits of Protein Secondary Structure Prediction
Abstract
:1. Introduction
2. Materials and Methods
2.1. Experimental Datasets
2.1.1. Datasets Prepared from the Protein Data Bank
2.1.2. Homolog Data Obtained from the SCOP Database
2.1.3. The Source of PSSM Reference Sequences
2.2. Assignment of Protein Secondary Structure
2.3. Experimental Procedures
2.3.1. Estimating SSP Limits by Structural Homologs Identified from the PDB Datasets
2.3.2. Estimating SSP Limits by Structural Homologs Determined by the SCOP Database
2.4. Applied Sequence Alignment Methods
2.5. Applied Structure Alignment Methods
2.6. Applied Secondary Structure Prediction Methods
2.7. Implemented Secondary Structure Prediction Models
2.8. Computation of Accuracy Measures
2.8.1. The Q Accuracy Measure and the Percent Secondary Structural Consistency between Homologous Protein Pairs
2.8.2. The SOV Measure
2.8.3. The Weighted Average of Accuracy Measures
3. Results
3.1. Experiments Based on PDB Datasets
3.2. Limit of SSP Accuracy Estimated by Protein Sequence Alignments between SCOP-Determined Structural Homologs
3.3. Limit of SSP Accuracy Estimated by Protein Structure Alignments between SCOP-Determined Structural Homologs
3.4. Limit of Accuracy for Various Protein Structure Classes and Sizes
3.5. The Lower Bound of Secondary Structure Prediction Accuracy
4. Discussion
4.1. Difference between Using Homologs Identified by Programs and Those Determined by SCOP
4.2. The Different Meanings between the Upper Limits of SSP Accuracy Estimated by Sequence and Structural Alignments
- Accuracies estimated by structure alignment should be the ultimate theoretical upper limit of SSP because the way they got computed strictly followed the logic that structural homologs’ structural differences were measured by structural alignments.
- Accuracies estimated by sequence alignment is the practical upper limit of SSP since their computation folloTable Swed the procedure of the current SSP methodology.
4.3. Information Revealed by the Estimated Limits of SSP under Different Sequence Identity Cutoffs
4.4. On the SSP Accuracies for Proteins of Different Structural Classes or Sizes and Residues with Different Physical Properties
4.5. Light Shed from the Discovery of This Study
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
References
- Yang, Y.; Gao, J.; Wang, J.; Heffernan, R.; Hanson, J.; Paliwal, K.; Zhou, Y. Sixty-five years of the long march in protein secondary structure prediction: The final stretch? Brief. Bioinform. 2018, 19, 482–494. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, B.; Krishnan, V.G.; Mort, M.E.; Xin, F.; Kamati, K.K.; Cooper, D.N.; Mooney, S.D.; Radivojac, P. Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics 2009, 25, 2744–2750. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Folkman, L.; Yang, Y.; Li, Z.; Stantic, B.; Sattar, A.; Mort, M.; Cooper, D.N.; Liu, Y.; Zhou, Y. DDIG-in: Detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels. Bioinformatics 2015, 31, 1599–1606. [Google Scholar] [CrossRef] [PubMed]
- Zhao, H.; Yang, Y.; Lin, H.; Zhang, X.; Mort, M.; Cooper, D.N.; Liu, Y.; Zhou, Y. DDIG-in: Discriminating between disease-associated and neutral non-frameshifting micro-indels. Genome Biol. 2013, 14, R23. [Google Scholar] [CrossRef] [Green Version]
- Do, C.B.; Mahabhashyam, M.S.; Brudno, M.; Batzoglou, S. ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 2005, 15, 330–340. [Google Scholar] [CrossRef] [Green Version]
- Pei, J.M.; Kim, B.H.; Grishin, N.V. PROMALS3D: A tool for multiple protein sequence and structure alignments. Nucleic Acids Res. 2008, 36, 2295–2300. [Google Scholar] [CrossRef]
- Soding, J.; Biegert, A.; Lupas, A.N. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 2005, 33, W244–W248. [Google Scholar] [CrossRef] [Green Version]
- Cuthbertson, L.; Mainprize, I.L.; Naismith, J.H.; Whitfield, C. Pivotal roles of the outer membrane polysaccharide export and polysaccharide copolymerase protein families in export of extracellular polysaccharides in gram-negative bacteria. Microbiol. Mol. Biol. Rev. 2009, 73, 155–177. [Google Scholar] [CrossRef] [Green Version]
- Ambrosi, C.; Gassmann, O.; Pranskevich, J.N.; Boassa, D.; Smock, A.; Wang, J.; Dahl, G.; Steinem, C.; Sosinsky, G.E. Pannexin1 and Pannexin2 channels show quaternary similarities to connexons and different oligomerization numbers from each other. J. Biol. Chem. 2010, 285, 24420–24431. [Google Scholar] [CrossRef] [Green Version]
- Makarova, K.S.; Aravind, L.; Wolf, Y.I.; Koonin, E.V. Unification of Cas protein families and a simple scenario for the origin and evolution of CRISPR-Cas systems. Biol. Direct 2011, 6, 38. [Google Scholar] [CrossRef] [Green Version]
- Kifer, I.; Nussinov, R.; Wolfson, H.J. Constructing templates for protein structure prediction by simulation of protein folding pathways. Proteins 2008, 73, 380–394. [Google Scholar] [CrossRef] [Green Version]
- Nalini, V.; Bax, B.; Driessen, H.; Moss, D.S.; Lindley, P.F.; Slingsby, C. Close packing of an oligomeric eye lens beta-crystallin induces loss of symmetry and ordering of sequence extensions. J. Mol. Biol. 1994, 236, 1250–1258. [Google Scholar] [CrossRef]
- Song, J.N.; Tan, H.; Perry, A.J.; Akutsu, T.; Webb, G.I.; Whisstock, J.C.; Pike, R.N. PROSPER: An Integrated Feature-Based Tool for Predicting Protease Substrate Cleavage Sites. PLoS ONE 2012, 7, e50300. [Google Scholar] [CrossRef] [Green Version]
- Song, J.N.; Tan, H.; Shen, H.B.; Mahmood, K.; Boyd, S.E.; Webb, G.I.; Akutsu, T.; Whisstock, J.C. Cascleave: Towards more accurate prediction of caspase substrate cleavage sites. Bioinformatics 2010, 26, 752–760. [Google Scholar] [CrossRef] [Green Version]
- Iwakura, M.; Nakamura, T.; Yamane, C.; Maki, K. Systematic circular permutation of an entire protein reveals essential folding elements. Nat. Struct. Biol. 2000, 7, 580–585. [Google Scholar] [CrossRef]
- Wright, G.; Basak, A.K.; Wieligmann, K.; Mayr, E.M.; Slingsby, C. Circular permutation of betaB2-crystallin changes the hierarchy of domain assembly. Protein Sci 1998, 7, 1280–1285. [Google Scholar] [CrossRef]
- Fiser, A. Template-based protein structure modeling. Methods Mol. Biol. 2010, 673, 73–94. [Google Scholar] [CrossRef] [Green Version]
- Madhusudhan, M.S.; Marti-Renom, M.A.; Sanchez, R.; Sali, A. Variable gap penalty for protein sequence-structure alignment. Protein Eng. Des. Sel. 2006, 19, 129–133. [Google Scholar] [CrossRef] [Green Version]
- Vakser, I.A. Protein-protein docking: From interaction to interactome. Biophys. J. 2014, 107, 1785–1793. [Google Scholar] [CrossRef] [Green Version]
- Lee, Y.Z.; Lo, W.C.; Sue, S.C. Computational Prediction of New Intein Split Sites. Methods Mol. Biol. 2017, 1495, 259–268. [Google Scholar] [CrossRef]
- Lo, W.C.; Wang, L.F.; Liu, Y.Y.; Dai, T.; Hwang, J.K.; Lyu, P.C. CPred: A web server for predicting viable circular permutations in proteins. Nucleic Acids Res. 2012, 40, W232–W237. [Google Scholar] [CrossRef] [Green Version]
- Lo, W.C.; Dai, T.; Liu, Y.Y.; Wang, L.F.; Hwang, J.K.; Lyu, P.C. Deciphering the preference and predicting the viability of circular permutations in proteins. PLoS ONE 2012, 7, e31791. [Google Scholar] [CrossRef] [Green Version]
- Lee, Y.T.; Su, T.H.; Lo, W.C.; Lyu, P.C.; Sue, S.C. Circular permutation prediction reveals a viable backbone disconnection for split proteins: An approach in identifying a new functional split intein. PLoS ONE 2012, 7, e43820. [Google Scholar] [CrossRef] [Green Version]
- Pellequer, J.L.; Westhof, E.; Vanregenmortel, M.H.V. Correlation between the Location of Antigenic Sites and the Prediction of Turns in Proteins. Immunol. Lett. 1993, 36, 83–100. [Google Scholar] [CrossRef]
- Gao, J.; Faraggi, E.; Zhou, Y.; Ruan, J.; Kurgan, L. BEST: Improved prediction of B-cell epitopes from antigen sequences. PLoS ONE 2012, 7, e40104. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.; Liu, X.; Zhu, Y.; Zhou, X.; Cao, C.; Hu, X.; Ma, H.; Wen, H.; Ma, X.; Ding, J.B. Bioinformatic prediction of epitopes in the Emy162 antigen of Echinococcus multilocularis. Exp. Ther. Med. 2013, 6, 335–340. [Google Scholar] [CrossRef] [Green Version]
- Zhou, H.X.; Shan, Y. Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 2001, 44, 336–343. [Google Scholar] [CrossRef]
- Mukherjee, S.; Zhang, Y. Protein-Protein Complex Structure Predictions by Multimeric Threading and Template Recombination. Structure 2011, 19, 955–966. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ward, J.J.; Sodhi, J.S.; McGuffin, L.J.; Buxton, B.F.; Jones, D.T. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol. 2004, 337, 635–645. [Google Scholar] [CrossRef]
- Deng, X.; Eickholt, J.; Cheng, J.L. PreDisorder: Ab initio sequence-based prediction of protein disordered regions. BMC Bioinform. 2009, 10, 436. [Google Scholar] [CrossRef] [Green Version]
- Xue, B.; Dunbrack, R.L.; Williams, R.W.; Dunker, A.K.; Uversky, V.N. PONDR-FIT: A meta-predictor of intrinsically disordered amino acids. Biochim. Biophys. Acta 2010, 1804, 996–1010. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, T.; Faraggi, E.; Xue, B.; Dunker, A.K.; Uversky, V.N.; Zhou, Y.Q. SPINE-D: Accurate Prediction of Short and Long Disordered Regions by a Single Neural-Network Based Method. J. Biomol. Struct. Dyn. 2012, 29, 799–813. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tardif, M.; Atteia, A.; Specht, M.; Cogne, G.; Rolland, N.; Brugiere, S.; Hippler, M.; Ferro, M.; Bruley, C.; Peltier, G.; et al. PredAlgo: A New Subcellular Localization Prediction Tool Dedicated to Green Algae. Mol. Biol. Evo.L 2012, 29, 3625–3639. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yu, C.S.; Chen, Y.C.; Lu, C.H.; Hwang, J.K. Prediction of protein subcellular localization. Proteins 2006, 64, 643–651. [Google Scholar] [CrossRef]
- Li, Z.X.; Yang, Y.D.; Zhan, J.; Dai, L.; Zhou, Y.Q. Energy Functions in De Novo Protein Design: Current Challenges and Future Prospects. Annu Rev. Biophys. 2013, 42, 315–335. [Google Scholar] [CrossRef] [Green Version]
- Xiong, P.; Wang, M.; Zhou, X.; Zhang, T.; Zhang, J.; Chen, Q.; Liu, H. Protein design with a comprehensive statistical energy function and boosted by experimental selection for foldability. Nat. Commun. 2014, 5, 5330. [Google Scholar] [CrossRef]
- Gebhard, L.G.; Risso, V.A.; Santos, J.; Ferreyra, R.G.; Noguera, M.E.; Ermacora, M.R. Mapping the distribution of conformational information throughout a protein sequence. J. Mol. Biol. 2006, 358, 280–288. [Google Scholar] [CrossRef]
- Michalsky, E.; Goede, A.; Preissner, R. Loops In Proteins (LIP)—A comprehensive loop database for homology modelling. Protein Eng. 2003, 16, 979–985. [Google Scholar] [CrossRef]
- Hu, X.; Wang, H.; Ke, H.; Kuhlman, B. High-resolution design of a protein loop. Proc. Natl. Acad. Sci. USA 2007, 104, 17668–17673. [Google Scholar] [CrossRef] [Green Version]
- Kabsch, W.; Sander, C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22, 2577–2637. [Google Scholar] [CrossRef]
- Pauling, L.; Corey, R.B. Configurations of Polypeptide Chains with Favored Orientations Around Single Bonds: Two New Pleated Sheets. Proc. Natl. Acad. Sci. USA 1951, 37, 729–740. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pauling, L.; Corey, R.B.; Branson, H.R. The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. Proc. Natl. Acad. Sci. USA 1951, 37, 205–211. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chou, P.Y.; Fasman, G.D. Prediction of protein conformation. Biochemistry 1974, 13, 222–245. [Google Scholar] [CrossRef] [PubMed]
- Garnier, J.; Osguthorpe, D.J.; Robson, B. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 1978, 120, 97–120. [Google Scholar] [CrossRef]
- Rost, B.; Sander, C. Improved Prediction of Protein Secondary Structure by Use of Sequence Profiles and Neural Networks. Proc. Natl. Acad. Sci. USA 1993, 90, 7558–7562. [Google Scholar] [CrossRef] [Green Version]
- Altschul, S.F.; Madden, T.L.; Schaffer, A.A.; Zhang, J.H.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [Google Scholar] [CrossRef] [Green Version]
- Jones, D.T. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 1999, 292, 195–202. [Google Scholar] [CrossRef] [Green Version]
- Rost, B. Review: Protein secondary structure prediction continues to rise. J. Struct. Biol. 2001, 134, 204–218. [Google Scholar] [CrossRef]
- Pollastri, G.; Przybylski, D.; Rost, B.; Baldi, P. Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 2002, 47, 228–235. [Google Scholar] [CrossRef]
- Dor, O.; Zhou, Y. Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training. Proteins 2007, 66, 838–845. [Google Scholar] [CrossRef]
- Cole, C.; Barber, J.D.; Barton, G.J. The Jpred 3 secondary structure prediction server. Nucleic Acids Res. 2008, 36, W197–W201. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mirabello, C.; Pollastri, G. Porter, PaleAle 4.0: High-accuracy prediction of protein secondary structure and relative solvent accessibility. Bioinformatics 2013, 29, 2056–2058. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Heffernan, R.; Yang, Y.; Paliwal, K.; Zhou, Y. Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 2017, 33, 2842–2849. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hanson, J.; Paliwal, K.; Litfin, T.; Yang, Y.; Zhou, Y. Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks. Bioinformatics 2019, 35, 2403–2410. [Google Scholar] [CrossRef]
- Wang, S.; Peng, J.; Ma, J.Z.; Xu, J.B. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields. Sci. Rep.-UK 2016, 6, 18962. [Google Scholar] [CrossRef] [Green Version]
- Torrisi, M.; Kaleel, M.; Pollastri, G. Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction. Sci. Rep. 2019, 9, 12374. [Google Scholar] [CrossRef] [Green Version]
- Heffernan, R.; Paliwal, K.; Lyons, J.; Singh, J.; Yang, Y.; Zhou, Y. Single-sequence-based prediction of protein secondary structures and solvent accessibility by deep whole-sequence learning. J. Comput. Chem. 2018, 39, 2210–2216. [Google Scholar] [CrossRef]
- Fang, C.; Shang, Y.; Xu, D. MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction. Proteins 2018, 86, 592–598. [Google Scholar] [CrossRef]
- Klausen, M.S.; Jespersen, M.C.; Nielsen, H.; Jensen, K.K.; Jurtz, V.I.; Sonderby, C.K.; Sommer, M.O.A.; Winther, O.; Nielsen, M.; Petersen, B.; et al. NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning. Proteins 2019, 87, 520–527. [Google Scholar] [CrossRef] [Green Version]
- Zhou, J.; Wang, H.; Zhao, Z.; Xu, R.; Lu, Q. CNNH_PSS: Protein 8-class secondary structure prediction by convolutional neural network with highway. BMC Bioinform. 2018, 19, 60. [Google Scholar] [CrossRef] [Green Version]
- Levin, J.M.; Pascarella, S.; Argos, P.; Garnier, J. Quantification of secondary structure prediction improvement using multiple alignments. Protein Eng. 1993, 6, 849–854. [Google Scholar] [CrossRef]
- Rost, B.; Sander, C.; Schneider, R. Redefining the goals of protein secondary structure prediction. J. Mol. Biol. 1994, 235, 13–26. [Google Scholar] [CrossRef]
- Zhang, W.; Dunker, A.K.; Zhou, Y. Assessing secondary structure assignment of protein structures by using pairwise sequence-alignment benchmarks. Proteins 2008, 71, 61–67. [Google Scholar] [CrossRef]
- Kuziemko, A.; Honig, B.; Petrey, D. Using structure to explore the sequence alignment space of remote homologs. PLoS Comput. Biol. 2011, 7, e1002175. [Google Scholar] [CrossRef]
- Pascarella, S.; Argos, P. A data bank merging related protein structures and sequences. Protein Eng. 1992, 5, 121–137. [Google Scholar] [CrossRef]
- Zemla, A.; Venclovas, C.; Fidelis, K.; Rost, B. A modified definition of Sov, a segment-based measure for protein secondary structure prediction assessment. Proteins 1999, 34, 220–223. [Google Scholar] [CrossRef]
- Murzin, A.G.; Brenner, S.E.; Hubbard, T.; Chothia, C. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995, 247, 536–540. [Google Scholar] [CrossRef]
- Madej, T.; Lanczycki, C.J.; Zhang, D.; Thiessen, P.A.; Geer, R.C.; Marchler-Bauer, A.; Bryant, S.H. MMDB and VAST+: Tracking structural similarities between macromolecular complexes. Nucleic Acids Res. 2014, 42, D297–D303. [Google Scholar] [CrossRef] [Green Version]
- NCBI nr-PDB: Non-Redundant PDB Data Set for VAST. Available online: https://www.ncbi.nlm.nih.gov/Structure/VAST/vast.shtml (accessed on 21 September 2021).
- Fu, L.; Niu, B.; Zhu, Z.; Wu, S.; Li, W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 2012, 28, 3150–3152. [Google Scholar] [CrossRef]
- Edgar, R.C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010, 26, 2460–2461. [Google Scholar] [CrossRef] [Green Version]
- Steinegger, M.; Soding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 2017, 35, 1026–1028. [Google Scholar] [CrossRef] [Green Version]
- Fox, N.K.; Brenner, S.E.; Chandonia, J.M. SCOPe: Structural Classification of Proteins--extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res. 2014, 42, D304–D309. [Google Scholar] [CrossRef]
- UniProt, C. Activities at the Universal Protein Resource (UniProt). Nucleic Acids Res. 2014, 42, D191–D198. [Google Scholar] [CrossRef] [Green Version]
- Touw, W.G.; Baakman, C.; Black, J.; te Beek, T.A.; Krieger, E.; Joosten, R.P.; Vriend, G. A series of PDB-related databanks for everyday needs. Nucleic Acids Res. 2015, 43, D364–D368. [Google Scholar] [CrossRef]
- Zhu, J.; Weng, Z. FAST: A novel protein structure alignment algorithm. Proteins 2005, 58, 618–627. [Google Scholar] [CrossRef] [Green Version]
- Lo Conte, L.; Ailey, B.; Hubbard, T.J.; Brenner, S.E.; Murzin, A.G.; Chothia, C. SCOP: A structural classification of proteins database. Nucleic Acids Res. 2000, 28, 257–259. [Google Scholar] [CrossRef] [Green Version]
- Lo, W.C.; Lee, C.Y.; Lee, C.C.; Lyu, P.C. iSARST: An integrated SARST web server for rapid protein structural similarity searches. Nucleic Acids Res. 2009, 37, W545–W551. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar] [CrossRef]
- Rice, P.; Longden, I.; Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. TIG 2000, 16, 276–277. [Google Scholar] [CrossRef]
- Myers, E.W.; Miller, W. Optimal alignments in linear space. Comput. Appl. Biosci. 1988, 4, 11–17. [Google Scholar] [CrossRef] [PubMed]
- Smith, T.F.; Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 1981, 147, 195–197. [Google Scholar] [CrossRef]
- Needleman, S.B.; Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970, 48, 443–453. [Google Scholar] [CrossRef]
- Zhang, Y.; Skolnick, J. TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005, 33, 2302–2309. [Google Scholar] [CrossRef]
- Lo, W.C.; Huang, P.J.; Chang, C.H.; Lyu, P.C. Protein structural similarity search by Ramachandran codes. BMC Bioinform. 2007, 8, 307. [Google Scholar] [CrossRef] [Green Version]
- Faraggi, E.; Zhang, T.; Yang, Y.; Kurgan, L.; Zhou, Y. SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles. J. Comput. Chem. 2012, 33, 259–267. [Google Scholar] [CrossRef] [Green Version]
- Yaseen, A.; Li, Y. Context-based features enhance protein secondary structure prediction accuracy. J. Chem. Inf. Model. 2014, 54, 992–1002. [Google Scholar] [CrossRef]
- Heffernan, R.; Paliwal, K.; Lyons, J.; Dehzangi, A.; Sharma, A.; Wang, J.; Sattar, A.; Yang, Y.; Zhou, Y. Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci. Rep. 2015, 5, 11476. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Zhao, F.; Peng, J.; Xu, J. Protein 8-class secondary structure prediction using conditional neural fields. Proteomics 2011, 11, 3786–3792. [Google Scholar] [CrossRef] [Green Version]
- Magnan, C.N.; Baldi, P. SSpro/ACCpro 5: Almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity. Bioinformatics 2014, 30, 2592–2597. [Google Scholar] [CrossRef] [Green Version]
- Moult, J.; Fidelis, K.; Kryshtafovych, A.; Schwede, T.; Tramontano, A. Critical assessment of methods of protein structure prediction (CASP)-Round XII. Proteins 2018, 86 (Suppl. 1), 7–15. [Google Scholar] [CrossRef]
- Zhou, J.; Troyanskaya, O.G. Deep supervised and convolutional generative stochastic network for protein secondary structure prediction. In Proceedings of the 31st International Conference on International Conference on Machine Learning, Beijing, China, 21–26 June 2014; Volume 32, pp. 745–753. [Google Scholar]
- Zhang, B.; Li, J.; Lu, Q. Prediction of 8-state protein secondary structures by a novel deep learning architecture. BMC Bioinform. 2018, 19, 1–13. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chen, T.R.; Juan, S.H.; Huang, Y.W.; Lin, Y.C.; Lo, W.C. A secondary structure-based position-specific scoring matrix applied to the improvement in protein secondary structure prediction. PLoS ONE 2021, 16, e0255076. [Google Scholar] [CrossRef]
- Chen, T.R.; Lo, C.H.; Juan, S.H.; Lo, W.C. The influence of dataset homology and a rigorous evaluation strategy on protein secondary structure prediction. PLoS ONE 2021, 16, e0254555. [Google Scholar] [CrossRef]
- Wilson, C.A.; Kreychman, J.; Gerstein, M. Assessing annotation transfer for genomics: Quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J. Mol. Biol. 2000, 297, 233–249. [Google Scholar] [CrossRef] [Green Version]
- Hobohm, U.; Scharf, M.; Schneider, R.; Sander, C. Selection of representative protein data sets. Protein Sci. 1992, 1, 409–417. [Google Scholar] [CrossRef] [Green Version]
- Hubbard, S.J.; Thornton, J.M. NACCESS V2.1.1. Computer Program, Department of Biochemistry and Molecular Biology, University College London. 1993. Available online: http://www.bioinf.manchester.ac.uk/naccess/ (accessed on 21 September 2021).
- Vander Meersche, Y.; Cretin, G.; de Brevern, A.G.; Gelly, J.C.; Galochkina, T. MEDUSA: Prediction of Protein Flexibility from Sequence. J. Mol. Biol. 2021, 433, 166882. [Google Scholar] [CrossRef]
- de Brevern, A.G. Impact of protein dynamics on secondary structure prediction. Biochimie 2020, 179, 14–22. [Google Scholar] [CrossRef]
- Shih, C.H.; Chang, C.M.; Lin, Y.S.; Lo, W.C.; Hwang, J.K. Evolutionary information hidden in a single protein structure. Proteins 2012, 80, 1647–1657. [Google Scholar] [CrossRef]
- Joseph, A.P.; Agarwal, G.; Mahajan, S.; Gelly, J.C.; Swapna, L.S.; Offmann, B.; Cadet, F.; Bornot, A.; Tyagi, M.; Valadie, H.; et al. A short survey on protein blocks. Biophys. Rev. 2010, 2, 137–147. [Google Scholar] [CrossRef]
SSE Set | Method | Datasets: TS115 vs. UniRef90-2015 | Datasets: CASP12 vs. UniRef90-2015 | ||
---|---|---|---|---|---|
Three-state | Measure | Q3 (reported) | Q3 | Q3 (reported) | Q3 |
Scorpion 1 | 0.817 | 0.815 | 0.805 | 0.823 | |
Spider2 1 | 0.819 | 0.819 | 0.798 | 0.812 | |
SpineX 1 | 0.801 | 0.800 | 0.769 | 0.783 | |
Psipred 1 | 0.802 | 0.801 | 0.780 | 0.783 | |
DeepCNF 1 | 0.823 | 0.819 | 0.821 | 0.828 | |
RaptorX 2 | 0.812 | 0.807 | 0.791 | 0.793 | |
SSpro82 | 0.795 | 0.788 | 0.776 | 0.779 | |
Eight-state | Measure | Q8 (reported) | Q8 | Q8 (reported) | Q8 |
DeepCNF 1 | 0.720 | 0.703 | 0.730 | 0.728 | |
RaptorX 2 | 0.697 | 0.698 | 0.651 | 0.694 | |
SSpro8 1 | 0.680 | 0.671 | 0.690 | 0.675 |
Identity (%) | Yang et al. [1] | PSI-BLAST (Avg ± Std) | FAST (Avg ± Std) | Identity Threshold (%) | Yang et al. [1] | PSI-BLAST (Avg ± Std) | FAST (Avg ± Std) |
---|---|---|---|---|---|---|---|
100 | 96.8 | 97.1 ± 0.3 | 97.1 ± 0.3 | 100 | 96.8 | 97.1 ± 0.3 | 97.1 ± 0.3 |
90‒100 | 96.2 | 95.6 ± 0.4 | 95.7 ± 0.4 | ≥90 | 96.8 | 96.1 ± 0.3 | 96.2 ± 0.4 |
80‒90 | 95.1 | 93.5 ± 1.3 | 94.0 ± 1.2 | ≥80 | 96.6 | 95.6 ± 0.5 | 95.8 ± 0.5 |
70‒80 | 95.8 | 92.5 ± 1.6 | 93.2 ± 1.3 | ≥70 | 96.5 | 95.4 ± 0.6 | 95.6 ± 0.5 |
60‒70 | 94.7 | 92.9 ± 0.8 | 93.4 ± 0.7 | ≥60 | 96.3 | 95.2 ± 0.6 | 95.5 ± 0.6 |
55‒60 | 94.2 | 92.6 ± 0.7 | 93.1 ± 0.8 | ≥55 | 96.2 | 95.0 ± 0.6 | 95.3 ± 0.6 |
50‒55 | 93.1 | 92.3 ± 0.6 | 92.8 ± 0.6 | ≥50 | 96.0 | 94.8 ± 0.6 | 95.1 ± 0.6 |
45‒50 | 92.6 | 91.8 ± 0.9 | 92.6 ± 0.8 | ≥45 | 95.7 | 94.6 ± 0.6 | 94.9 ± 0.6 |
40‒45 | 91.0 | 90.1 ± 1.3 | 91.2 ± 1.3 | ≥40 | 95.1 | 94.1 ± 0.7 | 94.5 ± 0.7 |
35‒40 | 89.9 | 89.6 ± 0.8 | 90.8 ± 0.9 | ≥35 | 94.6 | 93.6 ± 0.7 | 94.1 ± 0.7 |
30‒35 | 87.7 | 87.8 ± 0.8 | 90.2 ± 0.4 | ≥30 | 93.4 | 92.4 ± 0.7 | 93.3 ± 0.6 |
25‒30 | 84.2 | 84.2 ± 0.5 | 89.0 ± 0.4 | ≥25 | 90.7 | 89.1 ± 0.6 | 91.6 ± 0.5 |
20‒25 | 81.1 | 81.4 ± 0.3 | 88.2 ± 0.2 | ≥20 | 87.0 | 86.0 ± 0.5 | 90.2 ± 0.4 |
15‒20 | 75.9 | 77.5 ± 0.4 | 87.2 ± 0.2 | ≥15 | 84.0 | 83.8 ± 0.4 | 89.4 ± 0.3 |
<10‒15 | 71.4 | 72.1 ± 0.9 | 85.7 ± 0.3 | ≥10 | 83.7 | 80.8 ± 0.5 | 88.5 ± 0.3 |
Identity (%) | PSI-BLAST (Avg ± Std) | FAST (Avg ± Std) | Identity Threshold (%) | PSI-BLAST (Avg ± Std) | FAST (Avg ± Std) |
---|---|---|---|---|---|
100 | 94.8 ± 0.4 | 94.9 ± 0.4 | 100 | 94.8 ± 0.4 | 94.9 ± 0.4 |
90‒100 | 92.2 ± 0.7 | 92.6 ± 0.7 | ≥90 | 93.1 ± 0.5 | 93.4 ± 0.5 |
80‒90 | 88.5 ± 2.0 | 89.6 ± 1.7 | ≥80 | 92.3 ± 0.7 | 92.7 ± 0.7 |
70‒80 | 86.7 ± 2.8 | 88.1 ± 2.2 | ≥70 | 92.0 ± 0.9 | 92.4 ± 0.8 |
60‒70 | 87.2 ± 1.3 | 88.4 ± 1.1 | ≥60 | 91.6 ± 0.9 | 92.1 ± 0.9 |
55‒60 | 86.2 ± 1.8 | 87.3 ± 1.8 | ≥55 | 91.2 ± 1.0 | 91.7 ± 0.9 |
50‒55 | 85.8 ± 0.5 | 87.0 ± 0.5 | ≥50 | 90.8 ± 1.0 | 91.4 ± 0.9 |
45‒50 | 85.3 ± 1.2 | 86.8 ± 1.1 | ≥45 | 90.4 ± 1.0 | 91.1 ± 0.9 |
40‒45 | 82.6 ± 2.0 | 84.6 ± 1.9 | ≥40 | 89.6 ± 1.1 | 90.4 ± 1.0 |
35‒40 | 81.6 ± 1.2 | 83.9 ± 1.3 | ≥35 | 88.6 ± 1.1 | 89.6 ± 1.1 |
30‒35 | 78.9 ± 1.1 | 82.7 ± 0.5 | ≥30 | 86.7 ± 1.1 | 88.2 ± 0.9 |
25‒30 | 74.0 ± 0.7 | 80.9 ± 0.6 | ≥25 | 81.6 ± 0.9 | 85.3 ± 0.8 |
20‒25 | 70.7 ± 0.4 | 79.7 ± 0.3 | ≥20 | 77.2 ± 0.7 | 83.0 ± 0.6 |
15‒20 | 65.9 ± 0.4 | 78.3 ± 0.3 | ≥15 | 74.2 ± 0.6 | 81.8 ± 0.5 |
<10‒15 | 59.7 ± 1.0 | 76.5 ± 0.5 | ≥10 | 70.5 ± 0.7 | 80.4 ± 0.5 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ho, C.-T.; Huang, Y.-W.; Chen, T.-R.; Lo, C.-H.; Lo, W.-C. Discovering the Ultimate Limits of Protein Secondary Structure Prediction. Biomolecules 2021, 11, 1627. https://doi.org/10.3390/biom11111627
Ho C-T, Huang Y-W, Chen T-R, Lo C-H, Lo W-C. Discovering the Ultimate Limits of Protein Secondary Structure Prediction. Biomolecules. 2021; 11(11):1627. https://doi.org/10.3390/biom11111627
Chicago/Turabian StyleHo, Chia-Tzu, Yu-Wei Huang, Teng-Ruei Chen, Chia-Hua Lo, and Wei-Cheng Lo. 2021. "Discovering the Ultimate Limits of Protein Secondary Structure Prediction" Biomolecules 11, no. 11: 1627. https://doi.org/10.3390/biom11111627
APA StyleHo, C.-T., Huang, Y.-W., Chen, T.-R., Lo, C.-H., & Lo, W.-C. (2021). Discovering the Ultimate Limits of Protein Secondary Structure Prediction. Biomolecules, 11(11), 1627. https://doi.org/10.3390/biom11111627