Recent Advances in the Prediction of Protein Structural Classes: Feature Descriptors and Machine Learning Algorithms
Abstract
:1. Introduction
2. Datasets
3. Feature Extraction
3.1. Amino Acid Composition
Feature Types | Description | References |
---|---|---|
Amino Acid Composition | Simplest, primary, and fundamental | [8,9,22] |
Sequence Order | Capture all possible combinations of amino acids in oligomeric proteins, exceptionally large number of features | [40,41,42,43,44,45] |
Physicochemical Properties | Classify amino acids based on properties; Composition, order, and position-specific information are usually extracted | [1,36,46] |
Multiprofile Bayes | Incorporate both position-specific information and the posterior probability of each amino acid type | [16,47,48] |
Secondary Structure Based Features | Classify amino acids according to their tendency to form a specific secondary structural element | [1,23,25,48] |
PSSM-based Probability | Evolutionary information was included by a position-specific scoring matrix | [16,18,19,49] |
Fourier Transform Based Feature | Extract low frequency coefficients in frequency domain | [15,27,50,51] |
Functional Domain Composition | Convert protein sequence into a sequence of functional domain types | [37] |
Split Amino Acid Composition | Incorporate both position-specific information and amino acid composition | [16] |
3.2. Sequence Order
3.3. Physicochemical Properties
3.4. Multiprofile Bayes
3.5. Secondary-Structure-Based Features
3.6. Others
4. Feature Selection
4.1. Minimum Redundancy-Maximum Relevance (mRMR)
4.2. Genetic Algorithm
4.3. Particle Swarm Optimization (PSO)
4.4. Principal Component Analysis (PCA)
5. Classification Models
5.1. Artificial Neural Networks (ANNs)
5.2. Support Vector Machine (SVM)
5.3. K-Nearest Neighbor (KNN)
5.4. Random Forest
5.5. Logistic Regression
5.6. Deep Learning
6. Cross Validation
7. Concluding Discussions and Perspectives
Funding
Conflicts of Interest
References
- Dubchak, I.; Muchnik, I.; Holbrook, S.R.; Kim, S.H. Prediction of protein folding class using global description of amino acid sequence. Proc. Natl. Acad. Sci. USA 1995, 92, 8700–8704. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cheng, J.; Baldi, P. A machine learning information retrieval approach to protein fold recognition. Bioinformatics 2006, 22, 1456–1463. [Google Scholar] [CrossRef] [Green Version]
- Chou, K.-C. Structural Bioinformatics and its Impact to Biomedical Science. Curr. Med. Chem. 2004, 11, 2105–2134. [Google Scholar] [CrossRef]
- Wei, L.; Liao, M.; Gao, X.; Zou, Q. An Improved Protein Structural Classes Prediction Method by Incorporating Both Sequence and Structure Information. IEEE Trans. NanoBiosci. 2015, 14, 339–349. [Google Scholar] [CrossRef] [PubMed]
- Kurgan, L.; Cios, K.; Chen, K. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences. BMC Bioinform. 2008, 9, 226. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cai, Y.D.; Feng, K.Y.; Lu, W.C.; Chou, K.-C. Using LogitBoost classifier to predict protein structural classes. J. Theor. Biol. 2006, 238, 172–176. [Google Scholar] [CrossRef] [PubMed]
- Chou, K.-C. Progress in protein structural class prediction and its impact to bioinformatics and proteomics. Curr. Protein Pept. Sci. 2005, 6, 423–436. [Google Scholar] [CrossRef]
- Bonetta, R.; Valentino, G. Machine learning techniques for protein function prediction. Proteins Struct. Funct. Bioinform. 2020, 88, 397–413. [Google Scholar] [CrossRef]
- Chou, K.-C.; Zhang, C.-T. Prediction of Protein Structural Classes. Crit. Rev. Biochem. Mol. Biol. 1995, 30, 275–349. [Google Scholar] [CrossRef]
- Wang, Z.-X.; Yuan, Z. How good is prediction of protein structural class by the component-coupled method? Proteins Struct. Funct. Bioinform. 2000, 38, 165–175. [Google Scholar] [CrossRef]
- Liu, W.M.; Chou, K.-C. Prediction of protein structural classes by modified mahalanobis discriminant algorithm. J. Protein Chem. 1998, 17, 209–217. [Google Scholar] [CrossRef] [PubMed]
- Lin, C.; Zou, Y.; Qin, J.; Liu, X.; Jiang, Y.; Ke, C.; Zou, Q. Hierarchical Classification of Protein Folds Using a Novel Ensemble Classifier. PLoS ONE 2013, 8, e56499. [Google Scholar] [CrossRef] [PubMed]
- Chen, P.; Liu, C.; Burge, L.; Mahmood, M.; Southerland, W.; Gloster, C. Protein Fold Classification with Genetic Algorithms and Feature Selection. J. Bioinform. Comput. Biol. 2009, 7, 773–788. [Google Scholar] [CrossRef]
- Chen, K.; Kurgan, L. PFRES: Protein fold classification by using evolutionary information and predicted secondary structure. Bioinformatics 2007, 23, 2843–2850. [Google Scholar] [CrossRef] [Green Version]
- Sahu, S.S.; Panda, G. A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction. Comput. Biol. Chem. 2010, 34, 320–327. [Google Scholar] [CrossRef]
- Hayat, M.; Tahir, M.A.; Khan, S.A. Prediction of protein structure classes using hybrid space of multi-profile Bayes and bigram probability feature spaces. J. Theor. Biol. 2014, 346, 8–15. [Google Scholar] [CrossRef]
- Zhang, T.L.; Ding, Y.S.; Chou, K.-C. Prediction protein structural classes with pseudo-amino acid composition: Approximate entropy and hydrophobicity pattern. J. Theor. Biol. 2008, 250, 186–193. [Google Scholar] [CrossRef] [PubMed]
- Qin, Y.; Zheng, X.; Wang, J.; Chen, M.; Zhou, C. Prediction of protein structural class based on linear predictive coding of psi-blast profiles. Open Life Sci. 2015, 10, 529–536. [Google Scholar] [CrossRef]
- Tao, P.; Liu, T.; Li, X.; Chen, L. Prediction of protein structural class using trigram probabilities of position-specific scoring matrix and recursive feature elimination. Amino Acids 2015, 47, 461–468. [Google Scholar] [CrossRef]
- Kotsiantis, S.B. Supervised machine learning: A review of classification techniques. Informatica 2007, 31, 249–268. [Google Scholar]
- Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947–1958. [Google Scholar] [CrossRef]
- Bao, W.; Chen, Y.; Wang, D. Prediction of protein structure classes with flexible neural tree. Bio-Med. Mater. Eng. 2014, 24, 3797–3806. [Google Scholar] [CrossRef] [PubMed]
- Liu, L.; Ma, M.; Zhao, T. A GASVM algorithm for predicting protein structure classes. J. Comput. Commun. 2016, 4, 46–53. [Google Scholar] [CrossRef] [Green Version]
- Xiao, X.; Lin, W.Z.; Chou, K.-C. Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes. J. Comput. Chem. 2008, 29, 2018–2024. [Google Scholar] [CrossRef] [PubMed]
- Nanni, L.; Brahnam, S.; Lumini, A. Prediction of protein structure classes by incorporating different protein descriptors into general Chou’s pseudo amino acid composition. J. Theor. Biol. 2014, 360, 109–116. [Google Scholar] [CrossRef]
- Li, Z.C.; Zhou, X.B.; Lin, Y.R.; Zou, X.Y. Prediction of protein structure class by coupling improved genetic algorithm and support vector machine. Amino Acids 2008, 35, 581–590. [Google Scholar] [CrossRef]
- Li, Z.-C.; Zhou, X.-B.; Dai, Z.; Zou, X.-Y. Prediction of protein structural classes by Chou’s pseudo amino acid composition: Approached using continuous wavelet transform and principal component analysis. Amino Acids 2008, 37, 415–425. [Google Scholar] [CrossRef]
- Cao, Y.; Liu, S.; Zhang, L.; Qin, J.; Wang, J.; Tang, K. Prediction of protein structural class with Rough Sets. BMC Bioinform. 2006, 7, 20. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pearl, F.M.; Sillitoe, I.; Orengo, C.A. Protein Structure Classification; American Cancer Society: Atlanta, GA, USA, 2015. [Google Scholar]
- Chou, K.-C. WITHDRAWN: An insightful recollection for predicting protein subcellular locations in multi-label systems. Genomics 2019. [Google Scholar] [CrossRef]
- Chou, K.-C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol. 2011, 273, 236–247. [Google Scholar] [CrossRef]
- Chou, K.-C. Retracted article: An insightful 20-year recollection since the birth of pseudo amino acid components. Amino Acids 2020, 52, 847. [Google Scholar] [CrossRef]
- Chou, K.-C.; Shen, H.-B. Large-scale plant protein subcellular location prediction. J. Cell. Biochem. 2007, 100, 665–678. [Google Scholar] [CrossRef]
- Chou, K.-C.; Elrod, D.W. Protein subcellular location prediction. Protein Eng. Des. Sel. 1999, 12, 107–118. [Google Scholar] [CrossRef] [Green Version]
- Chou, K.-C.; Elrod, D.W. Prediction of membrane protein types and subcellular locations. Proteins 1999, 34, 137–153. [Google Scholar] [CrossRef]
- Chou, K.-C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins Struct. Funct. Bioinform. 2001, 43, 246–255. [Google Scholar] [CrossRef]
- Chou, K.-C.; Cai, Y.-D. Predicting protein structural class by functional domain composition. Biochem. Biophys. Res. Commun. 2004, 321, 1007–1009. [Google Scholar] [CrossRef] [PubMed]
- Bernardes, J.S. A Review of Protein Function Prediction under Machine Learning Perspective. Recent Pat. Biotechnol. 2013, 7, 122–141. [Google Scholar] [CrossRef] [PubMed]
- Shen, H.-B.; Chou, K.-C. PseAAC: A flexible web server for generating various kinds of protein pseudo amino acid composition. Anal. Biochem. 2008, 373, 386–388. [Google Scholar] [CrossRef] [PubMed]
- Lin, H.; Li, Q.-Z. Using pseudo amino acid composition to predict protein structural class: Approached by incorporating 400 dipeptide components. J. Comput. Chem. 2007, 28, 1463–1466. [Google Scholar] [CrossRef]
- Chou, K.-C. Using paircoupled amino acid composition to predict protein secondary structure content. Protein J. 1999, 18, 473–480. [Google Scholar] [CrossRef]
- Liu, W.-M.; Chou, K.-C. Prediction of protein secondary structure content. Protein Eng. Des. Sel. 1999, 12, 1041–1050. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yu, T.; Sun, Z.B.; Sang, J.P.; Huang, S.Y.; Zou, X.W. Structural class tendency of polypeptide: A new conception in predicting protein structural class. Phys. Part A Stat. Mech. Appl. 2007, 386, 581–589. [Google Scholar] [CrossRef]
- Rackovsky, S. On the nature of the protein folding code. Proc. Natl. Acad. Sci. USA 1993, 90, 644–648. [Google Scholar] [CrossRef] [Green Version]
- Ding, H.; Lin, H.; Chen, W.; Li, Z.-Q.; Guo, F.-B.; Huang, J.; Rao, N. Prediction of protein structural classes based on feature selection technique. Interdiscip. Sci. Comput. Life Sci. 2014, 6, 235–240. [Google Scholar] [CrossRef] [PubMed]
- Li, W.; Lin, K.; Feng, K.; Cai, Y. Prediction of protein structural classes using hybrid properties. Mol. Divers. 2008, 12, 171–179. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shao, J.; Xu, N.; Tsai, S.-N.; Wang, Y.; Ngai, S.-M. Computational Identification of Protein Methylation Sites through Bi-Profile Bayes Feature Extraction. PLoS ONE 2009, 4, e4920. [Google Scholar] [CrossRef] [PubMed]
- Hayat, M.; Khan, A. Memphybrid: Hybrid features-based prediction system for classifying membrane protein types. Anal. Biochem. 2012, 424, 35–44. [Google Scholar] [CrossRef]
- Xia, X.-Y.; Ge, M.; Wang, Z.-X.; Pan, X.-M. Accurate Prediction of Protein Structural Class. PLoS ONE 2012, 7, e37653. [Google Scholar] [CrossRef] [PubMed]
- Zhang, T.-L.; Ding, Y.-S. Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids 2007, 33, 623–629. [Google Scholar] [CrossRef]
- Yu, B.; Lou, L.; Li, S.; Zhang, Y.; Qiu, W.; Wu, X.; Wang, M.; Tian, B. Prediction of protein structural class for low-similarity sequences using Chou’s pseudo amino acid composition and wavelet denoising. J. Mol. Graph. Model. 2017, 76, 260–273. [Google Scholar] [CrossRef]
- Dubchak, I.; Muchnik, I.; Mayor, C.; Dralyuk, I.; Kim, S.H. Recognition of a protein fold in the context of the Structural Classification of Proteins (SCOP) classification. Proteins Struct. Funct. Bioinform. 1999, 35, 401–407. [Google Scholar] [CrossRef]
- Anand, A.; Pugalenthi, G.; Suganthan, P.N. Predicting protein structural class by svm with class-wise optimized features and decision probabilities. J. Theor. Biol. 2008, 253, 375–380. [Google Scholar] [CrossRef]
- Kawashima, S.; Kanehisa, M. AAindex: Amino acid index database. Nucleic Acids Res. 2000, 28, 374. [Google Scholar] [CrossRef] [PubMed]
- Zhou, X.B.; Chen, C.; Li, Z.C.; Zou, X.Y. Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J. Theor. Biol. 2007, 248, 546–551. [Google Scholar] [CrossRef] [PubMed]
- Chou, K.-C. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 2004, 21, 10–19. [Google Scholar] [CrossRef]
- Sadique, N.; Ahmed, A.A.N.; Islam, T.; Pervage, N.; Shatabda, S. Image-based effective feature generation for protein structural class and ligand binding prediction. PeerJ Comput. Sci. 2020, 6, e253. [Google Scholar] [CrossRef] [Green Version]
- Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. Feature Selection for High-Dimensional Data; Springer International Publishing: Cham, Switzerland, 2015. [Google Scholar]
- Cao, D.S.; Xu, Q.S.; Liang, Y.Z. propy: A tool to generate various modes of chou’s Pse-AAc. Bioinformatics 2013, 29, 960–962. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
- Ni, Q.; Chen, L. A feature and algorithm selection method for improving the prediction of protein structural class. Comb. Chem. High Throughput Screen. 2017, 20, 1. [Google Scholar] [CrossRef]
- Jalali-Heravi, M.; Kyani, A. Application of genetic algorithm-kernel partial least square as a novel nonlinear feature selection method: Activity of carbonic anhydrase II inhibitors. Eur. J. Med. Chem. 2007, 42, 649–659. [Google Scholar] [CrossRef]
- Kennedy, J. Particle swarm optimization. In Proceedings of the ICNN′95—International Conference on Neural Networks, Perth, Australia, 27 November–1 December 2011; Volume 4, pp. 1942–1948. [Google Scholar]
- Kaminski, M. Neural Network Training Using Particle Swarm Optimization—A Case Study. In Proceedings of the 2019 24th International Conference on Methods and Models in Automation and Robotics (MMAR), Miedzyzdroje, Poland, 26–29 August 2019; pp. 115–120. [Google Scholar]
- Meissner, M.; Schmuker, M.; Schneider, G. Optimized Particle Swarm Optimization (OPSO) and its application to artificial neural network training. BMC Bioinform. 2006, 7, 125. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Y.; Wang, S.; Ji, G. A comprehensive survey on particle swarm optimization algorithm and its applications. Math. Probl. Eng. 2015, 2015, 931256. [Google Scholar] [CrossRef] [Green Version]
- Jolliffe, I.T. Principal component analysis. J. Mark. Res. 2002, 87, 513. [Google Scholar]
- Jolliffe, I.T. Graphical Representation of Data Using Principal Components. In Principal Component Analysis, 2nd ed.; Springer: New York, NY, USA, 2002; pp. 78–110. [Google Scholar]
- Du, Q.-S.; Jiang, Z.-Q.; He, W.-Z.; Li, D.-P.; Chou, K.-C. Amino Acid Principal Component Analysis (AAPCA) and Its Applications in Protein Structural Class Prediction. J. Biomol. Struct. Dyn. 2006, 23, 635–640. [Google Scholar] [CrossRef] [PubMed]
- Wang, T.; Hu, X.; Cao, X. Identifying Protein Structural Classes Using MVP Algorithm. Int. J. Wirel. Microw. Technol. 2012, 2, 8–12. [Google Scholar] [CrossRef] [Green Version]
- Chou, K.-C. A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins Struct. Funct. Bioinform. 1995, 21, 319–344. [Google Scholar] [CrossRef]
- Chou, K.-C.; Maggiora, G.M. Domaˍin structural class prediction. Protein Eng. 1998, 11, 523–538. [Google Scholar] [CrossRef] [PubMed]
- Nakashima, H.; Nishikawa, K.; Ooi, T. The Folding type of a protein is relevant to the amino acid composition. J. Biochem. 1986, 99, 153–162. [Google Scholar] [CrossRef] [PubMed]
- Chou, K.-C. Prediction of protein folding types from amino acid composition by correlation angles. Amino Acids 1994, 6, 231–246. [Google Scholar] [CrossRef]
- Chou, K.-C.; Zhang, C.-T. A correlation-coefficient method to predicting protein-structural classes from amino acid compositions. JBIC J. Biol. Inorg. Chem. 1992, 207, 429–433. [Google Scholar] [CrossRef]
- Zhang, C.-T.; Chou, K.-C.; Maggiora, G.M. Predicting protein structural classes from amino acid composition: Application of fuzzy clustering. Protein Eng. 1995, 8, 425–435. [Google Scholar]
- Ding, Y.S.; Zhang, T.L.; Chou, K.C. Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. Protein Pept. Lett. 2007, 14, 811–815. [Google Scholar] [CrossRef]
- Kurgan, L.; Chen, K. Prediction of protein structural class for the twilight zone sequences. Biochem. Biophys. Res. Commun. 2007, 357, 453–460. [Google Scholar] [CrossRef] [PubMed]
- Jahandideh, S.; Abdolmaleki, P.; Jahandideh, M.; Hayatshahi, S.H.S.; Hayatshahi, H.S. Novel hybrid method for the evaluation of parameters contributing in determination of protein structural classes. J. Theor. Biol. 2007, 244, 275–281. [Google Scholar] [CrossRef] [PubMed]
- Nanni, L.; Lumini, A.; Pasquali, F.; Brahnam, S. iProStruct2D: Identifying protein structural classes by deep learning via 2D representations. Expert Syst. Appl. 2020, 142, 113019. [Google Scholar] [CrossRef] [Green Version]
- Jaiswal, M.; Saleem, S.; Kweon, Y.; Draizen, E.J.; Veretnik, S.; Mura, C.; Bourne, P.E. Deep learning of protein structural classes: Any evidence for an ‘Ur-fold’? In Proceedings of the 2020 Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, VA, USA, 24 April 2020; IEEE: Charlottesville, VA, USA, 2020; pp. 1–6. [Google Scholar]
- Gao, M.; Zhou, H.; Skolnick, J. DESTINI: A deep-learning approach to contact-driven protein structure prediction. Sci. Rep. 2019, 9, 1–13. [Google Scholar] [CrossRef] [Green Version]
- Newaz, K.; Ghalehnovi, M.; Rahnama, A.; Antsaklis, P.J.; Milenkovic, T. Network-based protein structural classification. R. Soc. Open Sci. 2020, 7, 191461. [Google Scholar] [CrossRef]
- Bankapur, S.; Patil, N. An Enhanced Protein Fold Recognition for Low Similarity Datasets Using Convolutional and Skip-Gram Features With Deep Neural Network. IEEE Trans. NanoBioscience 2021, 20, 42–49. [Google Scholar] [CrossRef]
- Panda, B.; Majhi, B. A novel improved prediction of protein structural class using deep recurrent neural network. Evol. Intell. 2018, 4096, 1–8. [Google Scholar] [CrossRef]
- Bishop, C.M. Neural networks for pattern recognition. Agric. Eng. Int. CIGR J. Sci. Res. Dev. Manuscr. PM 1995, 12, 1235–1242. [Google Scholar]
- Judith, E.D.; Deleo, J.M. Artificial neural networks. Cancer 2001, 91, 1615–1635. [Google Scholar]
- Chen, Y.; Yang, B.; Dong, J.; Abraham, A. Time-series forecasting using flexible neural tree model. Inf. Sci. 2005, 174, 219–235. [Google Scholar] [CrossRef]
- Yang, B.; Chen, Y.; Jiang, M. Reverse engineering of gene regulatory networks using flexible neural tree models. Neurocomputing 2013, 99, 458–466. [Google Scholar] [CrossRef]
- Park, J.; Sandberg, I.W. Approximation and Radial-Basis-Function Networks. Neural Comput. 1993, 5, 305–316. [Google Scholar] [CrossRef]
- Samantaray, S.; Dash, P.; Panda, G. Fault classification and location using HS-transform and radial basis function neural network. Electr. Power Syst. Res. 2006, 76, 897–905. [Google Scholar] [CrossRef]
- Cortes, C. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Chang, C.-C.; Lin, C.-J. LIBSVM. ACM Trans. Intell. Syst. Technol. 2011, 2, 1–27. [Google Scholar] [CrossRef]
- Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1992; pp. 144–152. [Google Scholar]
- Cai, Y.-D.; Liu, X.-J.; Xu, X.-B.; Zhou, G.-P. Support Vector Machines for predicting protein structural class. BMC Bioinform. 2001, 2, 3. [Google Scholar] [CrossRef]
- Fei, B.; Liu, J. Binary tree of SVM: A new fast multiclass training and classification algorithm. IEEE Trans. Neural Networks 2006, 17, 696–704. [Google Scholar] [CrossRef] [PubMed]
- Hasan, A.M.; Nasser, M.; Pal, B.; Ahmad, S. Support Vector Machine and Random Forest Modeling for Intrusion Detection System (IDS). J. Intell. Learn. Syst. Appl. 2014, 6, 45–52. [Google Scholar] [CrossRef] [Green Version]
- Li, J.; Wu, J.; Chen, K. PFP-RFSM: Protein fold prediction by using random forests and sequence motifs. J. Biomed. Sci. Eng. 2013, 6, 1161–1170. [Google Scholar] [CrossRef] [Green Version]
- Deng, L.; Yu, D. Deep learning: Methods and applications. Found. Trends Signal Process. 2014, 7, 197–387. [Google Scholar] [CrossRef] [Green Version]
- Schmidhuber, J. Deep learning in neural networks. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [Green Version]
- Senior, A.W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T.; Qin, C.; Žídek, A.; Nelson, A.W.R.; Bridgland, A.; et al. Improved protein structure prediction using potentials from deep learning. Nature 2020, 577, 706–710. [Google Scholar] [CrossRef]
- Sidi, T.; Keasar, C. Redundancy-weighting the PDB for detailed secondary structure prediction using deep-learning models. Bioinformatics 2020, 36, 3733–3738. [Google Scholar] [CrossRef]
- Sun, T.; Zhou, B.; Lai, L.; Pei, J. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinform. 2017, 18, 1–8. [Google Scholar] [CrossRef] [Green Version]
- Wang, Y.; Mao, H.; Yi, Z. Protein secondary structure prediction by using deep learning method. Knowl. Based Syst. 2017, 118, 115–123. [Google Scholar] [CrossRef] [Green Version]
- Almagro Armenteros, J.J.; Sønderby, C.K.; Sønderby, S.K.; Nielsen, H.B.; Winther, O. DeepLoc: Prediction of protein subcellular localization using deep learning. Bioinformatics 2017, 33, 3387–3395. [Google Scholar] [CrossRef] [PubMed]
- Klausen, M.S.; Jespersen, M.C.; Nielsen, H.; Jensen, K.K.; Jurtz, V.I.; Sønderby, C.K.; Sommer, M.O.A.; Winther, O.; Nielsen, M.; Petersen, B.; et al. NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning. Proteins Struct. Funct. Bioinform. 2019, 87, 520–527. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kedarisetti, K.D.; Kurgan, L.; Dick, S. Classifier ensembles for protein structural class prediction with varying homology. Biochem. Biophys. Res. Commun. 2006, 348, 981–988. [Google Scholar] [CrossRef]
- Chen, L.; Lu, L.; Feng, K.; Li, W.; Song, J.; Zheng, L.; Yuan, Y.; Zeng, Z.; Feng, K.; Lu, W.; et al. Multiple classifier integration for the prediction of protein structural classes. J. Comput. Chem. 2009, 30, 2248–2254. [Google Scholar] [CrossRef]
- Rahman, A.F.R.; Alam, H.; Fairhurst, M.C. Multiple classifier combination for character recognition: Revisiting the majority voting system and its variations. In International Workshop on Document Analysis Systems; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
- Larrañaga, P.; Calvo, B.; Santana, R.; Bielza, C.; Galdiano, J.; Inza, I.; Lozano, J.A.; Armañanzas, R.; Santafé, G.; Pérez, A.; et al. Machine learning in bioinformatics. Brief. Bioinform. 2006, 7, 86–112. [Google Scholar] [CrossRef] [Green Version]
- Kurgan, L.A.; Homaeian, L. Prediction of structural classes for protein sequences and domains—Impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy. Pattern Recognit. 2006, 39, 2323–2343. [Google Scholar] [CrossRef]
- Zhu, X.-J.; Feng, C.-Q.; Lai, H.-Y.; Chen, W.; Hao, L. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl. Based Syst. 2019, 163, 787–793. [Google Scholar] [CrossRef]
- Zhang, T.-H.; Zhang, S.-W. Advances in the Prediction of Protein Subcellular Locations with Machine Learning. Curr. Bioinform. 2019, 14, 406–421. [Google Scholar] [CrossRef]
- Zhang, L.; Tan, J.; Han, D.; Zhu, H. From machine learning to deep learning: Progress in machine intelligence for rational drug discovery. Drug Discov. Today 2017, 22, 1680–1685. [Google Scholar] [CrossRef]
- Vamathevan, J.; Clark, D.; Czodrowski, P.; Dunham, I.; Ferran, E.; Lee, G.; Li, B.; Madabhushi, A.; Shah, P.; Spitzer, M.; et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 2019, 18, 463–477. [Google Scholar] [CrossRef]
- Zhang, S.W.; Fan, X.N. Computational methods for predicting ncrna-protein interactions. Med. Chem. 2017, 13, 515–525. [Google Scholar] [CrossRef]
- Outeiral, C.; Strahm, M.; Shi, J.; Morris, G.M.; Benjamin, S.C.; Deane, C.M. The prospects of quantum computing in computational molecular biology. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2021, 11. [Google Scholar] [CrossRef]
- Mulligan, V.K.; Melo, H.; Merritt, H.I.; Slocum, S.; Weitzner, B.D.; Watkins, A.M.; Renfrew, P.D.; Pelissier, C.; Arora, P.S.; Bonneau, R. Designing Peptides on a Quantum Computer. bioRxiv 2019. [Google Scholar] [CrossRef]
- Li, J.; Feng, Y.; Wang, X.; Li, J.; Liu, W.; Rong, L.; Bao, J. An overview of predictors for intrinsically disordered proteins over 2010–2014. Int. J. Mol. Sci. 2015, 16, 23446–23462. [Google Scholar] [CrossRef] [PubMed]
- Vullo, A.; Bortolami, O.; Pollastri, G.; Tosatto, S.C.E. Spritz: A server for the prediction of intrinsically disordered regions in protein sequences using kernel machines. Nucleic Acids Res. 2006, 34, W164–W168. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, M.-L.; Su, W.; Wang, J.-S.; Yang, Y.-H.; Yang, H.; Lin, H. Predicting Preference of Transcription Factors for Methylated DNA Using Sequence Information. Mol. Ther. Nucleic Acids 2020, 22, 1043–1050. [Google Scholar] [CrossRef] [PubMed]
- Bauer, T.; Eils, R.; König, R. RIP: The regulatory interaction predictor—A machine learning-based approach for predicting target genes of transcription factors. Bioinformatics 2011, 27, 2239–2247. [Google Scholar] [CrossRef]
- Mao, M.; Hu, Y.; Yang, Y.; Qian, Y.; Wei, H.; Fan, W.; Yang, Y.; Li, X.; Wang, Z. Modeling and Predicting the Activities of Trans-Acting Splicing Factors with Machine Learning. Cell Syst. 2018, 7, 510–520.e4. [Google Scholar] [CrossRef] [Green Version]
- Walia, R.R.; Caragea, C.; Lewis, B.A.; Towfic, F.; Terribilini, M.; El-Manzalawy, Y.; Dobbs, D.; Honavar, V. Protein-RNA interface residue prediction using machine learning: An assessment of the state of the art. BMC Bioinform. 2012, 13, 89. [Google Scholar] [CrossRef] [Green Version]
- Walia, R.R.; Xue, L.C.; Wilkins, K.; El-Manzalawy, Y.; Dobbs, D.; Honavar, V. RNABindRPlus: A Predictor that Combines Machine Learning and Sequence Homology-Based Methods to Improve the Reliability of Predicted RNA-Binding Residues in Proteins. PLoS ONE 2014, 9, e97725. [Google Scholar] [CrossRef]
Dataset | All-Alpha | All-Beta | Alpha/Beta | Alpha + Beta | Total |
---|---|---|---|---|---|
640 [16,22,25] | 138 | 154 | 171 | 177 | 640 |
1189 [16,22,23] | 223 | 294 | 334 | 241 | 1092 |
ASTRAL [22] | 639 | 661 | 749 | 764 | 2813 |
C204 [15,22,24,26] | 52 | 61 | 45 | 46 | 204 |
25 PDB [16,18,23,27] | 443 | 443 | 346 | 441 | 1673 |
277 domains [15,26,28] | 70 | 61 | 81 | 63 | 277 |
498 domains [15,26,28] | 107 | 126 | 136 | 129 | 498 |
FC699 [23,25] | 130 | 269 | 377 | 72 | 858 |
ML Algorithms | Recent Variants | References |
---|---|---|
Artificial Neural Network | Flexible Neural Tree | [22] |
Radial Basis Function Neural Network | [15] | |
Support Vector Machine | Binary-Tree Support Vector Machine | [55] |
Improved Genetic Algorithm + Support Vector Machine | [23,26] | |
Dual-Layer Fuzzy Support Vector Machine | [77] | |
K-Nearest Neighbor | Optimized Evidence-Theoretic K-Nearest Neighbor | [16] |
Fuzzy K-Nearest Neighbor | [17] | |
Random Forest | N/A | [4] |
Logistic Regression | Multinomial Logistic Regression + Artificial Neural Network | [79] |
Deep Learning | Deep Recurrent Neural Network | [85] |
Convolutional Neural Network | [80] |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, L.; Davari, M.D.; Li, W. Recent Advances in the Prediction of Protein Structural Classes: Feature Descriptors and Machine Learning Algorithms. Crystals 2021, 11, 324. https://doi.org/10.3390/cryst11040324
Zhu L, Davari MD, Li W. Recent Advances in the Prediction of Protein Structural Classes: Feature Descriptors and Machine Learning Algorithms. Crystals. 2021; 11(4):324. https://doi.org/10.3390/cryst11040324
Chicago/Turabian StyleZhu, Lin, Mehdi D. Davari, and Wenjin Li. 2021. "Recent Advances in the Prediction of Protein Structural Classes: Feature Descriptors and Machine Learning Algorithms" Crystals 11, no. 4: 324. https://doi.org/10.3390/cryst11040324
APA StyleZhu, L., Davari, M. D., & Li, W. (2021). Recent Advances in the Prediction of Protein Structural Classes: Feature Descriptors and Machine Learning Algorithms. Crystals, 11(4), 324. https://doi.org/10.3390/cryst11040324