GasPhos: Protein Phosphorylation Site Prediction Using a New Feature Selection Approach with a GA-Aided Ant Colony System
Abstract
:1. Introduction
2. Results
2.1. Selection of Machine Learning Methods and Heuristic Functions
2.2. Analysis of GA-Aided Strategy
2.3. Full Pseudo-Random Proportional Rule
2.4. Comparison with Other Feature Selection Methods
2.5. Comparison with the Existing Predictors
3. Discussion
3.1. Similarity of Conserved Sequences and Features
3.2. Performance for Different Functional Proteins
3.3. Case Study Predicting the Phosphorylation Sites of Disease-Related Proteins
4. Materials and Methods
4.1. Data Preparation
4.2. Feature Encoding
4.3. Evaluating Machine Learning Methods
4.4. GA-Aided Ant Colony System
4.5. New Ant Colony System
4.5.1. Binary Transformation Strategy
4.5.2. State Transition Rule
4.5.3. Update of Pheromones
4.6. The GA Strategy
4.7. System Implementation
4.8. Evaluation of Classification Performance
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Conflicts of Interest
References
- Li, T.; Du, P.; Xu, N. Identifying human kinase-specific protein phosphorylation sites by integrating heterogeneous information from various sources. PLoS ONE 2010, 5, e15411. [Google Scholar] [CrossRef] [PubMed]
- Trost, B.; Kusalik, A. Computational prediction of eukaryotic phosphorylation sites. Bioinformatics 2011, 27, 2927–2935. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hubbard, M.J.; Cohen, P. On target with a new mechanism for the regulation of protein phosphorylation. Trends Biochem. Sci. 1993, 18, 172–177. [Google Scholar] [CrossRef]
- Manning, G.; Whyte, D.B.; Martinez, R.; Hunter, T.; Sudarsanam, S. The protein kinase complement of the human genome. Science 2002, 298, 1912–1934. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Karampetsou, M.; Ardah, M.T.; Semitekolou, M.; Polissidis, A.; Samiotaki, M.; Kalomoiri, M.; Majbour, N.; Xanthou, G.; El-Agnaf, O.M.; Vekrellis, K. Phosphorylated exogenous alpha-synuclein fibrils exacerbate pathology and induce neuronal dysfunction in mice. Sci Rep. 2017, 7, 1–18. [Google Scholar] [CrossRef] [PubMed]
- Junqueira, S.C.; Centeno, E.G.; Wilkinson, K.A.; Cimarosti, H. Post-translational modifications of parkinson’s disease-related proteins: Phosphorylation, sumoylation and ubiquitination. Biochim. Biophys. Acta BBA Mol. Basis Dis. 2019, 1865, 2001–2007. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Song, J.; Wang, H.; Wang, J.; Leier, A.; Marquez-Lago, T.; Yang, B.; Zhang, Z.; Akutsu, T.; Webb, G.I.; Daly, R.J. Phosphopredict: A bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection. Sci. Rep. 2017, 7, 1–19. [Google Scholar] [CrossRef] [Green Version]
- Ismail, H.D.; Jones, A.; Kim, J.H.; Newman, R.H.; Kc, D.B. Rf-phos: A novel general phosphorylation site prediction tool based on random forest. BioMed. Res. Int. 2016, 2016, 3281590. [Google Scholar] [CrossRef] [Green Version]
- Lumbanraja, F.R.; Mahesworo, B.; Cenggoro, T.W.; Budiarto, A.; Pardamean, B. An evaluation of deep neural network performance on limited protein phosphorylation site prediction data. Proc. Comput. Sci. 2019, 157, 25–30. [Google Scholar] [CrossRef]
- Gan, J.; Qiu, J.; Deng, C.; Lan, W.; Chen, Q.; Hu, Y. Ksimc: Predicting kinase–substrate interactions based on matrix completion. Int. J. Mol. Sci. 2019, 20, 302. [Google Scholar] [CrossRef] [Green Version]
- Gao, J.; Thelen, J.J.; Dunker, A.K.; Xu, D. Musite, a tool for global prediction of general and kinase-specific phosphorylation sites. Mol. Cell. Proteom. 2010, 9, 2586–2600. [Google Scholar] [CrossRef] [Green Version]
- Wong, Y.-H.; Lee, T.-Y.; Liang, H.-K.; Huang, C.-M.; Wang, T.-Y.; Yang, Y.-H.; Chu, C.-H.; Huang, H.-D.; Ko, M.-T.; Hwang, J.-K. Kinasephos 2.0: A web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns. Nucl. Acids Res. 2007, 35, W588–W594. [Google Scholar] [CrossRef] [PubMed]
- Xue, Y.; Li, A.; Wang, L.; Feng, H.; Yao, X. Ppsp: Prediction of pk-specific phosphorylation site with bayesian decision theory. BMC Bioinf. 2006, 7, 163. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Xue, Y.; Liu, Z.; Cao, J.; Ma, Q.; Gao, X.; Wang, Q.; Jin, C.; Zhou, Y.; Wen, L.; Ren, J. Gps 2.1: Enhanced prediction of kinase-specific phosphorylation sites with an algorithm of motif length selection. Protein Eng. Des. Select. 2011, 24, 255–260. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Song, C.; Ye, M.; Liu, Z.; Cheng, H.; Jiang, X.; Han, G.; Songyang, Z.; Tan, Y.; Wang, H.; Ren, J. Systematic analysis of protein phosphorylation networks from phosphoproteomic data. Mol. Cell. Proteom. 2012, 11, 1070–1083. [Google Scholar] [CrossRef] [Green Version]
- Xue, Y.; Ren, J.; Gao, X.; Jin, C.; Wen, L.; Yao, X. Gps 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy. Mol. Cell. Proteom. 2008, 7, 1598–1608. [Google Scholar] [CrossRef] [Green Version]
- Dang, T.H.; Trac, Q.T.; Phan, H.K.; Nguyen, M.C.; Thi, Q.T.P. Skiphos: Non-kinase specific phosphorylation site prediction with random forests and amino acid skip-gram embeddings. BioRxiv 2019, 793794. [Google Scholar]
- Blom, N.; Gammeltoft, S.; Brunak, S. Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J. Mol. Biol. 1999, 294, 1351–1362. [Google Scholar] [CrossRef]
- Iakoucheva, L.M.; Radivojac, P.; Brown, C.J.; O’Connor, T.R.; Sikes, J.G.; Obradovic, Z.; Dunker, A.K. The importance of intrinsic disorder for protein phosphorylation. Nucl. Acids Res. 2004, 32, 1037–1049. [Google Scholar] [CrossRef] [Green Version]
- Dorigo, M.; Birattari, M.; Stutzle, T. Ant colony optimization. IEEE Comput. Intell. Mag. 2006, 1, 28–39. [Google Scholar] [CrossRef] [Green Version]
- Liaw, A.; Wiener, M. Classification and regression by randomforest. R News 2002, 2, 18–22. [Google Scholar]
- Ebina, T.; Toh, H.; Kuroda, Y. Drop: An svm domain linker predictor trained with optimal features selected by random forest. Bioinformatics 2011, 27, 487–494. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Schaduangrat, N.; Nantasenamat, C.; Prachayasittikul, V.; Shoombuatong, W. Meta-iavp: A sequence-based meta-predictor for improving the prediction of antiviral peptides using effective feature representation. Int. J. Mol. Sci. 2019, 20, 5743. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Benesty, J.; Chen, J.; Huang, Y.; Cohen, I. Pearson correlation coefficient. In Noise Reduction in Speech Processing; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–4. [Google Scholar]
- Kabir, M.M.; Shahjahan, M.; Murase, K. A new hybrid ant colony optimization algorithm for feature selection. Exp. Syst. Appl. 2012, 39, 3747–3763. [Google Scholar] [CrossRef]
- Huang, C.-L. Aco-based hybrid classification system with feature subset selection and model parameters optimization. Neurocomputing 2009, 73, 438–448. [Google Scholar] [CrossRef]
- Crooks, G.E.; Hon, G.; Chandonia, J.-M.; Brenner, S.E. Weblogo: A sequence logo generator. Genom. Res. 2004, 14, 1188–1190. [Google Scholar] [CrossRef] [Green Version]
- Huang, J.-H.; Cao, D.-S.; Yan, J.; Xu, Q.-S.; Hu, Q.-N.; Liang, Y.-Z. Using core hydrophobicity to identify phosphorylation sites of human g protein-coupled receptors. Biochimie 2012, 94, 1697–1704. [Google Scholar] [CrossRef]
- Zhan, Z.; He, K.; Zhu, D.; Jiang, D.; Huang, Y.-H.; Li, Y.; Sun, C.; Jin, Y.-H. Phosphorylation of rad9 at serine 328 by cyclin a-cdk2 triggers apoptosis via interfering bcl-xl. PLoS ONE 2012, 7, e44923. [Google Scholar] [CrossRef] [Green Version]
- Witt, O.; Deubzer, H.E.; Milde, T.; Oehme, I. Hdac family: What are the cancer relevant targets? Cancer Lett. 2009, 277, 8–21. [Google Scholar] [CrossRef]
- Pluemsampant, S.; Safronova, O.S.; Nakahama, K.i.; Morita, I. Protein kinase ck2 is a key activator of histone deacetylase in hypoxia-associated tumors. Int. J. Cancer 2008, 122, 333–341. [Google Scholar] [CrossRef]
- Onge, R.P.S.; Besley, B.D.; Pelley, J.L.; Davey, S. A role for the phosphorylation of hrad9 in checkpoint signaling. J. Biol. Chem. 2003, 278, 26620–26628. [Google Scholar]
- Khan, D.H.; He, S.; Yu, J.; Winter, S.; Cao, W.; Seiser, C.; Davie, J.R. Protein kinase ck2 regulates the dimerization of histone deacetylase 1 (hdac1) and hdac2 during mitosis. J. Biol. Chem. 2013, 288, 16518–16528. [Google Scholar] [PubMed] [Green Version]
- Boutet, E.; Lieberherr, D.; Tognolli, M.; Schneider, M.; Bairoch, A. Uniprotkb/swiss-prot. In Plant Bioinformatics; Springer: Berlin/Heideiberg, Germany, 2007; pp. 89–112. [Google Scholar]
- Diella, F.; Cameron, S.; Gemünd, C.; Linding, R.; Via, A.; Kuster, B.; Sicheritz-Pontén, T.; Blom, N.; Gibson, T.J. Phospho. Elm: A database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinf. 2004, 5, 1–5. [Google Scholar]
- Hornbeck, P.V.; Kornhauser, J.M.; Tkachev, S.; Zhang, B.; Skrzypek, E.; Murray, B.; Latham, V.; Sullivan, M. Phosphositeplus: A comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucl. Acids Res. 2012, 40, D261–D270. [Google Scholar] [PubMed] [Green Version]
- Yang, C.-Y.; Chang, C.-H.; Yu, Y.-L.; Lin, T.-C.E.; Lee, S.-A.; Yen, C.-C.; Yang, J.-M.; Lai, J.-M.; Hong, Y.-R.; Tseng, T.-L. Phosphopoint: A comprehensive human kinase interactome and phospho-protein database. Bioinformatics 2008, 24, i14–i20. [Google Scholar]
- Lee, T.-Y.; Bo-Kai Hsu, J.; Chang, W.-C.; Huang, H.-D. Regphos: A system to explore the protein kinase–substrate phosphorylation network in humans. Nucl. Acids Res. 2011, 39, D777–D787. [Google Scholar] [PubMed] [Green Version]
- Li, W.; Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22, 1658–1659. [Google Scholar]
- Lee, T.-Y.; Bretaña, N.A.; Lu, C.-T. Plantphos: Using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity. BMC Bioinf. 2011, 12, 261. [Google Scholar]
- Kawashima, S.; Pokarowski, P.; Pokarowska, M.; Kolinski, A.; Katayama, T.; Kanehisa, M. Aaindex: Amino acid index database, progress report 2008. Nucl. Acids Res. 2007, 36, D202–D205. [Google Scholar]
- Atchley, W.R.; Zhao, J.; Fernandes, A.D.; Drüke, T. Solving the protein sequence metric problem. Proc. Natl. Acad. Sci. USA 2005, 102, 6395–6400. [Google Scholar]
- Venkatarajan, M.S.; Braun, W. New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties. Mol. Model. Ann. 2001, 7, 445–453. [Google Scholar]
- Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The weka data mining software: An update. ACM SIGKDD Exp. Newslett. 2009, 11, 10–18. [Google Scholar] [CrossRef]
- Huang, H.; Xie, H.-B.; Guo, J.-Y.; Chen, H.-J. Ant colony optimization-based feature selection method for surface electromyography signals classification. Comput. Biol. Med. 2012, 42, 30–38. [Google Scholar] [CrossRef]
- Piscopo, M.; Trifuoggi, M.; Notariale, R.; Labar, S.; Troisi, J.; Giarra, A.; Rabbito, D.; Puoti, R.; Brundo, M.V.; Basile, A. Protamine-like proteins analyses as emerging biotechnique for cadmium impact assessment on male mollusk mytilus galloprovincialis (lamarck 1819). Acta Biochim. Pol. 2018, 65, 259–267. [Google Scholar] [CrossRef] [PubMed]
- Lettieri, G.; Marra, F.; Moriello, C.; Prisco, M.; Notari, T.; Trifuoggi, M.; Giarra, A.; Bosco, L.; Montano, L.; Piscopo, M. Molecular alterations in spermatozoa of a family case living in the land of fires. A first look at possible transgenerational effects of pollutants. Int. J. Mol. Sci. 2020, 21, 6710. [Google Scholar]
- Lettieri, G.; D’Agostino, G.; Mele, E.; Cardito, C.; Esposito, R.; Cimmino, A.; Giarra, A.; Trifuoggi, M.; Raimondo, S.; Notari, T. Discovery of the involvement in DNA oxidative damage of human sperm nuclear basic proteins of healthy young men living in polluted areas. Int. J. Mol. Sci. 2020, 21, 4198. [Google Scholar] [CrossRef] [PubMed]
Kinase | Classifier | All Features | ACSFS | |||
---|---|---|---|---|---|---|
IG | F-Score | PCC | MDGI | |||
CDK_S | BFTree | 0.711 | 0.723 | 0.747 | 0.748 | 0.746 |
CDK_T | SimpleCart | 0.764 | 0.770 | 0.767 | 0.773 | 0.776 |
CK2_S | NaiveBayes | 0.630 | 0.699 | 0.692 | 0.702 | 0.700 |
CK2_T | MultiBoostAB | 0.625 | 0.630 | 0.681 | 0.682 | 0.692 |
MAPK_S | BFTree | 0.742 | 0.745 | 0.770 | 0.775 | 0.773 |
MAPK_T | BFTree | 0.843 | 0.850 | 0.859 | 0.849 | 0.854 |
PKA_S | DecisionTable | 0.747 | 0.769 | 0.772 | 0.781 | 0.781 |
PKA_T | RBFNetwork | 0.720 | 0.763 | 0.800 | 0.823 | 0.862 |
PKC_S | DecisionTable | 0.558 | 0.574 | 0.585 | 0.585 | 0.585 |
PKC_T | SimpleLogistic | 0.470 | 0.502 | 0.579 | 0.579 | 0.649 |
Src_Y | NaiveBayes | 0.320 | 0.359 | 0.394 | 0.408 | 0.421 |
Avg. | 0.648 | 0.671 | 0.695 | 0.700 | 0.713 |
Kinase | ACSFS | ACSGAFS | ||||||
---|---|---|---|---|---|---|---|---|
SN | SP | ACC | MCC | SN | SP | ACC | MCC | |
CDK_S | 0.838 | 0.906 | 0.872 | 0.746 | 0.845 | 0.904 | 0.874 | 0.751 |
CDK_T | 0.861 | 0.913 | 0.887 | 0.776 | 0.865 | 0.913 | 0.889 | 0.779 |
CK2_S | 0.814 | 0.884 | 0.849 | 0.700 | 0.815 | 0.898 | 0.856 | 0.716 |
CK2_T | 0.813 | 0.875 | 0.844 | 0.692 | 0.825 | 0.875 | 0.850 | 0.705 |
MAPK_S | 0.877 | 0.894 | 0.885 | 0.773 | 0.877 | 0.901 | 0.889 | 0.779 |
MAPK_T | 0.904 | 0.949 | 0.927 | 0.854 | 0.914 | 0.944 | 0.929 | 0.859 |
PKA_S | 0.871 | 0.907 | 0.889 | 0.781 | 0.878 | 0.902 | 0.890 | 0.782 |
PKA_T | 0.905 | 0.951 | 0.929 | 0.862 | 0.888 | 1.000 | 0.944 | 0.896 |
PKC_S | 0.795 | 0.789 | 0.792 | 0.585 | 0.804 | 0.786 | 0.795 | 0.592 |
PKC_T | 0.815 | 0.831 | 0.823 | 0.649 | 0.792 | 0.862 | 0.827 | 0.656 |
Src_Y | 0.715 | 0.704 | 0.709 | 0.421 | 0.752 | 0.715 | 0.733 | 0.469 |
Avg. | 0.837 | 0.873 | 0.855 | 0.713 | 0.841 | 0.882 | 0.862 | 0.726 |
Kinase | Binary Transformation Strategy | Pseudo-Random Proportional Rule | Full Pseudo-Random Proportional Rule |
---|---|---|---|
CDK_S | 0.751 (50) | 0.743 (104) | 0.755 (33) |
CDK_T | 0.779 (41) | 0.779 (43) | 0.792 (27) |
CK2_S | 0.716 (74) | 0.712 (112) | 0.712 (49) |
CK2_T | 0.705 (71) | 0.705 (50) | 0.758 (44) |
MAPK_S | 0.779 (45) | 0.769 (83) | 0.786 (30) |
MAPK_T | 0.859 (29) | 0.866 (27) | 0.864 (20) |
PKA_S | 0.782 (45) | 0.774 (62) | 0.795 (25) |
PKA_T | 0.896 (55) | 0.896 (40) | 0.909 (34) |
PKC_S | 0.592 (77) | 0.585 (126) | 0.615 (48) |
PKC_T | 0.656 (73) | 0.651 (64) | 0.667 (54) |
Src_Y | 0.469 (92) | 0.437 (132) | 0.477 (61) |
Avg. | 0.726 (59.27) | 0.720 (76.64) | 0.739 (38.64) |
Kinase | Simulated Annealing Algorithm | Genetic Algorithm | Gas |
---|---|---|---|
CDK_S | 0.692 | 0.736 | 0.755 |
CDK_T | 0.764 | 0.773 | 0.792 |
CK2_S | 0.602 | 0.699 | 0.712 |
CK2_T | 0.580 | 0.694 | 0.758 |
MAPK_S | 0.746 | 0.760 | 0.786 |
MAPK_T | 0.844 | 0.844 | 0.864 |
PKA_S | 0.770 | 0.780 | 0.795 |
PKA_T | 0.769 | 0.830 | 0.909 |
PKC_S | 0.571 | 0.589 | 0.615 |
PKC_T | 0.379 | 0.557 | 0.667 |
Src_Y | 0.323 | 0.391 | 0.477 |
Avg. | 0.640 | 0.696 | 0.739 |
Kinase | KinasPhos 2.0 | GPS | iGPS | Musite | PPSP | GasPhos |
---|---|---|---|---|---|---|
CDK_S | 0.150 | 0.593 | 0.503 | 0.677 | 0.689 | 0.755 |
CDK_T | 0.260 | 0.688 | 0.575 | 0.743 | 0.761 | 0.792 |
CK2_S | 0.647 | 0.619 | 0.423 | 0.661 | 0.583 | 0.712 |
CK2_T | 0.400 | 0.590 | 0.434 | 0.555 | 0.550 | 0.758 |
MAPK_S | 0.390 | 0.588 | 0.613 | 0.691 | 0.696 | 0.786 |
MAPK_T | N/A * | 0.730 | 0.708 | 0.820 | 0.824 | 0.864 |
PKA_S | 0.161 | 0.765 | 0.516 | 0.747 | 0.747 | 0.795 |
PKA_T | 0.560 | 0.813 | 0.514 | 0.719 | 0.700 | 0.909 |
PKC_S | 0.166 | 0.464 | 0.466 | 0.493 | 0.521 | 0.615 |
PKC_T | 0.231 | 0.459 | 0.410 | 0.418 | 0.436 | 0.667 |
Src_Y | 0.075 | 0.459 | 0.329 | 0.285 | 0.319 | 0.477 |
Avg. | 0.276 | 0.615 | 0.499 | 0.619 | 0.621 | 0.739 |
Function Type | KinasPhos 2.0 | GPS | iGPS | Musite | PPSP | GasPhos |
---|---|---|---|---|---|---|
Defense proteins | 0.040 | 0.499 | 0.231 | 0.386 | 0.458 | 0.396 |
Enzymes | 0.252 | 0.606 | 0.515 | 0.543 | 0.544 | 0.680 |
Contractile proteins | 0.064 | 0.247 | 0.306 | 0.414 | 0.391 | 0.447 |
Regulatory proteins | 0.212 | 0.393 | 0.273 | 0.426 | 0.469 | 0.539 |
Receptor proteins | 0.176 | 0.571 | 0.447 | 0.500 | 0.543 | 0.588 |
Other | 0.269 | 0.602 | 0.473 | 0.627 | 0.610 | 0.744 |
Avg. | 0.169 | 0.486 | 0.374 | 0.483 | 0.502 | 0.566 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, C.-W.; Huang, L.-Y.; Liao, C.-F.; Chang, K.-P.; Chu, Y.-W. GasPhos: Protein Phosphorylation Site Prediction Using a New Feature Selection Approach with a GA-Aided Ant Colony System. Int. J. Mol. Sci. 2020, 21, 7891. https://doi.org/10.3390/ijms21217891
Chen C-W, Huang L-Y, Liao C-F, Chang K-P, Chu Y-W. GasPhos: Protein Phosphorylation Site Prediction Using a New Feature Selection Approach with a GA-Aided Ant Colony System. International Journal of Molecular Sciences. 2020; 21(21):7891. https://doi.org/10.3390/ijms21217891
Chicago/Turabian StyleChen, Chi-Wei, Lan-Ying Huang, Chia-Feng Liao, Kai-Po Chang, and Yen-Wei Chu. 2020. "GasPhos: Protein Phosphorylation Site Prediction Using a New Feature Selection Approach with a GA-Aided Ant Colony System" International Journal of Molecular Sciences 21, no. 21: 7891. https://doi.org/10.3390/ijms21217891
APA StyleChen, C.-W., Huang, L.-Y., Liao, C.-F., Chang, K.-P., & Chu, Y.-W. (2020). GasPhos: Protein Phosphorylation Site Prediction Using a New Feature Selection Approach with a GA-Aided Ant Colony System. International Journal of Molecular Sciences, 21(21), 7891. https://doi.org/10.3390/ijms21217891