Prediction of Phage Virion Proteins Using Machine Learning Methods
Abstract
:1. Introduction
2. Results
2.1. Selection of Features
2.2. Selection of Methods
2.3. Comparison with Other Methods
2.4. Web Server
3. Discussion
4. Materials and Methods
4.1. Dataset
4.2. Features
4.3. Classification
4.3.1. Logistic Regression (LR)
4.3.2. K-Nearest Neighbor (KNN)
4.3.3. Decision Tree (DT)
4.3.4. Support Vector Machine (SVM)
4.3.5. Random Forest (RF)
4.3.6. AdaBoost Classifier (ABC)
4.3.7. Gradient Boosting Classifier (GBC)
4.4. 10-Fold Cross-Validation
4.5. Performance Measures
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Sample Availability
Abbreviations
References
- Summers, W.C. The strange history of phage therapy. Bacteriophage 2012, 2, 130–133. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Weinbauer, M.G. Ecology of prokaryotic viruses. FEMS Microbiol. Rev. 2004, 28, 127–181. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ofir, G.; Sorek, R. Contemporary Phage Biology: From Classic Models to New Insights. Cell 2018, 172, 1260–1270. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lin, D.M.; Koskella, B.; Lin, H.C. Phage therapy: An alternative to antibiotics in the age of multi-drug resistance. World J. Gastrointest. Pharmacol. Ther. 2017, 8, 162–173. [Google Scholar] [CrossRef] [PubMed]
- Singh, K.; Biswas, A.; Chakrabarti, A.K.; Dutta, S. Phage therapy as a protective tool against pathogenic bacteria: How far we are? Curr. Pharm. Biotechnol. 2022. [Google Scholar] [CrossRef]
- Lyon, J. Phage Therapy’s Role in Combating Antibiotic-Resistant Pathogens. JAMA 2017, 318, 1746–1748. [Google Scholar] [CrossRef]
- D’Accolti, M.; Soffritti, I.; Mazzacane, S.; Caselli, E. Bacteriophages as a Potential 360-Degree Pathogen Control Strategy. Microorganisms 2021, 9, 261. [Google Scholar] [CrossRef]
- Lekunberri, I.; Subirats, J.; Borrego, C.M.; Balcazar, J.L. Exploring the contribution of bacteriophages to antibiotic resistance. Environ. Pollut. 2017, 220, 981–984. [Google Scholar] [CrossRef] [PubMed]
- Jara-Acevedo, R.; Diez, P.; Gonzalez-Gonzalez, M.; Degano, R.M.; Ibarrola, N.; Gongora, R.; Orfao, A.; Fuentes, M. Screening Phage-Display Antibody Libraries Using Protein Arrays. Methods Mol. Biol. 2018, 1701, 365–380. [Google Scholar] [CrossRef]
- Lavigne, R.; Ceyssens, P.J.; Robben, J. Phage proteomics: Applications of mass spectrometry. Methods Mol. Biol. 2009, 502, 239–251. [Google Scholar] [CrossRef]
- Yuan, Y.; Gao, M. Proteomic Analysis of a Novel Bacillus Jumbo Phage Revealing Glycoside Hydrolase As Structural Component. Front. Microbiol. 2016, 7, 745. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Charoenkwan, P.; Kanthawong, S.; Schaduangrat, N.; Yana, J.; Shoombuatong, W. PVPred-SCM: Improved Prediction and Analysis of Phage Virion Proteins Using a Scoring Card Method. Cells 2020, 9, 353. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ding, H.; Feng, P.M.; Chen, W.; Lin, H. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. Mol. Biosyst. 2014, 10, 2229–2235. [Google Scholar] [CrossRef] [PubMed]
- Feng, P.M.; Ding, H.; Chen, W.; Lin, H. Naive Bayes classifier with feature selection to identify phage virion proteins. Comput. Math. Methods Med. 2013, 2013, 530696. [Google Scholar] [CrossRef] [Green Version]
- Manavalan, B.; Shin, T.H.; Lee, G. PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine. Front. Microbiol. 2018, 9, 476. [Google Scholar] [CrossRef]
- Tan, J.X.; Dao, F.Y.; Lv, H.; Feng, P.M.; Ding, H. Identifying Phage Virion Proteins by Using Two-Step Feature Selection Methods. Molecules 2018, 23, 2000. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Loc-Carrillo, C.; Abedon, S.T. Pros and cons of phage therapy. Bacteriophage 2011, 1, 111–114. [Google Scholar] [CrossRef] [Green Version]
- Principi, N.; Silvestri, E.; Esposito, S. Advantages and Limitations of Bacteriophages for the Treatment of Bacterial Infections. Front. Pharm. 2019, 10, 513. [Google Scholar] [CrossRef] [Green Version]
- Froissart, R.; Brives, C. Evolutionary biology and development model of medicines: A necessary ‘pas de deux’ for future successful bacteriophage therapy. J. Evol. Biol. 2021, 34, 1855–1866. [Google Scholar] [CrossRef]
- Rohde, C.; Wittmann, J.; Kutter, E. Bacteriophages: A Therapy Concept against Multi-Drug-Resistant Bacteria. Surg. Infect. 2018, 19, 737–744. [Google Scholar] [CrossRef] [Green Version]
- Drulis-Kawa, Z.; Majkowska-Skrobek, G.; Maciejewska, B.; Delattre, A.S.; Lavigne, R. Learning from bacteriophages—Advantages and limitations of phage and phage-encoded protein applications. Curr. Protein Pept. Sci. 2012, 13, 699–722. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Charoenkwan, P.; Nantasenamat, C.; Hasan, M.M.; Shoombuatong, W. Meta-iPVP: A sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation. J. Comput. Aided Mol. Des. 2020, 34, 1105–1116. [Google Scholar] [CrossRef]
- UniProt, C. UniProt: The universal protein knowledgebase in 2021. Nucleic Acids Res. 2021, 49, D480–D489. [Google Scholar] [CrossRef]
- Bhadra, P.; Yan, J.; Li, J.; Fong, S.; Siu, S.W.I. AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest. Sci. Rep. 2018, 8, 1697. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Meher, P.K.; Sahu, T.K.; Banchariya, A.; Rao, A.R. DIRProt: A computational approach for discriminating insecticide resistant proteins from non-resistant proteins. BMC Bioinform. 2017, 18, 190. [Google Scholar] [CrossRef] [Green Version]
- Meher, P.K.; Sahu, T.K.; Mohanty, J.; Gahoi, S.; Purru, S.; Grover, M.; Rao, A.R. nifPred: Proteome-Wide Identification and Categorization of Nitrogen-Fixation Proteins of Diaztrophs Based on Composition-Transition-Distribution Features Using Support Vector Machine. Front. Microbiol. 2018, 9, 1100. [Google Scholar] [CrossRef]
- Meng, C.; Zhang, J.; Ye, X.; Guo, F.; Zou, Q. Review and comparative analysis of machine learning-based phage virion protein identification methods. Biochim. Biophys. Acta Proteins Proteom. 2020, 1868, 140406. [Google Scholar] [CrossRef] [PubMed]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Barman, R.K.; Mukhopadhyay, A.; Maulik, U.; Das, S. Identification of infectious disease-associated host genes using machine learning techniques. BMC Bioinform. 2019, 20, 736. [Google Scholar] [CrossRef]
Sequence Features | Features Length | Method | Training Dataset | Independent Dataset | ||||
---|---|---|---|---|---|---|---|---|
accuracy | precision | f1 | accuracy_ind | precision_ind | f1_ind | |||
Amino acid composition (AAC) | 20 | SVM | 0.77 | 0.76 | 0.82 | 0.71 | 0.7 | 0.768 |
RF | 0.77 | 0.9 | 0.841 | 0.83 | 0.97 | 0.889 | ||
ABC | 0.67 | 0.86 | 0.784 | 0.7 | 0.78 | 0.78 | ||
GBC | 0.8 | 0.9 | 0.864 | 0.74 | 0.86 | 0.824 | ||
Dipeptide composition (DPC) | 400 | SVM | 0.87 | 0.86 | 0.903 | 0.72 | 0.77 | 0.794 |
RF | 0.67 | 0.95 | 0.799 | 0.73 | 0.97 | 0.833 | ||
ABC | 0.8 | 0.86 | 0.86 | 0.73 | 0.78 | 0.799 | ||
GBC | 0.73 | 0.9 | 0.824 | 0.76 | 0.94 | 0.84 | ||
Pseudo amino acid composition (PAAC) | 50 | SVM | 0.87 | 0.86 | 0.903 | 0.69 | 0.72 | 0.762 |
RF | 0.77 | 0.9 | 0.841 | 0.77 | 0.97 | 0.852 | ||
ABC | 0.8 | 0.95 | 0.869 | 0.77 | 0.84 | 0.83 | ||
GBC | 0.83 | 0.95 | 0.886 | 0.81 | 0.94 | 0.87 | ||
Composition–transition–distribution (CTD) | 343 | SVM | 1 | 1 | 1 | 0.69 | 0.84 | 0.787 |
RF | 0.77 | 1 | 0.857 | 0.71 | 0.95 | 0.819 | ||
ABC | 0.87 | 0.95 | 0.908 | 0.63 | 0.78 | 0.738 | ||
GBC | 0.87 | 0.95 | 0.908 | 0.66 | 0.78 | 0.759 | ||
AAC and DPC | 420 | SVM | 0.83 | 0.81 | 0.87 | 0.68 | 0.67 | 0.741 |
RF | 0.67 | 0.95 | 0.799 | 0.77 | 0.97 | 0.852 | ||
ABC | 0.7 | 0.81 | 0.789 | 0.71 | 0.8 | 0.79 | ||
GBC | 0.8 | 0.95 | 0.869 | 0.83 | 0.97 | 0.889 | ||
AAC, DPC, and CTD | 763 | SVM | 1 | 1 | 1 | 0.69 | 0.84 | 0.787 |
RF | 0.77 | 0.95 | 0.851 | 0.78 | 1 | 0.857 | ||
ABC | 0.87 | 0.95 | 0.908 | 0.72 | 0.81 | 0.8 | ||
GBC | 0.8 | 0.95 | 0.869 | 0.73 | 0.92 | 0.826 | ||
AAC, DPC, PAAC, and CTD | 813 | SVM | 0.97 | 0.95 | 0.974 | 0.76 | 0.83 | 0.825 |
RF | 0.73 | 0.95 | 0.832 | 0.74 | 0.97 | 0.84 | ||
ABC | 0.87 | 0.95 | 0.908 | 0.72 | 0.81 | 0.8 | ||
GBC | 0.8 | 0.95 | 0.869 | 0.79 | 0.91 | 0.857 |
Method | Training Dataset | Independent Dataset | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
accuracy | precision | mcc | f1 | auc | accuracy_ind | precision_ind | mcc_ind | f1_ind | auc_ind | |
LR | 0.83 | 0.86 | 0.62 | 0.88 | 0.847 | 0.72 | 0.72 | 0.43 | 0.78 | 0.789 |
GNB | 0.73 | 0.9 | 0.29 | 0.824 | 0.714 | 0.79 | 0.88 | 0.49 | 0.849 | 0.772 |
KNN | 0.87 | 0.9 | 0.68 | 0.9 | 0.848 | 0.71 | 0.78 | 0.34 | 0.785 | 0.746 |
DT | 0.77 | 0.86 | 0.43 | 0.84 | 0.65 | 0.72 | 0.77 | 0.39 | 0.794 | 0.654 |
SVM | 0.83 | 0.81 | 0.65 | 0.87 | 0.842 | 0.68 | 0.67 | 0.35 | 0.741 | 0.746 |
RF | 0.67 | 0.95 | −0.1 | 0.799 | 0.828 | 0.77 | 0.97 | 0.42 | 0.852 | 0.774 |
ABC | 0.7 | 0.81 | 0.26 | 0.789 | 0.759 | 0.71 | 0.8 | 0.33 | 0.79 | 0.762 |
GBC | 0.8 | 0.95 | 0.49 | 0.869 | 0.822 | 0.83 | 0.97 | 0.59 | 0.889 | 0.768 |
Method | Training Dataset | Independent Dataset | ||||||
---|---|---|---|---|---|---|---|---|
Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC | Accuracy_ ind (%) | Sensitivity_ ind (%) | Specificity_ ind (%) | MCC_ind (%) | |
Feng et al., 2013 [14] | 79.15 | 75.76 | 80.77 | - | - | - | - | - |
Ding et al., 2014 [13] | 85.02 | 75.76 | 89.42 | - | 71.30 | 60.00 | 76.50 | 0.357 |
Manavalan et al., 2018 [15] | 87.00 | 73.70 | 93.30 | 0.695 | 79.80 | 66.70 | 85.90 | 0.531 |
Tan et al., 2018 [16] | 87.95 | 83.83 | 89.90 | 0.761 | 75.53 | 70.00 | 78.13 | 0.464 |
Charoenkwan et al., 2020 [17] | 92.52 | 95.89 | 90.86 | 0.849 | 77.66 | 76.67 | 78.13 | 0.523 |
Proposed Method | 80.00 | 80.00 | 80.00 | 0.490 | 83.00 | 82.00 | 89.00 | 0.590 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Barman, R.K.; Chakrabarti, A.K.; Dutta, S. Prediction of Phage Virion Proteins Using Machine Learning Methods. Molecules 2023, 28, 2238. https://doi.org/10.3390/molecules28052238
Barman RK, Chakrabarti AK, Dutta S. Prediction of Phage Virion Proteins Using Machine Learning Methods. Molecules. 2023; 28(5):2238. https://doi.org/10.3390/molecules28052238
Chicago/Turabian StyleBarman, Ranjan Kumar, Alok Kumar Chakrabarti, and Shanta Dutta. 2023. "Prediction of Phage Virion Proteins Using Machine Learning Methods" Molecules 28, no. 5: 2238. https://doi.org/10.3390/molecules28052238
APA StyleBarman, R. K., Chakrabarti, A. K., & Dutta, S. (2023). Prediction of Phage Virion Proteins Using Machine Learning Methods. Molecules, 28(5), 2238. https://doi.org/10.3390/molecules28052238