Ensemble Learning for Hormone Binding Protein Prediction: A Promising Approach for Early Diagnosis of Thyroid Hormone Disorders in Serum
Abstract
:1. Introduction
2. Materials and Methods
2.1. Benchmark Dataset
2.2. Feature Formulation
Position Relative Features
2.3. Statistical Moments
2.4. Classification Algorithm
2.4.1. Boosted Random Forest
2.4.2. AdaBoost Classifier
2.4.3. Hyper-Parameter Optimization
2.5. Performance Evaluation
2.5.1. Accuracy
2.5.2. Precision, Recall and F-Score
3. Results and Discussion
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sex Hormone-Binding Globulin Genetic Variation: Associations with Type 2 Diabetes Mellitus and Polycystic Ovary Syndrome—PMC. Available online: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3683392/ (accessed on 30 October 2022).
- Kraut, J.A.; Madias, N.E. Adverse Effects of the Metabolic Acidosis of Chronic Kidney Disease. Adv. Chronic Kidney Dis. 2017, 24, 289–297. [Google Scholar] [CrossRef] [PubMed]
- Chou, K.-C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol. 2011, 273, 236–247. [Google Scholar] [CrossRef] [PubMed]
- Wang, T.; Yang, J.; Shen, H.-B.; Chou, K.-C. Predicting Membrane Protein Types by the LLDA Algorithm. Protein Pept. Lett. 2008, 15, 915–921. [Google Scholar] [CrossRef]
- Cai, Y.D.; Zhou, G.P.; Chou, K.C. Support vector machines for predicting membrane protein types by using functional domain composition. Biophys. J. 2003, 84, 3257–3263. [Google Scholar] [CrossRef] [PubMed]
- Hu, J.; Yan, X. BS-KNN: An effective algorithm for predicting protein subchloroplast localization. Evol. Bioinform. 2011, 2011, 79–87. [Google Scholar] [CrossRef] [PubMed]
- Awais, M.; Hussain, W.; Khan, Y.D.; Rasool, N.; Khan, S.A.; Chou, K.C. iPhosH-PseAAC: Identify phosphohistidine sites in proteins by blending statistical moments and position relative features according to the Chou’s 5-step rule and general pseudo amino acid composition. IEEE/ACM Trans. Comput. Biol. Bioinform. 2019, 18, 596–610. [Google Scholar] [CrossRef] [PubMed]
- Kandaswamy, K.K.; Chou, K.-C.; Martinetz, T.; Möller, S.; Suganthan, P.; Sridharan, S.; Pugalenthi, G. AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. J. Theor. Biol. 2011, 270, 56–62. [Google Scholar] [CrossRef]
- Han, G.S.; Yu, Z.G.; Anh, V.; Krishnajith, A.P.D.; Tian, Y.C. An Ensemble Method for Predicting Subnuclear Localizations from Primary Protein Structures. PLoS ONE 2013, 8, e57225. [Google Scholar] [CrossRef]
- Akbar, S.; Khan, S.; Ali, F.; Hayat, M.; Qasim, M.; Gul, S. iHBP-DeepPSSM: Identifying hormone binding proteins using PsePSSM based evolutionary features and deep learning approach. Chemom. Intell. Lab. Syst. 2020, 204, 104103. [Google Scholar] [CrossRef]
- Ali, F.; Kumar, H.; Patil, S.; Ahmad, A.; Babour, A.; Daud, A. Deep-GHBP: Improving prediction of Growth Hormone-binding proteins using deep learning model. Biomed. Signal Process. Control 2022, 78, 103856. [Google Scholar] [CrossRef]
- Yadav, A.; Sahu, R.; Nath, A. A representation transfer learning approach for enhanced prediction of growth hormone binding proteins. Comput. Biol. Chem. 2020, 87, 107274. [Google Scholar] [CrossRef] [PubMed]
- Tang, H.; Zhao, Y.-W.; Zou, P.; Zhang, C.-M.; Chen, R.; Huang, P.; Lin, H. HBPred: A tool to identify growth hormone-binding proteins. Int. J. Biol. Sci. 2018, 14, 957–964. [Google Scholar] [CrossRef] [PubMed]
- Libbrecht, M.W.; Noble, W.S. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 2015, 16, 321–332. [Google Scholar] [CrossRef]
- Larrañaga, P.; Calvo, B.; Santana, R.; Bielza, C.; Galdiano, J.; Inza, I.; Lozano, J.A.; Armañanzas, R.; Santafé, G.; Pérez, A.; et al. Machine learning in bioinformatics. Brief. Bioinform. 2006, 7, 86–112. [Google Scholar] [CrossRef] [PubMed]
- Chou, P.Y.; Fasman, G.D. Prediction of the secondary structure of proteins from their amino acid sequence. Adv. Enzymol. Relat. Areas Mol. Biol. 1978, 47, 45–148. [Google Scholar] [CrossRef]
- Shah, A.A.; Khan, Y.D. Identification of 4-carboxyglutamate residue sites based on position based statistical feature and multiple classification. Sci. Rep. 2020, 10, 16913. [Google Scholar] [CrossRef]
- Amanat, S.; Ashraf, A.; Hussain, W.; Rasool, N.; Khan, Y.D. Identification of Lysine Carboxylation Sites in Proteins by Integrating Statistical Moments and Position Relative Features via General PseAAC. Curr. Bioinform. 2020, 15, 396–407. [Google Scholar] [CrossRef]
- Naseer, S.; Hussain, W.; Khan, Y.D.; Rasool, N. NPalmitoylDeep-PseAAC: A Predictor of N-Palmitoylation Sites in Proteins Using Deep Representations of Proteins and PseAAC via Modified 5-Steps Rule. Curr. Bioinform. 2021, 16, 294–305. [Google Scholar] [CrossRef]
- Barukab, O.; Khan, Y.D.; Khan, S.A.; Chou, K.-C. iSulfoTyr-PseAAC: Identify Tyrosine Sulfation Sites by Incorporating Statistical Moments via Chou’s 5-steps Rule and Pseudo Components. Curr. Genom. 2019, 20, 306–320. [Google Scholar] [CrossRef]
- Naseer, S.; Hussain, W.; Khan, Y.D.; Rasool, N. Optimization of serine phosphorylation prediction in proteins by comparing human engineered features and deep representations. Anal. Biochem. 2021, 615, 114069. [Google Scholar] [CrossRef]
- Naseer, S.; Hussain, W.; Khan, Y.D.; Rasool, N. iPhosS(Deep)-PseAAC: Identification of Phosphoserine Sites in Proteins Using Deep Learning on General Pseudo Amino Acid Compositions. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022, 19, 1703–1714. [Google Scholar] [CrossRef] [PubMed]
- Butt, A.H.; Khan, Y.D. CanLect-Pred: A cancer therapeutics tool for prediction of target cancerlectins using experiential annotated proteomic sequences. IEEE Access 2020, 8, 9520–9531. [Google Scholar] [CrossRef]
- Malebary, S.J.; Khan, Y.D. Evaluating machine learning methodologies for identification of cancer driver genes. Sci. Rep. 2021, 11, 12281. [Google Scholar] [CrossRef] [PubMed]
- Khan, Y.D.; Alzahrani, E.; Alghamdi, W.; Ullah, M.Z. Sequence-based Identification of Allergen Proteins Developed by Integration of PseAAC and Statistical Moments via 5-Step Rule. Curr. Bioinform. 2020, 15, 1046–1055. [Google Scholar] [CrossRef]
- Mahmood, M.K.; Ehsan, A.; Khan, Y.D.; Chou, K.-C. iHyd-LysSite (EPSV): Identifying Hydroxylysine Sites in Protein Using Statistical Formulation by Extracting Enhanced Position and Sequence Variant Feature Technique. Curr. Genom. 2020, 21, 536–545. [Google Scholar] [CrossRef]
- Hussain, W.; Rasool, N.; Khan, Y.D. A Sequence-Based Predictor of Zika Virus Proteins Developed by Integration of PseAAC and Statistical Moments. Comb. Chem. High Throughput Screen. 2020, 23, 797–804. [Google Scholar] [CrossRef]
- Awais, M.; Hussain, W.; Rasool, N.; Khan, Y.D. iTSP-PseAAC: Identifying Tumor Suppressor Proteins by Using Fully Connected Neural Network and PseAAC. Curr. Bioinform. 2021, 16, 700–709. [Google Scholar] [CrossRef]
- Malebary, S.J.; Khan, R.; Khan, Y.D. ProtoPred: Advancing Oncological Research Through Identification of Proto-Oncogene Proteins. IEEE Access 2021, 9, 68788–68797. [Google Scholar] [CrossRef]
- Naseer, S.; Ali, R.F.; Khan, Y.D.; Dominic, P.D.D. iGluK-Deep: Computational identification of lysine glutarylation sites using deep neural networks with general pseudo amino acid compositions. J. Biomol. Struct. Dyn. 2021, 40, 11691–11704. [Google Scholar] [CrossRef]
- Khan, Y.D.; Khan, N.S.; Naseer, S.; Butt, A.H. iSUMOK-PseAAC: Prediction of lysine sumoylation sites using statistical moments and Chou’s PseAAC. PeerJ 2021, 9, e11581. [Google Scholar] [CrossRef]
- Malebary, S.; Khan, Y. Identification of Antimicrobial Peptides Using Chou’s 5 Step Rule. CMC 2021, 67, 2863–2881. [Google Scholar] [CrossRef]
- Butt, A.H.; Khan, S.A.; Jamil, H.; Rasool, N.; Khan, Y.D. A Prediction Model for Membrane Proteins Using Moments Based Features. BioMed Res. Int. 2016, 2016, 8370132. [Google Scholar] [CrossRef] [PubMed]
- Butt, A.H.; Rasool, N.; Khan, Y.D. A Treatise to Computational Approaches towards Prediction of Membrane Protein and Its Subtypes. J. Membr. Biol. 2017, 250, 55–76. [Google Scholar] [CrossRef] [PubMed]
- Butt, A.H.; Mahmood, M.K.; Khan, Y.D. An Exposition Analysis of Facial Expression Recognition Techniques. Pak. J. Sci. 2016, 68, 357–365. [Google Scholar]
- Yap, P.T.; Paramesran, R.; Ong, S.H. Image analysis using Hahn moments. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 2057–2062. [Google Scholar] [CrossRef]
- Butt, A.H.; Khan, Y.D. Prediction of S-Sulfenylation Sites Using Statistical Moments Based Features via CHOU’S 5-Step Rule. Int. J. Pept. Res. Ther. 2019, 26, 1291–1301. [Google Scholar] [CrossRef]
- Butt, A.H.; Rasool, N.; Khan, Y.D. Prediction of antioxidant proteins by incorporating statistical moments based features into Chou’s PseAAC. J. Theor. Biol. 2019, 473, 1–8. [Google Scholar] [CrossRef]
- Butt, A.H.; Rasool, N.; Khan, Y.D. Predicting membrane proteins and their types by extracting various sequence features into Chou’s general PseAAC. Mol. Biol. Rep. 2018, 45, 2295–2306. [Google Scholar] [CrossRef]
- Goh, H.-A.; Chong, C.-W.; Besar, R.; Abas, F.S.; Sim, K.-S. Translation and Scale Invariants of Hahn Moments. Int. J. Image Graph. 2009, 9, 271–285. [Google Scholar] [CrossRef]
- Liu, B.; Gao, X.; Zhang, H. BioSeq-Analysis2.0: An updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res. 2019, 47, e127. [Google Scholar] [CrossRef]
- Liu, B. BioSeq-Analysis: A platform for DNA, RNA and protein sequence analysis based on machine learning approaches. Brief Bioinform. 2019, 20, 1280–1294. [Google Scholar] [CrossRef] [PubMed]
- Freund, Y.; Schapire, R.E. A desicion-theoretic generalization of on-line learning and an application to boosting. In Computational Learning Theory; Vitányi, P., Ed.; In Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 1995; pp. 23–37. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. Available online: http://arxiv.org/abs/1201.0490 (accessed on 30 October 2022).
- Identification of Hormone-Binding Proteins Using a Novel Ensemble Classifier|SpringerLink. Available online: https://link.springer.com/article/10.1007/s00607-018-0682-x (accessed on 30 October 2022).
- iGHBP: Computational Identification of Growth Hormone Binding Proteins from Sequences Using Extremely Randomised Tree—ScienceDirect. Available online: https://www.sciencedirect.com/science/article/pii/S2001037018301168 (accessed on 30 October 2022).
Protein Sequences | Benchmark Dataset | Independent Dataset | Overall Dataset |
---|---|---|---|
HBPs | 358 | 50 | 408 |
Non-HBPs | 358 | 50 | 408 |
Overall | 716 | 100 | 816 |
Method | SN (%) | SP (%) | ACC (%) | MCC | F-Score | AUC | AUPRC |
---|---|---|---|---|---|---|---|
Wang et al. [45] | 92.7 | 87.9 | 90.7 | -- | -- | -- | -- |
HBPred [13] | 88.6 | 81.3 | 84.9 | -- | -- | -- | -- |
BioSeq-SVM [41] | 70.7 | 63.4 | 67.1 | -- | -- | -- | -- |
BioSeq-RF [42] | 70.7 | 74.8 | 72.8 | -- | -- | -- | -- |
Proposed Method | 94.9 | 93.8 | 94.4 | 0.8875 | 0.9438 | 0.98 | 0.99 |
Method | SN (%) | SP (%) | ACC (%) | MCC | F-Score | AUC | AUPRC |
---|---|---|---|---|---|---|---|
HBPred [13] | 80.43 | 56.52 | 68.48 | -- | -- | -- | -- |
iGHBP [46] | 80.71 | 83.90 | 82.31 | 0.650 | -- | -- | -- |
HBPred_2.0 [13] | 89.18 | 80.43 | 84.78 | 0.698 | -- | -- | -- |
Proposed Method | 93.8 | 95.6 | 94.6 | 0.8929 | 0.9472 | 0.98 | 0.98 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Butt, A.H.; Alkhalifah, T.; Alturise, F.; Khan, Y.D. Ensemble Learning for Hormone Binding Protein Prediction: A Promising Approach for Early Diagnosis of Thyroid Hormone Disorders in Serum. Diagnostics 2023, 13, 1940. https://doi.org/10.3390/diagnostics13111940
Butt AH, Alkhalifah T, Alturise F, Khan YD. Ensemble Learning for Hormone Binding Protein Prediction: A Promising Approach for Early Diagnosis of Thyroid Hormone Disorders in Serum. Diagnostics. 2023; 13(11):1940. https://doi.org/10.3390/diagnostics13111940
Chicago/Turabian StyleButt, Ahmad Hassan, Tamim Alkhalifah, Fahad Alturise, and Yaser Daanial Khan. 2023. "Ensemble Learning for Hormone Binding Protein Prediction: A Promising Approach for Early Diagnosis of Thyroid Hormone Disorders in Serum" Diagnostics 13, no. 11: 1940. https://doi.org/10.3390/diagnostics13111940
APA StyleButt, A. H., Alkhalifah, T., Alturise, F., & Khan, Y. D. (2023). Ensemble Learning for Hormone Binding Protein Prediction: A Promising Approach for Early Diagnosis of Thyroid Hormone Disorders in Serum. Diagnostics, 13(11), 1940. https://doi.org/10.3390/diagnostics13111940