EnsembleNPPred: A Robust Approach to Neuropeptide Prediction and Recognition Using Ensemble Machine Learning and Deep Learning Methods
Abstract
1. Introduction
2. Materials and Methods
2.1. Dataset Preparation
2.2. Feature Extraction and Feature Engineering
- (1)
- AAC descriptors: These represent the relative abundance of each amino acid type in a protein sequence. The proportions of all 20 standard amino acids [A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y] are calculated (AAC1-AAC20).
- (2)
- Chou’s pseudo amino acid composition (PseAAC): PseAAC converts protein sequences of varying lengths into fixed-length numerical feature vectors, incorporating sequence-order information. Unlike AAC, PseAAC captures more detailed information, making it suitable for various sequence-based prediction tasks [28,29]. In this study, PseAAC was computed with parameters λ = 3 and weight = 0.05, resulting in 23 dimensions (PAAC1-PAAC23). Additional PseAAC variants were also calculated: parallel correlations (PsePC1-PsePC22), series correlations (PseSC1-PseSC26), and amphiphilic pseudo AACs based on hydrophobicity (APAAC1-APAAC23) and hydrophilicity correlation functions (APAAC24-APAAC46).
- (3)
- CTD descriptors: These are derived from grouped amino acid compositions [30,31]. These include composition descriptors (CTDC1-CTDC21), transition descriptors (CTDT1-CTDT21), and distribution descriptors (CTDD1-CTDD105). These descriptors were calculated using the protr R package (1.7.4) [32], with amino acids classified into three groups based on seven physicochemical properties: normalized van der Waals volume, charge, hydrophobicity, polarity, secondary structure, and solvent accessibility.
- (4)
- Quasi-sequence-order descriptors: These are based on the distance matrix of the 20 amino acids [33] and include sequence-order-coupling numbers (SOCN1-SOCN6) and quasi-sequence-order descriptors (QSO1-QSO46), computed with lag = 3 and weight = 0.1.
- (5)
- Physicochemical and topological property-related features: These encompass the Crucian properties covariance index (Crucian1–Crucian3) [34], Z-scales (zscales1–zscales5) [35], factor analysis scales of generalized amino acid information (fasgai1–fasgai6) [36], T-scales (tScales1–tScales5) [37], VHSE-scales (vhsescales1–vhsescales8) [38], protFPs (protFP1–protFP8) [39], ST-scales (stscales1–stscales8) [40], MS-WHIM scores (mswhimscore1–mswhimscore3) [41], the aliphatic index (aIndex) [42], Geary autocorrelations (geary1–geary12), autocovariance index (autocov) [42], potential protein interaction index (Boman) [43], cross-covariance indices (Crosscov1–Crosscov2), net charge (Charge), instability index (Instaindex) [44], hydrophobic moment for alpha helices (Hmoment1), hydrophobic moment for beta sheets (Hmoment2), BLOSUM matrix-derived descriptors (Blosum1–8), and isoelectric point (pI) calculated using the peptide R package [45].
- (6)
- Occurrence of 2-mer and selected 3-mer motifs: Initially, all possible 2-mers (400 dimensions) were generated and retained. Then, 3-mers (8000 dimensions) were generated, and only those significantly different between positive and negative data, as determined by log-odds and MERCI [46] scores, were selected. The selected 3-mer motifs include: ALP, DFI, DTD, ENL, ETI, FLP, FYP, GLQ, GPF, HLP, HPF, IAW, IFP, IKW, IPA, IPP, IYP, KDQ, KRI, KVL, LAV, LHL, LLE, LMR, MFL, NPC, NVP, NWN, PAG, PEV, PFP, PGA, PIP, PIT, PKH, PLP, PSE, PTH, PVP, PYP, QTP, RLN, RND, STC, TKE, TLE, TLV, TST, VKE, VLP, VPP, VPQ, VRP, VYP, WLP, YNP, and YST motifs.
- (7)
- Secondary structure conformation features: The propensities for aggregation, amyloid formation, turns, alpha-helices, helical aggregation, and beta-strand structures were calculated using the Tango program [47] (tango1-tango6).
- (8)
- Composite features for neuropeptides: To enhance prediction with more informative features, we implemented a method for generating composite features by combining significant attributes using a logistic regression model. Multiple composite features were developed and evaluated through a 10-fold cross-validation process (referred to as logistic1-logistic15). The detailed process for constructing these composite features is outlined in the hybrid feature section of ensemble-AMPPred [48]. A set of selected features was used to fit a logistic regression model, which is expressed by the following equation:
2.3. Feature Selection
2.4. Base Classifier Selection and Model Implementation
3. Results and Discussion
3.1. Amino Acid Composition and Positional Residue Analysis
3.2. 10-Fold Cross-Validation of Predictive Performance with the Training Dataset
3.3. Feature Interpretability and Importance
3.4. Performance Comparison of Various Existing Predictive Models
3.5. Performance Across Diverse Neuropeptide Families
3.6. Evaluation of False Positive Rates for EnsembleNPPred
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Schoofs, L.; De Loof, A.; Van Hiel, M.B. Neuropeptides as Regulators of Behavior in Insects. Annu. Rev. Entomol. 2017, 62, 35–52. [Google Scholar] [CrossRef] [PubMed]
- Burbach, J.P. Neuropeptides from concept to online database www.neuropeptides.nl. Eur. J. Pharmacol. 2010, 626, 27–48. [Google Scholar] [CrossRef] [PubMed]
- Kupcova, I.; Danisovic, L.; Grgac, I.; Harsanyi, S. Anxiety and Depression: What Do We Know of Neuropeptides? Behav. Sci. 2022, 12, 262. [Google Scholar] [CrossRef]
- Burbach, J. What are neuropeptides? Methods Mol. Biol. 2011, 789, 1–36. [Google Scholar] [CrossRef] [PubMed]
- Elphick, M.R.; Mirabeau, O.; Larhammar, D. Evolution of neuropeptide signalling systems. J. Exp. Biol. 2018, 221, jeb151092. [Google Scholar] [CrossRef] [PubMed]
- Ofer, D.; Linial, M. NeuroPID: A predictor for identifying neuropeptide precursors from metazoan proteomes. Bioinformatics 2014, 30, 931–940. [Google Scholar] [CrossRef] [PubMed]
- Nässel, D.R.; Zandawala, M. Recent advances in neuropeptide signaling in Drosophila, from genes to physiology and behavior. Prog. Neurobiol. 2019, 179, 101607. [Google Scholar] [CrossRef]
- Bhat, U.S.; Shahi, N.; Surendran, S.; Babu, K. Neuropeptides and Behaviors: How Small Peptides Regulate Nervous System Function and Behavioral Outputs. Front. Mol. Neurosci. 2021, 14, 786471. [Google Scholar] [CrossRef]
- Sharma, D.; Kumar, K.; Bisht, G.S. A Mini-Review on Potential of Neuropeptides as Future Therapeutics. Int. J. Pept. Res. Ther. 2022, 28, 39. [Google Scholar] [CrossRef]
- An, M.Y.; Gao, J.; Zhao, X.F.; Wang, J.X. A new subfamily of penaeidin with an additional serine-rich region from kuruma shrimp (Marsupenaeus japonicus) contributes to antimicrobial and phagocytic activities. Dev. Comp. Immunol. 2016, 59, 186–198. [Google Scholar] [CrossRef]
- Blanchet, X.; Weber, C.; von Hundelshausen, P. Chemokine Heteromers and Their Impact on Cellular Function-A Conceptual Framework. Int. J. Mol. Sci. 2023, 24, 10925. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Wei, P.; Keller, C.; Li, L. Neuropeptides in gut-brain axis and their influence on host immunity and stress. Comput. Struct. Biotechnol. J. 2020, 18, 843–851. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Florea, G.; Tudorache, I.F.; Fuior, E.V.; Ionita, R.; Dumitrescu, M.; Fenyo, I.M.; Bivol, V.G.; Gafencu, A.V. Apolipoprotein A-II, a Player in Multiple Processes and Diseases. Biomedicines 2022, 10, 1578. [Google Scholar] [CrossRef]
- Zhuang, J.; Zhang, Y.D.; Sun, W.X.; Zong, J.; Li, J.; Dai, X.; Klosterman, S.J. The acyl-CoA-binding protein VdAcb1 is essential for carbon starvation response and contributes to virulence in Verticillium dahliae. aBIOTECH 2024, 5, 431–448. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Wang, M.; Yin, S.; Jang, R.; Wang, J.; Xue, Z.; Xu, T. NeuroPep: A comprehensive resource of neuropeptides. Database 2015, 2015, bav038. [Google Scholar] [CrossRef] [PubMed]
- Wang, M.; Wang, L.; Xu, W.; Chu, Z.; Wang, H.; Lu, J.; Xue, Z.; Wang, Y. NeuroPep 2.0: An Updated Database Dedicated to Neuropeptide and Its Receptor Annotations. J. Mol. Biol. 2024, 436, 168416. [Google Scholar] [CrossRef]
- Agrawal, P.; Kumar, S.; Singh, A.; Raghava, G.; Singh, I.K. NeuroPIpred: A tool to predict, design and scan insect neuropeptides. Sci. Rep. 2019, 9, 5129. [Google Scholar] [CrossRef]
- Bin, Y.; Zhang, W.; Tang, W.; Dai, R.; Li, M.; Zhu, Q.; Xia, J. Prediction of Neuropeptides from Sequence Information Using Ensemble Classifier and Hybrid Features. J. Proteome Res. 2020, 19, 3732–3740. [Google Scholar] [CrossRef]
- Jiang, M.; Zhao, B.; Luo, S.; Wang, Q.; Chu, Y.; Chen, T.; Mao, X.; Liu, y.; Wang, y.; Jiang, X.; et al. NeuroPpred-Fuse: An interpretable stacking model for prediction of neuropeptides by fusing sequence information and feature selection methods. Brief Bioinform. 2021, 22, bbab310. [Google Scholar] [CrossRef]
- Hasan, M.M.; Alam, M.A.; Shoombuatong, W.; Deng, H.W.; Manavalan, B.; Kurata, H. NeuroPred-FRL: An interpretable prediction model for identifying neuropeptide using feature representation learning. Brief Bioinform. 2021, 22, bbab167. [Google Scholar] [CrossRef]
- Chen, S.; Li, Q.; Zhao, J.; Bin, Y.; Zheng, C. NeuroPred-CLQ: Incorporating deep temporal convolutional networks and multi-head attention mechanism to predict neuropeptides. Brief Bioinform. 2022, 23, bbac319. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; Huang, C.; Wang, M.; Xue, Z.; Wang, Y. NeuroPred-PLM: An interpretable and robust model for neuropeptide prediction by protein language model. Brief Bioinform. 2023, 24, bbad077. [Google Scholar] [CrossRef]
- Farias, J.G.; Herrera-Belén, L.; Jimenez, L.; Beltrán, J.F. PROTA: A Robust Tool for Protamine Prediction Using a Hybrid Approach of Machine Learning and Deep Learning. Int. J. Mol. Sci. 2024, 25, 10267. [Google Scholar] [CrossRef] [PubMed]
- Li, C.; Wang, H.; Wen, Y.; Yin, R.; Zeng, X.; Li, K. GenoM7GNet: An Efficient N7-Methylguanosine Site Prediction Approach Based on a Nucleotide Language Model. IEEE/ACM Trans. Comput. Biol. Bioinf. 2024, 21, 6. [Google Scholar] [CrossRef]
- Zhang, R.; Lin, Y.; Wu, Y.; Deng, L.; Zhang, H.; Liao, M. MvMRL: Multi-view molecular representation learning with cross-attention for bioactivity prediction. Brief Bioinform. 2024, 25, bbae298. [Google Scholar] [CrossRef]
- Alarfaj, F.K.; Khan, J.A. Deep Dive into Fake News Detection: Feature-Centric Classification with Ensemble and Deep Learning Methods. Algorithms 2023, 16, 507. [Google Scholar] [CrossRef]
- Borandag, E. Software Fault Prediction Using an RNN-Based Deep Learning Approach and Ensemble Machine Learning Techniques. Appl. Sci. 2023, 13, 1639. [Google Scholar] [CrossRef]
- Chou, K.C. Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem. Biophys. Res. Commun. 2000, 278, 477–483. [Google Scholar] [CrossRef]
- Chou, K.C. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 2005, 21, 10–19. [Google Scholar] [CrossRef]
- Dubchak, I.; Muchnik, I.; Holbrook, S.R.; Kim, S.H. Prediction of protein folding class using global description of amino acid sequence. Proc. Natl. Acad. Sci. USA 1995, 92, 8700–8704. [Google Scholar] [CrossRef]
- Dubchak, I.; Muchnik, I.; Mayor, C.; Dralyuk, I.; Kim, S. Recognition of a protein fold in the context of the scop classification. Proteins Struct. Funct. Genet. 1999, 35, 401–407. [Google Scholar] [CrossRef]
- Xiao, N.; Cao, D.S.; Zhu, M.F.; Xu, Q.S. protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics 2015, 31, 1857–1859. [Google Scholar] [CrossRef] [PubMed]
- Chou, K.C. Some remarks on protein attribute prediction and pseudo amino acid composition (50th anniversary year re-view). J. Theor. Biol. 2011, 273, 236–247. [Google Scholar] [CrossRef] [PubMed]
- Cruciani, G.; Baroni, M.; Carosati, E.; Clementi, M.; Valigi, R.; Clementi, S. Peptide studies by means of principal properties of amino acids derived from MIF descriptors. J. Chemom. 2004, 18, 146–155. [Google Scholar] [CrossRef]
- Sandberg, M.; Eriksson, L.; Jonsson, J.; Sjostrom, M.; Wold, S. New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. J. Med. Chem. 1998, 41, 2481–2491. [Google Scholar] [CrossRef]
- Liang, G.; Li, Z. Factor analysis scale of generalized amino acid information as the source of a new set of descriptors for elucidating the structure and activity relationships of cationic antimicrobial peptides. Mol. Inform. 2007, 26, 754–763. [Google Scholar] [CrossRef]
- Tian, F.; Zhou, P.; Li, Z. T-scale as a novel vector of topological descriptors for amino acids and its application in QSARs of peptides. J. Mol. Struct. 2007, 830, 106–115. [Google Scholar] [CrossRef]
- Mei, H.U.; Liao, Z.H.; Zhou, Y.; Li, S.Z. A new set of amino acid descriptors and its application in peptide QSARs. Pept. Sci. 2005, 80, 775–786. [Google Scholar] [CrossRef]
- van Westen, G.J.; Swier, R.F.; Wegner, J.K.; IJzerman, A.P.; van Vlijmen, H.W.; Bender, A. Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): Comparative study of 13 amino acid descriptor sets. J. Cheminformatics 2013, 5, 41. [Google Scholar] [CrossRef]
- Yang, L.; Shu, M.; Ma, K.; Mei, H.; Jiang, Y.; Li, Z. ST-scale as a novel amino acid descriptor and its application in QSAM of peptides and analogues. Amino Acids 2010, 38, 805–816. [Google Scholar] [CrossRef]
- Zaliani, A.; Gancia, E. MS-WHIM scores for amino acids: A new 3D-description for peptide QSAR and QSPR studies. J. Chem. Inf. Comput. Sci. 1999, 39, 525–533. [Google Scholar] [CrossRef]
- Ikai, A. Thermostability and aliphatic index of globular proteins. J. Biochem. 1980, 88, 1895–1898. [Google Scholar] [PubMed]
- Boman, H.G. Antibacterial peptides: Basic facts and emerging concepts. J. InternalMedicine 2003, 254, 197–215. [Google Scholar] [CrossRef] [PubMed]
- Guruprasad, K.; Reddy, B.V.; Pandit, M.W. Correlation between stability of a protein and its dipeptide composition: A novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Eng. 1990, 4, 155–161. [Google Scholar] [CrossRef] [PubMed]
- Osorio, D.; Rondon-Villarreal, P.; Torres, R. Peptides: A package for data mining of antimicrobial peptides. R J. 2015, 7, 4–14. [Google Scholar] [CrossRef]
- Vens, C.; Rosso, M.; Danchin, E. Identifying discriminative classification-based motifs in biological sequences. Bioinformatics 2011, 27, 1231–1238. [Google Scholar] [CrossRef]
- Fernandez-Escamilla, A.M.; Rousseau, F.; Schymkowitz, J.; Serrano, L. Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat. Biotech. 2004, 22, 1302–1306. [Google Scholar] [CrossRef]
- Lertampaiporn, S.; Vorapreeda, T.; Hongsthong, A.; Thammarongtham, C. Ensemble-AMPPred: Robust AMP Prediction and Recognition Using the Ensemble Learning Method with a New Hybrid Feature for Differentiating AMPs. Genes 2021, 12, 137. [Google Scholar] [CrossRef]
- Kononenko, I. Estimating attributes: Analysis and extensions of RELIEF. In Machine Learning: ECML-94, Proceedings of the European Conference on Machine Learning, Catania, Italy, 6–8 April 1994; Bergadano, F., De Raedt, L., Eds.; Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence); Springer: Berlin/Heidelberg, Germany, 1994; Volume 784. [Google Scholar] [CrossRef]
- Vapnik, V.N. An overview of statistical learning theory. IEEE Trans. Neural Netw. 1999, 10, 988–999. [Google Scholar] [CrossRef]
- Weinberger, K.Q.; Saul, L.K. Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn Res. 2009, 10, 207–244. [Google Scholar]
- Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Kumar, M.; Thakur, V.; Raghava, G.P. COPid: Composition-based protein identification. Silico Biol. 2008, 8, 121–128. [Google Scholar] [CrossRef]
- Wagih, O. ggseqlogo: A versatile R package for drawing sequence logos. Bioinformatics 2017, 33, 3645–3647. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
- De Zutter, A.; Van Damme, J.; Struyf, S. The Role of Post-Translational Modifications of Chemokines by CD26 in Cancer. Cancers 2021, 13, 4247. [Google Scholar] [CrossRef]
- Vanheule, V.; Metzemaekers, M.; Janssens, R.; Struyf, S.; Proost, P. How post-translational modifications influence the biological activity of chemokines. Cytokine 2018, 109, 29–51. [Google Scholar] [CrossRef] [PubMed]
- Londraville, R.; Prokop, J.; Duff, R.; Liu, Q.; Tuttle, M. On the Molecular Evolution of Leptin, Leptin Receptor, and Endospanin. Front. Endocrinol. 2017, 8, 58. [Google Scholar] [CrossRef] [PubMed]
- Wardman, J.H.; Berezniuk, I.; Di, S.; Tasker, J.G.; Fricker, L.D. ProSAAS-Derived Peptides are Colocalized with Neuropeptide Y and Function as Neuropeptides in the Regulation of Food Intake. PLoS ONE 2011, 6, e28152. [Google Scholar] [CrossRef]
- Meng, X.; McGraw, C.M.; Wang, W.; Jing, J.; Yeh, S.; Wang, L.; Lopez, J.; Brown, A.M.; Lin, T.; Chen, W.; et al. Neurexophilin4 is a selectively expressed α-neurexin ligand that modulates specific cerebellar synapses and motor functions. eLife 2019, 8, e46773. [Google Scholar] [CrossRef] [PubMed]
- Spence, M.A.; Mortimer, M.D.; Buckle, A.M.; Minh, B.Q.; Jackson, C.J. A Comprehensive Phylogenetic Analysis of the Serpin Superfamily. Mol. Biol. Evol. 2021, 38, 2915–2929. [Google Scholar] [CrossRef]
- Nillni, E.A. Regulation of Prohormone Convertases in Hypothalamic Neurons: Implications for ProThyrotropin-Releasing Hormone and Proopiomelanocortin. Endocrinology 2007, 148, 4191–4200. [Google Scholar] [CrossRef]
- Southey, B.R.; Romanova, E.V.; Rodriguez-Zas, S.L.; Sweedler, J.V. Bioinformatics for Prohormone and Neuropeptide Discovery. Methods Mol. Biol. 2018, 1719, 71–96. [Google Scholar] [CrossRef]
- Kang, X.; Dong, F.; Shi, C.; Liu, S.; Sun, J.; Chen, J.; Li, H.; Xu, H.; Lao, X.; Zheng, H. DRAMP 2.0, an updated data repository of antimicrobial peptides. Sci. Data 2019, 6, 148. [Google Scholar] [CrossRef]
- Nässel, D.R.; Zandawala, M.; Kawada, T.; Satake, H. Tachykinins: Neuropeptides That Are Ancient, Diverse, Widespread and Functionally Pleiotropic. Front. Neurosci. 2019, 13, 1262. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Lai, S.H.; Chye, M.L. Plant Acyl-CoA-Binding Proteins-Their Lipid and Protein Interactors in Abiotic and Biotic Stresses. Cells 2021, 10, 1064. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Liu, J.; Chu, S.; Zhou, X.; Zhang, D.; Chen, N. Role of chemokines in Parkinson’s disease. Brain Res. Bull. 2019, 152, 11–18. [Google Scholar] [CrossRef] [PubMed]
- Holzer, P.; Reichmann, F.; Farzi, A. Neuropeptide Y, peptide YY and pancreatic polypeptide in the gut-brain axis. Neuropeptides 2012, 46, 261–274. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Wu, S.; Bekhit, A.E.D.A.; Wu, Q.; Chen, M.; Liao, X.; Wang, J.; Ding, Y. Bioactive peptides and gut microbiota: Candidates for a novel strategy for reduction and control of neurodegenerative diseases. Trends Food Sci. Technol. 2021, 108, 164–176. [Google Scholar] [CrossRef]
Physicochemical Property | Positive Data (NPs) | Negative Data (Non-NPs) |
---|---|---|
Average Length (amino acid residue) | 23.93 | 26.93 |
% Charged Residues (DEKHR) | 24.41 | 22.81 |
% Aliphatic Residues (ILV) | 16.04 | 21.78 |
% Aromatic Residues (FHWY) | 12.84 | 11.22 |
% Polar Residues (DERKQN) | 30.79 | 28.25 |
% Neutral Residues (AGHPSTY) | 41.27 | 35.01 |
% Hydrophobic Residues (CVLIMFW) | 27.94 | 36.74 |
% Positively Charged Residues (HKR) | 12.98 | 14.82 |
% Negatively Charged Residues (DE) | 11.43 | 7.99 |
% Tiny Residues (ACDGST) | 36.58 | 32.01 |
% Small Residues (EHILKMNPQV) | 45.85 | 53.04 |
% Large Residues (FRWY) | 17.57 | 14.95 |
Model | ACC | MCC | Sn | Sp | AUC | 95% ROC CI |
---|---|---|---|---|---|---|
SVM | 93.819 | 0.876 | 0.940 | 0.936 | 0.986 | [0.973–0.995] |
KNN | 90.570 | 0.814 | 0.870 | 0.942 | 0.968 | [0.954–0.981] |
DT | 88.569 | 0.772 | 0.876 | 0.895 | 0.877 | [0.853–0.899] |
RF | 93.324 | 0.866 | 0.930 | 0.936 | 0.985 | [0.971–0.994] |
ET | 93.779 | 0.876 | 0.938 | 0.937 | 0.986 | [0.973–0.996] |
XGB | 91.660 | 0.833 | 0.915 | 0.918 | 0.972 | [0.963–0.984] |
DL | 93.918 | 0.878 | 0.939 | 0.938 | 0.986 | [0.973–0.996] |
Ensemble | 93.978 | 0.880 | 0.939 | 0.941 | 0.987 | [0.975–0.996] |
Method | ACC | MCC | Sn | Sp | AUC |
---|---|---|---|---|---|
NeuroPpred-Fuse | 0.906 | 0.813 | 0.882 | 0.930 | 0.958 |
PredNeuroP | 0.897 | 0.794 | 0.886 | 0.907 | 0.954 |
NeuroPred-FRL | 0.900 | 0.803 | 0.946 | 0.854 | 0.965 |
NeuroPIpred | 0.536 | 0.074 | 0.331 | 0.736 | 0.581 |
NeuroPred-CLQ | 0.936 | 0.875 | 0.897 | 0.975 | 0.988 |
EnsembleNPPred | 0.940 | 0.881 | 0.962 | 0.918 | 0.990 |
Method | ACC | MCC | Precision | Recall | F1 |
---|---|---|---|---|---|
PredNeuroP | 0.864 | 0.738 | 0.935 | 0.782 | 0.852 |
NeuroPred-FRL | 0.861 | 0.740 | 0.960 | 0.757 | 0.847 |
NeuroPpred-Fuse | 0.905 | 0.813 | 0.906 | 0.908 | 0.907 |
NeuroPred-PLM | 0.922 | 0.845 | 0.907 | 0.941 | 0.924 |
EnsembleNPPred | 0.929 | 0.859 | 0.930 | 0.929 | 0.929 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lertampaiporn, S.; Wattanapornprom, W.; Thammarongtham, C.; Hongsthong, A. EnsembleNPPred: A Robust Approach to Neuropeptide Prediction and Recognition Using Ensemble Machine Learning and Deep Learning Methods. Life 2025, 15, 1010. https://doi.org/10.3390/life15071010
Lertampaiporn S, Wattanapornprom W, Thammarongtham C, Hongsthong A. EnsembleNPPred: A Robust Approach to Neuropeptide Prediction and Recognition Using Ensemble Machine Learning and Deep Learning Methods. Life. 2025; 15(7):1010. https://doi.org/10.3390/life15071010
Chicago/Turabian StyleLertampaiporn, Supatcha, Warin Wattanapornprom, Chinae Thammarongtham, and Apiradee Hongsthong. 2025. "EnsembleNPPred: A Robust Approach to Neuropeptide Prediction and Recognition Using Ensemble Machine Learning and Deep Learning Methods" Life 15, no. 7: 1010. https://doi.org/10.3390/life15071010
APA StyleLertampaiporn, S., Wattanapornprom, W., Thammarongtham, C., & Hongsthong, A. (2025). EnsembleNPPred: A Robust Approach to Neuropeptide Prediction and Recognition Using Ensemble Machine Learning and Deep Learning Methods. Life, 15(7), 1010. https://doi.org/10.3390/life15071010