Machine Learning in Preclinical Development of Antiviral Peptide Candidates: A Review of the Current Landscape
Abstract
1. Introduction
1.1. AVP Mechanisms of Action
1.2. Properties of Successful AVPs
2. Foundations of Machine Learning Methods
2.1. Unsupervised ML
2.2. Supervised ML
3. Aspects of Early-Stage Screening
3.1. Lead Identification
3.2. In Vitro Toxicity Screening
3.3. Clinical Adverse Event Screening
4. AVP Features to Consider for Model Construction
5. Current Leading ML Models
5.1. Binding Activity Prediction
5.1.1. iAVPs-ResBi
5.1.2. Virus Entry Inhibition Peptide (VEIP) Prediction Model
5.1.3. FIRM-AVP
5.1.4. Comparison of Models
5.2. Drug Toxicity Prediction
5.2.1. ATSE
5.2.2. ToxIBTL
5.2.3. tAMPer
5.2.4. ToxinPred 3.0
5.2.5. PLPTP
5.2.6. HyPepTox-Fuse
5.2.7. Comparison of Models
5.3. Clinical Adverse Event Prediction
5.4. Generation of Novel AVP Sequences
6. Current Challenges of AVP Design
7. Conclusions
8. Machine Learning Terminology
9. Database References
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| AVP | Antiviral peptide |
| ML | Machine Learning |
| AMP | Antimicrobial peptide |
| GWAS | Genome-wide association study |
| ADE | Adverse drug effect |
| RF | RandomForest |
| SVM | Support Vector Machine |
| NN | Neural Network |
| MLP | Multi-Layer Perceptron |
| DNN | Deep Neural Network |
| QSAR | Quantitative Structure-Activity Relationship |
| CNN | Convolutional Neural Network |
| GNN | Graph Neural Network |
| RNN | Recurrent Neural Network |
| GRU | Gated Recurrent Unit |
| BiGRU | Bidirectional Gated Recurrent Unit |
| LSTM | Long Short-Term Memory |
| BERT | Bidirectional Encoder Representations for Transformers |
| GPT | Generative Pre-trained Transformer |
| HTS | High-Throughput Screening |
References
- WHO. World Health Statistics 2024; WHO: Geneva, Switzerland, 2024. [Google Scholar]
- Qureshi, A. A review on current status of antiviral peptides. Discov. Viruses 2025, 2, 3. [Google Scholar] [CrossRef]
- Hajigha, M.N.; Hajikhani, B.; Vaezjalali, M.; Kafil, H.S.; Anari, R.K.; Goudarzi, M. Antiviral and antibacterial peptides: Mechanisms of action. Heliyon 2024, 10, e40121. [Google Scholar] [CrossRef] [PubMed]
- Groupe, V.; Pugh, L.H.; Weiss, D.; Kochi, M. Observations on Antiviral Activity of Viscosin. Proc. Soc. Exp. Biol. Med. 1951, 78, 354–358. [Google Scholar] [CrossRef] [PubMed]
- Putri, W.A.; Setiawan, J.; Sofyantoro, F.; Mafiroh, W.U.; Priyono, D.S.; Septriani, N.I.; Siregar, A.R.; Purwestri, Y.A.; Wibowo, A.T.; Nuringtyas, T.R. Global Antiviral Peptide Research: A Bibliometric Analysis from 1951 to 2022. J. Fac. Sci. 2024, 29, 229–251. [Google Scholar] [CrossRef]
- Chowdhury, T.; Mandal, S.M.; Kumari, R.; Ghosh, A.K. Purification and characterization of a novel antimicrobial peptide (QAK) from the hemolymph of Antheraea mylitta. Biochem. Biophys. Res. Commun. 2020, 527, 411–417. [Google Scholar] [CrossRef]
- Jaskiewicz, M.; Orlowska, M.; Olizarowicz, G.; Migon, D.; Grzywacz, D.; Kamysz, W. Rapid Screening of Antimicrobial Synthetic Peptides. Int. J. Pep. Res. Ther. 2015, 22, 155–161. [Google Scholar] [CrossRef]
- Chen, W.; Hwang, Y.Y.; Gleaton, J.W.; Titus, J.K.; Hamlin, N.J. Optimization of a Peptide Extraction and LC-MS Protocol for Quantitative Analysis of Antimicrobial Peptides. Future Sci. OA 2018, 5, FSO348. [Google Scholar] [CrossRef]
- Ma, X.; Liang, Y.; Zhang, S. iAVPs-ResBi: Identifying antiviral peptides by using deep residual network and bidirectional gated recurrent unit. Math. Biosci. Eng. 2023, 20, 21563–21587. [Google Scholar] [CrossRef]
- Maleki, M.S.M.; Sardari, S.; Alavijeh, A.G.; Madanchi, H. Recent Patents and FDA-Approved Drugs Based on Antiviral Peptides and Other Peptide-Related Antivirals. Int. J. Pep. Res. Ther. 2022, 29, 5. [Google Scholar] [CrossRef]
- Matthews, T.; Salgo, M.; Greenberg, M.; Chung, J.; DeMasi, R.; Bolognesi, D. Enfuvirtide: The first therapy to inhibit the entry of HIV-1 into host CD4 lymphocytes. Nat. Rev. Drug Discov. 2004, 3, 215–225. [Google Scholar] [CrossRef]
- Llibre, J.M.; Aberg, J.A.; Walmsley, S.; Velez, J.; Zala, C.; Crabtree Ramirez, B.; Shepherd, B.; Shah, R.; Clark, A.; Tenorio, A.R.; et al. Long-term safety and impact of immune recovery in heavily treatment-experienced adults receiving fostemsavir for up to 5 years in the phase 3 BRIGHTE study. Front. Immunol. 2024, 15, 1394644. [Google Scholar] [CrossRef] [PubMed]
- Marković, V.; Szczepańska, A.; Berlicki, L. Antiviral Protein-Protein Interaction Inhibitors. J. Med. Chem. 2024, 67, 3205–3231. [Google Scholar] [CrossRef] [PubMed]
- Quagliata, M.; Stincarelli, M.A.; Papini, A.M.; Giannecchini, S.; Rovero, P. Antiviral Activity against SARS-CoV-2 of Conformationally Constrained Helical Peptides Derived from Angiotensin-Converting Enzyme 2. ACS Omega 2023, 8, 22665–22672. [Google Scholar] [CrossRef] [PubMed]
- Hu, H.; Guo, N.; Chen, S.; Guo, X.; Liu, X.; Ye, S.; Chai, Q.; Wang, Y.; Liu, B.; He, Q. Antiviral activity of Piscidin 1 against pseudorabies virus both in vitro and in vivo. Virol. J. 2019, 16, 95. [Google Scholar] [CrossRef]
- Wang, X.; Ni, D.; Liu, Y.; Lu, S. Rational Design of Peptide-Based Inhibitors Disrupting Protein-Protein Interactions. Front. Chem. 2021, 9, 682675. [Google Scholar] [CrossRef]
- Zuend, C.F.; Nomellini, J.F.; Smit, J.; Horwitz, M.S. Generation of a Dual-Target, Safe, Inexpensive Microbicide that Protects Against HIV-1 and HSV-2 Disease. Sci. Rep. 2018, 8, 95. [Google Scholar] [CrossRef]
- Johansen-Leete, J.; Ullrich, S.; Fry, S.E.; Frkic, R.; Bedding, M.J.; Aggarwal, A.; Ashhurst, A.S.; Ekanayake, K.B.; Mahawaththa, M.C.; Sasi, V.M.; et al. Antiviral cyclic peptides targeting the main protease of SARS-CoV-2. Chem. Sci. 2022, 13, 3826–3836. [Google Scholar] [CrossRef]
- Portal-Núñez, S.; González-Navarro, C.J.; García-Delgado, M.; Vizmanos, J.L.; Lasarte, J.J.; Borrás-Cuesta, F. Peptide inhibitors of hepatitis C virus NS3 protease. Antivir. Chem. Chemother. 2003, 14, 225–233. [Google Scholar] [CrossRef]
- Chernysh, S.; Kim, S.I.; Bekker, G.; Pleskach, V.A.; Filatova, N.A.; Anikin, V.B.; Platonov, V.G.; Bulet, P. Antiviral and antitumor peptides from insects. Proc. Natl. Acad. Sci. USA 2002, 99, 12628–12632. [Google Scholar] [CrossRef]
- Sirén, J.; Imaizumi, T.; Sarkar, D.; Pietilä, T.; Noah, D.L.; Lin, R.; Hiscott, J.; Krug, R.M.; Fisher, P.B.; Julkunen, I.; et al. Retinoic acid inducible gene-I and mda-5 are involved in influenza A virus-induced expression of antiviral cytokines. Microbes Infect. 2006, 8, 2013–2020. [Google Scholar] [CrossRef]
- Lee, E.Y.; Lee, M.W.; Wong, G.C.L. Modulation of toll-like receptor signaling by antimicrobial peptides. Semin. Cell. Dev. Biol. 2019, 88, 173–184. [Google Scholar] [CrossRef]
- Wang, R.-R.; Yang, L.-M.; Wang, Y.-H.; Pang, W.; Tam, S.-C.; Tien, P.; Zheng, Y.-T. Sifuvirtide, a potent HIV fusion inhibitor peptide. Biochem. Biophys. Res. Commun. 2009, 382, 540–544. [Google Scholar] [CrossRef]
- Crack, L.R.; Jones, L.; Malavige, G.N.; Patel, V.; Ogg, G.S. Human antimicrobial peptides LL-37 and human β-defensin-2 reduce viral replication in keratinocytes infected with varicella zoster virus. CED 2011, 37, 534–543. [Google Scholar] [CrossRef] [PubMed]
- Guo, X.; An, Y.; Tan, W.; Ma, L.; Wang, M.; Li, J.; Li, B.; Hou, W.; Wu, L. Cathelicidin-derived antiviral peptide inhibits herpes simplex virus 1 infection. Front. Microbiol. 2023, 14, 1201505, Correction in Front. Microbiol. 2023, 14, 1254775. https://doi.org/10.3389/fmicb.2023.1254775. [Google Scholar] [CrossRef] [PubMed]
- Zhao, W.; Li, X.; Yu, Z.; Wu, S.; Ding, L.; Liu, J. Identification of lactoferrin-derived peptides as potential inhibitors against the main protease of SARS-CoV-2. LWT 2022, 154, 112684. [Google Scholar] [CrossRef] [PubMed]
- Falah, N.; Violot, S.; Decimo, D.; Berri, F.; Foucault-Grunenwald, M.L.; Ohlmann, T.; Schuffenecker, I.; Morfin, F.; Lina, B.; Riteau, B.; et al. Ex vivo and in vivo inhibition of human rhinovirus replication by a new pseudosubstrate of viral 2A protease. J. Virol. 2012, 28, 691–704. [Google Scholar] [CrossRef]
- Vishnepolsky, B.; Grigolava, M.; Gabrielian, A.; Rosenthal, A.; Hurt, D.; Tartakovsky, M.; Pirtskhalava, M. Analysis, Modeling, and Target-Specific Predictions of Linear Peptides Inhibiting Virus Entry. ACS Omega 2023, 8, 46218–46226. [Google Scholar] [CrossRef]
- Shaduangrat, N.; Nantasenamat, C.; Prachayasittijut, V.; Shoombuotong, W. Meta-iAVP: A Sequence-Based Meta-Predictor for Improving the Prediction of Antiviral Peptides Using Effective Feature Representation. Int. J. Mol. Sci. 2019, 20, 5743. [Google Scholar] [CrossRef]
- Chang, K.Y.; Yang, J.R. Analysis and Prediction of Highly Effective Antivifral Peptides Based on Random Forests. PLoS ONE 2013, 8, e70166. [Google Scholar] [CrossRef]
- Qureshi, A.; Tandon, H.; Kumar, M. AVP-IC50Pred: Multiple machine learning techniques-based prediction of peptide antiviral activity in terms of half maximal inhibitory concentration (IC50). Pep. Sci. 2015, 104, 753–763. [Google Scholar] [CrossRef]
- Shai, Y. Mode of action of membrane active antimicrobial peptides. Pep. Sci. 2002, 66, 236–248. [Google Scholar] [CrossRef] [PubMed]
- Sadler, K.; Eom, K.D.; Yang, J.L.; Dimitrova, Y.; Tam, J.P. Translocating proline-rich peptides from the antimicrobial peptide bactenecin 7. Biochemistry 2002, 41, 14150–14157. [Google Scholar] [CrossRef] [PubMed]
- Le, C.F.; Fang, C.M.; Sekaran, S.D. Intracellular targeting mechanisms by antimicrobial peptides. Antimicrob. Agents Chemother. 2017, 61, e02340-16. [Google Scholar] [CrossRef] [PubMed]
- Pirtskhalava, M.; Vishnepolsky, B.; Grigolava, M.; Managadze, G. Physicochemical Features and Peculiarities of Interaction of AMP with the Membrane. Pharmaceuticals 2021, 14, 471. [Google Scholar] [CrossRef]
- Dwyer, J.J.; Wilson, K.L.; Davison, D.K.; Freel, S.A.; Seedorff, J.E.; Wring, S.A.; Tvermoes, N.A.; Matthews, T.J.; Greenberg, M.L.; Delmedico, M.K. Design of helical, oligomeric HIV-1 fusion inhibitor peptides with potent activity against enfuvirtide-resistant virus. Proc. Natl. Acad. Sci. USA 2007, 104, 12772–12777. [Google Scholar] [CrossRef]
- Adzhubei, A.A.; Sternberg, M.J.E.; Makarov, A.A. Polyproline-II Helix in Proteins: Structure and Function. J. Mol. Biol. 2013, 425, 2100–2132. [Google Scholar] [CrossRef]
- Abramson, J.; Adler, J.; Dunger, J.; Evans, R.; Green, T.; Pritzel, A.; Ronneberger, O.; Willmore, L.; Ballard, A.J.; Bambrick, J.; et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef]
- Gallego, V.; Naveiro, R.; Roca, C.; Insua, D.R.; Campillo, N.E. AI in drug development: A multidisciplinary perspective. Mol. Divers. 2021, 25, 1461–1479. [Google Scholar] [CrossRef]
- The Atomwise AIMS Program. AI is a viable alternative to high throughput screening: A 318-target study. Sci. Rep. 2024, 14, 7526. [Google Scholar] [CrossRef]
- Hughes, J.; Rees, S.; Kalindjian, S.; Philpott, K. Principles of early drug discovery. BJP 2010, 162, 1239–1249. [Google Scholar] [CrossRef]
- Wang, M. High-throughput screening technologies for drug discovery. Theor. Nat. Sci. 2023, 15, 18–23. [Google Scholar] [CrossRef]
- Perola, E. An analysis of the binding efficiencies of drugs and their leads in successful drug discovery programs. J. Med. Chem. 2020, 53, 2986–2997. [Google Scholar] [CrossRef] [PubMed]
- Otvos, L. Peptide-Based Drug Research and Development: Relative Costs, Comparative Value; Technical report; Temple University: Philadelphia, PA, USA, 2014. [Google Scholar]
- Mirsalis, J.C. Preclinical Development Plan: Small Molecule Anti-Infectives; SRI Project; SRI International: Menlo Park, CA, USA, 2022. [Google Scholar]
- Mirsalis, J.C. Generic Preclinical Development Plan for Human Monoclonal Antibodies; SRI Project; SRI International: Menlo Park, CA, USA, 2024. [Google Scholar]
- Makurvet, F.D. Biologics vs. small molecules: Drug costs and patient access. Med. Drug Discov. 2021, 9, 100075. [Google Scholar] [CrossRef]
- Peptide Therapeutics Market Size and Share Analysis-Growth Trends and Forecast (2026–2031); Technical report; Mordor Intelligence: Hyderabad, India, 2025.
- Cao, A.; Zhang, L.; Bu, Y.; Sun, D. Machine Learning Prediction of On/Off Target-driven Clinical Adverse Events. Pharm. Res. 2024, 41, 1649–1658. [Google Scholar] [CrossRef]
- Smietana, K.; Siatkowski, M.; Moller, M. Trends in clinical success rates. Nat. Rev. Drug Discov. 2016, 15, 379–380. [Google Scholar] [CrossRef]
- Zhang, L.; Rube, T.H.; Vakulskas, C.A.; Behlke, M.A.; Bussemaker, H.J.; Pufall, M.A. Systematic in vitro profiling of off-target affinity, cleavage and efficiency for CRISPR enzymes. Nucleic Acids Res. 2020, 48, 5037–5053. [Google Scholar] [CrossRef]
- Braun, C.J.; Adames, A.C.; Saur, D.; Rad, R. Tutorial: Design and execution of CRISPR in vivo screens. Nat. Protoc. 2022, 17, 1903–1925. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, H.; Ai, H.; Hu, H.; Li, S.; Zhao, J.; Liu, H. Applications of Machine Learning Methods in Drug Toxicity Prediction. Curr. Top Med. Chem. 2018, 18, 987–997. [Google Scholar] [CrossRef]
- Chowdhury, A.S.; Reehl, S.M.; Kehn-Hall, K.; Bishop, B.; Webb-Robertson, B.J.M. Better understanding and prediction of antiviral peptides through primary and secondary structure feature importance. Sci. Rep. 2020, 10, 19260. [Google Scholar] [CrossRef]
- Pirtskhalava, M.; Amstrong, A.A.; Grigolava, M.P.; Chubinidze, M.; Alimbarashvili, E.; Vishnepolsky, B.; Gabrielian, A.; Rosenthal, A.; Hurt, D.E.; Tartakovsky, M. DBAASP v3: Database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic Acids Res. 2021, 49, D288–D297. [Google Scholar] [CrossRef]
- Thakur, N.; Qureshi, A.; Kumar, M. AVPpred: Collection and prediction of highly effective antiviral peptides. Nucleic Acids Res. 2012, 40, W199–W204. [Google Scholar] [CrossRef]
- Wei, L.; Ye, X.; Xue, Y.; Sakurai, T.; Wei, L. ATSE: A peptide toxicity predictor by exploiting structural and evolutionary information based on graph neural network and attention mechanism. Brief. Bioinform. 2021, 22, bbab041. [Google Scholar] [CrossRef] [PubMed]
- Wei, L.; Ye, X.; Sakurai, T.; Mu, Z.; Wei, L. ToxIBTL: Prediction of peptide toxicity based on information bottleneck and transfer learning. Bioinformatics 2022, 38, 1514–1524. [Google Scholar] [CrossRef] [PubMed]
- Vens, C.; Rosso, M.N.; Danchin, E.G.J. Identifying discriminative classification-based motifs in biological sequences. Bioinformatics 2011, 27, 1231–1238. [Google Scholar] [CrossRef]
- Gao, S.; Jia, Y.; Cui, F.; Xu, J.; Meng, Y.; Wei, L.; Zhang, Q.; Zou, Q.; Zhang, Z. PLPTP: A Motif-based Interpretable Deep Learning Framework Based on Protein Language Models for Peptide Toxicity Prediction. J. Mol. Biol. 2025, 437, 169115. [Google Scholar] [CrossRef] [PubMed]
- Pan, X.; Zuallaert, J.; Wang, X.; Shen, H.B.; Campos, E.P.; Marushchak, D.O.; De Neve, W. ToxDL: Deep learning using primary structure and domain embeddings for assessing protein toxicity. Bioinformatics 2021, 36, 5159–5168. [Google Scholar] [CrossRef]
- Tatonetti, N.P.; Ye, P.P.; Daneshjou, R.; Altman, R.B. Data-driven prediction of drug effects and interactions. Sci. Transl. Med. 2012, 4, 125ra31. [Google Scholar] [CrossRef]
- Yazdani, A.; Bornet, A.; Khlebnikov, P.; Zhang, B.; Rouhizadeh, H.; Amini, P.; Teodoro, D. An Evaluation Benchmark for Adverse Drug Event Prediction from Clinical Trial Results. Sci. Data 2025, 12, 424. [Google Scholar] [CrossRef]
- Kuhn, M.; Letunic, I.; Jensen, L.J.; Bork, P. The SIDER database of drugs and side effects. Nucleic Acids Res. 2016, 44, D1075–D1079. [Google Scholar] [CrossRef]
- Zhao, H.; Song, G. Antiviral Peptide-Generative Pre-Trained Transformer (AVP-GPT): A Deep Learning-Powered Model for Antiviral Peptide Design with High-Throughput Discovery and Exceptional Potency. Viruses 2024, 16, 1673. [Google Scholar] [CrossRef]
- Qureshi, A.; Thakur, N.; Tandon, H.; Kumar, M. AVPdb: A database of experimentally validated antiviral peptides targeting medically important viruses. Nucleic Acids Res. 2014, 42, D1147–D1153. [Google Scholar] [CrossRef] [PubMed]
- The Uniprot Consortium. UniProt: The Universal Protein Knowledgebase in 2025. Nucleic Acids Res. 2025, 53, D609–D617. [Google Scholar] [CrossRef] [PubMed]
- Ebrahimikondori, H.; Sutherland, D.; Yanai, A.; Richter, A.; Salehi, A.; Li, C.; Coombe, L.; Kotkoff, M.; Warren, R.L.; Birol, I. Structure-aware deep learning model for peptide toxicity prediction. Protein Sci. 2024, 33, e5076. [Google Scholar] [CrossRef] [PubMed]
- Rathore, A.S.; Choudhury, S.; Arora, A.; Tijare, P.; Raghava, G.P.S. ToxinPred 3.0: An improved method for predicting the toxicity of peptides. Comput. Biol. Med. 2024, 179, 108926. [Google Scholar] [CrossRef]
- Tran, D.T.; Pham, N.T.; Nguyen, N.D.H.; Wei, L.; Manavalan, B. HyPepTox-Fuse: An interpretable hybrid framework for accurate peptide toxicity prediction fusing protein language model-based embeddings with conventional descriptors. J. Pharm. Anal. 2025, 15, 101410. [Google Scholar] [CrossRef]
- Wang, G.; Li, X.; Wang, Z. APD3: The antimicrobial peptide database as a tool for research and education. Nucleic Acids Res. 2016, 44, D1087–D1093. [Google Scholar] [CrossRef]
- Ma, T.; Liu, Y.; Yu, B.; Sun, X.; Yao, H.; Hao, C.; Li, J.; Nawaz, M.; Jiang, X.; Lao, X.; et al. DRAMP 4.0: An open-access data repository dedicated to the clinical translation of antimicrobial peptides. Nucleic Acids Res. 2025, 53, D403–D410. [Google Scholar] [CrossRef]
- Waghu, F.H.; Barai, R.S.; Gurung, P.; Idicula-Thomas, S. CAMPR3: A database on sequences, structures and signatures of antimicrobial peptides. Nucleic Acids Res. 2016, 44, D1094–D1097. [Google Scholar] [CrossRef]
- Gu, Y.; Yu, Z.; Wang, Y.; Chen, L.; Lou, C.; Yang, C.; Li, W.; Liu, G.; Tang, Y. admetSAR3.0: A comprehensive platform for exploration, prediction and optimization of chemical ADMET properties. Nucleic Acids Res. 2024, W1, W432–W438. [Google Scholar] [CrossRef]
















| Variable | Abbreviation | Definition |
|---|---|---|
| Single Amino Acid Composition | AAC | Makeup of the peptide in terms of amino acids. |
| Dipeptide Composition | DPC | Makeup of the peptide in terms of pairs of amino acids. |
| Pseudoamino acid Composition | PseAAC | A representation of AAC that takes positioning of each amino acid into account. |
| K-spaced Amino Acid Pairs | KSAAP | Composition of the peptide as determined by pairs of amino acids separated by k many amino acids between the two elements of the pair. |
| Encoding Based on Group Weight | EBGW | Categorical sequence-derived features which are changed to a representation suitable for ML (numerical). |
| N5C5 Analysis | N5C5 | AAC of the five amino acids around the N- and C-terminals of the peptide sequence. |
| Model | Primary Task | Input Type | Dataset | Validation | Limitations |
|---|---|---|---|---|---|
| iAVPs-ResBi [9] | Binary binding affinity prediction | Sequence-derived peptide features | AVPdb [66] | 4-fold testing: 95% accuracy on all four subsets | Performance saturates with network depth; relies on existing curated data (limited diversity). |
| VEIP Predictor [28] | Predict binary VEIP activity | Sequence-derived features from peptides and target viral envelopes | 246 VEIPs vs. 246 non-VEIPs from DBAASP [55], UniProt [67] | 73/27% train/test split; RealAdaBoost yielded 0.90 accuracy on test set | Small dataset limits generalizability. |
| FIRM-AVP [54] | Binary binding affinity prediction | Sequence-derived features, reduced by feature selection | AVPpred [56] | 10-fold cross-validation (SVM achieved 92.4% accuracy) | Small, outdated dataset. |
| ATSE | Binary toxicity prediction | Molecular graph of predicted 3D peptide structure and evolutionary profile (PSSM), both sequence-derived | Collected peptide toxicity dataset [57] | 5-fold CV on training data; independent test (reported Sn/Sp/Acc 96/94/95%) | No transfer learning used, may struggle with generalizability (addressed by successor model ToxIBTL). |
| ToxIBTL [58] | Binary peptide toxicity prediction (with transfer learning) | Same multimodal inputs as ATSE [57], but employs pretrained model on large protein toxicity data, then fine-tunes on peptide data | Large protein toxicity dataset for pre-training (size: many thousands of proteins) + peptide toxicity dataset for fine-tuning (e.g., 2–3 K peptides) [58] | Trained in two stages; evaluated on peptide test set – improved Acc 0.96, Sn 0.963, Sp 0.954 vs predecessor | Requires extensive protein data in pre-training, increased complexity over predecessor model. |
| tAMPer [68] | Binary peptide toxicity prediction and secondary structure prediction | Multi-modal: amino acid sequence + predicted 3D structure graph (ColabFold-derived) fused via Bi-GRU and GNN; outputs both toxicity classification and secondary structure traits | Same peptide toxicity set as ToxIBTL [58] plus additional in-house hemolysis data | 5-fold CV; external data integration (showed +0.03 F1 improvement over ToxIBTL) | Slight F1 boost, but did not report direct Acc/Sn/Sp for comparison. 3-D prediction adds computational cost. |
| ToxinPred 3.0 [69] | Binary peptide toxicity prediction | Sequence-based features and motif patterns: ExtraTrees ensemble classifier combined with MERCI motif-identification weighting | ToxinPred3 dataset—5518 unique toxic peptides/proteins (augmented larger benchmark) | External validation on held-out set; achieved 92% Sn, 93% Sp, 93% Acc on its curated dataset | Model performance tied to identifiable motifs; may miss toxic peptides lacking known motif patterns. |
| PLPTP [60] | Binary peptide toxicity prediction | Hybrid: Transformer-based protein language embeddings (ESM-2, etc.) + traditional descriptors, fused into MLP classifier | 7513 peptides (2138 toxic + 5375 non-toxic) compiled from multiple sources | Model attained 97.5% Sn, 97.8% Sp, 99.7% Acc on 5-fold CV, plus comparison on independent test sets | Extremely high apparent accuracy may indicate overfitting. Motif analysis improves interpretability, but requires large transformers that raise computational requirements. |
| HyPepToxFuse [70] | Binary peptide toxicity prediction | Multi-head fusion of sequence embeddings and features: uses multiple protein language models (ESM-1b, ESM-2, ProtT5) alongside 40 classic sequence descriptors, combined via attention and transformer layers | ToxinPred3.0 dataset for training (5.5 k peptides); tested on ToxTeller external set | 5-fold CV on training data; independent test on ToxTeller dataset to assess generalization. ToxTeller test resulted in 88.3% Sn, 93.0% Sp, 90.5% Acc; only 4% accuracy drop on external test vs. 8% drop in prior model (ToxIBTL) | More complex architecture increases runtime but yields better robustness. Constrained by available toxicity labels—novel data or features required to significantly improve performance. |
| Two-Step ADE Model [49] | ADE prediction for peptide drugs | Multi-modal: peptide’s predicted multi-target potencies + tissue-specific target expression levels | Trained on CT-ADE database (168,984 drug—ADE pairs from 2497 drugs) | Evaluated on held-out clinical trial data; achieved >0.90 Sn/Sp/Acc for multiple adverse events, validated in a monotherapy setting | Clinical ADE data suffer from under-reporting, confounders. Model accuracy depends on comprehensive target profiling—may miss idiosyncratic or immune-mediated ADEs. Requires large, curated datasets (CT-ADE); still limited for peptide therapeutics. |
| AVP-GPT [65] | Generative AVP design (de novo antiviral peptide generation) | Transformer-based generative model: a GPT fine-tuned on known AVPs, conditioned on virus target context. Model generates candidate peptide sequences, then iteratively self-filters them via an internal classifier for antiviral probability | 10 k known peptide sequences labeled for antiviral activity against various viruses (combined public AVP databases + proprietary YouCare data) | Internal validation via perplexity (text-generation metric) and an integrated classifier. After training (pre-trained on RSV-target AVPs, fine-tuned on other viruses), the model produced 10,000 novel peptides; 25 top candidates were synthesized, with 19 showing sub-10 µM EC50 in vitro against RSV. Fine-tuned model achieved 91.5% accuracy in silico for RSV-targeted AVPs. Post-finetuning performance remained high for other viruses, though perplexity rose. | Lower confidence targeting under-represented viruses—perplexity jumped from 2 to 7–10 when generating for non-RSV targets, indicating potential generalization limits. Generation focuses on antiviral activity; stability, toxicity are not explicitly optimized, so additional filtering is needed before clinical consideration. |
| Term | Definition | Example |
|---|---|---|
| Instance | All characteristic information about a single candidate peptide. | Sequence or structural information, target binding affinity, toxicity, hydrophobicity, charge, etc. A single data point in a dataset. |
| Dataset | A collection of many instances. | Publicly available datasets, such as in DRAVP. |
| Variable | A quality of the peptide that can be measured or observed. | Hydrophobicity, charge, structural orientation, peptide sequence, etc. |
| Target | The quality or qualities of the peptide being predicted. | Binding affinity, toxicity, etc. |
| Positive Instance | An instance where the binary positive outcome is observed. | If a dataset is reporting binding to a receptor, the positive instances would be every instance with a strong binding to the receptor. |
| Negative Instance | An instance where the binary positive outcome is not observed. | Negative instances would be every instance in the dataset not classified as a positive instance. |
| Accuracy | A ratio of correct classifications to the total number of classifications | A model may correctly predict 8 out of 10 positive instances and 7 out of 10 negative instances. This model has an accuracy of 15/20, or 0.75. |
| Sensitivity | A ratio between the number of predicted positive instances and the number of positive instances present. | A model may correctly predict 8 out of 10 positive instances, which would make the model sensitivity 0.80. |
| Specificity | A ratio between the number of predicted negative instances and the number of negative instances present. | A model may correctly predict 9 out of 10 negative instances, which would make the model specificity 0.90. |
| Perplexity | The calculated exponent of the loss observed when training a GPT: used to assess the level of confidence in the generated sequence output. | If the loss observed in the fully trained model is high, then the model’s performance is poor, and the perplexity of the model is very high. |
| Database | Number of AVPs | Source |
|---|---|---|
| APD3 | 188 | [71] |
| DRAMP 4.0 | 3681 | [72] |
| AVPdb | 2683 | [66] |
| DBAASP | 1535 | [55] |
| CAMPR3 | 55 | [73] |
| Database | Number of Peptides | Source |
|---|---|---|
| admetSAR 3.0 | 172,116 | [74] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Hargrove, H.; Tong, B.; Elkabanny, A.H.; Zhang, X.F. Machine Learning in Preclinical Development of Antiviral Peptide Candidates: A Review of the Current Landscape. Viruses 2026, 18, 260. https://doi.org/10.3390/v18020260
Hargrove H, Tong B, Elkabanny AH, Zhang XF. Machine Learning in Preclinical Development of Antiviral Peptide Candidates: A Review of the Current Landscape. Viruses. 2026; 18(2):260. https://doi.org/10.3390/v18020260
Chicago/Turabian StyleHargrove, Hannah, Bei Tong, Amr Hussein Elkabanny, and Xiaohui Frank Zhang. 2026. "Machine Learning in Preclinical Development of Antiviral Peptide Candidates: A Review of the Current Landscape" Viruses 18, no. 2: 260. https://doi.org/10.3390/v18020260
APA StyleHargrove, H., Tong, B., Elkabanny, A. H., & Zhang, X. F. (2026). Machine Learning in Preclinical Development of Antiviral Peptide Candidates: A Review of the Current Landscape. Viruses, 18(2), 260. https://doi.org/10.3390/v18020260

