Diffusion Models at the Drug Discovery Frontier: A Review on Generating Small Molecules Versus Therapeutic Peptides
Simple Summary
Abstract
1. Introduction
1.1. The Bottleneck of Drug Discovery and the Rise of Generative AI
1.2. The Emergence of Diffusion Models
1.3. Scope and Structure of This Review
2. The Core Engine: Diffusion Models for Molecular Generation
2.1. Representing Molecules for Diffusion
2.2. The Mathematics of Diffusion: Forward and Reverse Processes
2.3. Conditional Generation: From Noise to Purpose
2.4. Comparison with Other Generative Approaches
3. Application I: De Novo Design of Small Molecules
3.1. Datasets and Benchmarks for Small Molecule Generation
3.2. Structure-Based Drug Design (SBDD)
3.3. Property-Based Ligand Design and Optimization
4. Application II: Innovative Design of Therapeutic Peptides
4.1. Datasets and Benchmarks for Peptide Design
4.2. Generation of Functional Peptide Sequences
4.3. Structure-Guided De Novo Peptide Design
5. Comparison, Challenges, and Future Perspectives
5.1. A Head-to-Head Comparison: Small Molecules vs. Peptides
5.2. Shared Hurdles and Common Challenges
5.3. Future Outlook and Opportunities
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Singh, N.; Vayer, P.; Tanwar, S.; Poyet, J.L.; Tsaioun, K.; Villoutreix, B.O. Drug discovery and development: Introduction to the general public and patient groups. Front. Drug Discov. 2023, 3, 1201419. [Google Scholar] [CrossRef]
- Brown, D.G.; Wobst, H.J.; Kapoor, A.; Kenna, L.A.; Southall, N. Clinical development times for innovative drugs. Nat. Rev. Drug Discov. 2022, 21, 793–794. [Google Scholar] [CrossRef]
- Kim, E.; Yang, J.; Park, S.; Shin, K. Factors affecting success of new drug clinical trials. Ther. Innov. Regul. Sci. 2023, 57, 737–750. [Google Scholar] [CrossRef]
- Zhou, Y.; Zhang, Y.; Chen, Z.; Huang, S.; Li, Y.; Fu, J.; Zhu, F. Dynamic clinical success rates for drugs in the 21st century. Nat. Commun. 2025, 16, 9537. [Google Scholar] [CrossRef] [PubMed]
- Smietana, K.; Siatkowski, M.; Møller, M. Trends in clinical success rates. Nat. Rev. Drug Discov. 2016, 15, 379–380. [Google Scholar] [CrossRef] [PubMed]
- Mullard, A. Parsing clinical success rates. Nat. Rev. Drug Discov. 2016, 15, 447–448. [Google Scholar] [CrossRef]
- Phares, S.; Phillip, K.; Trusheim, M. Clinical development success rates for durable cell and gene therapies. Nat. Rev. Drug Discov. 2025, 24, 329–330. [Google Scholar] [CrossRef]
- Mullard, A. New drugs cost US $2.6 billion to develop. Nat. Rev. Drug Discov. 2014, 13, 877. [Google Scholar] [CrossRef]
- Sertkaya, A.; Beleche, T.; Jessup, A.; Sommers, B.D. Costs of drug development and research and development intensity in the US, 2000–2018. JAMA Netw. Open 2024, 7, e2415445. [Google Scholar] [CrossRef] [PubMed]
- Senior, M. Fresh from the biotech pipeline: Record-breaking FDA approvals. Nat. Biotechnol. 2024, 42, 355–361. [Google Scholar] [CrossRef]
- Bohacek, R.S.; McMartin, C.; Guida, W.C. The art and practice of structure-based drug design: A molecular modeling perspective. Med. Res. Rev. 1996, 16, 3–50. [Google Scholar] [CrossRef]
- Orsi, M.; Reymond, J.L. Navigating a 1E+60 chemical space of peptide/peptoid oligomers. Mol. Inform. 2025, 44, e202400186. [Google Scholar] [CrossRef]
- Ruddigkeit, L.; Van Deursen, R.; Blum, L.C.; Reymond, J.L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 2012, 52, 2864–2875. [Google Scholar] [CrossRef]
- Reymond, J.L.; Awale, M. Exploring chemical space for drug discovery using the chemical universe database. ACS Chem. Neurosci. 2012, 3, 649–657. [Google Scholar] [CrossRef] [PubMed]
- Polishchuk, P.G.; Madzhidov, T.I.; Varnek, A. Estimation of the size of drug-like chemical space based on GDB-17 data. J. Comput.-Aided Mol. Des. 2013, 27, 675–679. [Google Scholar] [CrossRef] [PubMed]
- Jayatunga, M.K.; Ayers, M.; Bruens, L.; Jayanth, D.; Meier, C. How successful are AI-discovered drugs in clinical trials? A first analysis and emerging lessons. Drug Discov. Today 2024, 29, 104009. [Google Scholar] [CrossRef]
- Arnold, C. Inside the nascent industry of AI-designed drugs. Nat. Med. 2023, 29, 1292–1295. [Google Scholar] [CrossRef]
- Kanakia, A.; Sale, M.; Zhao, L.; Zhou, Z. AI in action: Redefining drug discovery and development. Clin. Transl. Sci. 2025, 18, e70149. [Google Scholar] [CrossRef] [PubMed]
- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Dinh, L.; Sohl-Dickstein, J.; Bengio, S. Density estimation using real nvp. arXiv 2016, arXiv:1605.08803. [Google Scholar]
- Kingma, D.P.; Salimans, T.; Jozefowicz, R.; Chen, X.; Sutskever, I.; Welling, M. Improved variational inference with inverse autoregressive flow. In Proceedings of the Advances in Neural Information Processing Systems 29 (NIPS 2016), Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
- Papamakarios, G.; Pavlakou, T.; Murray, I. Masked autoregressive flow for density estimation. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Sharma, P.; Kumar, M.; Sharma, H.K.; Biju, S.M. Generative adversarial networks (GANs): Introduction, taxonomy, variants, limitations, and applications. Multimed. Tools Appl. 2024, 83, 88811–88858. [Google Scholar] [CrossRef]
- Bond-Taylor, S.; Leach, A.; Long, Y.; Willcocks, C.G. Deep generative modelling: A comparative review of vaes, gans, normalizing flows, energy-based and autoregressive models. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7327–7347. [Google Scholar] [CrossRef]
- Vivekananthan, S. Comparative analysis of generative models: Enhancing image synthesis with vaes, gans, and stable diffusion. arXiv 2024, arXiv:2408.08751. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Sohl-Dickstein, J.; Weiss, E.; Maheswaranathan, N.; Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 2256–2265. [Google Scholar]
- Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical text-conditional image generation with clip latents. arXiv 2022, arXiv:2204.06125. [Google Scholar] [CrossRef]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
- Kong, Z.; Ping, W.; Huang, J.; Zhao, K.; Catanzaro, B. Diffwave: A versatile diffusion model for audio synthesis. arXiv 2020, arXiv:2009.09761. [Google Scholar]
- Li, X.; Thickstun, J.; Gulrajani, I.; Liang, P.S.; Hashimoto, T.B. Diffusion-lm improves controllable text generation. Adv. Neural Inf. Process. Syst. 2022, 35, 4328–4343. [Google Scholar]
- Ho, J.; Salimans, T. Classifier-free diffusion guidance. arXiv 2022, arXiv:2207.12598. [Google Scholar] [CrossRef]
- Weiss, T.; Mayo Yanes, E.; Chakraborty, S.; Cosmo, L.; Bronstein, A.M.; Gershoni-Poranne, R. Guided diffusion for inverse molecular design. Nat. Comput. Sci. 2023, 3, 873–882. [Google Scholar] [CrossRef]
- Alakhdar, A.; Poczos, B.; Washburn, N. Diffusion models in de novo drug design. J. Chem. Inf. Model. 2024, 64, 7238–7256. [Google Scholar] [CrossRef]
- Bai, Y.R.; Seng, D.J.; Xu, Y.; Zhang, Y.D.; Zhou, W.J.; Jia, Y.Y.; Song, J.; He, Z.X.; Liu, H.M.; Yuan, S. A comprehensive review of small molecule drugs approved by the FDA in 2023: Advances and prospects. Eur. J. Med. Chem. 2024, 276, 116706. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Sun, X.; Sun, M.; Wang, C.; Yang, L. Game Changers: Blockbuster Small-Molecule Drugs Approved by the FDA in 2024. Pharmaceuticals 2025, 18, 729. [Google Scholar] [CrossRef]
- Mullard, A. 2023 FDA approvals. Nat. Reviews. Drug Discov. 2024, 23, 88–95. [Google Scholar]
- Martins, A.C.; Oshiro, M.Y.; Albericio, F.; de la Torre, B.G. Food and Drug Administration (FDA) approvals of biological drugs in 2023. Biomedicines 2024, 12, 1992. [Google Scholar] [CrossRef]
- Xie, X.; Yu, T.; Li, X.; Zhang, N.; Foster, L.J.; Peng, C.; Huang, W.; He, G. Recent advances in targeting the “undruggable” proteins: From drug discovery to clinical trials. Signal Transduct. Target. Ther. 2023, 8, 335. [Google Scholar] [CrossRef]
- Nada, H.; Choi, Y.; Kim, S.; Jeong, K.S.; Meanwell, N.A.; Lee, K. New insights into protein–protein interaction modulators in drug discovery and therapeutic advance. Signal Transduct. Target. Ther. 2024, 9, 341. [Google Scholar] [CrossRef] [PubMed]
- Xu, W.; Kang, C. Fragment-based drug design: From then until now, and toward the future. J. Med. Chem. 2025, 68, 5000–5004. [Google Scholar] [CrossRef]
- Xiao, W.; Jiang, W.; Chen, Z.; Huang, Y.; Mao, J.; Zheng, W.; Hu, Y.; Shi, J. Advance in peptide-based drug development: Delivery platforms, therapeutics and vaccines. Signal Transduct. Target. Ther. 2025, 10, 74. [Google Scholar] [CrossRef]
- Baral, K.C.; Choi, K.Y. Barriers and strategies for oral peptide and protein therapeutics delivery: Update on clinical advances. Pharmaceutics 2025, 17, 397. [Google Scholar] [CrossRef] [PubMed]
- Mehrdadi, S. Lipid-based nanoparticles as oral drug delivery systems: Overcoming poor gastrointestinal absorption and enhancing bioavailability of peptide and protein therapeutics. Adv. Pharm. Bull. 2023, 14, 48. [Google Scholar] [CrossRef]
- Lamers, C. Overcoming the shortcomings of peptide-based therapeutics. Future Drug Discov. 2022, 4, FDD75. [Google Scholar] [CrossRef]
- Verma, S.; Goand, U.K.; Husain, A.; Katekar, R.A.; Garg, R.; Gayen, J.R. Challenges of peptide and protein drug delivery by oral route: Current strategies to improve the bioavailability. Drug Dev. Res. 2021, 82, 927–944. [Google Scholar] [CrossRef]
- Hu, Q.; Sun, C.; He, H.; Xu, J.; Liu, D.; Zhang, W.; Li, H. Target-aware 3D molecular generation based on guided equivariant diffusion. Nat. Commun. 2025, 16, 7928. [Google Scholar] [CrossRef]
- Chen, L.; Li, Y.; Ma, Y.; Gao, L.; Yu, L. Multiscale graph equivariant diffusion model for 3D molecule design. Sci. Adv. 2025, 11, eadv0778. [Google Scholar] [CrossRef]
- Vignac, C.; Krawczuk, I.; Siraudin, A.; Wang, B.; Cevher, V.; Frossard, P. DiGress: Discrete Denoising diffusion for graph generation. In Proceedings of the The Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Bian, T.; Niu, Y.; Chang, H.; Yan, D.; Huang, J.; Rong, Y.; Cheng, H. Hierarchical graph latent diffusion model for conditional molecule generation. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 21–25 October 2024; pp. 130–140. [Google Scholar]
- Liu, G.; Chen, J.; Zhu, Y.; Sun, M.; Luo, T.; Chawla, N.V.; Jiang, M. Graph Diffusion Transformers are In-Context Molecular Designers. arXiv 2025, arXiv:2510.08744. [Google Scholar] [CrossRef]
- Morehead, A.; Cheng, J. Geometry-complete diffusion for 3D molecule generation and optimization. Commun. Chem. 2024, 7, 150. [Google Scholar] [CrossRef] [PubMed]
- Zhang, H.; Liu, Y.; Liu, X.; Wang, C.; Guo, M. Equivariant score-based generative diffusion framework for 3D molecules. BMC Bioinform. 2024, 25, 203. [Google Scholar] [CrossRef]
- Liu, C.; Vadgama, S.; Ruhe, D.; Bekkers, E.; Forré, P. Clifford Group Equivariant Diffusion Models for 3D Molecular Generation. arXiv 2025, arXiv:2504.15773. [Google Scholar] [CrossRef]
- Satorras, V.G.; Hoogeboom, E.; Welling, M. E (n) equivariant graph neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 18–24 July 2021; pp. 9323–9332. [Google Scholar]
- Wang, Y.; Wang, T.; Li, S.; He, X.; Li, M.; Wang, Z.; Liu, T.Y. Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing. Nat. Commun. 2024, 15, 313. [Google Scholar] [CrossRef] [PubMed]
- Soleymani, F.; Paquet, E.; Viktor, H.L.; Michalowski, W. Structure-based protein and small molecule generation using EGNN and diffusion models: A comprehensive review. Comput. Struct. Biotechnol. J. 2024, 23, 2779–2797. [Google Scholar] [CrossRef]
- Guan, J.; Qian, W.W.; Peng, X.; Su, Y.; Peng, J.; Ma, J. 3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Guo, M.; Liu, C.; Forré, P. Frame-based Equivariant Diffusion Models for 3D Molecular Generation. arXiv 2025, arXiv:2509.19506. [Google Scholar] [CrossRef]
- Alamdari, S.; Thakkar, N.; Van Den Berg, R.; Tenenholtz, N.; Strome, R.; Moses, A.M.; Yang, K.K. Protein generation with evolutionary diffusion: Sequence is all you need. BioRxiv 2023, 2023-09. [Google Scholar] [CrossRef]
- Lisanza, S.L.; Gershon, J.M.; Tipps, S.W.; Sims, J.N.; Arnoldt, L.; Hendel, S.J.; Baker, D. Multistate and functional protein design using RoseTTAFold sequence space diffusion. Nat. Biotechnol. 2025, 43, 1288–1298. [Google Scholar] [CrossRef]
- Bai, P.; Miljković, F.; Liu, X.; De Maria, L.; Croasdale-Wood, R.; Rackham, O.; Lu, H. Mask-prior-guided denoising diffusion improves inverse protein folding. Nat. Mach. Intell. 2025, 7, 876–888. [Google Scholar] [CrossRef]
- Austin, J.; Johnson, D.D.; Ho, J.; Tarlow, D.; Van Den Berg, R. Structured denoising diffusion models in discrete state-spaces. Adv. Neural Inf. Process. Syst. 2021, 34, 17981–17993. [Google Scholar]
- Watson, J.L.; Juergens, D.; Bennett, N.R.; Trippe, B.L.; Yim, J.; Eisenach, H.E.; Baker, D. De novo design of protein structure and function with RFdiffusion. Nature 2023, 620, 1089–1100. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.T.; Chatterjee, P. Peptide binders designed directly from protein sequences. Nat. Biotechnol. 2025. [Google Scholar] [CrossRef]
- Wu, K.E.; Yang, K.K.; van den Berg, R.; Alamdari, S.; Zou, J.Y.; Lu, A.X.; Amini, A.P. Protein structure generation via folding diffusion. Nat. Commun. 2024, 15, 1059. [Google Scholar] [CrossRef]
- Li, W.R.; Cadet, X.F.; Medina-Ortiz, D.; Davari, M.D.; Sowdhamini, R.; Damour, C.; Cadet, F. From thermodynamics to protein design: Diffusion models for biomolecule generation towards autonomous protein engineering. arXiv 2025, arXiv:2501.02680. [Google Scholar] [CrossRef]
- Cremer, J.; Le, T.; Clevert, D.A.; Schütt, K.T. Latent-Conditioned Equivariant Diffusion for Structure-Based De Novo Ligand Generation. In Proceedings of the International Workshop on AI in Drug Discovery, Lugano, Switzerland, 19 September 2024; pp. 36–46. [Google Scholar]
- Hoogeboom, E.; Satorras, V.G.; Vignac, C.; Welling, M. Equivariant diffusion for molecule generation in 3d. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 8867–8887. [Google Scholar]
- Xu, M.; Yu, L.; Song, Y.; Shi, C.; Ermon, S.; Tang, J. GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation. In Proceedings of the International Conference on Learning Representations, Virtual Event, 25–29 April 2022. [Google Scholar]
- Dhariwal, P.; Nichol, A. Diffusion models beat gans on image synthesis. Adv. Neural Inf. Process. Syst. 2021, 34, 8780–8794. [Google Scholar]
- Huang, L.; Xu, T.; Yu, Y.; Zhao, P.; Chen, X.; Han, J.; Zhang, H. A dual diffusion model enables 3D molecule generation and lead optimization based on target pockets. Nat. Commun. 2024, 15, 2657. [Google Scholar] [CrossRef]
- Jin, W.; Barzilay, R.; Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 2323–2332. [Google Scholar]
- Ochiai, T.; Inukai, T.; Akiyama, M.; Furui, K.; Ohue, M.; Matsumori, N.; Inuki, S.; Uesugi, M.; Sunazuka, T.; Kikuchi, K.; et al. Variational autoencoder-based chemical latent space for large molecular structures with 3D complexity. Commun. Chem. 2023, 6, 249. [Google Scholar] [CrossRef] [PubMed]
- Tevosyan, A.; Khondkaryan, L.; Khachatrian, H.; Tadevosyan, G.; Apresyan, L.; Babayan, N.; Stopper, H.; Navoyan, Z. Improving VAE based molecular representations for compound property prediction. J. Cheminform. 2022, 14, 69. [Google Scholar] [CrossRef] [PubMed]
- Praljak, N.; Lian, X.; Ranganathan, R.; Ferguson, A.L. Protwave-vae: Integrating autoregressive sampling with latent-based inference for data-driven protein design. ACS Synth. Biol. 2023, 12, 3544–3561. [Google Scholar] [CrossRef] [PubMed]
- De Cao, N.; Kipf, T. MolGAN: An implicit generative model for small molecular graphs. arXiv 2018, arXiv:1805.11973. [Google Scholar]
- Saad, M.M.; O’Reilly, R.; Rehmani, M.H. A survey on training challenges in generative adversarial networks for biomedical image analysis. Artif. Intell. Rev. 2024, 57, 19. [Google Scholar] [CrossRef]
- Barsha, F.L.; Eberle, W. An in-depth review and analysis of mode collapse in generative adversarial networks. Mach. Learn. 2025, 114, 141. [Google Scholar] [CrossRef]
- Wang, H.; Wang, J.; Wang, J.; Zhao, M.; Zhang, W.; Zhang, F.; Guo, M. Graphgan: Graph representation learning with generative adversarial nets. Proc. AAAI Conf. Artif. Intell. 2018, 32. [Google Scholar] [CrossRef]
- Zang, C.; Wang, F. Moflow: An invertible flow model for generating molecular graphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 23–27 August 2020; pp. 617–626. [Google Scholar]
- Madhawa, K.; Ishiguro, K.; Nakago, K.; Abe, M. Graphnvp: An invertible flow model for generating molecular graphs. arXiv 2019, arXiv:1905.11600. [Google Scholar] [CrossRef]
- Mercado, R.; Rastemo, T.; Lindelöf, E.; Klambauer, G.; Engkvist, O.; Chen, H.; Bjerrum, E.J. Practical notes on building molecular graph generative models. Appl. AI Lett. 2020, 1. [Google Scholar] [CrossRef]
- Shi, C.; Xu, M.; Zhu, Z.; Zhang, W.; Zhang, M.; Tang, J. Graphaf: A flow-based autoregressive model for molecular graph generation. arXiv 2020, arXiv:2001.09382. [Google Scholar]
- Segler, M.H.; Kogej, T.; Tyrchan, C.; Waller, M.P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 2018, 4, 120–131. [Google Scholar] [CrossRef]
- Gupta, A.; Müller, A.T.; Huisman, B.J.; Fuchs, J.A.; Schneider, P.; Schneider, G. Generative recurrent networks for de novo drug design. Mol. Inform. 2018, 37, 1700111. [Google Scholar] [CrossRef]
- Wang, Z.; Shi, J.; Heess, N.; Gretton, A.; Titsias, M.K. Learning-Order Autoregressive Models with Application to Molecular Graph Generation. arXiv 2025, arXiv:2503.05979. [Google Scholar] [CrossRef]
- He, T.; Zhang, J.; Zhou, Z.; Glass, J. Exposure bias versus self-recovery: Are distortions really incremental for autoregressive text generation? arXiv 2019, arXiv:1905.10617. [Google Scholar]
- Wang, Y.; Che, T.; Li, B.; Song, K.; Pei, H.; Bengio, Y.; Li, D. Your autoregressive generative model can be better if you treat it as an energy-based one. arXiv 2022, arXiv:2206.12840. [Google Scholar] [CrossRef]
- Zhang, P.; Baker, D.; Song, M.; Bi, J. Unraveling the potential of diffusion models in small-molecule generation. Drug Discov. Today 2025, 30, 104413. [Google Scholar] [CrossRef]
- Song, Y.; Ermon, S. Generative modeling by estimating gradients of the data distribution. In Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Müller-Franzes, G.; Niehues, J.M.; Khader, F.; Arasteh, S.T.; Haarburger, C.; Kuhl, C.; Truhn, D. A multimodal comparison of latent denoising diffusion probabilistic models and generative adversarial networks for medical image synthesis. Sci. Rep. 2023, 13, 12098. [Google Scholar] [CrossRef]
- Yang, L.; Zhang, Z.; Song, Y.; Hong, S.; Xu, R.; Zhao, Y.; Yang, M.H. Diffusion models: A comprehensive survey of methods and applications. ACM Comput. Surv. 2023, 56, 1–39. [Google Scholar] [CrossRef]
- Wang, C.; Ong, H.H.; Chiba, S.; Rajapakse, J.C. GLDM: Hit molecule generation with constrained graph latent diffusion model. Briefings Bioinform. 2024, 25, bbae142. [Google Scholar] [CrossRef]
- Brown, N.; Fiscato, M.; Segler, M.H.; Vaucher, A.C. GuacaMol: Benchmarking models for de novo molecular design. J. Chem. Inf. Model. 2019, 59, 1096–1108. [Google Scholar] [CrossRef]
- Dunn, I.; Koes, D.R. FlowMol3: Flow Matching for 3D De Novo Small-Molecule Generation. arXiv 2025, arXiv:2508.12629. [Google Scholar]
- Liu, H.; Zhang, W.; Xie, J.; Faccio, F.; Xu, M.; Xiang, T.; Shou, M.Z.; Schmidhuber, J. Faster diffusion through temporal attention decomposition. Trans. Mach. Learn. Res. 2025. [Google Scholar]
- Yang, Y.; Gu, S.; Liu, B.; Gong, X.; Lu, R.; Qiu, J.; Liu, H. DiffMC-Gen: A Dual Denoising Diffusion Model for Multi-Conditional Molecular Generation. Adv. Sci. 2025, 12, 2417726. [Google Scholar] [CrossRef]
- Francoeur, P.G.; Masuda, T.; Sunseri, J.; Jia, A.; Iovanisci, R.B.; Snyder, I.; Koes, D.R. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J. Chem. Inf. Model. 2020, 60, 4200–4215. [Google Scholar] [CrossRef]
- Berman, H.M.; Westbrook, J.; Feng, Z.; Gillil, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N. The protein data bank. Nucleic Acids Res. 2000, 28, 235–242. [Google Scholar] [CrossRef] [PubMed]
- Corso, G.; StÃ, H.; Jing, B.; Barzilay, R.; Jaakkola, T. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking. In Proceedings of the International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Schneuing, A.; Harris, C.; Du, Y.; Didi, K.; Jamasb, A.; Igashov, I.; Du, W.; Gomes, C.; Blundell, T.L.; Lio, P.; et al. Structure-based drug design with equivariant diffusion models. Nat. Comput. Sci. 2024, 4, 899–909. [Google Scholar] [CrossRef] [PubMed]
- Das, U. Generative AI for drug discovery and protein design: The next frontier in AI-driven molecular science. Med. Drug Discov. 2025, 27, 100213. [Google Scholar] [CrossRef]
- Axelrod, S.; Gomez-Bombarelli, R. GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Sci. Data 2022, 9, 185. [Google Scholar] [CrossRef] [PubMed]
- Landrum, G. Rdkit documentation. Release 2013, 1, 4. [Google Scholar]
- Zhu, J.; Xia, Y.; Liu, C.; Wu, L.; Xie, S.; Wang, Y.; Wang, T.; Qin, T.; Zhou, W.; Li, H.; et al. Direct molecular conformation generation. arXiv 2022, arXiv:2202.01356. [Google Scholar] [CrossRef]
- McNutt, A.T.; Bisiriyu, F.; Song, S.; Vyas, A.; Hutchison, G.R.; Koes, D.R. Conformer generation for structure-based drug design: How many and how good? J. Chem. Inf. Model. 2023, 63, 6598–6607. [Google Scholar] [CrossRef]
- Irwin, J.J.; Tang, K.G.; Young, J.; Dandarchuluun, C.; Wong, B.R.; Khurelbaatar, M.; Moroz, Y.S.; Mayfield, J.; Sayle, R.A. ZINC20—A free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model. 2020, 60, 6065–6073. [Google Scholar] [CrossRef]
- Ramakrishnan, R.; Dral, P.O.; Rupp, M.; Von Lilienfeld, O.A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 2014, 1, 140022. [Google Scholar] [CrossRef]
- Wei, J.; Zhang, Y.; Ramdhan, P.A.; Huang, Z.; Seabra, G.; Jiang, Z.; Li, Y. GatorAffinity: Boosting Protein-Ligand Binding Affinity Prediction with Large-Scale Synthetic Structural Data. bioRxiv 2025, 2025-09. [Google Scholar] [CrossRef]
- Liu, H.; Chen, P.; Zhai, X.; Huo, K.G.; Zhou, S.; Han, L.; Fan, G. PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery. Sci. Data 2024, 11, 1316. [Google Scholar] [PubMed]
- Wang, H. Prediction of protein–ligand binding affinity via deep learning models. Briefings Bioinform. 2024, 25, bbae081. [Google Scholar]
- Liu, T.; Hwang, L.; Burley, S.K.; Nitsche, C.I.; Southan, C.; Walters, W.P.; Gilson, M.K. BindingDB in 2024: A FAIR knowledgebase of protein-small molecule binding data. Nucleic Acids Res. 2025, 53, D1633–D1644. [Google Scholar] [CrossRef] [PubMed]
- Gaulton, A.; Bellis, L.J.; Bento, A.P.; Chambers, J.; Davies, M.; Hersey, A.; Overington, J.P. ChEMBL: A large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40, D1100–D1107. [Google Scholar]
- Krishnan, S.R.; Bung, N.; Bulusu, G.; Roy, A. Accelerating de novo drug design against novel proteins using deep learning. J. Chem. Inf. Model. 2021, 61, 621–630. [Google Scholar] [CrossRef]
- Dalkıran, A.; Atakan, A.; Rifaioğlu, A.S.; Martin, M.J.; Atalay, R.; Acar, A.C.; Atalay, V. Transfer learning for drug–target interaction prediction. Bioinformatics 2023, 39, i103–i110. [Google Scholar] [CrossRef]
- Buterez, D.; Janet, J.P.; Kiddle, S.J.; Oglic, D.; Lió, P. Transfer learning with graph neural networks for improved molecular property prediction in the multi-fidelity setting. Nat. Commun. 2024, 15, 1517. [Google Scholar] [CrossRef]
- Atz, K.; Cotos, L.; Isert, C.; Håkansson, M.; Focht, D.; Hilleke, M.; Schneider, G. Prospective de novo drug design with deep interactome learning. Nat. Commun. 2024, 15, 3408. [Google Scholar] [CrossRef]
- Wang, J.; Dokholyan, N.V. Leveraging Transfer Learning for Predicting Protein–Small-Molecule Interaction Predictions. J. Chem. Inf. Model. 2025, 65, 3262–3269. [Google Scholar] [CrossRef] [PubMed]
- Peng, X.; Luo, S.; Guan, J.; Xie, Q.; Peng, J.; Ma, J. Pocket2mol: Efficient molecular sampling based on 3d protein pockets. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, ML, USA, 17–23 July 2022; pp. 17644–17655. [Google Scholar]
- Peng, J.; Yu, J.L.; Yang, Z.B.; Chen, Y.T.; Wei, S.Q.; Meng, F.B.; Li, G.B. Pharmacophore-oriented 3D molecular generation toward efficient feature-customized drug discovery. Nat. Comput. Sci. 2025, 5, 898–914. [Google Scholar] [CrossRef]
- Zhung, W.; Kim, H.; Kim, W.Y. 3D molecular generative framework for interaction-guided drug design. Nat. Commun. 2024, 15, 2688. [Google Scholar] [CrossRef] [PubMed]
- Qin, Y.; Wei, X.; Xu, M.; Wu, J.; Tang, M.; Ran, T.; Chen, H. Comprehensive Benchmark Study of Diffusion-Based 3D Molecular Generation Models. ACS omega 2025. [Google Scholar] [CrossRef] [PubMed]
- Buttenschoen, M.; Morris, G.M.; Deane, C.M. PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chem. Sci. 2024, 15, 3130–3139. [Google Scholar] [CrossRef]
- Bickerton, G.R.; Paolini, G.V.; Besnard, J.; Muresan, S.; Hopkins, A.L. Quantifying the chemical beauty of drugs. Nat. Chem. 2012, 4, 90–98. [Google Scholar] [CrossRef]
- Oestreich, M.; Merdivan, E.; Lee, M.; Schultze, J.L.; Piraud, M.; Becker, M. DrugDiff: Small molecule diffusion model with flexible guidance towards molecular properties. J. Cheminform. 2025, 17, 23. [Google Scholar] [CrossRef] [PubMed]
- Han, X.; Shan, C.; Shen, Y.; Xu, C.; Yang, H.; Li, X.; Li, D. Training-free multi-objective diffusion model for 3d molecule generation. In Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Khodabandeh Yalabadi, A.; Yazdani-Jahromi, M.; Garibay, O.O. BoKDiff: Best-of-K diffusion alignment for target-specific 3D molecule generation. Bioinform. Adv. 2025, 5, vbaf137. [Google Scholar] [CrossRef]
- Chen, L.; Kim, D.; Domaratzki, M.; Hu, P. Uncertainty-aware multi-objective reinforcement learning-guided diffusion models for 3D de novo molecular design. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS 2025), San Diego, CA, USA, 2–7 December 2025. [Google Scholar]
- Yuan, Y.; Pan, X.; Li, X.; Zhang, R.; Su, W. A 3D generation framework using diffusion model and reinforcement learning to generate multi-target compounds with desired properties. J. Cheminform. 2025, 17, 93. [Google Scholar] [CrossRef]
- Ertl, P.; Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 2009, 1, 8. [Google Scholar] [CrossRef]
- Guo, J.; Schwaller, P. Directly optimizing for synthesizability in generative molecular design using retrosynthesis models. Chem. Sci. 2025, 16, 6943–6956. [Google Scholar] [CrossRef]
- Seo, S.; Lim, J.; Kim, W.Y. Molecular generative model via retrosynthetically prepared chemical building block assembly. Adv. Sci. 2023, 10, 2206674. [Google Scholar] [CrossRef]
- Gaiński, P.; Boussif, O.; Rekesh, A.; Shevchuk, D.; Parviz, A.; Tyers, M.; Batey, R.A.; Koziarski, M. Scalable and cost-efficient de novo template-based molecular generation. arXiv 2025, arXiv:2506.19865. [Google Scholar]
- Liu, S.; Zhang, D.; Tu, Z.; Dai, H.; Liu, P. Evaluating Molecule Synthesizability via Retrosynthetic Planning and Reaction Prediction. arXiv 2024, arXiv:2411.08306. [Google Scholar]
- Zeng, X.; Wang, F.; Luo, Y.; Kang, S.G.; Tang, J.; Lightstone, F.C.; Cheng, F. Deep generative molecular design reshapes drug discovery. Cell Rep. Med. 2022, 3, 100794. [Google Scholar] [CrossRef] [PubMed]
- Fu, C.; Chen, Q. The future of pharmaceuticals: Artificial intelligence in drug discovery and development. J. Pharm. Anal. 2025, 15, 101248. [Google Scholar] [CrossRef] [PubMed]
- Ramos, M.C.; Collison, C.J.; White, A.D. A review of large language models and autonomous agents in chemistry. Chem. Sci. 2025, 16, 2514–2572. [Google Scholar] [CrossRef]
- Orengo, C.A.; Michie, A.D.; Jones, S.; Jones, D.T.; Swindells, M.B.; Thornton, J.M. CATH–a hierarchic classification of protein domain structures. Structure 1997, 5, 1093–1109. [Google Scholar] [CrossRef] [PubMed]
- Sillitoe, I.; Lewis, T.E.; Cuff, A.; Das, S.; Ashford, P.; Dawson, N.L.; Furnham, N.; Laskowski, R.A.; Lee, D.; Lees, J.G.; et al. CATH: Comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res. 2015, 43, D376–D381. [Google Scholar] [CrossRef] [PubMed]
- Fox, N.K.; Brenner, S.E.; Chandonia, J.M. SCOPe: Structural Classification of Proteins—Extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res. 2014, 42, D304–D309. [Google Scholar] [CrossRef]
- Chandonia, J.M.; Fox, N.K.; Brenner, S.E. SCOPe: Classification of large macromolecular structures in the structural classification of proteins—Extended database. Nucleic Acids Res. 2019, 47, D475–D481. [Google Scholar] [CrossRef] [PubMed]
- Apweiler, R.; Bairoch, A.; Wu, C.H.; Barker, W.C.; Boeckmann, B.; Ferro, S.; Gasteiger, E.; Huang, H.; Lopez, R.; Magrane, M.; et al. UniProt: The universal protein knowledgebase. Nucleic Acids Res. 2004, 32, D115–D119. [Google Scholar] [CrossRef]
- UniProt: The universal protein knowledgebase in 2025. Nucleic Acids Res. 2025, 53, D609–D617. [CrossRef]
- Suzek, B.E.; Huang, H.; McGarvey, P.; Mazumder, R.; Wu, C.H. UniRef: Comprehensive and non-redundant UniProt reference clusters. Bioinformatics 2007, 23, 1282–1288. [Google Scholar] [CrossRef]
- Varadi, M.; Anyango, S.; Deshpande, M.; Nair, S.; Natassia, C.; Yordanova, G.; Yuan, D.; Oregi, O.; Kleywegt, G.; Kleywegt, G.J.; et al. AlphaFold Protein Structure Database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022, 50, D439–D444. [Google Scholar] [CrossRef]
- Varadi, M.; Bertoni, D.; Magana, P.; Paramval, U.; Pidruchna, I.; Radhakrishnan, M.; Tucholska, A.; Yahiya, M.; Kleywegt, G.J.; Velankar, S. AlphaFold Protein Structure Database in 2024: Providing structure coverage for over 214 million protein sequences. Nucleic Acids Res. 2024, 52, D368–D375. [Google Scholar] [CrossRef]
- Wang, G.; Li, X.; Wang, Z. APD3: The antimicrobial peptide database as a tool for research and education. Nucleic Acids Res. 2016, 44, D1087–D1093. [Google Scholar] [CrossRef]
- Pirtskhalava, M.; Amstrong, A.A.; Grigolava, M.; Chubinidze, M.; Alimbarashvili, E.; Vishnepolsky, B.; Tartakovsky, M. DBAASP v3: Database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic Acids Res. 2021, 49, D288–D297. [Google Scholar] [CrossRef]
- Gautam, A.; Singh, H.; Tyagi, A.; Chaudhary, K.; Kumar, R.; Kapoor, P.; Raghava, G.P.S. CPPsite: A curated database of cell penetrating peptides. Database 2012, 2012, bas015. [Google Scholar] [CrossRef]
- Agrawal, P.; Bhalla, S.; Usmani, S.S.; Singh, S.; Chaudhary, K.; Raghava, G.P.; Gautam, A. CPPsite 2.0: A repository of experimentally validated cell-penetrating peptides. Nucleic Acids Res. 2016, 44, D1098–D1103. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Li, Y.; Han, L.; Li, J.; Liu, J.; Zhao, Z.; Nie, W.; Liu, Y.; Wang, R. PDB-wide collection of binding data: Current status of the PDBbind database. Bioinformatics 2015, 31, 405–412. [Google Scholar] [CrossRef] [PubMed]
- Wang, R.; Fang, X.; Lu, Y.; Yang, C.Y.; Wang, S. The PDBbind database: Methodologies and updates. J. Med. Chem. 2005, 48, 4111–4119. [Google Scholar] [CrossRef]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
- Baek, M.; DiMaio, F.; Anishchenko, I.; Dauparas, J.; Ovchinnikov, S.; Lee, G.R.; Wang, J.; Cong, Q.; Kinch, L.N.; Schaeffer, R.D.; et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021, 373, 871–876. [Google Scholar] [CrossRef] [PubMed]
- Tang, S.; Zhang, Y.; Chatterjee, P. Peptune: De novo generation of therapeutic peptides with multi-objective-guided discrete diffusion. arXiv 2025, arXiv:2412.17780. [Google Scholar]
- Meshchaninov, V.; Strashnov, P.; Shevtsov, A.; Nikolaev, F.; Ivanisenko, N.; Kardymon, O.; Vetrov, D. Diffusion on language model encodings for protein sequence generation. arXiv 2024, arXiv:2403.03726. [Google Scholar]
- Luo, Z.; Geng, A.; Wei, L.; Zou, Q.; Cui, F.; Zhang, Z. CPL-Diff: A Diffusion Model for De Novo Design of Functional Peptide Sequences with Fixed Length. Adv. Sci. 2025, 12, 2412926. [Google Scholar] [CrossRef]
- Szymczak, P.; Możejko, M.; Grzegorzek, T.; Jurczak, R.; Bauer, M.; Neubauer, D.; Sikora, K.; Michalski, M.; Sroka, J.; Setny, P.; et al. Discovering highly potent antimicrobial peptides with deep generative model HydrAMP. Nat. Commun. 2023, 14, 1453. [Google Scholar] [CrossRef]
- Li, T.; Ren, X.; Luo, X.; Wang, Z.; Li, Z.; Luo, X.; Shen, J.; Li, Y.; Yuan, D.; Nussinov, R.; et al. A foundation model identifies broad-spectrum antimicrobial peptides against drug-resistant bacterial infection. Nat. Commun. 2024, 15, 7538. [Google Scholar] [CrossRef]
- Dong, R.; Liu, R.; Liu, Z.; Liu, Y.; Zhao, G.; Li, H.; Hou, S.; Ma, X.; Kang, H.; Liu, J.; et al. Exploring the repository of de novo-designed bifunctional antimicrobial peptides through deep learning. eLife 2025, 13, RP97330. [Google Scholar] [CrossRef]
- Wang, J.; Feng, J.; Kang, Y.; Pan, P.; Ge, J.; Wang, Y.; Wang, M.; Wu, Z.; Zhang, X.; Yu, J.; et al. Discovery of antimicrobial peptides with notable antibacterial potency by an LLM-based foundation model. Sci. Adv. 2025, 11, eads8932. [Google Scholar] [CrossRef]
- Brizuela, C.A.; Liu, G.; Stokes, J.M.; de la Fuente-Nunez, C. AI methods for antimicrobial peptides: Progress and challenges. Microb. Biotechnol. 2025, 18, e70072. [Google Scholar] [CrossRef] [PubMed]
- Cao, J.; Zhang, J.; Yu, Q.; Ji, J.; Li, J.; He, S.; Zhu, Z. TG-CDDPM: Text-guided antimicrobial peptides generation based on conditional denoising diffusion probabilistic model. Briefings Bioinform. 2025, 26, bbae644. [Google Scholar] [CrossRef] [PubMed]
- Jin, S.; Zeng, Z.; Xiong, X.; Huang, B.; Tang, L.; Wang, H.; Lin, F. AMPGen: An evolutionary information-reserved and diffusion-driven generative model for de novo design of antimicrobial peptides. Commun. Biol. 2025, 8, 839. [Google Scholar] [CrossRef]
- Seixas Feio, J.A.; de Oliveira, E.C.L.; de Sales, C.D.S.; da Costa, K.S.; e Lima, A.H.L. Investigating molecular descriptors in cell-penetrating peptides prediction with deep learning: Employing N, O, and hydrophobicity according to the Eisenberg scale. PLoS ONE 2024, 19, e0305253. [Google Scholar] [CrossRef]
- Tran, D.P.; Tada, S.; Yumoto, A.; Kitao, A.; Ito, Y.; Uzawa, T.; Tsuda, K. Using molecular dynamics simulations to prioritize and understand AI-generated cell penetrating peptides. Sci. Rep. 2021, 11, 10630. [Google Scholar] [CrossRef] [PubMed]
- González, R.D.; Simões, S.; Ferreira, L.; Carvalho, A.T. Designing Cell Delivery Peptides and SARS-CoV-2-Targeting Small Interfering RNAs: A Comprehensive Bioinformatics Study with Generative Adversarial Network-Based Peptide Design and In Vitro Assays. Mol. Pharm. 2023, 20, 6079–6089. [Google Scholar] [CrossRef]
- Ramelot, T.A.; Palmer, J.; Montelione, G.T.; Bhardwaj, G. Cell-permeable chameleonic peptides: Exploiting conformational dynamics in de novo cyclic peptide design. Curr. Opin. Struct. Biol. 2023, 80, 102603. [Google Scholar] [CrossRef]
- Lai, L.; Liu, Y.; Song, B.; Li, K.; Zeng, X. Deep generative models for therapeutic peptide discovery: A comprehensive review. ACM Comput. Surv. 2025, 57, 1–29. [Google Scholar] [CrossRef]
- Sutcliffe, R.; Doherty, C.P.; Morgan, H.P.; Dunne, N.J.; Mccarthy, H.O. Strategies for the design of biomimetic cell-penetrating peptides using AI-driven in silico tools for drug delivery. Biomater. Adv. 2024, 169, 214153. [Google Scholar] [CrossRef]
- Zhang, S.; Jiang, Z.; Huang, R.; Mo, S.; Zhu, L.; Li, P.; Zhang, Z.; Pan, E.; Chen, X.; Long, Y.; et al. Pro-ldm: Protein sequence generation with a conditional latent diffusion model. bioRxiv 2023. [Google Scholar] [CrossRef]
- Chen, T.; Vure, P.; Pulugurta, R.; Chatterjee, P. AMP-diffusion: Integrating latent diffusion with protein language models for antimicrobial peptide generation. bioRxiv 2024. [Google Scholar] [CrossRef]
- Wang, Y.; Song, M.; Liu, F.; Liang, Z.; Hong, R.; Dong, Y.; Luan, H.; Fu, X.; Yuan, W.; Fang, W.; et al. Artificial intelligence using a latent diffusion model enables the generation of diverse and potent antimicrobial peptides. Sci. Adv. 2025, 11, eadp7171. [Google Scholar] [CrossRef]
- Rezaee, K.; Eslami, H. Bridging machine learning and peptide design for cancer treatment: A comprehensive review. Artif. Intell. Rev. 2025, 58, 1–59. [Google Scholar] [CrossRef]
- Wan, F.; Wong, F.; Collins, J.J.; de la Fuente-Nunez, C. Machine learning for antimicrobial peptide identification and design. Nat. Rev. Bioeng. 2024, 2, 392–407. [Google Scholar] [CrossRef] [PubMed]
- Rettie, S.A.; Bhardwaj, G. Deep learning-enabled design of macrocyclic peptide binders. Nat. Chem. Biol. 2025. [Google Scholar] [CrossRef] [PubMed]
- Rettie, S.A.; Juergens, D.; Adebomi, V.; Bueso, Y.F.; Zhao, Q.; Leveille, A.N.; Bhardwaj, G. Accurate de novo design of high-affinity protein-binding macrocycles using deep learning. Nat. Chem. Biol. 2025, 1–9. [Google Scholar] [CrossRef]
- Dauparas, J.; Anishchenko, I.; Bennett, N.; Bai, H.; Ragotte, R.J.; Milles, L.F.; Baker, D. Robust deep learning–based protein sequence design using ProteinMPNN. Science 2022, 378, 49–56. [Google Scholar] [CrossRef]
- Cao, L.; Coventry, B.; Goreshnik, I.; Huang, B.; Sheffler, W.; Park, J.S.; Jude, K.M.; Marković, I.; Kadam, R.U.; Verschueren, K.H.G.; et al. Design of protein-binding proteins from the target structure alone. Nature 2022, 605, 551–560. [Google Scholar] [CrossRef] [PubMed]
- Bennett, N.R.; Coventry, B.; Goreshnik, I.; Huang, B.; Allen, A.; Vafeados, D.; Baker, D. Improving de novo protein binder design with deep learning. Nat. Commun. 2023, 14, 2625. [Google Scholar] [CrossRef]
- Vázquez Torres, S.; Leung, P.J.; Venkatesh, P.; Lutz, I.D.; Hink, F.; Huynh, H.H.; Becker, J.; Yeh, A.H.; Juergens, D.; Bennett, N.R.; et al. De novo design of high-affinity binders of bioactive helical peptides. Nature 2024, 626, 435–442. [Google Scholar] [CrossRef]
- Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y.; et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379, 1123–1130. [Google Scholar] [CrossRef]
- Fetse, J.; Kandel, S.; Mamani, U.F.; Cheng, K. Recent advances in the development of therapeutic peptides. Trends Pharmacol. Sci. 2023, 44, 425–441. [Google Scholar] [CrossRef] [PubMed]
- Achilleos, K.; Petrou, C.; Nicolaidou, V.; Sarigiannis, Y. Beyond Efficacy: Ensuring Safety in Peptide Therapeutics through Immunogenicity Assessment. J. Pept. Sci. 2025, 31, e70016. [Google Scholar] [CrossRef] [PubMed]
- Rettie, S.A.; Campbell, K.V.; Bera, A.K.; Kang, A.; Kozlov, S.; Bueso, Y.F.; Bhardwaj, G. Cyclic peptide structure prediction and design using AlphaFold2. Nat. Commun. 2025, 16, 4730. [Google Scholar] [CrossRef]
- Reymond, J.L. The chemical space project. Accounts Chem. Res. 2015, 48, 722–730. [Google Scholar] [CrossRef]
- Doak, B.C.; Over, B.; Giordanetto, F.; Kihlberg, J. Oral druggable space beyond the rule of 5: Insights from drugs and clinical candidates. Chem. Biol. 2014, 21, 1115–1142. [Google Scholar] [CrossRef] [PubMed]
- Fosgerau, K.; Hoffmann, T. Peptide therapeutics: Current status and future directions. Drug Discov. Today 2015, 20, 122–128. [Google Scholar] [CrossRef] [PubMed]
- Kitchen, D.B.; Decornez, H.; Furr, J.R.; Bajorath, J. Docking and scoring in virtual screening for drug discovery: Methods and applications. Nat. Rev. Drug Discov. 2004, 3, 935–949. [Google Scholar] [CrossRef]
- Daina, A.; Michielin, O.; Zoete, V. SwissADME: A free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci. Rep. 2017, 7, 42717. [Google Scholar] [CrossRef]
- Rich, R.L.; Myszka, D.G. Advances in surface plasmon resonance biosensor analysis. Curr. Opin. Biotechnol. 2000, 11, 54–61. [Google Scholar] [CrossRef]
- Myszka, D.G.; Rich, R.L. Implementing surface plasmon resonance biosensors in drug discovery. Pharm. Sci. Technol. Today 2000, 3, 310–317. [Google Scholar] [CrossRef]
- Myszka, D.G. Kinetic analysis of macromolecular interactions using surface plasmon resonance biosensors. Curr. Opin. Biotechnol. 1997, 8, 50–57. [Google Scholar] [CrossRef]
- Polykovskiy, D.; Zhebrak, A.; Sanchez-Lengeling, B.; Golovanov, S.; Tatanov, O.; Belyaev, S.; Kurbanov, R.; Artamonov, A.; Aladinskiy, V.; Veselov, M.; et al. Molecular sets (MOSES): A benchmarking platform for molecular generation models. Front. Pharmacol. 2020, 11, 565644. [Google Scholar] [CrossRef] [PubMed]
- Igashov, I.; Stärk, H.; Vignac, C.; Schneuing, A.; Satorras, V.G.; Frossard, P.; Welling, M.; Bronstein, M.; Correia, B. Equivariant 3D-conditional diffusion model for molecular linker design. Nat. Mach. Intell. 2024, 6, 417–427. [Google Scholar] [CrossRef]
- Ingraham, J.B.; Baranov, M.; Costello, Z.; Barber, K.W.; Wang, W.; Ismail, A.; Frappier, V.; Lord, D.M.; Ng-Thow-Hing, C.; Van Vlack, E.R.; et al. Illuminating protein space with a programmable generative model. Nature 2023, 623, 1070–1078. [Google Scholar] [CrossRef] [PubMed]
- Trott, O.; Olson, A.J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010, 31, 455–461. [Google Scholar] [CrossRef]
- Eberhardt, J.; Santos-Martins, D.; Tillack, A.F.; Forli, S. AutoDock Vina 1.2.0: New docking methods, expanded force field, and python bindings. J. Chem. Inf. Model. 2021, 61, 3891–3898. [Google Scholar] [CrossRef] [PubMed]
- Friesner, R.A.; Banks, J.L.; Murphy, R.B.; Halgren, T.A.; Klicic, J.J.; Mainz, D.T.; Shenkin, P.S. Glide: A new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 2004, 47, 1739–1749. [Google Scholar] [CrossRef]
- Hollingsworth, S.A.; Dror, R.O. Molecular dynamics simulation for all. Neuron 2018, 99, 1129–1143. [Google Scholar] [CrossRef] [PubMed]
- Hospital, A.; Goñi, J.; Orozco, M.; Gelpí, J.L. Molecular dynamics simulations: Advances and applications. Adv. Appl. Bioinform. Chem. 2015, 15, 37–47. [Google Scholar] [CrossRef]
- Jiménez, J.; Skalic, M.; Martinez-Rosell, G.; De Fabritiis, G. K deep: Protein–ligand absolute binding affinity prediction via 3d-convolutional neural networks. J. Chem. Inf. Model. 2018, 58, 287–296. [Google Scholar] [CrossRef]
- Wang, L.; Wu, Y.; Deng, Y.; Kim, B.; Pierce, L.; Krilov, G.; Abel, R. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. J. Am. Chem. Soc. 2015, 137, 2695–2703. [Google Scholar] [CrossRef]
- Kwon, Y.; Shin, W.H.; Ko, J.; Lee, J. AK-score: Accurate protein-ligand binding affinity prediction using an ensemble of 3D-convolutional neural networks. Int. J. Mol. Sci. 2020, 21, 8424. [Google Scholar] [CrossRef] [PubMed]
- Lee, H.J.; Emani, P.S.; Gerstein, M.B. Improved Prediction of Ligand–Protein Binding Affinities by Meta-modeling. J. Chem. Inf. Model. 2024, 64, 8684–8704. [Google Scholar] [CrossRef]
- Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B.A.; Thiessen, P.A.; Yu, B.; et al. PubChem 2023 update. Nucleic Acids Res. 2023, 51, D1373–D1380. [Google Scholar] [CrossRef]
- Kim, S.; Thiessen, P.A.; Bolton, E.E.; Chen, J.; Fu, G.; Gindulyte, A.; Han, L.; He, J.; He, S.; Shoemaker, B.A.; et al. PubChem substance and compound databases. Nucleic Acids Res. 2016, 44, D1202–D1213. [Google Scholar] [CrossRef]
- Wu, X.; Lin, H.; Bai, R.; Duan, H. Deep learning for advancing peptide drug development: Tools and methods in structure prediction and design. Eur. J. Med. Chem. 2024, 268, 116262. [Google Scholar] [CrossRef]
- Bhati, A.P.; Wan, S.; Alfè, D.; Clyde, A.R.; Bode, M.; Tan, L.; Titov, M.; Merzky, A.; Turilli, M.; Jha, S.; et al. Pandemic drugs at pandemic speed: Infrastructure for accelerating COVID-19 drug discovery with hybrid machine learning-and physics-based simulations on high-performance computers. Interface Focus 2021, 11, 20210018. [Google Scholar] [CrossRef]
- Filella-Merce, I.; Molina, A.; Díaz, L.; Orzechowski, M.; Berchiche, Y.A.; Zhu, Y.M.; Vilalta-Mor, J.; Malo, L.; Yekkirala, A.S.; Ray, S.; et al. Optimizing drug design by merging generative AI with a physics-based active learning framework. Commun. Chem. 2025, 8, 238. [Google Scholar] [CrossRef]
- Gorantla, R.; Kubincova, A.; Suutari, B.; Cossins, B.P.; Mey, A.S. Benchmarking active learning protocols for ligand-binding affinity prediction. J. Chem. Inf. Model. 2024, 64, 1955–1965. [Google Scholar] [CrossRef] [PubMed]
- Bailey, M.; Moayedpour, S.; Li, R.; Corrochano-Navarro, A.; Kötter, A.; Kogler-Anele, L.; Riahi, S.; Grebner, C.; Hessler, G.; Matter, H.; et al. Deep Batch Active Learning for Drug Discovery. eLife 2024, 12. [Google Scholar] [CrossRef]
- Loeffler, H.H.; Wan, S.; Klähn, M.; Bhati, A.P.; Coveney, P.V. Optimal molecular design: Generative active learning combining REINVENT with precise binding free energy ranking simulations. J. Chem. Theory Comput. 2024, 20, 8308–8328. [Google Scholar] [CrossRef]
- Goles, M.; Daza, A.; Cabas-Mora, G.; Sarmiento-Varón, L.; Sepúlveda-Yañez, J.; Anvari-Kazemabad, H.; Davari, M.D.; Uribe-Paredes, R.; Olivera-Nappa, Á; Navarrete, M.A.; et al. Peptide-based drug discovery through artificial intelligence: Towards an autonomous design of therapeutic peptides. Briefings Bioinform. 2024, 25. [Google Scholar] [CrossRef]
- Al-Omari, A.M.; Akkam, Y.H.; Zyout, A.A.; Younis, S.A.; Tawalbeh, S.M.; Al-Sawalmeh, K.; Al Fahoum, A.; Arnold, J. Accelerating antimicrobial peptide design: Leveraging deep learning for rapid discovery. PLoS ONE 2024, 19, e0315477. [Google Scholar] [CrossRef] [PubMed]
- Matzko, R.; Konur, S. Technologies for design-build-test-learn automation and computational modelling across the synthetic biology workflow: A review. Netw. Model. Anal. Health Inform. Bioinform. 2024, 13, 22. [Google Scholar] [CrossRef]
- National Academies of Sciences, Engineering, and Medicine. The Age of AI in the Life Sciences: Benefits and Biosecurity Considerations; The National Academies Press: Washington, DC, USA, 2025. [Google Scholar]
- Liao, X.; Ma, H.; Tang, Y.J. Artificial intelligence: A solution to involution of design–build–test–learn cycle. Curr. Opin. Biotechnol. 2022, 75, 102712. [Google Scholar] [CrossRef]
- Abolhasani, M.; Kumacheva, E. The rise of self-driving labs in chemical and materials sciences. Nat. Synth. 2023, 2, 483–492. [Google Scholar] [CrossRef]
- Dai, T.; Vijayakrishnan, S.; Szczypiński, F.T.; Ayme, J.F.; Simaei, E.; Fellowes, T.; Clowes, R.; Kotopanov, L.; Shields, C.E.; Zhou, Z.; et al. Autonomous mobile robots for exploratory synthetic chemistry. Nature 2024, 635, 890–897. [Google Scholar] [CrossRef]
- Tom, G.; Schmid, S.P.; Baird, S.G.; Cao, Y.; Darvish, K.; Hao, H.; Lo, S.; Pablo-García, S.; Rajaonson, E.M.; Skreta, M.; et al. Self-driving laboratories for chemistry and materials science. Chem. Rev. 2024, 124, 9633–9732. [Google Scholar] [CrossRef]
- Ha, T.; Lee, D.; Kwon, Y.; Park, M.S.; Lee, S.; Jang, J.; Choi, B.; Jeon, H.; Kim, J.; Choi, H.; et al. AI-driven robotic chemist for autonomous synthesis of organic molecules. Sci. Adv. 2023, 9, eadj0461. [Google Scholar] [CrossRef] [PubMed]
- Kusne, A.G.; Yu, H.; Wu, C.; Zhang, H.; Hattrick-Simpers, J.; DeCost, B.; Sarker, S.; Oses, C.; Toher, C.; Curtarolo, S.; et al. On-the-fly closed-loop materials discovery via Bayesian active learning. Nat. Commun. 2020, 11, 5966. [Google Scholar] [CrossRef] [PubMed]
- Ramos, M.C.; Michtavy, S.S.; Porosoff, M.D.; White, A.D. Bayesian optimization of catalysts with in-context learning. arXiv 2023, arXiv:2304.05341. [Google Scholar] [CrossRef]
- Xian, Y.; Ding, X.; Jiang, X.; Zhou, Y.; Sun, J.; Xue, D.; Lookman, T. Unlocking the black box beyond Bayesian global optimization for materials design using reinforcement learning. Npj Comput. Mater. 2025, 11, 143. [Google Scholar] [CrossRef]
- Wu, Y.; Walsh, A.; Ganose, A.M. Race to the bottom: Bayesian optimisation for chemical problems. Digit. Discov. 2024, 3, 1086–1100. [Google Scholar] [CrossRef]
- Klarner, L.; Rudner, T.G.; Morris, G.M.; Deane, C.M.; Teh, Y.W. Context-guided diffusion for out-of-distribution molecular and protein design. In Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024; pp. 24770–24807. [Google Scholar]
- Wacker, D.; Stevens, R.C.; Roth, B.L. How ligands illuminate GPCR molecular pharmacology. Cell 2017, 170, 414–427. [Google Scholar] [CrossRef] [PubMed]
- Chen, S.; Lin, T.; Basu, R.; Ritchey, J.; Wang, S.; Luo, Y.; Cheng, X. Design of target specific peptide inhibitors using generative deep learning and molecular dynamics simulations. Nat. Commun. 2024, 15, 1611. [Google Scholar] [CrossRef] [PubMed]
- Khoee, A.G.; Yu, Y.; Feldt, R. Domain generalization through meta-learning: A survey. Artif. Intell. Rev. 2024, 57, 285. [Google Scholar] [CrossRef]
- Xie, W.; Zhang, J.; Xie, Q.; Gong, C.; Ren, Y.; Xie, J.; Pei, J. Accelerating discovery of bioactive ligands with pharmacophore-informed generative models. Nat. Commun. 2025, 16, 2391. [Google Scholar] [CrossRef] [PubMed]
- Dharmasivam, M.; Kaya, B.; Akinware, A.; Azad, M.G.; Richardson, D.R. Leading AI-Driven Drug Discovery Platforms: 2025 Landscape and Global Outlook. Pharmacol. Rev. 2025, 100102. [Google Scholar] [CrossRef]
- Dermawan, D.; Alotaiq, N. From Lab to Clinic: How Artificial Intelligence (AI) Is Reshaping Drug Discovery Timelines and Industry Outcomes. Pharmaceuticals 2025, 18, 981. [Google Scholar] [CrossRef]



| Feature | Small Molecules | Therapeutic Peptides |
|---|---|---|
| Representation | Graphs: Atoms & bonds 3D Point Clouds: Coordinates Requires equivariance [53,70,71] | Sequences: Discrete amino acids 3D Backbones: Continuous coordinates Often requires distinct models for sequence (discrete) and structure (continuous) generation |
| Chemical Space | Vast & Discontinuous (∼) [11,12,15,188] Learns implicit chemical rules (e.g., valence) | Combinatorial & Structured () [12] Governed by protein folding principles |
| Typical Size | MW: 150–900 Da (oral drugs often 300–500 Da) [189] Heavy Atoms: 10–50 Mostly rigid structures | MW: 500–5000 Da Length: 5–50 amino acids [190] Highly flexible, multiple conformations |
| Key Challenge | Synthesizability: Can it be made? [132] Stereochemistry control | Biological Stability: Folding, proteolysis Immunogenicity avoidance [190] |
| Validation | Computational: Docking, ADMET [191,192] Experimental: Synthesis, binding assays (SPR, ITC) [193,194,195] | Computational: Structure prediction (AF2) [155] Experimental: Expression, binding & stability assays |
| Conditioning | Protein pocket geometry [59,103,121] Pharmacophores, desired properties (QED, logP) [126] | Target protein surface [65] Structural motifs (helix), sequence patterns |
| Data & Cost | Data: PDBbind (∼20k complexes), CrossDocked ( 100k pairs) Cost: Varies widely by model and scale | Data: PDB (∼220k entries), AlphaFold DB (>200 M structures) Cost: Varies widely by model and scale |
| Success Metrics | Chemical: Validity, Uniqueness, Novelty [96,103,196] Predicted Affinity: High-affinity rate | Structural: Designability (folds to target) [155] Experimental Success: Varies, often a few to tens of percent [65] |
| Example Works | Pocket2Mol [121], DiffSBDD [103], TargetDiff [59], GeoDiff [71], DiffLinker [197] | RFdiffusion [65], ProteinMPNN [180] (seq. design), Chroma [198], EvoDiff [61], FoldingDiff [67] |
| Model | Modality/Role | Key Performance Metrics & Highlights |
|---|---|---|
| Small Molecule Generation (Diffusion Models) | ||
| Pocket2Mol [121] | Structure-based generation | Avg. Vina score: −7.29 kcal/mol; High-affinity rate: 54.2%; Good drug-likeness (QED: 0.56). |
| DiffSBDD [103] | Structure-based generation | High chemical validity (97.8%) and novelty (85.7%); Median Vina score: −7.50 kcal/mol. |
| TargetDiff [59] | Guided generation | State-of-the-art binding affinity (Avg. Vina: −7.80 kcal/mol); High-affinity rate: 58.1%. |
| GeoDiff [71] | Conformer generation | High-quality 3D conformer generation with low geometric error (MAT-R: 0.86 Å on Drugs dataset). |
| Peptide and Protein Design (Diffusion-Centric Workflows) | ||
| RFdiffusion [65] | Backbone generation (Diffusion) | High experimental success rate for binders (14–19%); Generated structures match Cryo-EM to 0.63 Å RMSD. |
| ProteinMPNN [180] | Sequence design (GNN, non-diffusion) | High native sequence recovery (52.4%); Essential downstream tool for designing sequences for generated backbones. |
| Chroma [198] | Protein/Complex generation (Diffusion) | Experimentally confirmed designs with crystal structures matching to ~1.0 Å RMSD; Generates diverse topologies. |
| EvoDiff [61] | Sequence generation (Discrete Diffusion) | High experimental success for functional proteins (65–75%); Generates evolutionarily plausible sequences. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Ma, Y.; Chang, Y.; Yan, J.; Zhang, J.; Cai, M.; Wei, K. Diffusion Models at the Drug Discovery Frontier: A Review on Generating Small Molecules Versus Therapeutic Peptides. Biology 2025, 14, 1665. https://doi.org/10.3390/biology14121665
Wang Y, Ma Y, Chang Y, Yan J, Zhang J, Cai M, Wei K. Diffusion Models at the Drug Discovery Frontier: A Review on Generating Small Molecules Versus Therapeutic Peptides. Biology. 2025; 14(12):1665. https://doi.org/10.3390/biology14121665
Chicago/Turabian StyleWang, Yiquan, Yahui Ma, Yuhan Chang, Jiayao Yan, Jialin Zhang, Minnuo Cai, and Kai Wei. 2025. "Diffusion Models at the Drug Discovery Frontier: A Review on Generating Small Molecules Versus Therapeutic Peptides" Biology 14, no. 12: 1665. https://doi.org/10.3390/biology14121665
APA StyleWang, Y., Ma, Y., Chang, Y., Yan, J., Zhang, J., Cai, M., & Wei, K. (2025). Diffusion Models at the Drug Discovery Frontier: A Review on Generating Small Molecules Versus Therapeutic Peptides. Biology, 14(12), 1665. https://doi.org/10.3390/biology14121665

