AI-Integrated QSAR Modeling for Enhanced Drug Discovery: From Classical Approaches to Deep Learning and Structural Insight
Abstract
1. Introduction
2. Foundations of QSAR and Molecular Descriptors
3. Classical QSAR: Statistical Modeling Techniques
4. Machine Learning Rise in QSAR
5. Deep Learning and Neural Models in Drug Discovery
6. Molecular Docking and Dynamics
7. PROTACs and Targeted Protein Degradation
8. Predicting ADMET and Toxicity Profiles
9. Assessing Model Validity and Reliability
10. Software, Databases and Computational Platforms
11. Challenges, Ethical Considerations and Regulatory Aspects
12. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
References
- Cherkasov, A.; Muratov, E.N.; Fourches, D.; Varnek, A.; Baskin, I.I.; Cronin, M.; Dearden, J.; Gramatica, P.; Martin, Y.C.; Todeschini, R. QSAR modeling: Where have you been? Where are you going to? J. Med. Chem. 2014, 57, 4977–5010. [Google Scholar] [CrossRef]
- Wu, Z.; Ramsundar, B.; Feinberg, E.N.; Gomes, J.; Geniesse, C.; Pappu, A.S.; Leswing, K.; Pande, V. MoleculeNet: A benchmark for molecular machine learning. Chem. Sci. 2018, 9, 513–530. [Google Scholar] [CrossRef] [PubMed]
- Tropsha, A. Best practices for QSAR model development, validation, and exploitation. Mol. Inform. 2010, 29, 476–488. [Google Scholar] [CrossRef]
- Roy, K.; Kar, S.; Das, R.N. A Primer on QSAR/QSPR Modeling: Fundamental Concepts; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
- Hansch, C.; Fujita, T. p-σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure. J. Am. Chem. Soc. 1964, 86, 1616–1626. [Google Scholar] [CrossRef]
- Kubinyi, H. QSAR and 3D QSAR in drug design Part 1: Methodology. Drug Discov. Today 1997, 2, 457–467. [Google Scholar] [CrossRef]
- De, P.; Kar, S.; Ambure, P.; Roy, K. Prediction reliability of QSAR models: An overview of various validation tools. Arch. Toxicol. 2022, 96, 1279–1295. [Google Scholar] [CrossRef]
- Ren, B. Novel atomic-level-based AI topological descriptors: Application to QSPR/QSAR modeling. J. Chem. Inf. Comput. Sci. 2002, 42, 858–868. [Google Scholar] [CrossRef]
- Tropsha, A.; Isayev, O.; Varnek, A.; Schneider, G.; Cherkasov, A. Integrating QSAR modelling and deep learning in drug discovery: The emergence of deep QSAR. Nat. Rev. Drug Discov. 2024, 23, 141–155. [Google Scholar] [CrossRef]
- Li, F.; Hu, Q.; Zhang, X.; Sun, R.; Liu, Z.; Wu, S.; Tian, S.; Ma, X.; Dai, Z.; Yang, X. DeepPROTACs is a deep learning-based targeted degradation predictor for PROTACs. Nat. Commun. 2022, 13, 7133. [Google Scholar] [CrossRef]
- Sheridan, R.P.; Baskin, I.I.; Curtarolo, S.; Isayev, O.; Tropsha, A.; Filimonov, D.; Poroikov, V.; Tetko, I.V.; Varnek, A.; Roitberg, A.E. Correction: QSAR without borders. Chem. Soc. Rev. 2020, 49, 3716, Correction in Chem. Soc. Rev. 2020, 49, 3525–3564. [Google Scholar] [CrossRef]
- Talukder, M.E.K.; Atif, M.F.; Siddiquee, N.H.; Rahman, S.; Rafi, N.I.; Israt, S.; Shahir, N.F.; Islam, M.T.; Samad, A.; Wani, T.A. Molecular docking, QSAR, and simulation analyses of EGFR-targeting phytochemicals in non-small cell lung cancer. J. Mol. Struct. 2025, 1321, 139924. [Google Scholar] [CrossRef]
- Kaur, N.; Gupta, S.; Pal, J.; Bansal, Y.; Bansal, G. Design of BBB permeable BACE-1 inhibitor as potential drug candidate for Alzheimer disease: 2D-QSAR, molecular docking, ADMET, molecular dynamics, MMGBSA. Comput. Biol. Chem. 2025, 116, 108371. [Google Scholar] [CrossRef]
- Souza, A.S.d.; Amorim, V.M.d.F.; Soares, E.P.; de Souza, R.F.; Guzzo, C.R. Antagonistic trends between binding affinity and drug-likeness in SARS-CoV-2 MPRO inhibitors revealed by machine learning. Viruses 2025, 17, 935. [Google Scholar] [CrossRef]
- Maliyakkal, N.; Kumar, S.; Bhowmik, R.; Vishwakarma, H.C.; Yadav, P.; Mathew, B. Two-dimensional QSAR-driven virtual screening for potential therapeutics against Trypanosoma cruzi. Front. Chem. 2025, 13, 1600945. [Google Scholar] [CrossRef]
- Ou-Yang, S.-S.; Lu, J.-Y.; Kong, X.-Q.; Liang, Z.-J.; Luo, C.; Jiang, H. Computational drug discovery. Acta Pharmacol. Sin. 2012, 33, 1131–1140. [Google Scholar] [CrossRef]
- Ouma, R.B.; Ngari, S.M.; Kibet, J.K. A review of the current trends in computational approaches in drug design and metabolism. Discov. Public Health 2024, 21, 108. [Google Scholar] [CrossRef]
- Lavecchia, A. Machine-learning approaches in drug discovery: Methods and applications. Drug Discov. Today 2015, 20, 318–331. [Google Scholar] [CrossRef]
- Paul, D.; Sanap, G.; Shenoy, S.; Kalyane, D.; Kalia, K.; Tekade, R.K. Artificial intelligence in drug discovery and development. Drug Discovery Today 2021, 26, 80–93. [Google Scholar] [CrossRef] [PubMed]
- Roy, K.; Kar, S.; Das, R.N. Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment; Academic Press: Cambridge, MA, USA, 2015. [Google Scholar]
- Le, T.T.; Fu, W.; Moore, J.H. Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics 2020, 36, 250–256. [Google Scholar] [CrossRef] [PubMed]
- Romano, J.D.; Le, T.T.; Fu, W.; Moore, J.H. TPOT-NN: Augmenting tree-based automated machine learning with neural network estimators. Genet. Program. Evolvable Mach. 2021, 22, 207–227. [Google Scholar] [CrossRef]
- Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
- Das, J.; Chen, P.; Norris, D.; Padmanabha, R.; Lin, J.; Moquin, R.V.; Shen, Z.; Cook, L.S.; Doweyko, A.M.; Pitt, S. 2-Aminothiazole as a Novel Kinase Inhibitor Template. Structure—Activity Relationship Studies toward the Discovery of N-(2-Chloro-6-methylphenyl)-2-[[6-[4-(2-hydroxyethyl)-1-piperazinyl)]-2-methyl-4-pyrimidinyl] amino)]-1, 3-thiazole-5-carboxamide (Dasatinib, BMS-354825) as a Potent pan-Src Kinase Inhibitor. J. Med. Chem. 2006, 49, 6819–6832. [Google Scholar]
- Vedani, A.; Dobler, M. 5D-QSAR: The key for simulating induced fit? J. Med. Chem. 2002, 45, 2139–2149. [Google Scholar] [CrossRef]
- Todeschini, R.; Consonni, V. Molecular Descriptors for Chemoinformatics: Volume I: Alphabetical Listing/Volume II: Appendices, References; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
- Yap, C.W. PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 2011, 32, 1466–1474. [Google Scholar] [CrossRef]
- Landrum, G. Rdkit: Open-Source Cheminformatics Software. 2016. Available online: https://github.com/rdkit/rdkit (accessed on 2 August 2025).
- Duvenaud, D.K.; Maclaurin, D.; Iparraguirre, J.; Bombarell, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R.P. Convolutional networks on graphs for learning molecular fingerprints. Adv. Neural Inf. Process. Syst. 2015, 2, 2224–2232. [Google Scholar]
- Yang, K.; Swanson, K.; Jin, W.; Coley, C.; Eiden, P.; Gao, H.; Guzman-Perez, A.; Hopper, T.; Kelley, B.; Mathea, M. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 2019, 59, 3370–3388, Correction in J. Chem. Inf. Model. 2019, 12, 5304–5305. [Google Scholar] [CrossRef] [PubMed]
- Hung, C.; Gini, G. QSAR modeling without descriptors using graph convolutional neural networks: The case of mutagenicity prediction. Mol. Divers. 2021, 25, 1283–1299. [Google Scholar] [CrossRef]
- Varmuza, K.; Dehmer, M.; Bonchev, D. Statistical Modelling of Molecular Descriptors in QSAR/QSPR; Wiley Online Library: Hoboken, NJ, USA, 2012. [Google Scholar]
- Gini, G. QSAR methods. In In Silico Methods for Predicting Drug Toxicity; Springer: Berlin/Heidelberg, Germany, 2022; pp. 1–26. [Google Scholar]
- Tetko, I.V.; Sushko, I.; Pandey, A.K.; Zhu, H.; Tropsha, A.; Papa, E.; Oberg, T.; Todeschini, R.; Fourches, D.; Varnek, A. Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: Focusing on applicability domain and overfitting by variable selection. J. Chem. Inf. Model. 2008, 48, 1733–1746. [Google Scholar] [CrossRef]
- Riley, R.D.; Collins, G.S. Stability of clinical prediction models developed using statistical or machine learning methods. Biom. J. 2023, 65, 2200302. [Google Scholar] [CrossRef] [PubMed]
- Cai, Z.; Zafferani, M.; Akande, O.M.; Hargrove, A.E. Quantitative Structure–Activity Relationship (QSAR) Study Predicts Small-Molecule Binding to RNA Structure. J. Med. Chem. 2022, 65, 7262–7277. [Google Scholar] [CrossRef]
- Bueso-Bordils, J.I.; Antón-Fos, G.M.; Martín-Algarra, R.; Alemán-López, P.A. Overview of computational toxicology methods applied in drug and green chemical discovery. J. Xenobiot. 2024, 14, 1901–1918. [Google Scholar] [CrossRef]
- Mora, J.R.; Marquez, E.A.; Pérez-Pérez, N.; Contreras-Torres, E.; Perez-Castillo, Y.; Agüero-Chapin, G.; Martinez-Rios, F.; Marrero-Ponce, Y.; Barigye, S.J. Rethinking the applicability domain analysis in QSAR models. J. Comput.-Aided Mol. Des. 2024, 38, 9. [Google Scholar] [CrossRef]
- Olenginski, L.T.; Wierzba, A.J.; Laursen, S.P.; Batey, R.T. Designing small molecules targeting a cryptic RNA binding site through base displacement. Nat. Chem. Biol. 2025, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Wu, Z.; Zhu, M.; Kang, Y.; Leung, E.L.-H.; Lei, T.; Shen, C.; Jiang, D.; Wang, Z.; Cao, D.; Hou, T. Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets. Brief. Bioinform. 2021, 22, bbaa321. [Google Scholar] [CrossRef] [PubMed]
- Zhang, F.; Wang, Z.; Peijnenburg, W.J.; Vijver, M.G. Machine learning-driven QSAR models for predicting the mixture toxicity of nanoparticles. Environ. Int. 2023, 177, 108025. [Google Scholar] [CrossRef]
- Singh, K.; Ghosh, I.; Jayaprakash, V.; Jayapalan, S. Building a ML-based QSAR model for predicting the bioactivity of therapeutically active drug class with imidazole scaffold. Eur. J. Med. Chem. Rep. 2024, 11, 100148. [Google Scholar] [CrossRef]
- Lenselink, E.B.; Ten Dijke, N.; Bongers, B.; Papadatos, G.; Van Vlijmen, H.W.; Kowalczyk, W.; IJzerman, A.P.; Van Westen, G.J. Beyond the hype: Deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set. J. Cheminform. 2017, 9, 45. [Google Scholar] [CrossRef]
- Nayarisseri, A.; Khandelwal, R.; Tanwar, P.; Madhavi, M.; Sharma, D.; Thakur, G.; Speck-Planche, A.; Singh, S.K. Artificial intelligence, big data and machine learning approaches in precision medicine & drug discovery. Curr. Drug Targets 2021, 22, 631–655. [Google Scholar]
- Matboli, M.; Al-Amodi, H.S.; Khaled, A.; Khaled, R.; Roushdy, M.M.; Ali, M.; Diab, G.I.; Elnagar, M.F.; Elmansy, R.A.; TAhmed, H.H. Comprehensive machine learning models for predicting therapeutic targets in type 2 diabetes utilizing molecular and biochemical features in rats. Front. Endocrinol. 2024, 15, 1384984. [Google Scholar] [CrossRef]
- Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947–1958. [Google Scholar] [CrossRef]
- Koutsoukas, A.; Monaghan, K.J.; Li, X.; Huan, J. Deep-learning: Investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J. Cheminform. 2017, 9, 42. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference of Neutral Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; p. 30. [Google Scholar]
- Mazanetz, M.P.; Marmon, R.J.; Reisser, C.B.T.; Morao, I. Drug discovery applications for KNIME: An open source data mining platform. Curr. Top. Med. Chem. 2012, 12, 1965–1979. [Google Scholar] [CrossRef]
- Niazi, S.K.; Mariam, Z. Recent Advances in Machine-Learning-Based Chemoinformatics: A Comprehensive Review. Int. J. Mol. Sci. 2023, 24, 11488. [Google Scholar] [CrossRef]
- Van Tilborg, D.; Alenicheva, A.; Grisoni, F. Exposing the limitations of molecular machine learning with activity cliffs. J. Chem. Inf. Model. 2022, 62, 5938–5951. [Google Scholar] [CrossRef] [PubMed]
- Scholz, G.E.; Linard, B.; Romashchenko, N.; Rivals, E.; Pardi, F. Rapid screening and detection of inter-type viral recombinants using phylo-k-mers. Bioinformatics 2020, 36, 5351–5360. [Google Scholar] [CrossRef] [PubMed]
- Kalian, A.D.; Benfenati, E.; Osborne, O.J.; Gott, D.; Potter, C.; Dorne, J.-L.C.; Guo, M.; Hogstrand, C. Exploring dimensionality reduction techniques for deep learning driven QSAR models of mutagenicity. Toxics 2023, 11, 572. [Google Scholar] [CrossRef]
- Noviandy, T.R.; Idroes, G.M.; Maulana, A.; Afidh, R.P.F.; Idroes, R. Optimizing hepatitis C virus inhibitor identification with LightGBM and tree-structured parzen estimator sampling. Eng. Technol. Appl. Sci. Res. 2024, 14, 18810–18817. [Google Scholar] [CrossRef]
- Goh, G.B.; Hodas, N.O.; Vishnu, A. Deep learning for computational chemistry. J. Comput. Chem. 2017, 38, 1291–1307. [Google Scholar] [CrossRef]
- Zhong, S.; Hu, J.; Yu, X.; Zhang, H. Molecular image-convolutional neural network (CNN) assisted QSAR models for predicting contaminant reactivity toward OH radicals: Transfer learning, data augmentation and model interpretation. Chem. Eng. J. 2021, 408, 127998. [Google Scholar] [CrossRef]
- Bisoi, A.V.; Shreyas, V.; Siguenza, J.; Ramsundar, B. DeepChem-Variant: A Modular Open Source Framework for Genomic Variant Calling. In Proceedings of the Championing Open-Source Development in ML Workshop@ ICML25, Vancouver, BC, Canada, 13–19 July 2025. [Google Scholar]
- Heid, E.; Greenman, K.P.; Chung, Y.; Li, S.-C.; Graff, D.E.; Vermeire, F.H.; Wu, H.; Green, W.H.; McGill, C.J. Chemprop: A machine learning package for chemical property prediction. J. Chem. Inf. Model. 2023, 64, 9–17. [Google Scholar] [CrossRef]
- Chithrananda, S.; Grand, G.; Ramsundar, B. ChemBERTa: Large-scale self-supervised pretraining for molecular property prediction. arXiv 2020, arXiv:2010.09885. [Google Scholar]
- Li, J.; Jiang, X. Mol-BERT: An Effective Molecular Representation with BERT for Molecular Property Prediction. Wirel. Commun. Mob. Comput. 2021, 2021, 7181815. [Google Scholar] [CrossRef]
- Olivecrona, M.; Blaschke, T.; Engkvist, O.; Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 2017, 9, 48. [Google Scholar] [CrossRef]
- Hajim, W.I.; Zainudin, S.; Daud, K.M.; Alheeti, K. Optimized models and deep learning methods for drug response prediction in cancer treatments: A review. PeerJ Comput. Sci. 2024, 10, e1903. [Google Scholar] [CrossRef]
- Ugurlu, S. Machine Learning Applications in Drug Discovery. ChemRxiv. 2024. [Google Scholar] [CrossRef]
- Gao, K.; Wang, R.; Chen, J.; Cheng, L.; Frishcosy, J.; Huzumi, Y.; Qiu, Y.; Schluckbier, T.; Wei, X.; Wei, G.-W. Methodology-centered review of molecular modeling, simulation, and prediction of SARS-CoV-2. Chem. Rev. 2022, 122, 11287–11368. [Google Scholar] [CrossRef] [PubMed]
- Peng, L.; Wang, F.; Wang, Z.; Tan, J.; Huang, L.; Tian, X.; Liu, G.; Zhou, L. Cell–cell communication inference and analysis in the tumour microenvironments from single-cell transcriptomics: Data resources and computational strategies. Brief. Bioinform. 2022, 23, bbac234. [Google Scholar] [CrossRef]
- Ahmad, A.; Fröhlich, H. Towards clinically more relevant dissection of patient heterogeneity via survival-based Bayesian clustering. Bioinformatics 2017, 33, 3558–3566. [Google Scholar] [CrossRef] [PubMed]
- Kim, H.; Lee, J.; Ahn, S.; Lee, J.R. A merged molecular representation learning for molecular properties prediction with a web-based service. Sci. Rep. 2021, 11, 11028. [Google Scholar] [CrossRef]
- Altae-Tran, H.; Ramsundar, B.; Pappu, A.S.; Pande, V. Low data drug discovery with one-shot learning. ACS Cent. Sci. 2017, 3, 283–293. [Google Scholar] [CrossRef]
- Trott, O.; Olson, A.J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010, 31, 455–461. [Google Scholar]
- Halgren, T.A.; Murphy, R.B.; Friesner, R.A.; Beard, H.S.; Frye, L.L.; Pollard, W.T.; Banks, J.L. Glide: A new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J. Med. Chem. 2004, 47, 1750–1759. [Google Scholar] [CrossRef]
- Verdonk, M.L.; Cole, J.C.; Hartshorn, M.J.; Murray, C.W.; Taylor, R.D. Improved protein–ligand docking using GOLD. Proteins Struct. Funct. Bioinform. 2003, 52, 609–623. [Google Scholar]
- Liu, N.; Xu, Z. In Using LeDock as a docking tool for computational drug design. IOP Conf. Ser. Earth Environ. Sci. 2019, 218, 012143. [Google Scholar] [CrossRef]
- Pagadala, N.S.; Syed, K.; Tuszynski, J. Software for molecular docking: A review. Biophys. Rev. 2017, 9, 91–102. [Google Scholar] [CrossRef] [PubMed]
- Yuriev, E.; Ramsland, P.A. Latest developments in molecular docking: 2010–2011 in review. J. Mol. Recognit. 2013, 26, 215–239. [Google Scholar] [CrossRef]
- Hollingsworth, S.A.; Dror, R.O. Molecular dynamics simulation for all. Neuron 2018, 99, 1129–1143. [Google Scholar] [CrossRef]
- Abraham, M.J.; Murtola, T.; Schulz, R.; Páll, S.; Smith, J.C.; Hess, B.; Lindahl, E. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 2015, 1, 19–25. [Google Scholar] [CrossRef]
- Huang, J.; MacKerell, A.D., Jr. CHARMM36 all-atom additive protein force field: Validation based on comparison to NMR data. J. Comput. Chem. 2013, 34, 2135–2145. [Google Scholar] [CrossRef] [PubMed]
- Phillips, J.C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R.D.; Kale, L.; Schulten, K. Scalable molecular dynamics with NAMD. J. Comput. Chem. 2005, 26, 1781–1802. [Google Scholar] [CrossRef] [PubMed]
- Salomon-Ferrer, R.; Case, D.A.; Walker, R.C. An overview of the Amber biomolecular simulation package. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2013, 3, 198–210. [Google Scholar] [CrossRef]
- Kumari, R.; Kumar, R.; Consortium, O.S.D.D.; Lynn, A. g_mmpbsa—A GROMACS tool for high-throughput MM-PBSA calculations. J. Chem. Inf. Model. 2014, 54, 1951–1962. [Google Scholar] [CrossRef] [PubMed]
- Koirala, M.; Fagerquist, C.K. Binding Free Energy Analysis of Colicin D, E3 and E8 to Their Respective Cognate Immunity Proteins Using Computational Simulations. Molecules 2025, 30, 1277. [Google Scholar] [CrossRef]
- Koirala, M.; DiPaola, M. Targeting CDK9 in Cancer: An Integrated Approach of Combining In Silico Screening with Experimental Validation for Novel Degraders. Curr. Issues Mol. Biol. 2024, 46, 1713–1730. [Google Scholar] [CrossRef]
- Koirala, M.; Alexov, E. Ab-initio binding of barnase–barstar with DelPhiForce steered Molecular Dynamics (DFMD) approach. J. Theor. Comput. Chem. 2020, 19, 2050016. [Google Scholar] [CrossRef]
- Shi, W.; Yang, H.; Xie, L.; Yin, X.-X.; Zhang, Y. A review of machine learning-based methods for predicting drug–target interactions. Health Inf. Sci. Syst. 2024, 12, 30. [Google Scholar] [CrossRef] [PubMed]
- Liu, H.; Hu, B.; Chen, P.; Wang, X.; Wang, H.; Wang, S.; Wang, J.; Lin, B.; Cheng, M. Docking score ML: Target-specific machine learning models improving docking-based virtual screening in 155 targets. J. Chem. Inf. Model. 2024, 64, 5413–5426. [Google Scholar] [CrossRef] [PubMed]
- Lu, S.; He, X.; Yang, Z.; Chai, Z.; Zhou, S.; Wang, J.; Rehman, A.U.; Ni, D.; Pu, J.; Sun, J. Activation pathway of a G protein-coupled receptor uncovers conformational intermediates as targets for allosteric drug design. Nat. Commun. 2021, 12, 4721. [Google Scholar] [CrossRef]
- Zou, Y.; Ma, D.; Wang, Y. The PROTAC technology in drug development. Cell Biochem. Funct. 2019, 37, 21–30. [Google Scholar] [CrossRef]
- Troup, R.I.; Fallan, C.; Baud, M.G. Current strategies for the design of PROTAC linkers: A critical review. Explor. Target. Anti-Tumor Ther. 2020, 1, 273. [Google Scholar] [CrossRef]
- Koirala, M.; DiPaola, M. Overcoming cancer resistance: Strategies and modalities for effective treatment. Biomedicines 2024, 12, 1801. [Google Scholar] [CrossRef] [PubMed]
- Ribes, S.; Nittinger, E.; Tyrchan, C.; Mercado, R. Modeling PROTAC degradation activity with machine learning. Artif. Intell. Life Sci. 2024, 6, 100104, Erratum in Artif. Intell. Life Sci. 2024, 6, 100114. [Google Scholar] [CrossRef]
- Speck-Planche, A.; Scotti, M.T. BET bromodomain inhibitors: Fragment-based in silico design using multi-target QSAR models. Mol. Divers. 2019, 23, 555–572. [Google Scholar] [CrossRef] [PubMed]
- Poongavanam, V.; Kolling, F.; Giese, A.; Goller, A.H.; Lehmann, L.; Meibom, D.; Kihlberg, J. Predictive modeling of PROTAC cell permeability with machine learning. ACS Omega 2023, 8, 5901–5916. [Google Scholar] [CrossRef]
- Jarusiewicz, J.A.; Yoshimura, S.; Mayasundari, A.; Actis, M.; Aggarwal, A.; McGowan, K.; Yang, L.; Li, Y.; Fu, X.; Mishra, V. Phenyl dihydrouracil: An alternative cereblon binder for PROTAC design. ACS Med. Chem. Lett. 2023, 14, 141–145. [Google Scholar] [CrossRef]
- Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
- Tunjic, T.M.; Weber, N.; Brunsteiner, M. Computer aided drug design in the development of proteolysis targeting chimeras. Comput. Struct. Biotechnol. 2023, 21, 2058–2067. [Google Scholar] [CrossRef]
- Wu, L.; Chen, Y.; Shen, K.; Guo, X.; Gao, H.; Li, S.; Pei, J.; Long, B. Graph neural networks for natural language processing: A survey. Found. Trends® Mach. Learn.g 2023, 16, 119–328, Erratum in AI Open 2024, 5, 100001. [Google Scholar] [CrossRef]
- Liu, J.; Roy, M.J.; Isbel, L.; Li, F. Accurate PROTAC-targeted degradation prediction with DegradeMaster. Bioinformatics 2025, 41 (Suppl. S1), i342–i351. [Google Scholar] [CrossRef]
- Abouzied, A.S.; Alshammari, B.; Kari, H.; Huwaimel, B.; Alqarni, S.; Kassab, S.E. AI-DPAPT: A Machine Learning Framework for Predicting PROTAC Activity. Mol. Divers. 2025, 29, 2995–3007. [Google Scholar] [CrossRef]
- Imrie, F.; Bradley, A.R.; van der Schaar, M.; Deane, C.M. Deep generative models for 3D linker design. J. Chem. Inf. Model. 2020, 60, 1983–1995. [Google Scholar] [CrossRef] [PubMed]
- Igashov, I.; Stärk, H.; Vignac, C.; Schneuing, A.; Satorras, V.G.; Frossard, P.; Welling, M.; Bronstein, M.; Correia, B. Equivariant 3D-conditional diffusion model for molecular linker design. Nat. Mach. Intell. 2024, 6, 417–427. [Google Scholar] [CrossRef]
- Li, F.; Hu, Q.; Zhou, Y.; Yang, H.; Bai, F. DiffPROTACs is a deep learning-based generator for proteolysis targeting chimeras. Brief. Bioinform. 2024, 25. [Google Scholar] [CrossRef]
- Mangalathu, S.; Hwang, S.-H.; Jeon, J.-S. Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Eng. Struct. 2020, 219, 110927. [Google Scholar] [CrossRef]
- Ekanayake, I.; Meddage, D.; Rathnayake, U. A novel approach to explain the black-box nature of machine learning in compressive strength predictions of concrete using Shapley additive explanations (SHAP). Case Stud. Constr. Mater. 2022, 16, e01059. [Google Scholar] [CrossRef]
- Xie, L.; Xie, L. Elucidation of genome-wide understudied proteins targeted by PROTAC-induced degradation using interpretable machine learning. PLoS Comput. Biol. 2023, 19, e1010974. [Google Scholar] [CrossRef]
- Yi, J.; Shi, S.; Fu, L.; Yang, Z.; Nie, P.; Lu, A.; Wu, C.; Deng, Y.; Hsieh, C.; Zeng, X. OptADMET: A web-based tool for substructure modifications to improve ADMET properties of lead compounds. Nat. Protoc. 2024, 19, 1105–1121. [Google Scholar] [CrossRef]
- Swanson, K.; Walther, P.; Leitz, J.; Mukherjee, S.; Wu, J.C.; Shivnaraine, R.V.; Zou, J. ADMET-AI: A machine learning ADMET platform for evaluation of large-scale chemical libraries. Bioinformatics 2024, 40, btae416. [Google Scholar] [CrossRef]
- Daoud, N.E.-H.; Borah, P.; Deb, P.K.; Venugopala, K.N.; Hourani, W.; Alzweiri, M.; Bardaweel, S.K.; Tiwari, V. ADMET profiling in drug discovery and development: Perspectives of in silico, in vitro and integrated approaches. Curr. Drug Metab. 2021, 22, 503–522. [Google Scholar] [CrossRef]
- Raju, B.; Verma, H.; Narendra, G.; Sapra, B.; Silakari, O. Multiple machine learning, molecular docking, and ADMET screening approach for identification of selective inhibitors of CYP1B1. J. Biomol. Struct. Dyn. 2022, 40, 7975–7990. [Google Scholar] [CrossRef]
- Abdelwahab, A.A.; Elattar, M.A.; Fawzi, S.A. Advancing ADMET prediction for major CYP450 isoforms: Graph-based models, limitations, and future directions. Biomed. Eng. OnLine 2025, 24, 93. [Google Scholar] [CrossRef]
- Göller, A.H.; Kuhnke, L.; Ter Laak, A.; Meier, K.; Hillisch, A. Machine learning applied to the modeling of pharmacological and ADMET endpoints. Artif. Intell. Drug Des. 2021, 2390, 61–101. [Google Scholar]
- Zonghuang, X. Machine learning-based quantitative structure-activity relationship and ADMET prediction models for erα activity of anti-breast cancer drug candidates. Wuhan Univ. J. Nat. Sci. 2023, 28, 257–270. [Google Scholar]
- Dong, J.; Wang, N.-N.; Yao, Z.-J.; Zhang, L.; Cheng, Y.; Ouyang, D.; Lu, A.-P.; Cao, D.-S. ADMETlab: A platform for systematic ADMET evaluation based on a comprehensively collected ADMET database. J. Cheminform. 2018, 10, 29. [Google Scholar] [CrossRef] [PubMed]
- Pires, D.E.; Blundell, T.L.; Ascher, D.B. pkCSM: Predicting small-molecule pharmacokinetic and toxicity properties using graph-based signatures. J. Med. Chem. 2015, 58, 4066–4072. [Google Scholar] [CrossRef] [PubMed]
- Daina, A.; Michielin, O.; Zoete, V. SwissADME: A free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci. Rep. 2017, 7, 42717. [Google Scholar] [CrossRef] [PubMed]
- Banerjee, P.; Eckert, A.O.; Schrey, A.K.; Preissner, R. ProTox-II: A webserver for the prediction of toxicity of chemicals. Nucleic Acids Res. 2018, 46, W257–W263. [Google Scholar] [CrossRef] [PubMed]
- Martin, T.; Harten, P.; Young, D. TEST (Toxicity Estimation Software Tool); Version 4.1; US Environmental Protection Agency: Washington DC, USA, 2012.
- Benfenati, E.; Manganaro, A.; Gini, G.C. VEGA-QSAR: AI inside a platform for predictive toxicology. CEUR Workshop Proc. 2013, 1107, 21–28. [Google Scholar]
- Cheng, F.; Li, W.; Zhou, Y.; Shen, J.; Wu, Z.; Liu, G.; Lee, P.W.; Tang, Y. admetSAR: A comprehensive source and free tool for assessment of chemical ADMET properties. J. Chem. Inf. Model. 2012, 52, 3099–3105, Correction in J. Chem. Inf. Model. 2019, 59, 4959. [Google Scholar] [CrossRef]
- Ioakimidis, L.; Thoukydidis, L.; Mirza, A.; Naeem, S.; Reynisson, J. Benchmarking the reliability of QikProp. Correlation between experimental and predicted values. QSAR Comb. Sci. 2008, 27, 445–456. [Google Scholar] [CrossRef]
- Advanced Chemistry Development, Inc. Available online: https://www.acdlabs.com (accessed on 11 August 2025).
- Lhasa Limited. DEREK Nexus; Lhasa Limited: Leeds, UK. Available online: https://www.lhasalimited.org (accessed on 11 August 2025).
- BIOVIA Discovery Studio Solutions, Version 2.1; Dassault Systèmes: San Diego, CA, USA. Available online: https://www.3ds.com/products/biovia/discovery-studio (accessed on 11 August 2025).
- ADMET Predictor, Version 12; Simulations Plus, Inc.: Lancaster, CA, USA, 2025. Available online: https://www.businesswire.com (accessed on 11 August 2025).
- StarDrop; Optibrium Ltd.: Cambdrige, UK, 2025. Available online: https://optibrium.com (accessed on 11 August 2025).
- Chemaxon. Available online: https://www.chemaxon.com (accessed on 12 August 2025).
- Patlewicz, G.; Jeliazkova, N.; Safford, R.; Worth, A.; Aleksiev, B. An evaluation of the implementation of the Cramer classification scheme in the Toxtree software. SAR QSAR Environ. Res. 2008, 19, 495–524. [Google Scholar] [CrossRef]
- U.S. Environmental Protection Agency. Toxicity Forecasting (ToxCast). Available online: https://www.epa.gov/comptox-tools/toxicity-forecasting-toxcast (accessed on 11 August 2025).
- Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B.A.; Thiessen, P.A.; Yu, B. PubChem 2023 update. Nucleic Acids Res. 2023, 51, D1373–D1380. [Google Scholar] [CrossRef]
- Mendez, D.; Gaulton, A.; Bento, A.P.; Chambers, J.; De Veij, M.; Félix, E.; Magariños, M.P.; Mosquera, J.F.; Mutowo, P.; Nowotka, M. ChEMBL: Towards direct deposition of bioassay data. Nucleic Acids Res. 2019, 47, D930–D940. [Google Scholar]
- Wishart, D.S.; Feunang, Y.D.; Guo, A.C.; Lo, E.J.; Marcu, A.; Grant, J.R.; Sajed, T.; Johnson, D.; Li, C.; Sayeeda, Z. DrugBank 5.0: A major update to the DrugBank database for 2018. Nucleic Acids Res. 2018, 46, D1074–D1082. [Google Scholar] [CrossRef]
- Lumumba, V.W.; Kiprotich, D.; Lemasulani Mpaine, M.; Grace Makena, N.; Daniel Kavita, M. Comparative analysis of Cross-Validation techniques: LOOCV, K-folds Cross-Validation, and repeated K-folds Cross-Validation in machine learning models. Am. J. Theor. Appl. Stat. 2024, 13, 127–137. [Google Scholar] [CrossRef]
- Gramatica, P. Principles of QSAR modeling: Comments and suggestions from personal experience. Int. J. Quant. Struct.-Prop. Relatsh. (IJQSPR) 2020, 5, 61–97. [Google Scholar] [CrossRef]
- Sahigara, F.; Mansouri, K.; Ballabio, D.; Mauri, A.; Consonni, V.; Todeschini, R. Comparison of different approaches to define the applicability domain of QSAR models. Molecules 2012, 17, 4791–4810. [Google Scholar] [CrossRef] [PubMed]
- Cassotti, M.; Ballabio, D.; Todeschini, R.; Consonni, V. A similarity-based QSAR model for predicting acute toxicity towards the fathead minnow (Pimephales promelas). SAR QSAR Environ. Res. 2015, 26, 217–243. [Google Scholar] [CrossRef]
- Chirico, N.; Gramatica, P. Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection. J. Chem. Inf. Model. 2012, 52, 2044–2058. [Google Scholar] [CrossRef]
- Golbraikh, A.; Tropsha, A. Beware of q2! Mol. Graph. Model. 2002, 20, 269–276. [Google Scholar] [CrossRef]
- Organisation for Economic Co-Operation and Development. Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q) SAR] Models; Organisation for Economic Co-Operation and Development: Paris, France, 2014.
- Vamathevan, J.; Clark, D.; Czodrowski, P.; Dunham, I.; Ferran, E.; Lee, G.; Li, B.; Madabhushi, A.; Shah, P.; Spitzer, M. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 2019, 18, 463–477. [Google Scholar] [CrossRef]
- Bender, A.; Glen, R.C. Molecular similarity: A key technique in molecular informatics. Org. Biomol. Chem. 2004, 2, 3204–3218. [Google Scholar] [CrossRef]
- Fu, X.; Liu, L.; Guan, W.W.; Kalra, Y.; Bao, S.; Kötter, T.; Sturm, K. Advancing replicable and reproducible GIScience: An approach with KNIME. Cartogr. Geogr. Inf. Sci. 2025, 1–21. [Google Scholar] [CrossRef]
- Neves, B.J.; Moreira-Filho, J.T.; Silva, A.C.; Borba, J.V.; Mottin, M.; Alves, V.M.; Braga, R.C.; Muratov, E.N.; Andrade, C.H. Automated framework for developing predictive machine learning models for data-driven drug discovery. J. Braz. Chem. Soc. 2021, 32, 110–122. [Google Scholar] [CrossRef]
- Zdrazil, B.; Felix, E.; Hunter, F.; Manners, E.J.; Blackshaw, J.; Corbett, S.; De Veij, M.; Ioannidis, H.; Lopez, D.M.; Mosquera, J.F. The ChEMBL Database in 2023: A drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res. 2024, 52, D1180–D1192. [Google Scholar] [CrossRef]
- Irwin, J.J.; Shoichet, B.K. ZINC—A free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 2005, 45, 177–182. [Google Scholar] [CrossRef] [PubMed]
- Uzundurukan, A.; Nelson, M.; Teske, C.; Islam, M.S.; Mohamed, E.; Christy, J.V.; Martin, H.-J.; Muratov, E.; Glover, S.; Fuoco, D. Meta-analysis and review of in silico methods in drug discovery—Part 1: Technological evolution and trends from big data to chemical space. Pharmacogenom. J. 2025, 25, 8. [Google Scholar] [CrossRef] [PubMed]
- Gaulton, A.; Bellis, L.J.; Bento, A.P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B. ChEMBL: A large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40, D1100–D1107. [Google Scholar] [CrossRef] [PubMed]
- Vinogradov, V.; Izmailov, I.; Steshin, S.; Nguyen, K.T. Bioptic--A Target-Agnostic Potency-Based Small Molecules Search Engine. arXiv 2024, arXiv:2406.14572. [Google Scholar]
- Ramsundar, B.; Eastman, P.; Walters, P.; Pande, V. Deep Learning for the Life Sciences: Applying Deep Learning to Genomics, Microscopy, Drug Discovery, and More; O’Reilly Media: Sebastopol, CA, USA, 2019. [Google Scholar]
- Nene, L.; Flepisi, B.T.; Brand, S.J.; Basson, C.; Balmith, M. Evolution of drug development and regulatory affairs: The demonstrated power of artificial intelligence. Clin. Ther. 2024, 46, e6–e14. [Google Scholar] [CrossRef]
- Blanco-Gonzalez, A.; Cabezon, A.; Seco-Gonzalez, A.; Conde-Torres, D.; Antelo-Riveiro, P.; Pineiro, A.; Garcia-Fandino, R. The role of AI in drug discovery: Challenges, opportunities, and strategies. Pharmaceuticals 2023, 16, 891. [Google Scholar] [CrossRef]
- Mirakhori, F.; Niazi, S.K. Harnessing the AI/ML in drug and biological products discovery and development: The regulatory perspective. Pharmaceuticals 2025, 18, 47. [Google Scholar] [CrossRef]
- Guideline, I. Assessment and control of DNA reactive (mutagenic) impurities in pharmaceuticals to limit potential carcinogenic risk M7. In Proceedings of the International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH), Geneva, Switzerland, 8–13 November 2014. [Google Scholar]
- Okumoto, A.; Nomura, Y.; Maki, K.; Ogawa, T.; Onodera, H.; Shikano, M.; Okabe, N. Addressing practical issues in the smooth implementation of revised guidelines for non-clinical studies of vaccines for infectious disease prevention. Regul. Toxicol. Pharmacol. 2023, 142, 105413. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Nair, B.; Vavilala, M.S.; Horibe, M.; Eisses, M.J.; Adams, T.; Liston, D.E.; Low, D.K.-W.; Newman, S.-F.; Kim, J. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2018, 2, 749–760. [Google Scholar] [CrossRef] [PubMed]
- Rodríguez-Pérez, R.; Bajorath, J. Interpretation of machine learning models using shapley values: Application to compound potency and multi-target activity predictions. J. Comput.-Aided Mol. Des. 2020, 34, 1013–1026. [Google Scholar] [CrossRef] [PubMed]
- Wilczok, D.; Zhavoronkov, A. Progress, pitfalls, and impact of AI-driven clinical trials. Clin. Pharmacol. Ther. 2025, 117, 887–890. [Google Scholar] [CrossRef] [PubMed]
Name | Type | Key Features |
---|---|---|
ADMETlab (v3.0) | Open-Source | On-line multi-endpoint ADMET & toxicity prediction [112]. |
pkCSM (2015 release) | Open-Source | Graph-based for ADMET classification [113]. |
SwissADME (2017 release) | Open-Source | On-line tool for ADME, physicochemical, and drug-likeness [114]. |
ProTox-II (v2, 2018) | Open-Source | Toxicity endpoints including LD50, hepatotoxicity [115]. |
T.E.S.T. (v5.1.1) | Open-Source | EPA tool for QSAR-based toxicity estimates [116]. |
DeepChem (v2.x) | Open-Source | Python ML/AI library for molecular modeling [2]. |
VEGA QSAR (v1.2.3) | Open-Source | Rule-based QSAR toxicity predictor [117]. |
AdmetSAR (v2.0, 2019) | Open-Source | Predictive model for ADMET endpoints [118]. |
ADMET-AI (2023 release) | Open-Source | ML-based tool for fast and accurate ADMET predictions [106]. |
CtoxPred3 (v3) | Open-Source | In silico prediction of peptide toxicity |
Schrödinger QikProp (v6.2) | Commercial | 50+ ADME properties, integrated in Schrödinger [119]. |
ACD/Percepta (v2023.1) | Commercial | Physicochemical, ADME, and toxicity predictions [120]. |
DEREK Nexus (v6.x) | Commercial | Rule-based toxicology and safety predictions [121]. |
TOPKAT (BIOVIA) (v6.2) | Commercial | QSAR-based toxicity (mutagenicity, carcinogenicity) [122]. |
ADMET Predictor (v11.5) | Commercial | 175+ ADMET properties and metabolism simulation [123]. |
StarDrop (v7.3) | Commercial | ADMET modeling with compound prioritization [124]. |
ChemAxon’s cxcalc (v23.15) | Commercial | Command-line ADME prediction tool [125]. |
ToxTree (v2.6.13) | Commercial | Decision tree-based toxicity analysis [126]. |
Tox21/ToxCast | Large public databases with toxicity screening data for thousands of compounds [127]. |
PubChem BioAssay | ADMET-related assay data and curated datasets for ML model development [128]. |
ChEMBL | Contains bioactivity data including absorption, CYP inhibition, and off-target toxicity [129]. |
DrugBank | Pharmacokinetic and toxicity data for approved drugs, useful for validation and model tuning [130]. |
Tool/Database | Application | Reference |
---|---|---|
RDKit (2019 release) | Molecular descriptor generation, structure processing | [28] |
KNIME (2015 release) | Workflow automation and model building | [49] |
ChEMBL (2017 release) | Bioactivity database for QSAR modeling | [129,142] |
DeepChem (2018 release) | Deep learning platform for cheminformatics | [57] |
ZINC Database (2012 release) | Commercially available compound repository | [143] |
Chemprop (2019 release) | SMILES-based deep learning QSAR modeling | [58] |
Google Colab (2018) | Cloud-based Python notebook environment | [57] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Koirala, M.; Yan, L.; Mohamed, Z.; DiPaola, M. AI-Integrated QSAR Modeling for Enhanced Drug Discovery: From Classical Approaches to Deep Learning and Structural Insight. Int. J. Mol. Sci. 2025, 26, 9384. https://doi.org/10.3390/ijms26199384
Koirala M, Yan L, Mohamed Z, DiPaola M. AI-Integrated QSAR Modeling for Enhanced Drug Discovery: From Classical Approaches to Deep Learning and Structural Insight. International Journal of Molecular Sciences. 2025; 26(19):9384. https://doi.org/10.3390/ijms26199384
Chicago/Turabian StyleKoirala, Mahesh, Lindy Yan, Zoser Mohamed, and Mario DiPaola. 2025. "AI-Integrated QSAR Modeling for Enhanced Drug Discovery: From Classical Approaches to Deep Learning and Structural Insight" International Journal of Molecular Sciences 26, no. 19: 9384. https://doi.org/10.3390/ijms26199384
APA StyleKoirala, M., Yan, L., Mohamed, Z., & DiPaola, M. (2025). AI-Integrated QSAR Modeling for Enhanced Drug Discovery: From Classical Approaches to Deep Learning and Structural Insight. International Journal of Molecular Sciences, 26(19), 9384. https://doi.org/10.3390/ijms26199384