Benchmarking Molecular Mutation Operators for Evolutionary Drug Design
Abstract
1. Introduction
- We provide a reproducible framework comparing five mutation operators (GB-GA, GB-GM, SCC, SF-T, SM-T) on a common FDA-approved seed set from ZINC20, under a standardized mutation budget (1/3/5) and metrics (validity, runtime, bioactivity conservation, complexity, and diversity).
- We generate an actionable selection guide, in which we translate results into practical recommendations. For example, use GB-GA for fast, high validity local optimization; SCC for reaction-based growth with property shifts; SF-T for rapid, high-diversity exploration; GB-GM for data-consistent (often heavier) chemistries; and reserve SM-T for quick prototypes due to low validity.
1.1. Background and Related Work
1.2. A Quick Review of Similar Works on Comparing Algorithms in Molecular Mutations
2. Results
2.1. Computational Requirement and Validity
2.2. Conservation of Relevant Features
2.3. Molecular Complexity
2.4. Molecular Diversity
3. Discussion
4. Materials and Methods
4.1. Mutation of Molecules
- Graph-based genetic algorithm (GB-GA) [62]. GB-GA operates directly on the molecular graph. Primitive edits include adding or removing atoms or bonds, changing atom types or bond orders, and forming/cleaving rings. Each proposed edit is checked against valence and sanitization rules (e.g., with RDKit); invalid offspring are rejected and resampled. This yields high validity and fine control over local changes, at the cost of exploring chemical space more locally around the parent structures.
- Graph-based generative model (GB-GM) [62]. GB-GM learns data-driven probabilities for graph edits (e.g., which atoms or bonds to add next, given the current context). Molecules are expanded or modified by sampling edits from these learned distributions; some implementations couple this with search (e.g., MCTS) and annealing to bias toward better-scoring candidates. We enforce validity via graph chemistry checks. This approach explores realistic, training-distribution-consistent chemistries, but it can drift toward heavier, more complex products.
- SMILESClickChem (SCC) [61]. SCC is a reaction-based operator, as it applies curated SMARTS templates (e.g., “click-like” transformations) to a parent and reagents from a building-block library to produce products. Reaction application, atom mapping, and sanitization ensure chemically sensible outputs; rule filters can enforce property bounds (e.g., Lipinski). SCC often increases molecular weight and heteroatom count while enabling chemically interpretable moves.
- SELFIES Token mutation (SF-T) [56]. SF-T string-level edits (insert/replace/delete) on SELFIES tokens. Because SELFIES encodes valence constraints in the alphabet and grammar, every token sequence decodes to a valid molecule, giving near-100% validity without resampling. Edits can induce larger structural jumps and may bias toward saturated, sp3-rich chemotypes unless constrained.
- SMILES Token mutation (SM-T) [56,72]. SM-T string-level edits on SMILES tokens (insert/replace/delete branches, rings, atom symbols). Fast and simple, but many edited strings are invalid due to unmatched ring indices or parentheses; practical implementations resample up to a cap or validate with RDKit. Variants like DeepSMILES reduce some syntax errors, but validity remains lower than that of SELFIES.
4.2. Failure Modes and Operator-Specific Error Sources
4.3. Computational Requirement and Validity
4.4. Computational Environment
4.5. Conservation of Relevant Features
4.6. Molecular Complexity
- Hann index (HI). This index is another topological descriptor that quantifies the molecular structure based on its connectivity. It helps in understanding the chemical properties and behaviors of the molecule [89].
- Physicochemical Complexity (PCI). The PCI score is a simple composite physicochemical descriptor combining lipophilicity (logP) with hydrogen-bonding features (HBA–HBD). It is not intended as a rigorous measure of synthetic complexity but may provide a crude proxy for functional-group richness and polarity [91].
- Number of heteroatoms (NH). NH is the count of atoms in a molecule that are not carbon or hydrogen, such as nitrogen, oxygen, or sulfur. Heteroatoms can be crucial in molecular interactions and properties [94].
- Wiener index (WI). WI is a graph-theoretical index that reflects molecular connectivity by summing the distances between all pairs of atoms in a molecule. It provides insight into the molecular shape and size [100,101,102]. The WI is calculated as , where n is the number of atoms in the molecule and is the shortest path distance between atoms i and j.
- Bertz index (BI). BI is a topological index used to describe the complexity of a molecule’s structure, accounting for its connectivity and the arrangement of its atoms [103,104,105,106]. The formula for the BI is given by , where n is the number of atoms in the molecule, is the degree of the atom (the number of bonds formed by atom i), and is the logarithm of the factorial of the degree of atom i. The summation runs over all atoms in the molecule, and the logarithm function is typically taken as the natural logarithm.
4.7. Molecular Diversity
- Number of Valence Electrons (NVE). The NVE reflects the total number of electrons in the outermost shell of an atom, which determines how it interacts chemically with other atoms [109].
- Number of Radical Electrons (NRE). This indicates the number of unpaired electrons in a molecule, which can make it highly reactive [110].
- Number of Hydrogen Bond Donors (NumHDonors). This counts the number of hydrogen atoms in a molecule that can form hydrogen bonds, which is crucial for molecular interactions in biological systems [111].
- Number of Hydrogen Bond Acceptors (NumHAcceptors). This counts the number of atoms (such as oxygen or nitrogen) in a molecule that can accept hydrogen bonds, influencing solubility and binding properties [111].
- Number of Rotatable Bonds (NumRotatableBonds). This is the count of single non-ring bonds, excluding terminal bonds, that can freely rotate, affecting the flexibility and conformational diversity of a molecule [112].
- Total Polar Surface Area (TPSA). The TPSA measures the surface area of a molecule occupied by polar atoms, typically oxygen and nitrogen, and indicates a molecule’s ability to interact with water (hydrophilicity) [113]. The TPSA is calculated as , where n is the number of polar atoms in the molecule (typically oxygen and nitrogen) and is the polar surface area of atom i.
- Octanol-Water Partition Coefficient (LogP). The LogP is the logarithm of the partition coefficient between n-octanol and water, providing an estimate of a molecule’s hydrophobicity or lipophilicity, which affects its ability to cross cell membranes [114,115,116]. The LogP is calculated as , where is the concentration of the compound in octanol and is the concentration of the compound in water.
4.8. Distribution Tests
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| ACO | Ant Colony Optimization |
| APC | Article Processing Charge |
| ANN | Artificial Neural Network |
| BI | Bertz Index |
| CADD | Computer-Aided Drug Design |
| ChEMBL | ChEMBL bioactivity database |
| CSP3 | Fraction of sp3-hybridized carbons (RDKit descriptor) |
| DE | Differential Evolution |
| DL | Deep Learning |
| DQN | Deep Q-Network |
| DT | Decision Tree |
| ECFP4 | Extended Connectivity Fingerprint (diameter 4) |
| FDA | U.S. Food and Drug Administration |
| GA | Genetic Algorithm |
| GB-GA | Graph-Based Genetic Algorithm |
| GB-GM | Graph-Based Generative Model |
| GB-GM-MCTS | Graph-Based Generative Model with Monte Carlo Tree Search |
| GNN | Graph Neural Network |
| HI | Hann Index |
| HCF | Fraction of sp3-hybridized carbons (alt. label used in tables) |
| HCR | Fraction of sp3-hybridized carbons |
| IGF1R | Insulin-like Growth Factor 1 Receptor |
| JTVAE | Junction Tree Variational Autoencoder |
| KL | Kullback–Leibler (divergence) |
| LBVS | Ligand-Based Virtual Screening |
| LBDD | Ligand-Based Drug Design |
| LogP | Octanol–Water Partition Coefficient (logarithm) |
| MACCS | Molecular ACCess System (structural keys) |
| MAE | Mean Absolute Error |
| MARS | Markov Molecular Sampling |
| MCMC | Markov Chain Monte Carlo |
| MDP | Markov Decision Process |
| MolLogP | Calculated logP descriptor (RDKit) |
| mTOR | mechanistic Target of Rapamycin |
| MW | Molecular Weight |
| NH | Number of Heteroatoms |
| NHA | Number of Hydrogen-Bond Acceptors |
| NHD | Number of Hydrogen-Bond Donors |
| NMR | Nuclear Magnetic Resonance |
| NRE | Number of Radical Electrons |
| NR | Number of Rings |
| NRB | Number of Rotatable Bonds |
| NVE | Number of Valence Electrons |
| PCA | Principal Component Analysis |
| PCI | Physicochemical Complexity |
| PSO | Particle Swarm Optimization |
| QCF | Fraction of Chiral Carbons |
| QED | Quantitative Estimate of Drug-likeness |
| QSAR | Quantitative Structure–Activity Relationship |
| R2 | Coefficient of Determination |
| RDKit | RDKit cheminformatics toolkit |
| REINVENT | Framework for de novo molecular design (RNN/Transformer + RL) |
| RF | Random Forest |
| RGA | Reinforced Genetic Algorithm |
| RMSE | Root Mean Square Error |
| RNN | Recurrent Neural Network |
| SA | Simulated Annealing |
| SCC | SMILESClickChem |
| SELFIES | Self-Referencing Embedded Strings |
| SF-T | SELFIES Token mutation |
| SM-T | SMILES Token mutation |
| SMILES | Simplified Molecular Input Line Entry System |
| SMARTS | SMILES Arbitrary Target Specification |
| SBDD | Structure-Based Drug Design |
| SRC | Proto-oncogene tyrosine-protein kinase Src |
| SVM | Support Vector Machine |
| TPSA | Topological Polar Surface Area |
| TS | Tabu Search |
| VAE | Variational Autoencoder |
| VS | Virtual Screening |
| WI | Wiener Index |
| ZINC20 | ZINC database (version 20) of purchasable compounds |
References
- Gurung, A.B.; Ali, M.A.; Lee, J.; Farah, M.A.; Al-Anazi, K.M. An Updated Review of Computer-Aided Drug Design and Its Application to COVID-19. BioMed Res. Int. 2021, 2021, 8853056. [Google Scholar] [CrossRef] [PubMed]
- Sun, D.; Gao, W.; Hu, H.; Zhou, S. Why 90% of clinical drug development fails and how to improve it? Acta Pharm. Sin. B 2022, 12, 3049–3062. [Google Scholar] [CrossRef] [PubMed]
- Vemula, D.; Jayasurya, P.; Sushmitha, V.; Kumar, Y.N.; Bhandari, V. CADD, AI and ML in drug discovery: A comprehensive review. Eur. J. Pharm. Sci. 2023, 181, 106324. [Google Scholar] [CrossRef] [PubMed]
- Anderson, A.C. The Process of Structure-Based Drug Design. Chem. Biol. 2003, 10, 787–797. [Google Scholar] [CrossRef]
- Sharma, V.; Wakode, S.; Kumar, H. Chapter 2 - Structure- and ligand-based drug design: Concepts, approaches, and challenges. In Chemoinformatics and Bioinformatics in the Pharmaceutical Sciences; Sharma, N., Ojha, H., Raghav, P.K., k. Goyal, R., Eds.; Academic Press: Cambridge, MA, USA, 2021; pp. 27–53. [Google Scholar] [CrossRef]
- Oliveira, T.; Silva, M.; Maia, E.; Silva, A.; Taranto, A. Virtual Screening Algorithms in Drug Discovery: A Review Focused on Machine and Deep Learning Methods. Drugs Drug Candidates 2023, 2, 311–334. [Google Scholar] [CrossRef]
- Willett, P. Similarity-based virtual screening using 2D fingerprints. Drug Discov. Today 2006, 11, 1046–1053. [Google Scholar] [CrossRef]
- Huang, L.; Luo, H.; Li, S.; Wu, F.X.; Wang, J. Drug–drug similarity measure and its applications. Briefings Bioinform. 2020, 22, bbaa265. [Google Scholar] [CrossRef]
- Eckert, H.; Bajorath, J. Molecular similarity analysis in virtual screening: Foundations, limitations and novel approaches. Drug Discov. Today 2007, 12, 225–233. [Google Scholar] [CrossRef]
- Muratov, E.N.; Bajorath, J.; Sheridan, R.P.; Tetko, I.V.; Filimonov, D.; Poroikov, V.; Oprea, T.I.; Baskin, I.I.; Varnek, A.; Roitberg, A.; et al. QSAR Without Borders. Chem. Soc. Rev. 2020, 49, 3525–3564, Erratum in Chem. Soc. Rev.2020, 49, 3716.. [Google Scholar] [CrossRef]
- Danishuddin, N.; Khan, A.U. Descriptors and their selection methods in QSAR analysis: Paradigm for drug design. Drug Discov. Today 2016, 21, 1291–1302. [Google Scholar] [CrossRef]
- Verma, J.; Khedkar, V.; Coutinho, E. 3D-QSAR in Drug Design—A Review. Curr. Top. Med. Chem. 2010, 10, 95–115. [Google Scholar] [CrossRef]
- Carpenter, K.A.; Huang, X. Machine Learning-based Virtual Screening and Its Applications to Alzheimer’s Drug Discovery: A Review. Curr. Pharm. Des. 2018, 24, 3347–3358. [Google Scholar] [CrossRef] [PubMed]
- Klon, A. Bayesian Modeling in Virtual High Throughput Screening. Comb. Chem. High Throughput Screen. 2009, 12, 469–483. [Google Scholar] [CrossRef] [PubMed]
- Melville, J.; Burke, E.; Hirst, J. Machine Learning in Virtual Screening. Comb. Chem. High Throughput Screen. 2009, 12, 332–343. [Google Scholar] [CrossRef] [PubMed]
- Vyas, R.; Bapat, S.; Jain, E.; Tambe, S.S.; Karthikeyan, M.; Kulkarni, B.D. A Study of Applications of Machine Learning Based Classification Methods for Virtual Screening of Lead Molecules. Comb. Chem. High Throughput Screen. 2015, 18, 658–672. [Google Scholar] [CrossRef]
- Kimber, T.B.; Chen, Y.; Volkamer, A. Deep Learning in Virtual Screening: Recent Applications and Developments. Int. J. Mol. Sci. 2021, 22, 4435. [Google Scholar] [CrossRef]
- Lin, B.; Chavali, S.; Camarda, K.; Miller, D. Computer-aided molecular design using Tabu search. Comput. Chem. Eng. 2005, 29, 337–347. [Google Scholar] [CrossRef]
- Xue, Z.; Sun, C.; Zheng, W.; Lv, J.; Liu, X. TargetSA: Adaptive simulated annealing for target-specific drug design. Bioinformatics 2024, 41, btae730. [Google Scholar] [CrossRef]
- Chen, C.C.; Wang, L.H.; Kao, C.Y.; Ouhyoung, M.; Chen, W.C. Molecular binding in structure-based drug design: A case study of the population-based annealing genetic algorithms. In Proceedings of the Tenth IEEE International Conference on Tools with Artificial Intelligence (Cat. No. 98CH36294), Taipei, Taiwan, 10–12 November 1998; pp. 328–335. [Google Scholar] [CrossRef]
- Maddalena, D.; Snowdon, G. Applications of genetic algorithms to drug design. Expert Opin. Ther. Patents 1997, 7, 247–254. [Google Scholar] [CrossRef]
- Koohi-Moghadam, M.; Rahmani, A.T. Molecular docking with opposition-based differential evolution. In Proceedings of the ACM Symposium on Applied Computing, Riva del Garda, Italy, 25–29 March 2012; pp. 1387–1392. [Google Scholar] [CrossRef]
- Korb, O.; Stützle, T.; Exner, T.E. PLANTS: Application of Ant Colony Optimization to Structure-Based Drug Design. In Proceedings of the Ant Colony Optimization and Swarm Intelligence, Brussels, Belgium, 4–7 September 2006; Dorigo, M., Gambardella, L.M., Birattari, M., Martinoli, A., Poli, R., Stützle, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 247–258. [Google Scholar]
- Cedefto, W.; Agraflotis, D. Particle swarms for drug design. In Proceedings of the 2005 IEEE Congress on Evolutionary Computation, Scotland, UK, 2–5 September 2005; Volume 2, pp. 1218–1225. [Google Scholar] [CrossRef]
- Spiegel, J.O.; Durrant, J.D. AutoGrow4: An open-source genetic algorithm for de novo drug design and lead optimization. J. Cheminform. 2020, 12, 25. [Google Scholar] [CrossRef]
- Morris, G.M.; Goodsell, D.S.; Halliday, R.S.; Huey, R.; Hart, W.E.; Belew, R.K.; Olson, A.J. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J. Comput. Chem. 1998, 19, 1639–1662. [Google Scholar] [CrossRef]
- Sukumar, N.; Prabhu, G.; Saha, P. Applications of Genetic Algorithms in QSAR/QSPR Modeling. In Applications of Metaheuristics in Process Engineering; Valadi, J., Siarry, P., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 315–324. [Google Scholar] [CrossRef]
- Fromer, J.C.; Graff, D.E.; Coley, C.W. Pareto optimization to accelerate multi-objective virtual screening. Digit. Discov. 2024, 3, 467–481. [Google Scholar] [CrossRef]
- Zhou, Z.; Kearnes, S.; Li, L.; Zare, R.N.; Riley, P. Optimization of Molecules via Deep Reinforcement Learning. Sci. Rep. 2019, 9, 10752, Erratum in Sci. Rep.2020, 10, 10478.. [Google Scholar] [CrossRef] [PubMed]
- Gómez-Bombarelli, R.; Wei, J.N.; Duvenaud, D.; Hernández-Lobato, J.M.; Sánchez-Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T.D.; Adams, R.P.; Aspuru-Guzik, A. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Cent. Sci. 2018, 4, 268–276. [Google Scholar] [CrossRef] [PubMed]
- Kerstjens, A.; De Winter, H. LEADD: Lamarckian evolutionary algorithm for de novo drug design. J. Cheminform. 2022, 14, 3. [Google Scholar] [CrossRef]
- Xie, Y.; Shi, C.; Zhou, H.; Yang, Y.; Zhang, W.; Yu, Y.; Li, L. MARS: Markov Molecular Sampling for Multi-Objective Drug Discovery. arXiv 2021, arXiv:2103.10432. [Google Scholar] [CrossRef]
- Jin, W.; Barzilay, R.; Jaakkola, T. Junction Tree Variational Autoencoder for Molecular Graph Generation. arXiv 2018, arXiv:1802.04364. [Google Scholar]
- Jin, W.; Barzilay, R.; Jaakkola, T. Multi-Objective Molecule Generation using Interpretable Substructures. arXiv 2020, arXiv:2002.03244. [Google Scholar] [CrossRef]
- Loeffler, H.H.; He, J.; Tibo, A.; Janet, J.P.; Voronov, A.; Mervin, L.H.; Engkvist, O. Reinvent 4: Modern AI–driven generative molecule design. J. Cheminform. 2024, 16, 20. [Google Scholar] [CrossRef]
- Fu, T.; Gao, W.; Coley, C.W.; Sun, J. Reinforced Genetic Algorithm for Structure-based Drug Design. arXiv 2022, arXiv:2211.16508. [Google Scholar] [CrossRef]
- Lambora, A.; Gupta, K.; Chopra, K. Genetic Algorithm—A Literature Review. In Proceedings of the 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, 14–16 February 2019. [Google Scholar] [CrossRef]
- Ahmad, K.; Almohammadi, K.; Alkafaween, E.; Abunawas, E.; Hammouri, A.M.; Surya, B. Choosing Mutation and Crossover Ratios for Genetic Algorithms—A Review with a New Dynamic Approach. Information 2019, 10, 390. [Google Scholar] [CrossRef]
- Kramer, O. Genetic Algorithms. In Studies in Computational Intelligence; Springer: Berlin/Heidelberg, Germany, 2017; pp. 11–19. [Google Scholar] [CrossRef]
- Supady, A.; Blum, V.; Baldauf, C. First-Principles Molecular Structure Search with a Genetic Algorithm. J. Chem. Inf. Model. 2015, 55, 2338–2348. [Google Scholar] [CrossRef]
- Sen, S.; Bhattacharya, S. Genetic Algorithms in Drug Design: A Not-So-Old Story in a Newer Bottle. In Applications of Metaheuristics in Process Engineering; Valadi, J., Siarry, P., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 325–342. [Google Scholar] [CrossRef]
- Zhao, J.; Shi, R.; Sai, L.; Huang, X.; Su, Y. Comprehensive genetic algorithm for ab initio global optimisation of clusters. Mol. Simul. 2016, 42, 809–819. [Google Scholar] [CrossRef]
- Ozdemir, M.; Embrechts, M.; Arciniegas, F.; Breneman, C.; Lockwood, L.; Bennett, K. Feature selection for in-silico drug design using genetic algorithms and neural networks. In SMCia/01, Proceedings of the 2001 IEEE Mountain Workshop on Soft Computing in Industrial Applications (Cat. No. 01EX504), Blacksburg, VA, USA, 27 June 2001; IEEE: New York, NY, USA; pp. 53–57. [CrossRef]
- Embrechts, M.J.; Ozdemir, M.; Lockwood, L.; Breneman, C.; Bennett, K.; Devogelaere, D.; Rijckaert, M. Chapter 15—Feature Selection Methods Based on Genetic Algorithms for in Silico Drug Design. In Evolutionary Computation in Bioinformatics; Fogel, G.B., Corne, D.W., Eds.; The Morgan Kaufmann Series in Artificial Intelligence; Morgan Kaufmann: San Francisco, CA, USA, 2003; pp. 317–339. [Google Scholar] [CrossRef]
- Hemmateenejad, B.; Miri, R.; Akhond, M.; Shamsipur, M. QSAR study of the calcium channel antagonist activity of some recently synthesized dihydropyridine derivatives. An application of genetic algorithm for variable selection in MLR and PLS methods. Chemom. Intell. Lab. Syst. 2002, 64, 91–99. [Google Scholar] [CrossRef]
- Hoffman, B.T.; Kopajtic, T.; Katz, J.L.; Newman, A.H. 2D QSAR Modeling and Preliminary Database Searching for Dopamine Transporter Inhibitors Using Genetic Algorithm Variable Selection of Molconn Z Descriptors. J. Med. Chem. 2000, 43, 4151–4159. [Google Scholar] [CrossRef] [PubMed]
- Bilodeau, C.; Jin, W.; Jaakkola, T.; Barzilay, R.; Jensen, K.F. Generative Models for Molecular Discovery: Recent Advances and Challenges. WIREs Comput. Mol. Sci. 2022, 12, e1608. [Google Scholar] [CrossRef]
- Daina, A.; Michielin, O.; Zoete, V. SwissADME: A free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci. Rep. 2017, 7, 42717. [Google Scholar] [CrossRef]
- Polykovskiy, D.; Zhebrak, A.; Sanchez-Lengeling, B.; Golovanov, S.; Tatanov, O.; Belyaev, S.; Kurbanov, R.; Artamonov, A.; Aladinskiy, V.; Veselov, M.; et al. Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. Front. Pharmacol. 2020, 11, 565644. [Google Scholar] [CrossRef]
- Veber, D.F.; Johnson, S.R.; Cheng, H.Y.; Smith, B.R.; Ward, K.W.; Kopple, K.D. Molecular properties that influence the oral bioavailability of drug candidates. J. Med. Chem. 2002, 45, 2615–2623. [Google Scholar] [CrossRef]
- Alhijawi, B.; Awajan, A. Genetic Algorithms: Theory, Genetic Operators, Solutions, and Applications. Evol. Intell. 2024, 17, 1245–1256. [Google Scholar] [CrossRef]
- Abdoun, O.; Abouchabaka, J.; Tajani, C. Analyzing the Performance of Mutation Operators to Solve the Travelling Salesman Problem. Int. J. Emerg. Sci. (IJES) 2012, 2, 61–77. [Google Scholar] [CrossRef]
- Brown, N.; Fiscato, M.; Segler, M.H.S.; Vaucher, A.C. GuacaMol: Benchmarking Models for de Novo Molecular Design. J. Chem. Inf. Model. 2019, 59, 1003–1012. [Google Scholar] [CrossRef] [PubMed]
- Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31–36. [Google Scholar] [CrossRef]
- Li, Z.; Jiang, M.; Wang, S.; Zhang, S. Deep learning methods for molecular representation and property prediction. Drug Discov. Today 2022, 27, 103373. [Google Scholar] [CrossRef]
- Krenn, M.; Häse, F.; Nigam, A.K.; Friederich, P. Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation. Mach. Learn. Sci. Technol. 2020, 1, 045024. [Google Scholar] [CrossRef]
- Kuwahara, H.; Gao, X. Analysis of the effects of related fingerprints on molecular similarity using an eigenvalue entropy approach. J. Cheminform. 2021, 13, 27. [Google Scholar] [CrossRef]
- Pattanaik, L.; Coley, C.W. Molecular Representation: Going Long on Fingerprints. Chem 2020, 6, 1204–1207. [Google Scholar] [CrossRef]
- Kwon, Y.; Kang, S.; Choi, Y.S.; Kim, I. Evolutionary design of molecules based on deep learning and a genetic algorithm. Sci. Rep. 2021, 11, 17304. [Google Scholar] [CrossRef]
- Spiegel, J.O.; Durrant, J.D. SMILESMerge: An open-source program for automated de novo ligand design using a crossover method. Zenodo 2020. [Google Scholar] [CrossRef]
- Spiegel, J.O.; Durrant, J.D. SMILESClickChem: An open-source program for automated de novo ligand design using in silico reactions. Zenodo 2020. [Google Scholar] [CrossRef]
- Jensen, J.H. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space. Chem. Sci. 2019, 10, 3567–3572. [Google Scholar] [CrossRef]
- Steinmann, C.; Rarey, M.; Wetzel, H. Flux (2): Comparison of Molecular Mutation and Crossover Operators for Ligand-Based de Novo Design. J. Chem. Inf. Model. 2007, 47, 426–435. [Google Scholar] [CrossRef]
- Zhu, J.F.; Zhu, J.Y.; Yi, J.F.; Yin, J.P.; Liu, Q. Towards Exploring Large Molecular Space: An Efficient Chemical Genetic Algorithm. J. Comput. Sci. Technol. 2022, 37, 1186–1205. [Google Scholar] [CrossRef]
- Prentis, L.E.; Singleton, C.D.; Bickel, J.D.; Allen, W.J.; Rizzo, R.C. A molecular evolution algorithm for ligand design in DOCK. J. Comput. Chem. 2022, 43, 1942–1963. [Google Scholar] [CrossRef]
- Nigam, A.; Pollice, R.; Krenn, M.; dos Passos Gomes, G.; Aspuru-Guzik, A. Beyond generative models: Superfast traversal, optimization, novelty, exploration and discovery (STONED) algorithm for molecules using SELFIES. Chem. Sci. 2021, 12, 7079–7090. [Google Scholar] [CrossRef]
- Sushko, Y.; Yamashita, M.; Takagi, K.; Okuno, Y.; Schneider, G. De novo design of drug-like molecules by a fragment-based molecular evolutionary approach. J. Mol. Graph. Model. 2014, 47, 10–18. [Google Scholar] [CrossRef]
- Katoch, S.; Chauhan, S.S.; Kumar, V. A review on genetic algorithm: Past, present, and future. Multimed. Tools Appl. 2021, 80, 8091–8126. [Google Scholar] [CrossRef] [PubMed]
- Yi, J.; Shi, S.; Fu, L.; Yang, Z.; Nie, P.; Lu, A.; Wu, C.; Deng, Y.; Hsieh, C.; Zeng, X.; et al. OptADMET: A web-based tool for substructure modifications to improve ADMET properties of lead compounds. Nat. Protoc. 2024, 19, 1105–1121. [Google Scholar] [CrossRef] [PubMed]
- Mouchlis, V.D.; Afantitis, A.; Serra, A.; Fratello, M.; Papadiamantis, A.G.; Aidinis, V.; Lynch, I.; Greco, D.; Melagraki, G. Advances in De Novo Drug Design: From Conventional to Machine Learning Methods. Int. J. Mol. Sci. 2021, 22, 1676. [Google Scholar] [CrossRef] [PubMed]
- Irwin, J.J.; Tang, K.G.; Young, J.; Dandarchuluun, C.; Wong, B.R.; Khurelbaatar, M.; Moroz, Y.S.; Mayfield, J.; Sayle, R.A. ZINC20—A Free Ultralarge-Scale Chemical Database for Ligand Discovery. J. Chem. Inf. Model. 2020, 60, 6065–6073. [Google Scholar] [CrossRef]
- O’Boyle, N.; Dalke, A. DeepSMILES: An Adaptation of SMILES for Use in Machine-Learning of Chemical Structures. ChemRxiv 2018. [Google Scholar] [CrossRef]
- Landrum, G.; Tosco, P.; Kelley, B.; Ric, G.; Cosgrove, D.; Sriniker, S. rdkit/rdkit: 2023_09_4 (Q3 2023) Release (Release_2023_09_4). Zenodo 2024. [Google Scholar] [CrossRef]
- Mazuz, E.; Shtar, G.; Shapira, B.; Rokach, L. Molecule generation using transformers and policy gradient reinforcement learning. Sci. Rep. 2023, 13, 8799. [Google Scholar] [CrossRef] [PubMed]
- Chandrabose, S.; Kumar, T.S.; Konda, R.K.; Singh, S.K. Tool Development for Prediction of pIC50 Values from the IC50 Values—A pIC50 Value Calculator. Curr. Trends Biotechnol. Pharm. 2011, 5, 1104–1109. [Google Scholar]
- Zdrazil, B.; Felix, E.; Hunter, F.; Manners, E.J.; Blackshaw, J.; Corbett, S.; de Veij, M.; Ioannidis, H.; Lopez, D.M.; Mosquera, J.F.; et al. The ChEMBL Database in 2023: A drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res. 2024, 52, D1180–D1192. [Google Scholar] [CrossRef]
- Martellucci, S.; Clementi, L.; Sabetta, S.; Mattei, V.; Botta, L.; Angelucci, A. Src Family Kinases as Therapeutic Targets in Advanced Solid Tumors: What We Have Learned So Far. Cancers 2020, 12, 1448. [Google Scholar] [CrossRef]
- Soni, U.K.; Jenny, L.; Hegde, R.S. IGF-1R Targeting in Cancer—Does Sub-Cellular Localization Matter? J. Exp. Clin. Cancer Res. 2023, 42, 273. [Google Scholar] [CrossRef]
- Wang, P.; Mak, V.C.; Cheung, L.W. Drugging IGF-1R in Cancer: New Insights and Emerging Opportunities. Genes Dis. 2023, 10, 199–211. [Google Scholar] [CrossRef]
- Tian, T.; Li, X.; Zhang, J. mTOR Signaling in Cancer and mTOR Inhibitors in Solid Tumor Targeting Therapy. Int. J. Mol. Sci. 2019, 20, 755. [Google Scholar] [CrossRef]
- Hua, H.; Kong, Q.; Zhang, H.; Wang, J.; Luo, T.; Jiang, Y. Targeting mTOR for Cancer Therapy. J. Hematol. Oncol. 2019, 12, 71. [Google Scholar] [CrossRef]
- Slocombe, L.; Walker, S.I. Measuring Molecular Complexity. ACS Cent. Sci. 2024, 10, 949–952. [Google Scholar] [CrossRef]
- Saldívar-González, F.I.; Medina-Franco, J.L. Chemoinformatics approaches to assess chemical diversity and complexity of small molecules. In Small Molecule Drug Discovery; Elsevier: Amsterdam, The Netherlands, 2020; pp. 83–102. [Google Scholar] [CrossRef]
- Bickerton, G.R.; Paolini, G.V.; Besnard, J.; Muresan, S.; Hopkins, A.L. Quantifying the Chemical Beauty of Drugs. Nat. Chem. 2012, 4, 90–98. [Google Scholar] [CrossRef] [PubMed]
- Skoraczyński, G.; Kitlas, M.; Miasojedow, B.; Gambin, A. Critical assessment of synthetic accessibility scores in computer-assisted synthesis planning. J. Cheminform. 2023, 15, 6. [Google Scholar] [CrossRef] [PubMed]
- Shearer, J.; Castro, J.L.; Lawson, A.D.G.; MacCoss, M.; Taylor, R.D. Rings in Clinical Trials and Drugs: Present and Future. J. Med. Chem. 2022, 65, 8407–8421. [Google Scholar] [CrossRef] [PubMed]
- Ward, S.E.; Beswick, P. What does the aromatic ring number mean for drug design? Expert Opin. Drug Discov. 2014, 9, 995–1003. [Google Scholar] [CrossRef]
- Nilakantan, R.; Bauman, N.; Haraki, K.S.; Venkataraghavan, R. A ring-based chemical structural query system: Use of a novel ring-complexity heuristic. J. Chem. Inf. Comput. Sci. 1990, 30, 50–57. [Google Scholar] [CrossRef]
- Hann, M.M.; Leach, A.R.; Harper, G. Molecular Complexity and Its Impact on the Probability of Finding Leads for Drug Discovery. J. Chem. Inf. Comput. Sci. 2001, 41, 856–864. [Google Scholar] [CrossRef]
- Ursu, O.; Rayan, A.; Goldblum, A.; Oprea, T.I. Understanding Drug-Likeness. WIREs Comput. Mol. Sci. 2011, 1, 760–781. [Google Scholar] [CrossRef]
- Ertl, P.; Schuffenhauer, A. Estimation of Synthetic Accessibility Score of Drug-like Molecules Based on Molecular Complexity and Fragment Contributions. J. Cheminform. 2009, 1, 8. [Google Scholar] [CrossRef]
- Editors, E.B. Molecular Weight. Encyclopedia Britannica. 2024. Available online: https://www.britannica.com/science/molecular-weight (accessed on 29 November 2025).
- Mullard, A. Re-assessing the rule of 5, two decades on. Nat. Rev. Drug Discov. 2018, 17, 777. [Google Scholar] [CrossRef]
- Böttcher, T. An Additive Definition of Molecular Complexity. J. Chem. Inf. Model. 2016, 56, 462–470. [Google Scholar] [CrossRef]
- Jacobs, A. The effect of heteroatoms. In Understanding Organic Reaction Mechanisms; Cambridge University Press: Cambridge, UK, 1997; pp. 115–157. [Google Scholar]
- Méndez-Lucio, O.; Medina-Franco, J.L. The many roles of molecular complexity in drug discovery. Drug Discov. Today 2017, 22, 120–126. [Google Scholar] [CrossRef]
- von Korff, M.; Sander, T. Molecular Complexity Calculated by Fractal Dimension. Sci. Rep. 2019, 9, 967. [Google Scholar] [CrossRef]
- Wei, W.; Cherukupalli, S.; Jing, L.; Liu, X.; Zhan, P. Fsp3: A new parameter for drug-likeness. Drug Discov. Today 2020, 25, 1839–1845. [Google Scholar] [CrossRef]
- Meyers, J.; Carter, M.; Mok, N.Y.; Brown, N. On the Origins of Three-Dimensionality in Drug-Like Molecules. Future Med. Chem. 2016, 8, 1753–1767. [Google Scholar] [CrossRef] [PubMed]
- Bi, B.; Jamil, M.K.; Fahd, K.M.; Sun, T.L.; Ahmad, I.; Ding, L. Algorithms for Computing Wiener Indices of Acyclic and Unicyclic Graphs. Complexity 2021, 2021, 6663306. [Google Scholar] [CrossRef]
- García, G.C.; Ruiz, I.L.; Ángel Gómez-Nieto, M.; Doncel, J.A.C.; Plaza, A.G. From Wiener Index to Molecules. J. Chem. Inf. Model. 2005, 45, 555–563. [Google Scholar] [CrossRef]
- Bonchev, D. The Overall Wiener Index: A New Tool for Characterization of Molecular Topology. J. Chem. Inf. Comput. Sci. 2001, 41, 527–535. [Google Scholar] [CrossRef]
- Bertz, S.H. On the Complexity of Graphs and Molecules. Bull. Math. Biol. 1983, 45, 849–855. [Google Scholar] [CrossRef]
- Nikolić, S.; Trinajstić, N.; Tolić, I.M. Complexity of Molecules. J. Chem. Inf. Comput. Sci. 2000, 40, 1234–1245. [Google Scholar] [CrossRef]
- Bertz, S.H. Branching in Graphs and Molecules. J. Mol. Graph. 1988, 6, 2–12. [Google Scholar] [CrossRef]
- Bertz, S.H. The First General Index of Molecular Complexity. J. Am. Chem. Soc. 1981, 103, 3599–3601. [Google Scholar] [CrossRef]
- Moriwaki, H.; Tian, Y.S.; Kawashita, N.; Takagi, T. Mordred: A molecular descriptor calculator. J. Cheminform. 2018, 10, 4. [Google Scholar] [CrossRef]
- Jolliffe, I. Principal Component Analysis. In International Encyclopedia of Statistical Science; Lovric, M., Ed.; Springer: Berlin/ Heidelberg, Germany, 2011; Chapter 455. [Google Scholar] [CrossRef]
- Britannica Editors. Valence. Encyclopedia Britannica. 2024. Available online: https://www.britannica.com/science/valence-chemistry (accessed on 14 August 2024).
- Walling, C.T. Radical. Encyclopedia Britannica. 2024. Available online: https://www.britannica.com/science/radical-chemistry (accessed on 14 August 2024).
- Kenny, P.W. Hydrogen-Bond Donors in Drug Design. J. Med. Chem. 2022, 65, 14261–14275. [Google Scholar] [CrossRef]
- Benet, L.Z.; Hosey, C.M.; Ursu, O.; Oprea, T.I. BDDCS, the Rule of 5 and Drugability. Adv. Drug Deliv. Rev. 2016, 101, 89–98. [Google Scholar] [CrossRef]
- Prasanna, S.; Doerksen, R.J. Topological Polar Surface Area: A Useful Descriptor in 2D-QSAR. Curr. Med. Chem. 2009, 16, 21–41. [Google Scholar] [CrossRef]
- Martin, Y.C. How medicinal chemists learned about log P. J.-Comput.-Aided Mol. Des. 2018, 32, 809–819. [Google Scholar] [CrossRef]
- Ulrich, N.; Goss, K.U.; Ebert, A. Exploring the octanol–water partition coefficient dataset using deep learning techniques and data augmentation. Commun. Chem. 2021, 4, 90. [Google Scholar] [CrossRef]
- Daina, A.; Michielin, O.; Zoete, V. iLOGP: A Simple, Robust, and Efficient Description of n-Octanol/Water Partition Coefficient for Drug Design Using the GB/SA Approach. J. Chem. Inf. Model. 2014, 54, 3284–3301. [Google Scholar] [CrossRef] [PubMed]















| Algorithm | Core Idea | Strengths | Shortcomings |
|---|---|---|---|
| MolDQN | MDP formulation with Deep Q-Learning and chemically valid edit actions. | 100% validity, no pretraining, strong multi-objective control. | High computational cost, requires multiple Q-functions, reward-design sensitive. |
| MARS | Fragment-level edits guided by an adaptive proposal network and annealed MCMC. | High novelty/diversity, no labeled data required. | Sampling can drift, dependent on proposal quality, slow convergence. |
| JTVAE | Two-stage generation: scaffold via junction tree, then graph assembly. | Valid substructure assembly, strong latent-space optimization. | Complex architecture, requires predefined vocabularies, heavy training. |
| RationaleRL | RL + MCTS to assemble molecules from interpretable rationales. | Interpretable, good multi-objective balance. | MCTS is expensive, relies on rationale extraction, limited chemical space. |
| REINVENT | SMILES-based generative RL (RNN/Transformer) with curriculum learning. | Flexible, supports scaffold hopping and optimization. | SMILES syntax errors, diversity collapse risk, chemistry not explicit. |
| RGA | Neural-network-guided GA operations using protein–ligand structure data. | Improved docking scores, robust exploration. | Requires large structure datasets, expensive, may produce non-synthesizable offspring. |
| Classical GA | Crossover/mutation on graphs, strings, or fingerprints. | General-purpose, good global search ability. | Validity not guaranteed, heavily representation-dependent. |
| ECFP-GA | GA on fingerprints, decoding via RNNs. | Useful for similarity/QSAR tasks. | Fingerprint decoding unreliable, structural info lost, validity issues. |
| SCC | Reaction-based mutations using SMARTS templates. | Chemically interpretable, rule-based transformations. | Tends to increase size/complexity, limited by template library. |
| GB-GA | Graph edits with chemical sanitization (atom/bond changes, ring edits). | High validity, precise local modifications. | Local bias, slower due to sanitization, limited large-scale edits. |
| GB-GM | Data-driven probabilities for graph edits learned from chemical datasets. | Realistic, distribution-consistent molecules. | Tends toward heavier molecules, training-data-biased, slow. |
| SF-T | Token edits on SELFIES with guaranteed validity. | Near-100% validity, simple, large exploratory jumps. | Overly saturated/sp3-rich outputs, unstable property changes. |
| SM-T | Token edits on SMILES (insert/replace/delete). | Very fast, easy to implement. | Low validity, many syntax errors, high resampling needed. |
| Method | Mutants | Validity (%) | Molecules | Time (s) | mol/s |
|---|---|---|---|---|---|
| GB-GA | 1 | 97.2 | 486 | 145.2 | 3.3 |
| 3 | 96.4 | 1446 | 142.4 | 10.2 | |
| 5 | 95.8 | 2396 | 147.0 | 16.3 | |
| Average | 96.5 | 1442.7 | 144.9 | 9.9 | |
| GB-GM | 1 | 87.2 | 436 | 1978.2 | 0.2 |
| 3 | 83.0 | 1245 | 3219.8 | 0.4 | |
| 5 | 79.8 | 1995 | 3939.1 | 0.5 | |
| Average | 83.3 | 1225.3 | 3045.7 | 0.4 | |
| SCC | 1 | 82.0 | 410 | 517.6 | 0.8 |
| 3 | 81.4 | 1221 | 565.0 | 2.2 | |
| 5 | 81.4 | 2035 | 591.4 | 3.4 | |
| Average | 81.6 | 1222.0 | 558.0 | 2.1 | |
| SF-T | 1 | 88.6 | 443 | 87.7 | 5.1 |
| 3 | 79.7 | 1195 | 93.3 | 12.8 | |
| 5 | 72.9 | 1822 | 96.2 | 18.9 | |
| Average | 80.4 | 1153.3 | 92.4 | 12.3 | |
| SM-T | 1 | 40.0 | 200 | 84,724.9 | |
| 3 | 29.9 | 448 | 22,562.7 | ||
| 5 | 21.9 | 547 | 49,837.8 | ||
| Average | 30.6 | 398.3 | 52,375.1 |
| Method | Mutants | SRC | IGF1R | mTOR | Average |
|---|---|---|---|---|---|
| GB-GA | 1 | 0.298 | 0.168 | 0.157 | 0.208 |
| 3 | 0.304 | 0.171 | 0.152 | 0.209 | |
| 5 | 0.298 | 0.170 | 0.166 | 0.212 | |
| Average | 0.300 | 0.170 | 0.158 | 0.209 | |
| GB-GM | 1 | 0.448 | 0.210 | 0.217 | 0.292 |
| 3 | 0.377 | 0.227 | 0.212 | 0.272 | |
| 5 | 0.428 | 0.220 | 0.206 | 0.285 | |
| Average | 0.418 | 0.219 | 0.212 | 0.283 | |
| SCC | 1 | 0.466 | 0.228 | 0.249 | 0.314 |
| 3 | 0.476 | 0.237 | 0.247 | 0.320 | |
| 5 | 0.452 | 0.227 | 0.241 | 0.307 | |
| Average | 0.465 | 0.231 | 0.246 | 0.314 | |
| SF-T | 1 | 0.539 | 0.151 | 0.135 | 0.275 |
| 3 | 0.613 | 0.210 | 0.189 | 0.337 | |
| 5 | 0.689 | 0.230 | 0.216 | 0.378 | |
| Average | 0.613 | 0.197 | 0.180 | 0.330 | |
| SM-T | 1 | 0.027 | 0.015 | 0.007 | 0.016 |
| 3 | 0.014 | 0.013 | 0.011 | 0.012 | |
| 5 | 0.024 | 0.017 | 0.016 | 0.019 | |
| Average | 0.022 | 0.015 | 0.011 | 0.016 | |
| Average | 0.363 | 0.166 | 0.162 | 0.230 |
| Method | SRC | IGF1R | mTOR | Average |
|---|---|---|---|---|
| GB-GA | 0.014 | 0.002 | 0.002 | 0.006 |
| GB-GM | 0.021 | 0.004 | 0.003 | 0.009 |
| SCC | 0.025 | 0.003 | 0.003 | 0.010 |
| SF-T | 0.043 | 0.003 | 0.003 | 0.016 |
| SM-T | 0.001 | 0.000 | 0.000 | 0.000 |
| Average | 0.021 | 0.002 | 0.002 | 0.008 |
| Method | Mutants | MW | NR | NH | QCF | HCF |
|---|---|---|---|---|---|---|
| GB-GA | 1 | 11.023 | 0.286 | 0.319 | 0.048 | 0.048 |
| 3 | 11.412 | 0.273 | 0.349 | 0.043 | 0.043 | |
| 5 | 11.693 | 0.280 | 0.341 | 0.043 | 0.043 | |
| Average | 11.376 | 0.280 | 0.337 | 0.045 | 0.045 | |
| GB-GM | 1 | 49.089 | 0.516 | 0.828 | 0.063 | 0.063 |
| 3 | 45.883 | 0.500 | 0.841 | 0.061 | 0.061 | |
| 5 | 47.850 | 0.516 | 0.845 | 0.062 | 0.062 | |
| Average | 47.607 | 0.510 | 0.838 | 0.062 | 0.062 | |
| SCC | 1 | 111.204 | 0.824 | 2.346 | 0.105 | 0.105 |
| 3 | 114.642 | 0.880 | 2.403 | 0.106 | 0.106 | |
| 5 | 114.461 | 0.843 | 2.410 | 0.110 | 0.110 | |
| Average | 113.436 | 0.849 | 2.387 | 0.107 | 0.107 | |
| SF-T | 1 | 64.978 | 0.648 | 1.364 | 0.103 | 0.103 |
| 3 | 58.952 | 0.770 | 1.165 | 0.095 | 0.095 | |
| 5 | 68.837 | 0.908 | 1.397 | 0.108 | 0.108 | |
| Average | 64.256 | 0.775 | 1.309 | 0.102 | 0.102 | |
| SM-T | 1 | 0.509 | 0.000 | 0.060 | 0.004 | 0.004 |
| 3 | 0.726 | 0.000 | 0.056 | 0.004 | 0.004 | |
| 5 | 1.435 | 0.002 | 0.115 | 0.008 | 0.008 | |
| Average | 0.890 | 0.001 | 0.077 | 0.005 | 0.005 |
| Method | Mutants | BI | HI | WI | QED | PCI |
|---|---|---|---|---|---|---|
| GB-GA | 1 | 46.889 | 0.187 | 183.570 | 0.079 | 0.681 |
| 3 | 46.659 | 0.187 | 169.972 | 0.080 | 0.648 | |
| 5 | 47.128 | 0.185 | 178.343 | 0.079 | 0.666 | |
| Average | 46.892 | 0.186 | 177.295 | 0.079 | 0.665 | |
| GB-GM | 1 | 140.972 | 0.301 | 923.521 | 0.081 | 0.972 |
| 3 | 139.074 | 0.320 | 768.611 | 0.081 | 0.908 | |
| 5 | 149.420 | 0.332 | 808.420 | 0.081 | 0.945 | |
| Average | 143.155 | 0.318 | 833.517 | 0.081 | 0.942 | |
| SCC | 1 | 307.516 | 0.764 | 1487.393 | 0.180 | 2.841 |
| 3 | 323.713 | 0.833 | 1585.059 | 0.187 | 2.789 | |
| 5 | 319.383 | 0.813 | 1566.556 | 0.186 | 2.792 | |
| Average | 316.871 | 0.803 | 1546.336 | 0.184 | 2.807 | |
| SF-T | 1 | 90.544 | 0.202 | 332.273 | 0.073 | 0.768 |
| 3 | 153.008 | 0.335 | 568.392 | 0.116 | 1.212 | |
| 5 | 197.828 | 0.434 | 722.413 | 0.128 | 1.480 | |
| Average | 147.127 | 0.324 | 541.026 | 0.106 | 1.153 | |
| SM-T | 1 | 0.603 | 0.014 | 3.995 | 0.004 | 0.083 |
| 3 | 1.340 | 0.012 | 7.634 | 0.006 | 0.078 | |
| 5 | 2.405 | 0.028 | 14.333 | 0.011 | 0.153 | |
| Average | 1.449 | 0.018 | 8.654 | 0.007 | 0.104 |
| Methods | MW | NR | NH | QCF | HCF | BI | WI | QED |
|---|---|---|---|---|---|---|---|---|
| GB-GA | 0.002 | 0.143 | 0.007 | 0.134 | 0.134 | 0.006 | 0.006 | 0.031 |
| GB-GM | 0.013 | 0.363 | 0.026 | 0.235 | 0.235 | 0.036 | 0.084 | 0.040 |
| SCC | 0.044 | 0.538 | 0.070 | 0.374 | 0.374 | 0.108 | 0.228 | 0.098 |
| SF-T | 0.124 | 0.372 | 0.132 | 0.314 | 0.314 | 0.173 | 0.225 | 0.091 |
| SM-T | 0.000 | 0.000 | 0.002 | 0.002 | 0.002 | 0.000 | 0.000 | 0.003 |
| Descriptor | GB_GA | GB_GM | SCC | SF-T | SM-T |
|---|---|---|---|---|---|
| MolWt | 0.002 | 0.012 | 0.044 | 0.094 | 0.000 |
| NumValenceElectrons | 0.001 | 0.012 | 0.043 | 0.089 | 0.000 |
| TPSA | 0.018 | 0.058 | 0.169 | 0.132 | 0.004 |
| NumHDonors | 0.347 | 0.643 | 1.209 | 0.697 | 0.123 |
| NumHAcceptors | 0.023 | 0.046 | 0.178 | 0.134 | 0.007 |
| NumRotatableBonds | 0.302 | 0.142 | 0.649 | 0.530 | 0.012 |
| FractionCSP3 | 0.134 | 0.234 | 0.374 | 0.282 | 0.002 |
| Speed ↑ | Validity ↓ | pIC50 ↓ | Com. Dec. ↓ | Com. Ind. ↓ | Diversity ↓ | |
|---|---|---|---|---|---|---|
| GB-GA | 3 | 1 | 4 ↑ | 4 ↑ | 4 ↓ | 4 |
| GB-GM | 5 | 2 | 3 ↑ | 3 ↑ | 3 ↑ | 3 |
| SCC | 4 | 3 | 1 ↑ | 1 ↑ | 1 ↑ | 2 |
| SF-T | 2 | 4 | 3 ↓ | 2 ↓ | 2 ↓ | 1 |
| SM-T | 1 | 5 | 5 ↑ | 5 ↑ | 5 ↑ | 5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Acosta Murillo, R.; Zapata-Morin, P.A.; Ortiz-Bayliss, J.C. Benchmarking Molecular Mutation Operators for Evolutionary Drug Design. Int. J. Mol. Sci. 2025, 26, 11685. https://doi.org/10.3390/ijms262311685
Acosta Murillo R, Zapata-Morin PA, Ortiz-Bayliss JC. Benchmarking Molecular Mutation Operators for Evolutionary Drug Design. International Journal of Molecular Sciences. 2025; 26(23):11685. https://doi.org/10.3390/ijms262311685
Chicago/Turabian StyleAcosta Murillo, Raúl, Patricio Adrián Zapata-Morin, and José Carlos Ortiz-Bayliss. 2025. "Benchmarking Molecular Mutation Operators for Evolutionary Drug Design" International Journal of Molecular Sciences 26, no. 23: 11685. https://doi.org/10.3390/ijms262311685
APA StyleAcosta Murillo, R., Zapata-Morin, P. A., & Ortiz-Bayliss, J. C. (2025). Benchmarking Molecular Mutation Operators for Evolutionary Drug Design. International Journal of Molecular Sciences, 26(23), 11685. https://doi.org/10.3390/ijms262311685

