Design and Research of a Dual-Target Drug Molecular Generation Model Based on Reinforcement Learning
Abstract
1. Introduction
1.1. Research Background and Current State of the Field
1.2. Research Gaps and Limitations
1.3. Research Objectives and Paper Organization
2. Materials and Methods
2.1. SFG-Drug Model Architecture Framework
2.2. Construction of Fragment Molecular Library
2.2.1. Dataset Introduction
2.2.2. Molecular Fragment Segmentation
2.2.3. Molecular Fragment Library Screening
2.3. Monte Carlo Tree Search Molecular Generation
- (1)
- Selection: The SFG-Drug search tree originated from an initial state containing only root nodes, where each root node corresponded to a starting token $ of a SMILES string, representing the initial stage of molecular generation. Each node in the search tree represented a molecular fragment encoded as a partial SMILES string. Using the upper confidence bound applied to trees (UCT) within the tree policy, leaf nodes were iteratively selected from the root until terminal nodes were reached.
- (2)
- Expansion: New child nodes were generated by extending selected leaf nodes, which corresponded to appending molecular fragments to the end of the current partial SMILES strings. This step progressively built longer SMILES sequences, thereby advancing the construction of candidate molecules.
- (3)
- Simulation or Evaluation: Pre-trained gated recurrent unit (GRU) models were employed to complete the partial molecular fragments into full SMILES strings, generating complete molecular structures. The validity of the resulting SMILES strings was verified, and reward values were computed based on the molecular docking scores of valid compounds with target proteins. These rewards were then compared to the current best SMILES trajectory in SFG-Drug to assess whether the generated molecules represented optimal solutions.
- (4)
- Back-propagation: Reward values were propagated backward along search paths to parent nodes in the search tree, using feedback to update node statistics and refine subsequent node selection and molecular generation.
2.4. Molecular Fragment Prediction Model
- (1)
- Input Data

- (2)
- Decoder
- (3)
- Model Loss
2.5. Molecular Docking
3. Results
3.1. Molecular Screening Rules and Drug-likeness Criteria
3.2. Evaluation Metrics for Drug Molecular Generation
3.3. Performance Evaluation and Analysis of SFG-Drug-Generated Drug Molecules
3.3.1. Visualization Analysis of Lead Drug Molecules Generated by SFG-Drug Model and ZINC-250k Dataset
3.3.2. Generation Performance of SFG-Drug Model and Quality Assessment of Generated Lead Drug Molecules
3.3.3. Molecular Docking Score Analysis
3.3.4. D Molecular Properties of SFG-Drug-Generated Lead Drug Molecules
3.3.5. Molecular Docking Pose and Interaction Analysis of SFG-Drug-Generated Lead Molecules with Protein Targets
4. Discussion
Research Achievements and Comparative Analysis
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Chen, H.; Engkvist, O.; Wang, Y.; Olivecrona, M.; Blaschke, T. The Rise of Deep Learning in Drug Discovery. Drug Discov. Today 2018, 23, 1241–1250. [Google Scholar] [CrossRef]
- Vamathevan, J.; Clark, D.; Czodrowski, P.; Dunham, I.; Ferran, E.; Lee, G.; Li, B.; Madabhushi, A.; Shah, P.; Spitzer, M.; et al. Applications of Machine Learning in Drug Discovery and Development. Nat. Rev. Drug Discov. 2019, 18, 463–477. [Google Scholar] [CrossRef]
- DiMasi, J.A.; Grabowski, H.G.; Hansen, R.W. Innovation in the Pharmaceutical Industry: New Estimates of R&D Costs. J. Health Econ. 2016, 47, 20–33. [Google Scholar]
- Wouters, O.J.; McKee, M.; Luyten, J. Estimated Research and Development Investment Needed to Bring a New Medicine to Market, 2009–2018. JAMA 2020, 323, 844–853. [Google Scholar] [CrossRef] [PubMed]
- Lima Guimarães, G.; Sanchez-Lengeling, B.; Outeiral, C.; Farias, P.L.C.; Aspuru-Guzik, A. Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models. arXiv 2017, arXiv:1705.10843. [Google Scholar]
- Zhavoronkov, A.; Ivanenkov, Y.A.; Aliper, A.; Veselov, M.S.; Aladinskiy, V.A.; Aladinskaya, A.V.; Terentiev, V.A.; Polykovskiy, D.A.; Kuznetsov, M.D.; Asadulaev, A.; et al. Deep Learning Enables Rapid Identification of Potent DDR1 Kinase Inhibitors. Nat. Biotechnol. 2019, 37, 1038–1040. [Google Scholar] [CrossRef] [PubMed]
- Polishchuk, P.G.; Madzhidov, T.I.; Varnek, A. Estimation of the Size of Drug-like Chemical Space Based on GDB-17 Data. J. Comput.-Aided Mol. Des. 2013, 27, 675–679. [Google Scholar] [CrossRef]
- Reymond, J.L.; Awale, M. Exploring Chemical Space for Drug Discovery Using the Chemical Universe Database. ACS Chem. Neurosci. 2012, 3, 649–657. [Google Scholar] [CrossRef]
- Butler, K.T.; Davies, D.W.; Cartwright, H.; Isayev, O.; Walsh, A. Machine Learning for Molecular and Materials Science. Nature 2018, 559, 547–555. [Google Scholar] [CrossRef]
- Popova, M.; Isayev, O.; Tropsha, A. Deep Reinforcement Learning for De Novo Drug Design. Sci. Adv. 2018, 4, eaap7885. [Google Scholar] [CrossRef]
- Olivecrona, M.; Blaschke, T.; Engkvist, O.; Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 2017, 9, 48. [Google Scholar] [CrossRef]
- Ramsay, R.R.; Popovic-Nikolic, M.R.; Nikolic, K.; Uliassi, E.; Bolognesi, M.L. A Perspective on Multi-target Drug Discovery and Design for Complex Diseases. Clin. Transl. Med. 2018, 7, 3. [Google Scholar] [CrossRef] [PubMed]
- Reddy, A.S.; Zhang, S. Polypharmacology: Drug Discovery for the Future. Expert Rev. Clin. Pharmacol. 2013, 6, 41–47. [Google Scholar] [CrossRef] [PubMed]
- Overington, J.P.; Al-Lazikani, B.; Hopkins, A.L. How Many Drug Targets Are There? Nat. Rev. Drug Discov. 2006, 5, 993–996. [Google Scholar] [CrossRef] [PubMed]
- Anighoro, A.; Bajorath, J.; Rastelli, G. Polypharmacology: Challenges and Opportunities in Drug Discovery. J. Med. Chem. 2014, 57, 7874–7887. [Google Scholar] [CrossRef]
- Proschak, E.; Stark, H.; Merk, D. Polypharmacology by Design: A Medicinal Chemist’s Perspective on Multitargeting Compounds. J. Med. Chem. 2019, 62, 420–444. [Google Scholar] [CrossRef]
- Hopkins, A.L. Network Pharmacology: The Next Paradigm in Drug Discovery. Nat. Chem. Biol. 2008, 4, 682–690. [Google Scholar] [CrossRef]
- Gómez-Bombarelli, R.; Wei, J.N.; Duvenaud, D.; Hernández-Lobato, J.M.; Sánchez-Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T.D.; Adams, R.P.; Aspuru-Guzik, A. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Cent. Sci. 2018, 4, 268–276. [Google Scholar] [CrossRef]
- Weininger, D. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31–36. [Google Scholar] [CrossRef]
- Sanchez-Lengeling, B.; Aspuru-Guzik, A. Inverse Molecular Design Using Machine Learning: Generative Models for Matter Engineering. Science 2018, 361, 360–365. [Google Scholar] [CrossRef]
- Elton, D.C.; Boukouvalas, Z.; Fuge, M.D.; Chung, P.W. Deep Learning for Molecular Design—A Review of the State of the Art. Mol. Syst. Des. Eng. 2019, 4, 828–849. [Google Scholar] [CrossRef]
- Kusner, M.J.; Paige, B.; Hernández-Lobato, J.M. Grammar Variational Autoencoder. Proc. Mach. Learn. Res. 2017, 70, 1945–1954. [Google Scholar]
- Jin, W.; Barzilay, R.; Jaakkola, T. Junction Tree Variational Autoencoder for Molecular Graph Generation. Proc. Mach. Learn. Res. 2018, 80, 2323–2332. [Google Scholar]
- Erlanson, D.A.; Fesik, S.W.; Hubbard, R.E.; Jahnke, W.; Jhoti, H. Twenty Years On: The Impact of Fragments on Drug Discovery. Nat. Rev. Drug Discov. 2016, 15, 605–619. [Google Scholar] [CrossRef] [PubMed]
- Congreve, M.; Carr, R.; Murray, C.; Jhoti, H. A ‘Rule of Three’ for Fragment-Based Lead Discovery? Drug Discov. Today 2003, 8, 876–877. [Google Scholar] [CrossRef] [PubMed]
- Hajduk, P.J.; Greer, J. A Decade of Fragment-Based Drug Design: Strategic Advances and Lessons Learned. Nat. Rev. Drug Discov. 2007, 6, 211–219. [Google Scholar] [CrossRef]
- Lamoree, B.; Hubbard, R.E. Current Perspectives in Fragment-Based Lead Discovery (FBLD). Essays Biochem. 2017, 61, 453–464. [Google Scholar] [CrossRef]
- You, J.; Liu, B.; Ying, Z.; Pande, V.; Leskovec, J. Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation. Adv. Neural Inf. Process. Syst. 2018, 31, 6410–6421. [Google Scholar]
- Yang, X.; Zhang, J.; Yoshizoe, K.; Terayama, K.; Tsuda, K. ChemTS: An Efficient Python Library for De Novo Molecular Generation. Sci. Technol. Adv. Mater. 2017, 18, 972–976. [Google Scholar] [CrossRef]
- Brown, N.; Fiscato, M.; Segler, M.H.; Vaucher, A.C. GuacaMol: Benchmarking Models for De Novo Molecular Design. J. Chem. Inf. Model. 2019, 59, 1096–1108. [Google Scholar] [CrossRef]
- Merk, D.; Friedrich, L.; Grisoni, F.; Schneider, G. De Novo Design of Bioactive Small Molecules by Artificial Intelligence. Mol. Inform. 2018, 37, 1700153. [Google Scholar] [CrossRef] [PubMed]
- Coulom, R. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. Comput. Games 2006, 4630, 72–83. [Google Scholar]
- Browne, C.B.; Powley, E.; Whitehouse, D.; Lucas, S.M.; Cowling, P.I.; Rohlfshagen, P.; Tavener, S.; Perez, D.; Samothrakis, S.; Colton, S. A Survey of Monte Carlo Tree Search Methods. IEEE Trans. Comput. Intell. AI Games 2012, 4, 1–43. [Google Scholar] [CrossRef]
- Segler, M.H.; Preuss, M.; Waller, M.P. Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI. Nature 2018, 555, 604–610. [Google Scholar] [CrossRef]
- Xue, D.; Gong, Y.; Yang, Z.; Chuai, G.; Qu, S.; Shen, A.; Yu, J.; Liu, Q. Advances and challenges in deep generative models for de novo molecule generation. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2019, 9, e1395. [Google Scholar] [CrossRef]
- Chen, X.; Ma, W.; Li, D.; Zhu, F.; Routray, S.; Guduri, M.; Margala, M. Kalman-based Adaptive Moment Estimation Optimisation Algorithm to Enhance GPT in LLMs for Medical Sentiment Analysis of Patient Health-related Feedback. IEEE J. Biomed. Health Inform. 2025, 1–11. [Google Scholar] [CrossRef]
- Putin, E.; Asadulaev, A.; Ivanenkov, Y.; Aladinskiy, V.; Sanchez-Lengeling, B.; Aspuru-Guzik, A.; Zhavoronkov, A. Reinforced Adversarial Neural Computer for De Novo Molecular Design. J. Chem. Inf. Model. 2018, 58, 1194–1204. [Google Scholar] [CrossRef]
- Sumita, M.; Yang, X.; Ishihara, S.; Tamura, R.; Tsuda, K. Hunting for Organic Molecules with Artificial Intelligence: Molecules Optimized for Desired Excitation Energies. ACS Cent. Sci. 2018, 4, 1126–1133. [Google Scholar] [CrossRef]
- Zheng, S.; Yan, X.; Yang, Y.; Xu, J. Identifying Structure–Property Relationships through SMILES Syntax Analysis with Self-Attention Mechanism. J. Chem. Inf. Model. 2019, 59, 914–923. [Google Scholar] [CrossRef]
- Schneider, G.; Clark, D.E. Automated De Novo Drug Design: Are We Nearly There Yet? Angew. Chem. Int. Ed. 2019, 58, 10792–10803. [Google Scholar] [CrossRef]
- Walters, W.P.; Murcko, M. Assessing the Impact of Generative AI on Medicinal Chemistry. Nat. Biotechnol. 2020, 38, 143–145. [Google Scholar] [CrossRef]
- Lewell, X.Q.; Judd, D.B.; Watson, S.P.; Hann, M.M. RECAP—Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Lead Optimization. J. Chem. Inf. Comput. Sci. 1998, 38, 511–522. [Google Scholar] [CrossRef] [PubMed]
- Degen, J.; Wegscheid-Gerlach, C.; Zaliani, A.; Rarey, M. On the Art of Compiling and Using “Drug-Like” Chemical Fragment Spaces. ChemMedChem 2008, 3, 1503–1507. [Google Scholar] [CrossRef] [PubMed]
- Gaulton, A.; Bellis, L.J.; Bento, A.P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; et al. ChEMBL: A Large-Scale Bioactivity Database for Drug Discovery. Nucleic Acids Res. 2012, 40, D1100–D1107. [Google Scholar] [CrossRef] [PubMed]
- Bjerrum, E.J.; Threlfall, R. Molecular Generation with Recurrent Neural Networks (RNNs). arXiv 2017, arXiv:1705.04612. [Google Scholar] [CrossRef]
- Lusci, A.; Pollastri, G.; Baldi, P. Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules. J. Chem. Inf. Model. 2013, 53, 1563–1575. [Google Scholar] [CrossRef]
- Krishnan, S.R.; Bung, N.; Bulusu, G.; Roy, A. Accelerating De Novo Drug Design against Novel Proteins with Deep Learning. J. Chem. Inf. Model. 2021, 61, 621–630. [Google Scholar] [CrossRef]
- Li, Z.; Zhao, Y.; Lv, X.; Deng, Y. Integrated Brain on a Chip and Automated Organ-on-Chips Systems. Interdiscip. Med. 2023, 1, e20220002. [Google Scholar] [CrossRef]
- Li, D.; Chen, X.; Li, Q.; Zhu, F.; Lu, X.; Routray, S.; Ghosh, U.; Al-Numay, M. Intelligent Biomedical Photoplethysmography Signal Cycle Division with Digital Twin in Metaverse for Consumer Health. IEEE Trans. Consum. Electron. 2024, 70, 2116–2128. [Google Scholar] [CrossRef]
- Wang, T.; Tang, Y.; Tao, Y.; Zhou, H.; Ding, D. Nucleic Acid Drug and Delivery Techniques for Disease Therapy: Present Situation and Future Prospect. Interdiscip. Med. 2024, 2, e20230041. [Google Scholar] [CrossRef]
- Wang, Y.; Zhan, J.; Huang, J.; Wang, X.; Chen, Z.; Yang, Z.; Li, J. Dynamic Responsiveness of Self-Assembling Peptide-Based Nano-Drug Systems. Interdiscip. Med. 2023, 1, e20220005. [Google Scholar] [CrossRef]
- Liu, Q.; Sabnis, Y.; Zhao, Z.; Zhang, T.; Buhrlage, S.J.; Jones, L.H.; Gray, N.S. Developing Irreversible Inhibitors of the Protein Kinase cysteinome. Chem. Biol. 2013, 20, 146–159. [Google Scholar] [CrossRef] [PubMed]
- Sabatini, D.M. Twenty-Five Years of mTOR: Uncovering the Link from Nutrients to Growth. Proc. Natl. Acad. Sci. USA 2017, 114, 11818–11825. [Google Scholar] [CrossRef] [PubMed]
- Polykovskiy, D.; Zhebrak, A.; Sanchez-Lengeling, B.; Golovanov, S.; Tatanov, O.; Belyaev, S.; Kurbanov, R.; Artamonov, A.; Aladinskiy, V.; Veselov, M.; et al. Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. Front. Pharmacol. 2020, 11, 565644. [Google Scholar] [CrossRef]
- Prykhodko, O.; Johansson, S.V.; Kotsias, P.C.; Arús-Pous, J.; Bjerrum, E.J.; Engkvist, O.; Chen, H. A De Novo Molecular Generation Method Using Latent Vector Based Generative Adversarial Network. J. Cheminform. 2019, 11, 74. [Google Scholar] [CrossRef]
- Irwin, J.J.; Sterling, T.; Mysinger, M.M.; Bolstad, E.S.; Coleman, R.G. ZINC: A Free Tool to Discover Chemistry for Biology. J. Chem. Inf. Model. 2012, 52, 1757–1768. [Google Scholar] [CrossRef]
- Villar, H.O. Library design, chemical space, and drug likeness. In In Silico Drug Discovery and Design: Theory, Methods, Challenges, and Applications; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar]
- Liu, T.; Naderi, M.; Alvin, C.; Mukhopadhyay, S.; Brylinski, M. Break Down in Order To Build Up: Decomposing Small Molecules for Fragment-Based Drug Design with eMolFrag. J. Chem. Inf. Model. 2017, 57, 627–631. [Google Scholar] [CrossRef]
- Landrum, G. RDKit: Open-Source Cheminformatics. Available online: http://www.rdkit.org (accessed on 15 January 2024).
- Yang, R.; Zhou, H.; Wang, F.; Yang, G. DigFrag as a Digital Fragmentation Method Used for Artificial Intelligence-Based Drug Design. Commun. Chem. 2024, 7, 258. [Google Scholar] [CrossRef]
- Brenk, R.; Schipani, A.; James, D.; Krasowski, A.; Gilbert, I.H.; Frearson, J.; Wyatt, P.G. Lessons Learnt from Assembling Screening Libraries for Drug Discovery for Neglected Diseases. ChemMedChem 2008, 3, 435–444. [Google Scholar] [CrossRef]
- Williams, R.J.; Zipser, D. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks. Neural Comput. 1989, 1, 270–280. [Google Scholar] [CrossRef]
- Xie, Y.; Shi, C.; Zhou, H.; Yang, Y.; Zhang, W.; Yu, Y.; Li, L. MARS: Markov Molecular Sampling for Multi-objective Drug Discovery. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
- Jin, W.; Yang, K.; Barzilay, R.; Jaakkola, T. Learning Multimodal Graph-to-Graph Translation for Molecular Optimization. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Blaschke, T.; Arús-Pous, J.; Chen, H.; Margreitter, C.; Tyrchan, C.; Engkvist, O.; Papadopoulos, K.; Patronov, A. REINVENT 2.0: An AI Tool for De Novo Drug Design. J. Chem. Inf. Model. 2020, 60, 5918–5922. [Google Scholar] [CrossRef]
- Horwood, J.; Noutahi, E. Molecular design in synthetically accessible chemical space via deep reinforcement learning. ACS Omega 2020, 5, 32984–32994. [Google Scholar] [CrossRef]
- Lin, Z.; Zhang, Y.; Duan, L.; Ou-Yang, L.; Zhao, P. MoVAE: A Variational AutoEncoder for Molecular Graph Generation. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), Minneapolis, MN, USA, 27–29 April 2023; SIAM: Philadelphia, PA, USA, 2023; pp. 514–522. [Google Scholar]
- Ma, X.H.; Shi, Z.; Tan, C.; Jiang, Y.; Go, M.L.; Low, B.C.; Chen, Y.Z. In-silico approaches to multi-target drug discovery: Computer aided multi-target drug design, multi-target virtual screening. Pharm. Res. 2010, 27, 739–749. [Google Scholar] [CrossRef]
- Bickerton, G.R.; Paolini, G.V.; Besnard, J.; Muresan, S.; Hopkins, A.L. Quantifying the Chemical Beauty of Drugs. Nat. Chem. 2012, 4, 90–98. [Google Scholar] [CrossRef]










| Model | Valid (↑) | Unique (↑) | IntDiv1 (↑) | IntDiv2 (↑) | Novelty (↑) |
|---|---|---|---|---|---|
| JT-VAE [63] | 1.000 | 1.000 | 0.855 | 0.849 | 0.914 |
| MARS [64] | 0.950 | 1.000 | 0.856 | 0.850 | 0.822 |
| RationaleRL [65] | 0.898 | 1.000 | 0.851 | 0.844 | 0.949 |
| REINVENT2.0 [66] | 0.982 | 0.980 | 0.820 | 0.804 | 1.000 |
| SFG-Drug | 1.000 | 1.000 | 0.878 | 0.860 | 1.000 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, P.; Yan, Z.; Zhou, Y.; Li, H.; Gao, W.; Li, D. Design and Research of a Dual-Target Drug Molecular Generation Model Based on Reinforcement Learning. Inventions 2026, 11, 12. https://doi.org/10.3390/inventions11010012
Li P, Yan Z, Zhou Y, Li H, Gao W, Li D. Design and Research of a Dual-Target Drug Molecular Generation Model Based on Reinforcement Learning. Inventions. 2026; 11(1):12. https://doi.org/10.3390/inventions11010012
Chicago/Turabian StyleLi, Peilin, Ziyan Yan, Yuchen Zhou, Hongyun Li, Wei Gao, and Dazhou Li. 2026. "Design and Research of a Dual-Target Drug Molecular Generation Model Based on Reinforcement Learning" Inventions 11, no. 1: 12. https://doi.org/10.3390/inventions11010012
APA StyleLi, P., Yan, Z., Zhou, Y., Li, H., Gao, W., & Li, D. (2026). Design and Research of a Dual-Target Drug Molecular Generation Model Based on Reinforcement Learning. Inventions, 11(1), 12. https://doi.org/10.3390/inventions11010012
