A Chemoinformatics Investigation of Spectral and Quantum Chemistry Patterns for Discovering New Drug Leads from Natural Products Targeting the PD-1/PD-L1 Immune Checkpoint, with a Particular Focus on Naturally Occurring Marine Products
Abstract
1. Introduction
2. Results and Discussion
2.1. Docking Score for the Estimation of IC50 Values
2.2. QSDAR Regression Modeling
2.2.1. Dataset
2.2.2. Spectral Data
2.2.3. QSDAR Model Development
2.3. Benchmarking with QSAR Using Molecular, Fingerprint, and Quantum Descriptors
2.4. QSDAR Classification Model
Analysis of Outliers
2.5. Applicability Domain of PD-L1 QSDAR Classification Model
2.6. Virtual Screening
3. Materials and Methods
3.1. Datasets: Training and Test Sets
3.2. Molecular Docking
3.3. ML Techniques
3.3.1. Random Forest (RF)
3.3.2. Support Vector Machines (SVM)
3.3.3. Deep Learning Multilayer Perceptron Networks (dMLP)
3.3.4. Light Gradient-Boosting Machine (LightGBM)
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- de la Torre, B.G.; Albericio, F. The Pharmaceutical Industry in 2020. An Analysis of FDA Drug Approvals from the Perspective of Molecules. Molecules 2021, 26, 627. [Google Scholar] [CrossRef] [PubMed]
- Pereira, F.; Aires-de-Sousa, J. Computational Methodologies in the Exploration of Marine Natural Product Leads. Mar. Drugs 2018, 16, 236. [Google Scholar] [CrossRef]
- Salman, M.M.; Al-Obaidi, Z.; Kitchen, P.; Loreto, A.; Bill, R.M.; Wade-Martins, R. Advances in Applying Computer-Aided Drug Design for Neurodegenerative Diseases. Int. J. Mol. Sci. 2021, 22, 4688. [Google Scholar] [CrossRef]
- Cui, W.; Aouidate, A.; Wang, S.; Yu, Q.; Li, Y.; Yuan, S. Discovering Anti-Cancer Drugs via Computational Methods. Front. Pharmacol. 2020, 11, 733. [Google Scholar] [CrossRef] [PubMed]
- Jimenez, P.C.; Wilke, D.V.; Branco, P.C.; Bauermeister, A.; Rezende-Teixeira, P.; Gaudencio, S.P.; Costa-Lotufo, L.V. Enriching cancer pharmacology with drugs of marine origin. Br. J. Pharmacol. 2020, 177, 3–27. [Google Scholar] [CrossRef] [PubMed]
- Pereira, F. Have marine natural product drug discovery efforts been productive and how can we improve their efficiency? Expert Opin. Drug Discov. 2019, 14, 717–722. [Google Scholar] [CrossRef]
- Sorokina, M.; Merseburger, P.; Rajan, K.; Yirik, M.A.; Steinbeck, C. COCONUT online: Collection of Open Natural Products database. J. Cheminform. 2021, 13, 2. [Google Scholar] [CrossRef]
- Chandrasekhar, V.; Rajan, K.; Kanakam, S.R.S.; Sharma, N.; Weißenborn, V.; Schaub, J.; Steinbeck, C. COCONUT 2.0: A comprehensive overhaul and curation of the collection of open natural products database. Nucleic Acids Res. 2025, 53, D634–D643. [Google Scholar] [CrossRef]
- Newman, D.J.; Cragg, G.M. Natural Products as Sources of New Drugs over the Nearly Four Decades from 01/1981 to 09/2019. J. Nat. Prod. 2020, 83, 770–803. [Google Scholar] [CrossRef]
- Wetzel, S.; Bon, R.S.; Kumar, K.; Waldmann, H. Biology-Oriented Synthesis. Angew. Chem. Int. Ed. 2011, 50, 10800–10826. [Google Scholar] [CrossRef]
- Pereira, F. Machine Learning Methods to Predict the Terrestrial and Marine Origin of Natural Products. Mol. Inform. 2021, 40, 2060034. [Google Scholar] [CrossRef] [PubMed]
- Ertl, P.; Roggo, S.; Schuffenhauer, A. Natural product-likeness score and its application for prioritization of compound libraries. J. Chem. Inf. Model. 2008, 48, 68–74. [Google Scholar] [CrossRef]
- Shang, J.; Hu, B.; Wang, J.; Zhu, F.; Kang, Y.; Li, D.; Sun, H.; Kong, D.-X.; Hou, T. A cheminformatic insight into the differences between terrestrial and marine originated natural products. J. Chem. Inf. Model. 2018, 58, 1182–1193. [Google Scholar] [CrossRef] [PubMed]
- Zhang, W.J.; Song, Y.X.; Zhang, X.C. Relationship of Programmed Death-1 (PD-1) and Programmed Death Ligand-1 (PD-L1) Polymorphisms with Overall Cancer Susceptibility: An Updated Meta-Analysis of 28 Studies with 60 612 Subjects. Med. Sci. Monit. 2021, 27, 17. [Google Scholar] [CrossRef]
- Sobral, P.; Luz, V.; Almeida, J.; Videira, P.; Pereira, F. Computational Approaches Drive Developments in Immune-Oncology Therapies for PD-1/PD-L1 Immune Checkpoint Inhibitors. Int. J. Mol. Sci. 2023, 24, 5908. [Google Scholar] [CrossRef]
- Butera, R.; Wazynska, M.; Magiera-Mularz, K.; Plewka, J.; Musielak, B.; Surmiak, E.; Sala, D.; Kitel, R.; de Bruyn, M.; Nijman, H.; et al. Design, Synthesis, and Biological Evaluation of Imidazopyridines as PD-1/PD-L1 Antagonists. ACS Med. Chem. Lett. 2021, 12, 768–773. [Google Scholar] [CrossRef]
- Konieczny, M.; Musielak, B.; Kocik, J.; Skalniak, L.; Sala, D.; Czub, M.; Magiera-Mularz, K.; Rodriguez, I.; Myrcha, M.; Stec, M.; et al. Di-bromo-Based Small-Molecule Inhibitors of the PD-1/PD-L1 Immune Checkpoint. J. Med. Chem. 2020, 63, 11271–11285. [Google Scholar] [CrossRef]
- Lu, L.; Qi, Z.; Wang, T.; Zhang, X.; Zhang, K.; Wang, K.; Cheng, Y.; Xiao, Y.; Li, Z.; Jiang, S. Design, Synthesis, and Evaluation of PD-1/PD-L1 Antagonists Bearing a Benzamide Scaffold. ACS Med. Chem. Lett. 2022, 13, 586–592. [Google Scholar] [CrossRef] [PubMed]
- Muszak, D.; Surmiak, E.; Plewka, J.; Magiera-Mularz, K.; Kocik-Krol, J.; Musielak, B.; Sala, D.; Kitel, R.; Stec, M.; Weglarczyk, K.; et al. Terphenyl-Based Small-Molecule Inhibitors of Programmed Cell Death-1/Programmed Death-Ligand 1 Protein-Protein Interaction. J. Med. Chem. 2021, 64, 11614–11636. [Google Scholar] [CrossRef]
- Song, Z.; Liu, B.; Peng, X.; Gu, W.; Sun, Y.; Xing, L.; Xu, Y.; Geng, M.; Ai, J.; Zhang, A. Design, Synthesis, and Pharmacological Evaluation of Biaryl-Containing PD-1/PD-L1 Interaction Inhibitors Bearing a Unique Difluoromethyleneoxy Linkage. J. Med. Chem. 2021, 64, 16687–16702. [Google Scholar] [CrossRef]
- Gao, Y.; Wang, H.; Shen, L.; Xu, H.; Deng, M.; Cheng, M.; Wang, J. Discovery of benzo[d]isothiazole derivatives as novel scaffold inhibitors targeting the programmed cell death-1/programmed cell death-ligand 1 (PD-1/PD-L1) interaction through “ring fusion” strategy. Bioorg. Chem. 2022, 123, 105769. [Google Scholar] [CrossRef] [PubMed]
- Sobral, P.; Carvalho, T.; Izadi, S.; Castilho, A.; Silva, Z.; Videira, P.; Pereira, F. Advancements in drug discovery: Integrating CADD tools and drug repurposing for PD-1/PD-L1 axis inhibition. RSC Adv. 2025, 15, 2298–2316. [Google Scholar] [CrossRef] [PubMed]
- Rufino, A.; Brant, A.; Santos, J.; Ferreira, M.; Emerenciano, V. Simple Method for Identification of Skeletons of Aporphine Alkaloids from 13C NMR Data Using Artificial Neural Networks. J. Chem. Inf. Model. 2005, 45, 645–651. [Google Scholar] [CrossRef]
- An, N.; Van der Mei, F.; Voutchkova-Kostal, A. Global Model for Octanol-Water Partition Coefficients from Proton Nuclear Magnetic Resonance Spectra. Mol. Inform. 2014, 33, 286–292. [Google Scholar] [CrossRef]
- Scotti, L.; Fernandes, M.; Muramatsu, E.; Emereciano, V.; Tavares, J.; da Silva, M.; Scotti, M.T. 13C NMR spectral data and molecular descriptors to predict the antioxidant activity of flavonoids. Braz. J. Pharm. 2011, 47, 241–249. [Google Scholar] [CrossRef]
- Latino, D.; Alres-De-Sousa, J. Linking Databases of Chemical Reactions to NMR Data: An Exploration of 1H NMR-Based Reaction Classification. Anal. Chem. 2007, 79, 854–862. [Google Scholar] [CrossRef]
- Latino, D.; Aires-de-Sousa, J. Automatic NMR-Based Identification of Chemical Reaction Types in Mixtures of Co-Occurring Reactions. PLoS ONE 2014, 9, e88499. [Google Scholar] [CrossRef] [PubMed]
- Cruz, S.; Gomes, S.E.; Borralho, P.M.; Rodrigues, C.M.P.; Gaudencio, S.P.; Pereira, F. In Silico HCT116 Human Colon Cancer Cell-Based Models En Route to the Discovery of Lead-Like Anticancer Drugs. Biomolecules 2018, 8, 56. [Google Scholar] [CrossRef]
- Dias, T.; Gaudencio, S.P.; Pereira, F. A Computer-Driven Approach to Discover Natural Product Leads for Methicillin-Resistant Staphylococcus aureus Infection Therapy. Mar. Drugs 2019, 17, 16. [Google Scholar] [CrossRef]
- Guzik, K.; Zak, K.; Grudnik, P.; Magiera, K.; Musielak, B.; Törner, R.; Skalniak, L.; Dömling, A.; Dubin, G.; Holak, T. Small-Molecule Inhibitors of the Programmed Cell Death-1/Programmed Death-Ligand 1 (PD-1/PD-L1) Interaction via Transiently Induced Protein States and Dimerization of PD-L1. J. Med. Chem. 2017, 60, 5857–5867. [Google Scholar] [CrossRef]
- Ghose, A.; Viswanadhan, V.; Wendoloski, J. A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. J. Comb. Chem. 1999, 1, 55–68. [Google Scholar] [CrossRef] [PubMed]
- Yang, Z.; Chakraborty, M.; White, A. Predicting chemical shifts with graph neural networks. Chem. Sci. 2021, 12, 10802–10809. [Google Scholar] [CrossRef] [PubMed]
- Binev, Y.; Marques, M.M.B.; Aires-de-Sousa, J. Prediction of H-1 NMR coupling constants with associative neural networks trained for chemical shifts. J. Chem. Inf. Model. 2007, 47, 2089–2097. [Google Scholar] [CrossRef]
- Jonas, E.; Kuhn, S.; Schlörer, N. Prediction of chemical shift in NMR: A review. Magn. Reson. Chem. 2022, 60, 1021–1031. [Google Scholar] [CrossRef]
- RDKit: Open-Source. Cheminformatics Software. 2016. Available online: https://www.rdkit.org/ (accessed on 25 March 2025).
- Pereira, F.; Xiao, K.; Latino, D.A.R.S.; Wu, C.; Zhang, Q.; Aires-de-Sousa, J. Machine Learning Methods to Predict Density Functional Theory B3LYP Energies of HOMO and LUMO Orbitals. J. Chem. Inf. Model. 2017, 57, 11–21. [Google Scholar] [CrossRef]
- Pereira, F.; Aires-de-Sousa, J. Machine learning for the prediction of molecular dipole moments obtained by density functional theory. J. Cheminform. 2018, 10, 43. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Q.; Zheng, F.; Fartaria, R.; Latino, D.A.R.S.; Qu, X.; Campos, T.; Zhao, T.; Aires-de-Sousa, J. A QSPR approach for the fast estimation of DFT/NBO partial atomic charges. Chemom. Intell. Lab. Syst. 2014, 134, 158–163. [Google Scholar] [CrossRef]
- Trott, O.; Olson, A.J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010, 31, 455–461. [Google Scholar] [CrossRef]
- Eberhardt, J.; Santos-Martins, D.; Tillack, A.F.; Forli, S. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J. Chem. Inf. Model. 2021, 61, 3891–3898. [Google Scholar] [CrossRef]
- Pettersen, E.F.; Goddard, T.D.; Huang, C.C.; Couch, G.S.; Greenblatt, D.M.; Meng, E.C.; Ferrin, T.E. UCSF Chimera--a visualization system for exploratory research and analysis. J. Comput. Chem. 2004, 25, 1605–1612. [Google Scholar] [CrossRef]
- Adasme, M.F.; Linnemann, K.L.; Bolz, S.N.; Kaiser, F.; Salentin, S.; Haupt, V.J.; Schroeder, M. PLIP 2021: Expanding the scope of the protein-ligand interaction profiler to DNA and RNA. Nucleic Acids Res. 2021, 49, W530–W534. [Google Scholar] [CrossRef] [PubMed]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947–1958. [Google Scholar] [CrossRef] [PubMed]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- RandomForestRegressor—Scikit-Learn 1.6.0 Documentation. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html (accessed on 25 March 2025).
- Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Chang, C.-C.; Lin, C.-J. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 27. [Google Scholar] [CrossRef]
- Chollet, F. Keras: Deep Learning for Humans. Available online: https://keras.io/ (accessed on 25 March 2025).
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv 2016, arXiv:1603.04467. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Advances in Neural Information Processing Systems 30 (NIPS 2017). Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Sets | # | PD-L1 Activity | Drug-Likeness | NP-Likeness Score 5 | ||
---|---|---|---|---|---|---|
Active 1 | Inactive 2 | MWheatom 3 | MolLogP 4 | |||
Training | 119,733 | 46,950 (39%) | 72,783 (61%) | 112,555 (94%) | 100,534 (84%) | 52,640 (44%) |
Test | 1202 | 491 (41%) | 711 (59%) | 1141 (95%) | 1004 (84%) | 519 (43%) |
Spectral Descriptors | # 1 | R2 | MAE 3 | RMSE 4 |
---|---|---|---|---|
SPINUS | 570 | 0.907 2 | 0.229 | 0.311 |
GNN | 250 | 0.784 2 | 0.346 | 0.444 |
SPINUS + GNN | 820 | 0.907 2 | 0.228 | 0.304 |
Model | # 1 | R2 | MAE 3 | RMSE 4 |
---|---|---|---|---|
SPINUS | 50 | 0.90086 2 | 0.23427 | 0.31782 |
100 | 0.90644 2 | 0.22927 | 0.31147 | |
150 | 0.90685 2 | 0.22903 | 0.31102 | |
200 | 0.90777 2 | 0.22856 | 0.31030 | |
250 | 0.90685 | 0.22902 | 0.31101 |
Model | R2 | MAE 2 | RMSE 3 |
---|---|---|---|
Training set | |||
RF 4 | 0.908 1 | 0.229 | 0.310 |
LightGBM 5 | 0.941 1 | 0.156 | 0.223 |
SVM 5 | 0.531 1 | 0.423 | 0.595 |
CNN 5 | 0.502 1 | 0.446 | 0.612 |
Test set | |||
RF 4 | 0.489 1 | 0.464 | 0.626 |
LightGBM 5 | 0.525 1 | 0.441 | 0.601 |
SVM 5 | 0.396 1 | 0.516 | 0.680 |
CNN 5 | 0.377 1 | 0.528 | 0.695 |
Descriptors | # 1 | R2 | MAE 3 | RMSE 4 | |
---|---|---|---|---|---|
FPs | MACCS | 166 | 0.8213 2 | 0.1879 | 0.2618 |
Morgan | 1024 | 0.91 2 | 0.0987 | 0.1365 | |
RDKit | 2048 | 0.92 2 | 0.077 | 0.1248 | |
1D&2D | 425 | 0.9384 2 | 0.017 | 0.0255 | |
3D | Autocorr3D | 80 | 0.9429 2 | 0.0373 | 0.0519 |
Getaway | 271 | 0.9384 2 | 0.0132 | 0.019 | |
RDF | 210 | 0.9384 2 | 0.0202 | 0.0292 | |
3D | 561 | 0.9384 2 | 0.0091 | 0.0132 | |
Quantum | 5 | 0.6246 2 | 0.333 | 0.4250 |
Model | # 1 | R2 | MAE 3 | RMSE 4 |
---|---|---|---|---|
Training set 5 | ||||
1D&2D&3D | 986 | 0.9384 2 | 0.0071 | 0.0105 |
Test set | ||||
1D&2D&3D | 986 | 0.7707 2 | 0.2363 | 0.3112 |
Sets | ||
---|---|---|
Training 1 | Test | |
TP 2 | 33,333 | 365 |
TN 3 | 57,268 | 574 |
FP 4 | 15,515 | 137 |
FN 5 | 13,617 | 126 |
SE 6 | 0.710 | 0.743 |
SP 7 | 0.787 | 0.807 |
Q 8 | 0.757 | 0.781 |
MCC 9 | 0.494 | 0.549 |
Hyperparameter | Setting |
---|---|
Initializer | Glorot uniform |
Number of hidden layers | 4 |
Number of neurons in the 1st, 2nd, 3rd, and 4th layers | 150 |
Activation 1st–3rd layers | Relu |
Activation 4th layer | Relu |
Batch size | 128 |
Optimizer | Adam |
Epochs | 500 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rabelo, H.; Tsimiante, A.; Binev, Y.; Pereira, F. A Chemoinformatics Investigation of Spectral and Quantum Chemistry Patterns for Discovering New Drug Leads from Natural Products Targeting the PD-1/PD-L1 Immune Checkpoint, with a Particular Focus on Naturally Occurring Marine Products. Mar. Drugs 2025, 23, 247. https://doi.org/10.3390/md23060247
Rabelo H, Tsimiante A, Binev Y, Pereira F. A Chemoinformatics Investigation of Spectral and Quantum Chemistry Patterns for Discovering New Drug Leads from Natural Products Targeting the PD-1/PD-L1 Immune Checkpoint, with a Particular Focus on Naturally Occurring Marine Products. Marine Drugs. 2025; 23(6):247. https://doi.org/10.3390/md23060247
Chicago/Turabian StyleRabelo, Henrique, Ayana Tsimiante, Yuri Binev, and Florbela Pereira. 2025. "A Chemoinformatics Investigation of Spectral and Quantum Chemistry Patterns for Discovering New Drug Leads from Natural Products Targeting the PD-1/PD-L1 Immune Checkpoint, with a Particular Focus on Naturally Occurring Marine Products" Marine Drugs 23, no. 6: 247. https://doi.org/10.3390/md23060247
APA StyleRabelo, H., Tsimiante, A., Binev, Y., & Pereira, F. (2025). A Chemoinformatics Investigation of Spectral and Quantum Chemistry Patterns for Discovering New Drug Leads from Natural Products Targeting the PD-1/PD-L1 Immune Checkpoint, with a Particular Focus on Naturally Occurring Marine Products. Marine Drugs, 23(6), 247. https://doi.org/10.3390/md23060247