Exploring Deep Learning for Metalloporphyrins: Databases, Molecular Representations, and Model Architectures
Abstract
:1. Introduction
2. Results
2.1. Establishing Canonical SMILES for Porphyrins and Metalloporphyrins
2.2. Comparing the Performance of Deep Learning Models
2.3. Molecular Graph-Based Model Results
2.4. String-Based Model Results
2.5. Transfer Learning Results
2.6. Comparing the Computational Costs of Different Models
2.7. Mapping the Chemical Space of the Porphyrin Database under the D-MPNN Model and BERT Model
3. Discussion
4. Materials and Methods
4.1. Database
4.1.1. Porphyrin-Based Dyes Database
4.1.2. Databases for Transfer Learning
4.2. Structural Representation
4.3. Model Names and Architecture
4.3.1. Graph Convolutional Neural Network (GCN)
4.3.2. Message Passing Neural Network (MPNN)
4.3.3. Directed Message Passing Neural Network (D-MPNN)
4.3.4. Transformer
4.3.5. Bidirectional Encoder Representation from Transformers (BERT)
4.3.6. Tree MAP (TMAP)
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Hiroto, S.; Miyake, Y.; Shinokubo, H. Synthesis and Functionalization of Porphyrins through Organometallic Methodologies. Chem. Rev. 2017, 117, 2910–3043. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Ma, S. Biomimetic catalysis of metal–organic frameworks. Dalton Trans. 2016, 45, 9744–9753. [Google Scholar] [CrossRef] [PubMed]
- Marchetti, L.; Levine, M. Biomimetic Catalysis. ACS Catal. 2011, 1, 1090–1118. [Google Scholar] [CrossRef]
- Zhang, W.; Lai, W.; Cao, R. Energy-Related Small Molecule Activation Reactions: Oxygen Reduction and Hydrogen and Oxygen Evolution Reactions Catalyzed by Porphyrin- and Corrole-Based Systems. Chem. Rev. 2017, 117, 3717–3797. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Wasson, M.C.; Shayan, M.; Berdichevsky, E.K.; Ricardo-Noordberg, J.; Singh, Z.; Papazyan, E.K.; Castro, A.J.; Marino, P.; Ajoyan, Z.; et al. A historical perspective on porphyrin-based metal–organic frameworks and their applications. Coord. Chem. Rev. 2021, 429, 213615. [Google Scholar] [CrossRef]
- Zhao, M.; Ou, S.; Wu, C.-D. Porous Metal–Organic Frameworks for Heterogeneous Biomimetic Catalysis. Acc. Chem. Res. 2014, 47, 1199–1207. [Google Scholar] [CrossRef]
- Liang, Z.; Wang, H.-Y.; Zheng, H.; Zhang, W.; Cao, R. Porphyrin-based frameworks for oxygen electrocatalysis and catalytic reduction of carbon dioxide. Chem. Soc. Rev. 2021, 50, 2540–2581. [Google Scholar] [CrossRef]
- Qin, Y.; Zhu, L.; Luo, S. Organocatalysis in Inert C–H Bond Functionalization. Chem. Rev. 2017, 117, 9433–9520. [Google Scholar] [CrossRef]
- Groves, J.T.; Haushalter, R.C.; Nakamura, M.; Nemo, T.E.; Evans, B.J. High-valent iron-porphyrin complexes related to peroxidase and cytochrome P-450. J. Am. Chem. Soc. 1981, 103, 2884–2886. [Google Scholar] [CrossRef]
- Groves, J.T.; Nemo, T.E.; Myers, R.S. Hydroxylation and epoxidation catalyzed by iron-porphine complexes. Oxygen transfer from iodosylbenzene. J. Am. Chem. Soc. 1979, 101, 1032–1033. [Google Scholar] [CrossRef]
- Tanaka, T.; Osuka, A. Conjugated porphyrin arrays: Synthesis, properties and applications for functional materials. Chem. Soc. Rev. 2015, 44, 943–969. [Google Scholar] [CrossRef]
- Suijkerbuijk, B.M.J.M.; Klein Gebbink, R.J.M. Merging Porphyrins with Organometallics: Synthesis and Applications. Angew. Chem. Int. Ed. 2008, 47, 7396–7421. [Google Scholar] [CrossRef] [Green Version]
- Senge, M.O.; Sergeeva, N.N.; Hale, K.J. Classic highlights in porphyrin and porphyrinoid total synthesis and biosynthesis. Chem. Soc. Rev. 2021, 50, 4730–4789. [Google Scholar] [CrossRef]
- Birel, Ö.; Nadeem, S.; Duman, H. Porphyrin-Based Dye-Sensitized Solar Cells (DSSCs): A Review. J. Fluoresc. 2017, 27, 1075–1085. [Google Scholar] [CrossRef]
- Biesaga, M.; Pyrzyńska, K.; Trojanowicz, M. Porphyrins in analytical chemistry. A review. Talanta 2000, 51, 209–224. [Google Scholar] [CrossRef]
- Shaik, S.; Cohen, S.; Wang, Y.; Chen, H.; Kumar, D.; Thiel, W. P450 Enzymes: Their Structure, Reactivity, and Selectivity—Modeled by QM/MM Calculations. Chem. Rev. 2010, 110, 949–1017. [Google Scholar] [CrossRef]
- Huang, X.; Groves, J.T. Oxygen Activation and Radical Transformations in Heme Proteins and Metalloporphyrins. Chem. Rev. 2018, 118, 2491–2553. [Google Scholar] [CrossRef] [Green Version]
- Blomberg, M.R.A.; Borowski, T.; Himo, F.; Liao, R.-Z.; Siegbahn, P.E.M. Quantum Chemical Studies of Mechanisms for Metalloenzymes. Chem. Rev. 2014, 114, 3601–3658. [Google Scholar] [CrossRef]
- Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural Message Passing for Quantum Chemistry. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6 August 2017; pp. 1263–1272. [Google Scholar]
- Duvenaudt, D.; Maclaurin, D.; Aguilera-Iparraguirre, J.; Gomez-Bombarelli, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R.P. Convolutional Networks on Graphs for Learning Molecular Fingerprints. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
- Gómez-Bombarelli, R.; Wei, J.N.; Duvenaud, D.; Hernández-Lobato, J.M.; Sánchez-Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T.D.; Adams, R.P.; Aspuru-Guzik, A. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Cent. Sci. 2018, 4, 268–276. [Google Scholar] [CrossRef]
- Yang, K.; Swanson, K.; Jin, W.; Coley, C.; Eiden, P.; Gao, H.; Guzman-Perez, A.; Hopper, T.; Kelley, B.; Mathea, M.; et al. Analyzing Learned Molecular Representations for Property Prediction. J. Chem. Inf. Model. 2019, 59, 3370–3388. [Google Scholar] [CrossRef]
- Walters, W.P.; Barzilay, R. Applications of Deep Learning in Molecule Generation and Molecular Property Prediction. Acc. Chem. Res. 2021, 54, 263–270. [Google Scholar] [CrossRef] [PubMed]
- Stokes, J.M.; Yang, K.; Swanson, K.; Jin, W.; Cubillos-Ruiz, A.; Donghia, N.M.; Macnair, C.R.; French, S.; Carfrae, L.A.; Bloom-Ackermann, Z.; et al. A Deep Learning Approach to Antibiotic Discovery. Cell 2020, 180, 688–702.e613. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, X.; Zhang, S.Q.; Xu, L.C.; Hong, X. Predicting Regioselectivity in Radical C-H Functionalization of Heterocycles through Machine Learning. Angew. Chem.-Int. Ed. 2020, 59, 13253–13259. [Google Scholar] [CrossRef] [PubMed]
- Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B.A.; Thiessen, P.A.; Yu, B.; et al. PubChem in 2021: New data content and improved web interfaces. Nucleic Acids Res. 2020, 49, D1388–D1395. [Google Scholar] [CrossRef] [PubMed]
- Davies, M.; Nowotka, M.; Papadatos, G.; Dedman, N.; Gaulton, A.; Atkinson, F.; Bellis, L.; Overington, J.P. ChEMBL web services: Streamlining access to drug discovery data and utilities. Nucleic Acids Res. 2015, 43, W612–W620. [Google Scholar] [CrossRef] [Green Version]
- Mendez, D.; Gaulton, A.; Bento, A.P.; Chambers, J.; De Veij, M.; Félix, E.; Magariños, M.P.; Mosquera, J.F.; Mutowo, P.; Nowotka, M.; et al. ChEMBL: Towards direct deposition of bioassay data. Nucleic Acids Res. 2018, 47, D930–D940. [Google Scholar] [CrossRef]
- Richard, A.M.; Williams, C.R. Distributed structure-searchable toxicity (DSSTox) public database network: A proposal. Mutat. Res. Fundam. Mol. Mech. Mutagen. 2002, 499, 27–52. [Google Scholar] [CrossRef]
- Wu, Z.Q.; Ramsundar, B.; Feinberg, E.N.; Gomes, J.; Geniesse, C.; Pappu, A.S.; Leswing, K.; Pande, V. MoleculeNet: A benchmark for molecular machine learning. Chem. Sci. 2018, 9, 513–530. [Google Scholar] [CrossRef] [Green Version]
- Sterling, T.; Irwin, J.J. ZINC 15—Ligand Discovery for Everyone. J. Chem. Inf. Model. 2015, 55, 2324–2337. [Google Scholar] [CrossRef]
- Yu, J.; Zhang, C.; Cheng, Y.; Yang, Y.-F.; She, Y.-B.; Liu, F.; Su, W.; Su, A. SolvBERT for solvation free energy and solubility prediction: A demonstration of an NLP model for predicting the properties of molecular complexes. ChemRxiv 2022. [Google Scholar] [CrossRef]
- Landrum, G. RDKit: Open-Source Cheminformatics. Available online: http://www.rdkit.org (accessed on 13 November 2022).
- Landrum, G.; Tosco, P.; Kelley, B.; Ric; sriniker; gedeck; Vianello, P.; NadineSchneider; Kawashima, E.; Dalke, A.; et al. Rdkit/Rdkit: 2022_03_4 (Q1 2022) Release; Zenodo: Genève, Switzerland, 2022. [Google Scholar]
- Chithrananda, S.; Grand, G.; Ramsundar, B. Chemberta: Large-scale self-supervised pretraining for molecular property prediction. arXiv 2020, arXiv:2010.09885. [Google Scholar]
- Probst, D.; Reymond, J.-L. Visualization of very large high-dimensional data sets as minimum spanning trees. J. Cheminform. 2020, 12, 12. [Google Scholar] [CrossRef] [Green Version]
- Li, Z.; Omidvar, N.; Chin, W.S.; Robb, E.; Morris, A.; Achenie, L.; Xin, H.L. Machine-Learning Energy Gaps of Porphyrins with Molecular Graph Representations. J. Phys. Chem. A 2018, 122, 4571–4578. [Google Scholar] [CrossRef]
- Schwaller, P.; Vaucher, A.C.; Laino, T.; Reymond, J.-L. Prediction of chemical reaction yields using deep learning. Mach. Learn. Sci. Technol. 2021, 2, 015016. [Google Scholar] [CrossRef]
- Coley, C.W.; Jin, W.G.; Rogers, L.; Jamison, T.F.; Jaakkola, T.S.; Green, W.H.; Barzilay, R.; Jensen, K.F. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 2019, 10, 370–377. [Google Scholar] [CrossRef] [Green Version]
- Jin, W.; Coley, C.W.; Barzilay, R.; Jaakkola, T. Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network. arXiv 2017, arXiv:1709.04555. [Google Scholar]
- Schwaller, P.; Vaucher, A.C.; Laino, T.; Reymond, J.-L. Data augmentation strategies to improve reaction yield predictions and estimate uncertainty. ChemRxiv 2020. [Google Scholar] [CrossRef]
- Öztürk, H.; Özgür, A.; Schwaller, P.; Laino, T.; Ozkirimli, E. Exploring chemical space using natural language processing methodologies for drug discovery. Drug Discov. Today 2020, 25, 689–705. [Google Scholar] [CrossRef] [Green Version]
- Schwaller, P.; Probst, D.; Vaucher, A.C.; Nair, V.H.; Kreutter, D.; Laino, T.; Reymond, J.-L. Mapping the space of chemical reactions using attention-based neural networks. Nat. Mach. Intell. 2021, 3, 144–152. [Google Scholar] [CrossRef]
- Schwaller, P.; Laino, T.; Gaudin, T.; Bolgar, P.; Hunter, C.A.; Bekas, C.; Lee, A.A. Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction. ACS Cent. Sci. 2019, 5, 1572–1583. [Google Scholar] [CrossRef] [Green Version]
- Mo, Y.; Guan, Y.; Verma, P.; Guo, J.; Fortunato, M.E.; Lu, Z.; Coley, C.W.; Jensen, K.F. Evaluating and clustering retrosynthesis pathways with learned strategy. Chem. Sci. 2021, 12, 1469–1478. [Google Scholar] [CrossRef] [PubMed]
- Ornso, K.B.; Garcia-Lastra, J.M.; Thygesen, K.S. Computational screening of functionalized zinc porphyrins for dye sensitized solar cells. Phys. Chem. Chem. Phys. 2013, 15, 19478–19486. [Google Scholar] [CrossRef] [PubMed]
- Ornso, K.B.; Pedersen, C.S.; Garcia-Lastra, J.M.; Thygesen, K.S. Optimizing porphyrins for dye sensitized solar cells using large-scale ab initio calculations. Phys. Chem. Chem. Phys. 2014, 16, 16246–16254. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Guo, X.; Liu, L.; Xiao, Y.; Qi, Y.; Duan, C.; Zhang, F. Band gap engineering of metal-organic frameworks for solar fuel productions. Coord. Chem. Rev. 2021, 435, 213785. [Google Scholar] [CrossRef]
- Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31–35. [Google Scholar] [CrossRef]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Kearnes, S.; McCloskey, K.; Berndl, M.; Pande, V.; Riley, P. Molecular graph convolutions: Moving beyond fingerprints. J. Comput. Aided Mol. Des. 2016, 30, 595–608. [Google Scholar] [CrossRef] [Green Version]
- Eastman, B.R.P. Deep Learning for the Life Sciences, 1st ed.; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2019. [Google Scholar]
- Chollet, F. Keras. Available online: https://keras.io (accessed on 13 November 2022).
- Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Ying, C.; Cai, T.; Luo, S.; Zheng, S.; Ke, G.; He, D.; Shen, Y.; Liu, T.-Y. Do transformers really perform badly for graph representation? Adv. Neural Inf. Process. Syst. 2021, 34, 28877–28888. [Google Scholar]
- Rajapakse, T.C. Simple Transformers. Available online: https://github.com/ThilinaRajapakse/simpletransformers (accessed on 13 November 2022).
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M. Huggingface’s transformers: State-of-the-art natural language processing. In arXiv; 2019. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Jensen, Z.; Kim, E.; Kwon, S.; Gani, T.Z.H.; Román-Leshkov, Y.; Moliner, M.; Corma, A.; Olivetti, E. A Machine Learning Approach to Zeolite Synthesis Enabled by Automatic Literature Data Extraction. ACS Cent. Sci. 2019, 5, 892–899. [Google Scholar] [CrossRef] [Green Version]
- Tshitoyan, V.; Dagdelen, J.; Weston, L.; Dunn, A.; Rong, Z.; Kononova, O.; Persson, K.A.; Ceder, G.; Jain, A. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 2019, 571, 95–98. [Google Scholar] [CrossRef]
Model | Time (s) | Epochs |
---|---|---|
Transformer | 825.30 | 5 |
Transformer_transfer | 195.60 | 2 |
BERT | 621.84 | 10 |
BERT_transfer | 204.45 | 3 |
GCN | 673.96 | 20 |
MPNN | 538.59 | 50 |
D-MPNN | 935.43 | 30 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Su, A.; Zhang, C.; She, Y.-B.; Yang, Y.-F. Exploring Deep Learning for Metalloporphyrins: Databases, Molecular Representations, and Model Architectures. Catalysts 2022, 12, 1485. https://doi.org/10.3390/catal12111485
Su A, Zhang C, She Y-B, Yang Y-F. Exploring Deep Learning for Metalloporphyrins: Databases, Molecular Representations, and Model Architectures. Catalysts. 2022; 12(11):1485. https://doi.org/10.3390/catal12111485
Chicago/Turabian StyleSu, An, Chengwei Zhang, Yuan-Bin She, and Yun-Fang Yang. 2022. "Exploring Deep Learning for Metalloporphyrins: Databases, Molecular Representations, and Model Architectures" Catalysts 12, no. 11: 1485. https://doi.org/10.3390/catal12111485
APA StyleSu, A., Zhang, C., She, Y.-B., & Yang, Y.-F. (2022). Exploring Deep Learning for Metalloporphyrins: Databases, Molecular Representations, and Model Architectures. Catalysts, 12(11), 1485. https://doi.org/10.3390/catal12111485