Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD
Abstract
:Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. Dataset Construction
2.1.1. Positive Data
2.1.2. Negative Data
2.2. Model Construction and Evaluation
2.3. Case Study
2.4. Clustering and Integrated Gradients Analyses
3. Results
3.1. Model Training and Evaluation
3.2. Model Interpretation
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Agarwala, R.; Barrett, T.; Beck, J.; Benson, D.A.; Bollin, C.; Bolton, E.; Bourexis, D.; Brister, J.R.; Bryant, S.H.; Canese, K.; et al. Database Resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2018, 46, D8–D13. [Google Scholar] [CrossRef]
- Bateman, A.; Martin, M.J.; Orchard, S.; Magrane, M.; Agivetova, R.; Ahmad, S.; Alpi, E.; Bowler-Barnett, E.H.; Britto, R.; Bursteinas, B.; et al. UniProt: The Universal Protein Knowledgebase in 2021. Nucleic Acids Res. 2021, 49, D480–D489. [Google Scholar] [CrossRef]
- Goodwin, S.; McPherson, J.D.; McCombie, W.R. Coming of Age: Ten Years of next-Generation Sequencing Technologies. Nat. Rev. Genet. 2016, 17, 333–351. [Google Scholar] [CrossRef]
- Sengupta, S.; Basak, S.; Saikia, P.; Paul, S.; Tsalavoutis, V.; Atiah, F.; Ravi, V.; Peters, A. A Review of Deep Learning with Special Emphasis on Architectures, Applications and Recent Trends. Knowl. Based Syst. 2020, 194, 105596. [Google Scholar] [CrossRef]
- Kulmanov, M.; Hoehndorf, R.; Cowen, L. DeepGOPlus: Improved Protein Function Prediction from Sequence. Bioinformatics 2020, 36, 422–429. [Google Scholar] [CrossRef]
- Strodthoff, N.; Wagner, P.; Wenzel, M.; Samek, W. UDSMProt: Universal deep sequence models for protein classification. Bioinformatics 2020, 36, 2401–2409. [Google Scholar] [CrossRef] [PubMed]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef] [PubMed]
- Baek, M.; DiMaio, F.; Anishchenko, I.; Dauparas, J.; Ovchinnikov, S.; Lee, G.R.; Wang, J.; Cong, Q.; Kinch, L.N.; Dustin Schaeffer, R.; et al. Accurate Prediction of Protein Structures and Interactions Using a Three-Track Neural Network. Science 2021, 373, 871–876. [Google Scholar] [CrossRef] [PubMed]
- Xu, J.; McPartlon, M.; Li, J. Improved protein structure prediction by deep learning irrespective of co-evolution information. Nat. Mach. Intell. 2021, 3, 601–609. [Google Scholar] [CrossRef]
- Jing, X.; Xu, J. Fast and effective protein model refinement using deep graph neural networks. Nat. Comput. Sci. 2021, 1, 462–469. [Google Scholar] [CrossRef]
- Greener, J.G.; Kandathil, S.M.; Jones, D.T. Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints. Nat. Commun. 2019, 10, 3977. [Google Scholar] [CrossRef]
- Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y.; et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379, 1123–1130. [Google Scholar] [CrossRef]
- Zhou, X.; Hu, J.; Zhang, C.; Zhang, G.; Zhang, Y. Assembling multidomain protein structures through analogous global structural alignments. Proc. Natl. Acad. Sci. USA 2019, 116, 15930–15938. [Google Scholar] [CrossRef] [PubMed]
- Zheng, W.; Wuyun, Q.; Zhou, X.; Li, Y.; Freddolino, P.L.; Zhang, Y. LOMETS3: Integrating deep learning and profile alignment for advanced protein template recognition and function annotation. Nucleic Acids Res. 2022, 50, W454–W464. [Google Scholar] [CrossRef]
- Almagro Armenteros, J.J.; Sønderby, C.K.; Sønderby, S.K.; Nielsen, H.; Winther, O. DeepLoc: Prediction of Protein Subcellular Localization Using Deep Learning. Bioinformatics 2017, 33, 3387–3395. [Google Scholar] [CrossRef] [PubMed]
- Wang, F.; Wei, L. Multi-scale deep learning for the imbalanced multi-label protein subcellular localization prediction based on immunohistochemistry images. Bioinformatics 2022, 38, 2602–2611. [Google Scholar] [CrossRef] [PubMed]
- Ryu, J.Y.; Kim, H.U.; Lee, S.Y. Deep Learning Enables High-Quality and High-Throughput Prediction of Enzyme Commission Numbers. Proc. Natl. Acad. Sci. USA 2019, 116, 13996–14001. [Google Scholar] [CrossRef]
- Li, Y.; Wang, S.; Umarov, R.; Xie, B.; Fan, M.; Li, L.; Gao, X. DEEPre: Sequence-Based Enzyme EC Number Prediction by Deep Learning. Bioinformatics 2018, 34, 760–769. [Google Scholar] [CrossRef]
- Nallapareddy, M.V.; Dwivedula, R. ABLE: Attention Based Learning for Enzyme Classification. Comput. Biol. Chem. 2021, 94, 1–10. [Google Scholar] [CrossRef]
- Schwaller, P.; Laino, T.; Gaudin, T.; Bolgar, P.; Hunter, C.A.; Bekas, C.; Lee, A.A. Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction. ACS Cent. Sci. 2019, 5, 1572–1583. [Google Scholar] [CrossRef]
- Ucak, U.V.; Ashyrmamatov, I.; Ko, J.; Lee, J. Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nat. Commun. 2022, 13, 1186. [Google Scholar] [CrossRef]
- Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
- Jiménez-Luna, J.; Grisoni, F.; Schneider, G. Drug Discovery with Explainable Artificial Intelligence. Nat. Mach. Intell. 2020, 2, 573–584. [Google Scholar] [CrossRef]
- Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic Attribution for Deep Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Lundberg, S.M.; Lee, S. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Jha, A.; Aicher, J.K.; Gazzara, M.R.; Singh, D.; Barash, Y.; Barash, Y. Enhanced Integrated Gradients: Improving Interpretability of Deep Learning Models Using Splicing Codes as a Case Study. Genome Biol. 2020, 21, 149. [Google Scholar] [CrossRef] [PubMed]
- Lin, Y.; Pan, X.; Shen, H. Bin. LncLocator 2.0: A Cell-Line-Specific Subcellular Localization Predictor for Long Non-Coding RNAs with Interpretable Deep Learning. Bioinformatics 2021, 37, 2308–2316. [Google Scholar] [CrossRef] [PubMed]
- Junghare, M.; Spiteller, D.; Schink, B. Anaerobic Degradation of Xenobiotic Isophthalate by the Fermenting Bacterium Syntrophorhabdus Aromaticivorans. ISME J. 2019, 13, 1252–1268. [Google Scholar] [CrossRef]
- Marshall, S.A.; Fisher, K.; Cheallaigh, A.N.; White, M.D.; Payne, K.A.P.; Parker, D.A.; Rigby, S.E.J.; Leys, D. Oxidative maturation and structural characterization of prenylated FMN binding by UbiD, a decarboxylase involved in bacterial ubiquinone biosynthesis. J. Biol. Chem. 2017, 292, 4623–4637. [Google Scholar] [CrossRef]
- Weber, C.; Brückner, C.; Weinreb, S.; Lehr, C.; Essl, C.; Boles, E. Biosynthesis of cis,cis-muconic acid and its aromatic precursors, catechol and protocatechuic acid, from renewable feedstocks by saccharomyces cerevisiae. Appl. Environ. Microbiol. 2012, 78, 8421–8430. [Google Scholar] [CrossRef]
- Yoshida, T.; Inami, Y.; Matsui, T.; Nagasawa, T. Regioselective Carboxylation of Catechol by 3,4-Dihydroxybenzoate Decarboxylase of Enterobacter Cloacae, P. Biotechnol. Lett. 2010, 32, 701–705. [Google Scholar] [CrossRef]
- Álvarez-Rodríguez, M.L.; Belloch, C.; Villa, M.; Uruburu, F.; Larriba, G.; Coque, J.J.R. Degradation of Vanillic Acid and Production of Guaiacol by Microorganisms Isolated from Cork Samples. FEMS Microbiol. Lett. 2003, 220, 49–55. [Google Scholar] [CrossRef]
- Dhar, A.; Lee, K.S.; Dhar, K.; Rosazza, J.P.N. Nocardia Sp. Vanillic Acid Decarboxylase. Enzym. Microb. Technol. 2007, 41, 271–277. [Google Scholar] [CrossRef]
- He, Z.; Wiegel, J. Purification and characterization of an oxygen-sensitive, reversible 3,4-dihydroxybenzoate decarboxylase from Clostridium hydroxybenzoicum. J. Bacteriol. 1996, 178, 3539–3543. [Google Scholar] [CrossRef] [PubMed]
- Matsui, T.; Yoshida, T.; Hayashi, T.; Nagasawa, T. Purification, characterization, and gene cloning of 4-hydroxybenzoate decarboxylase of Enterobacter cloacae P240. Arch. Microbiol. 2006, 186, 21–29. [Google Scholar] [CrossRef] [PubMed]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems 25, Lake Tahoe, NV, USA, 3–8 December 2012. [Google Scholar]
- Baldi, P. Autoencoders, Unsupervised Learning, and Deep Architectures. In Proceedings of the ICML Workshop on Unsupervised and Transfer Learning, Bellevue, WA, USA, 2 July 2011. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Li, W.; Godzik, A. Cd-Hit: A Fast Program for Clustering and Comparing Large Sets of Protein or Nucleotide Sequences. Bioinformatics 2006, 22, 1658–1659. [Google Scholar] [CrossRef] [PubMed]
- Camacho, C.; Coulouris, G.; Avagyan, V.; Ma, N.; Papadopoulos, J.; Bealer, K.; Madden, T.L. BLAST+: Architecture and Applications. BMC Bioinf. 2009, 10, 421. [Google Scholar] [CrossRef] [PubMed]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv 2016, arXiv:1603.04467. [Google Scholar] [CrossRef]
- Jacewicz, A.; Izumi, A.; Brunner, K.; Schnell, R.; Schneider, G. Structural Insights into the UbiD Protein Family from the Crystal Structure of PA0254 from Pseudomonas Aeruginosa. PLoS ONE 2013, 8, e63161. [Google Scholar] [CrossRef]
- Waterhouse, A.M.; Procter, J.B.; Martin, D.M.A.; Clamp, M.; Barton, G.J. Jalview Version 2-A Multiple Sequence Alignment Editor and Analysis Workbench. Bioinformatics 2009, 25, 1189–1191. [Google Scholar] [CrossRef]
- Katoh, K.; Rozewicki, J.; Yamada, K.D. MAFFT Online Service: Multiple Sequence Alignment, Interactive Sequence Choice and Visualization. Brief. Bioinform. 2019, 20, 1160–1166. [Google Scholar] [CrossRef]
- Zhou, W.; Forouhar, F.; Seetharaman, J.; Fang, Y.; Xiao, R.; Cunningham, K.; Ma, L.-C.; Chen, C.X.; Acton, T.B.; Montelione, G.T.; et al. Crystal Structure of 3-octaprenyl-4-hydroxybenzoate decarboxylase (UbiD) from Escherichia coli, Northeast Structural Genomics Target ER459. 2006. Available online: https://www.wwpdb.org/pdb?id=pdb_00002idb (accessed on 28 May 2023).
- Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The protein data bank. Nucleic Acids Res. 2000, 28, 235–242. [Google Scholar] [CrossRef]
- Blum, M.; Chang, H.Y.; Chuguransky, S.; Grego, T.; Kandasaamy, S.; Mitchell, A.; Nuka, G.; Paysan-Lafosse, T.; Qureshi, M.; Raj, S.; et al. The InterPro Protein Families and Domains Database: 20 Years On. Nucleic Acids Res. 2021, 49, D344–D354. [Google Scholar] [CrossRef] [PubMed]
- Zheng, K.; Zhang, X.L.; Wang, L.; You, Z.H.; Ji, B.Y.; Liang, X.; Li, Z.-W. SPRDA: A link prediction approach based on the structural perturbation to infer disease-associated Piwi-interacting RNAs. Brief Bioinform. 2023, 24, bbac498. [Google Scholar] [CrossRef] [PubMed]
- Zhang, H.Y.; Wang, L.; You, Z.H.; Hu, L.; Zhao, B.W.; Li, Z.W.; Li, Y.-M. iGRLCDA: Identifying circRNA–disease association based on graph representation learning. Brief Bioinform. 2022, 23, bbac083. [Google Scholar] [CrossRef] [PubMed]
Dataset Category | Training | Validation | Test |
---|---|---|---|
Positive data | 1593 | 646 | 645 |
Negative data | 62,476 | 8168 | 8167 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Watanabe, N.; Kuriya, Y.; Murata, M.; Yamamoto, M.; Shimizu, M.; Araki, M. Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD. Biology 2023, 12, 795. https://doi.org/10.3390/biology12060795
Watanabe N, Kuriya Y, Murata M, Yamamoto M, Shimizu M, Araki M. Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD. Biology. 2023; 12(6):795. https://doi.org/10.3390/biology12060795
Chicago/Turabian StyleWatanabe, Naoki, Yuki Kuriya, Masahiro Murata, Masaki Yamamoto, Masayuki Shimizu, and Michihiro Araki. 2023. "Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD" Biology 12, no. 6: 795. https://doi.org/10.3390/biology12060795
APA StyleWatanabe, N., Kuriya, Y., Murata, M., Yamamoto, M., Shimizu, M., & Araki, M. (2023). Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD. Biology, 12(6), 795. https://doi.org/10.3390/biology12060795