Deep Learning Approaches for the Prediction of Protein Functional Sites
Abstract
:1. Introduction
2. A Primer on Deep Learning
3. Deep Learning Approaches for Predicting Protein Functional Sites
Name | Prediction Goal | URL(s) and/or Reference | Deep Learning Architecture/Approach |
---|---|---|---|
AlphaFold | Protein 3D structure | https://alphafoldserver.com/ | Deep neural network with a diffusion generative model |
[7] | |||
DeepTMpred | Transmembrane helices | https://github.com/ISYSLAB-HUST/DeepTMpred | Combination of different DL models |
[18] | |||
BetAware-Deep | Transmembrane beta barrels | https://busca.biocomp.unibo.it/betaware2 | Deep recurrent neural network |
[19] | |||
TMbed | Transmembrane helices and barrels | https://github.com/BernhoferM/TMbed | Embbedings of protein language models |
[20] | |||
PhiGnet | Protein function and associated functional sites | https://doi.org/10.5281/zenodo.12496869 | Graph neural networks |
[23] | |||
DeepFri | Protein function and associated functional sites | https://beta.deepfri.flatironinstitute.org/ | Convolutional neural network and pre-trained embedding with protein language models |
[24] | |||
PARSE | Enzyme functions and associated catalytic sites | https://github.com/awfderry/PARSE | COLLAPSE embeddings [21] and traditional statistics |
[25] | |||
ScanNet | Protein binding sites | http://bioinfo3d.cs.tau.ac.il/ScanNet/ | Geometric deep learning |
[26] | |||
-- | PROSITE motifs [22] and catalytic sites | https://simtk.org/projects/fscnn | 3D Convolutional neural network |
[27] | |||
cDL-PAU, cDL- FuncPhos | Different post-translational modifications | https://github.com/ComputeSuda/PTM_ML | Combination of different neural networks |
[28] | |||
DeepNphos | N-phosphorylation sites | https://github.com/ChangXulinmessi/DeepNPhos | Convolutional neural network |
[29] | |||
NetGPI | Glycosylphosphatidylinositol anchoring sites | https://services.healthtech.dtu.dk/service.php?NetGPI | Recurrent neural network |
https://github.com/mhgislason/netgpi-1.1 | |||
[30] | |||
DeepConv-DTI | Drug targets and drug-binding residues | https://github.com/GIST-CSBL/DeepConv-DTI | Convolutional neural network |
[31] | |||
DeepDrug3D | Ligand binding sites | https://github.com/pulimeng/DeepDrug3D | Convolutional neural network |
[32] | |||
SignalP | All five types of signal peptides | https://services.healthtech.dtu.dk/service.php?SignalP-6.0 | Protein language models |
https://github.com/fteufel/signalp-6.0 | |||
[34] | |||
TargetP | Transit peptides | http://www.cbs.dtu.dk/services/TargetP-2.0/ | Recurrent neural network |
https://github.com/JJAlmagro/TargetP-2.0/ | |||
[35] | |||
SCLpred-EMS | Sorting signal to endomembrane/secretory pathway | http://distilldeep.ucd.ie/SCLpred2/ | Convolutional neural networks |
[36] | |||
DeepLoc | Sorting signals for different subcellular compartments | https://services.healthtech.dtu.dk/service.php?DeepLoc-2.0 | Protein language models |
[37] | |||
DDMut-PPI | Effect of mutations on protein interactions | https://biosig.lab.uq.edu.au/ddmut_ppi | Graph convolutional neural network |
[39] | |||
ProMEP | Effect of mutations | https://github.com/wenjiegroup/ProMEP | Protein language models |
[40] | |||
ESM1b | Disease variant effect | https://github.com/ntranoslab/esm-variants | Protein language model |
[41] |
4. Example
5. Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Hu, T.; Chitnis, N.; Monos, D.; Dinh, A. Next-Generation Sequencing Technologies: An Overview. Hum. Immunol. 2021, 82, 801–811. [Google Scholar] [CrossRef] [PubMed]
- Shokralla, S.; Spall, J.L.; Gibson, J.F.; Hajibabaei, M. Next-Generation Sequencing Technologies for Environmental DNA Research. Mol. Ecol. 2012, 21, 1794–1805. [Google Scholar] [CrossRef] [PubMed]
- The UniProt Consortium. UniProt: The Universal Protein Knowledgebase in 2021. Nucleic Acids Res. 2021, 49, D480–D489. [Google Scholar] [CrossRef] [PubMed]
- Rauer, C.; Sen, N.; Waman, V.P.; Abbasian, M.; Orengo, C.A. Computational Approaches to Predict Protein Functional Families and Functional Sites. Curr. Opin. Struct. Biol. 2021, 70, 108–122. [Google Scholar] [CrossRef]
- Chagoyen, M.; Garcia-Martin, J.A.; Pazos, F. Practical Analysis of Specificity-Determining Residues in Protein Families. Brief. Bioinform. 2016, 17, 255–261. [Google Scholar] [CrossRef]
- Pazos, F. Computational Prediction of Protein Functional Sites-Applications in Biotechnology and Biomedicine. Adv. Protein. Chem. Struct Biol. 2022, 130, 39–57. [Google Scholar] [CrossRef]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
- Baek, M.; DiMaio, F.; Anishchenko, I.; Dauparas, J.; Ovchinnikov, S.; Lee, G.R.; Wang, J.; Cong, Q.; Kinch, L.N.; Schaeffer, R.D.; et al. Accurate Prediction of Protein Structures and Interactions Using a Three-Track Neural Network. Science 2021, 373, 871–876. [Google Scholar] [CrossRef]
- Baldi, P. Deep Learning in Science; Cambridge University Press: Cambridge, UK, 2021; ISBN 978-1-108-84535-9. [Google Scholar]
- Zou, J.; Huss, M.; Abid, A.; Mohammadi, P.; Torkamani, A.; Telenti, A. A Primer on Deep Learning in Genomics. Nat. Genet. 2019, 51, 12–18. [Google Scholar] [CrossRef]
- Boadu, F.; Lee, A.; Cheng, J. Deep Learning Methods for Protein Function Prediction. PROTEOMICS 2024, 12, 2300471. [Google Scholar] [CrossRef]
- Simon, E.; Swanson, K.; Zou, J. Language Models for Biological Research: A Primer. Nat. Methods 2024, 21, 1422–1429. [Google Scholar] [CrossRef] [PubMed]
- Pazos, F.; Sanchez-Pulido, L. Protein Superfamilies. In eLS; John Wiley & Sons, Ltd: Chichester, UK, 2014. [Google Scholar] [CrossRef]
- Bordin, N.; Lau, A.M.; Orengo, C. Large-Scale Clustering of AlphaFold2 3D Models Shines Light on the Structure and Function of Proteins. Mol. Cell 2023, 83, 3950–3952. [Google Scholar] [CrossRef] [PubMed]
- Porta-Pardo, E.; Ruiz-Serra, V.; Valentini, S.; Valencia, A. The Structural Coverage of the Human Proteome before and after AlphaFold. PLOS Comput. Biol. 2022, 18, e1009818. [Google Scholar] [CrossRef] [PubMed]
- Yang, Z.; Zeng, X.; Zhao, Y.; Chen, R. AlphaFold2 and Its Applications in the Fields of Biology and Medicine. Signal Transduct. Target. Ther. 2023, 8, 115. [Google Scholar] [CrossRef]
- Wilson, C.J.; Choy, W.-Y.; Karttunen, M. AlphaFold2: A Role for Disordered Protein/Region Prediction? Int. J. Mol. Sci. 2022, 23, 4591. [Google Scholar] [CrossRef]
- Wang, L.; Zhong, H.; Xue, Z.; Wang, Y. Improving the Topology Prediction of α-Helical Transmembrane Proteins with Deep Transfer Learning. Comput. Struct. Biotechnol. J. 2022, 20, 1993–2000. [Google Scholar] [CrossRef]
- Madeo, G.; Savojardo, C.; Martelli, P.L.; Casadio, R. BetAware-Deep: An Accurate Web Server for Discrimination and Topology Prediction of Prokaryotic Transmembrane β-Barrel Proteins. J. Mol. Biol. 2021, 433, 166729. [Google Scholar] [CrossRef]
- Bernhofer, M.; Rost, B. TMbed: Transmembrane Proteins Predicted through Language Model Embeddings. BMC Bioinform. 2022, 23, 326. [Google Scholar] [CrossRef]
- Derry, A.; Altman, R.B. COLLAPSE: A Representation Learning Framework for Identification and Characterization of Protein Structural Sites. Protein. Sci. 2023, 32, e4541. [Google Scholar] [CrossRef]
- Sigrist, C.J.A.; de Castro, E.; Cerutti, L.; Cuche, B.A.; Hulo, N.; Bridge, A.; Bougueleret, L.; Xenarios, I. New and Continuing Developments at PROSITE. Nucleic Acids Res. 2013, 41, D344–D347. [Google Scholar] [CrossRef]
- Jang, Y.J.; Qin, Q.-Q.; Huang, S.-Y.; Peter, A.T.J.; Ding, X.-M.; Kornmann, B. Accurate Prediction of Protein Function Using Statistics-Informed Graph Networks. Nat. Commun. 2024, 15, 6601. [Google Scholar] [CrossRef] [PubMed]
- Gligorijević, V.; Renfrew, P.D.; Kosciolek, T.; Leman, J.K.; Berenberg, D.; Vatanen, T.; Chandler, C.; Taylor, B.C.; Fisk, I.M.; Vlamakis, H.; et al. Structure-Based Protein Function Prediction Using Graph Convolutional Networks. Nat. Commun. 2021, 12, 3168. [Google Scholar] [CrossRef] [PubMed]
- Derry, A.; Altman, R.B. Explainable Protein Function Annotation Using Local Structure Embeddings. bioRxiv 2023. [Google Scholar] [CrossRef]
- Tubiana, J.; Schneidman-Duhovny, D.; Wolfson, H.J. ScanNet: An Interpretable Geometric Deep Learning Model for Structure-Based Protein Binding Site Prediction. Nat. Methods 2022, 19, 730–739. [Google Scholar] [CrossRef]
- Torng, W.; Altman, R.B. High Precision Protein Functional Site Detection Using 3D Convolutional Neural Networks. Bioinformatics 2019, 35, 1503–1512. [Google Scholar] [CrossRef]
- Zhu, F.; Yang, S.; Meng, F.; Zheng, Y.; Ku, X.; Luo, C.; Hu, G.; Liang, Z. Leveraging Protein Dynamics to Identify Functional Phosphorylation Sites Using Deep Learning Models. J. Chem. Inf. Model. 2022, 62, 3331–3345. [Google Scholar] [CrossRef]
- Chang, X.; Zhu, Y.; Chen, Y.; Li, L. DeepNphos: A Deep-Learning Architecture for Prediction of N-Phosphorylation Sites. Comput. Biol. Med. 2024, 170, 108079. [Google Scholar] [CrossRef]
- Gíslason, M.H.; Nielsen, H.; Almagro Armenteros, J.J.; Johansen, A.R. Prediction of GPI-Anchored Proteins with Pointer Neural Networks. Curr. Res. Biotechnol. 2021, 3, 6–13. [Google Scholar] [CrossRef]
- Lee, I.; Keum, J.; Nam, H. DeepConv-DTI: Prediction of Drug-Target Interactions via Deep Learning with Convolution on Protein Sequences. PLOS Comput. Biol. 2019, 15, e1007129. [Google Scholar] [CrossRef]
- Pu, L.; Govindaraj, R.G.; Lemoine, J.M.; Wu, H.-C.; Brylinski, M. DeepDrug3D: Classification of Ligand-Binding Pockets in Proteins with a Convolutional Neural Network. PLOS Comput. Biol. 2019, 15, e1006718. [Google Scholar] [CrossRef]
- Savojardo, C.; Martelli, P.L.; Casadio, R. Finding Functional Motifs in Protein Sequences with Deep Learning and Natural Language Models. Curr. Opin. Struct. Biol. 2023, 81, 102641. [Google Scholar] [CrossRef] [PubMed]
- Teufel, F.; Almagro Armenteros, J.J.; Johansen, A.R.; Gíslason, M.H.; Pihl, S.I.; Tsirigos, K.D.; Winther, O.; Brunak, S.; von Heijne, G.; Nielsen, H. SignalP 6.0 Predicts All Five Types of Signal Peptides Using Protein Language Models. Nat. Biotechnol. 2022, 40, 1023–1025. [Google Scholar] [CrossRef] [PubMed]
- Armenteros, J.J.A.; Salvatore, M.; Emanuelsson, O.; Winther, O.; von Heijne, G.; Elofsson, A.; Nielsen, H. Detecting Sequence Signals in Targeting Peptides Using Deep Learning. Life Sci. Alliance 2019, 2, e201900429. [Google Scholar] [CrossRef] [PubMed]
- Kaleel, M.; Zheng, Y.; Chen, J.; Feng, X.; Simpson, J.C.; Pollastri, G.; Mooney, C. SCLpred-EMS: Subcellular Localization Prediction of Endomembrane System and Secretory Pathway Proteins by Deep N-to-1 Convolutional Neural Networks. Bioinformatics 2020, 36, 3343–3349. [Google Scholar] [CrossRef] [PubMed]
- Thumuluri, V.; Almagro Armenteros, J.J.; Johansen, A.R.; Nielsen, H.; Winther, O. DeepLoc 2.0: Multi-Label Subcellular Localization Prediction Using Protein Language Models. Nucleic Acids Res. 2022, 50, W228–W234. [Google Scholar] [CrossRef]
- Diaz, D.J.; Kulikova, A.V.; Ellington, A.D.; Wilke, C.O. Using Machine Learning to Predict the Effects and Consequences of Mutations in Proteins. Curr. Opin. Struct. Biol. 2023, 78, 102518. [Google Scholar] [CrossRef]
- Zhou, Y.; Myung, Y.; Rodrigues, C.H.M.; Ascher, D.B. DDMut-PPI: Predicting Effects of Mutations on Protein-Protein Interactions Using Graph-Based Deep Learning. Nucleic Acids Res. 2024, 52, W207–W214. [Google Scholar] [CrossRef]
- Cheng, P.; Mao, C.; Tang, J.; Yang, S.; Cheng, Y.; Wang, W.; Gu, Q.; Han, W.; Chen, H.; Li, S.; et al. Zero-Shot Prediction of Mutation Effects with Multimodal Deep Representation Learning Guides Protein Engineering. Cell Res 2024, 34, 630–647. [Google Scholar] [CrossRef]
- Brandes, N.; Goldman, G.; Wang, C.H.; Ye, C.J.; Ntranos, V. Genome-Wide Prediction of Disease Variant Effects with a Deep Protein Language Model. Nat. Genet. 2023, 55, 1512–1522. [Google Scholar] [CrossRef]
- Cagiada, M.; Bottaro, S.; Lindemose, S.; Schenstrøm, S.M.; Stein, A.; Hartmann-Petersen, R.; Lindorff-Larsen, K. Discovering Functionally Important Sites in Proteins. Nat. Commun. 2023, 14, 4175. [Google Scholar] [CrossRef]
- Li, G.; Zhang, N.; Dai, X.; Fan, L. EnzyACT: A Novel Deep Learning Method to Predict the Impacts of Single and Multiple Mutations on Enzyme Activity. J. Chem. Inf. Model. 2024, 64, 5912–5921. [Google Scholar] [CrossRef] [PubMed]
- Harrison, R.A.; Lu, J.; Carrasco, M.; Hunter, J.; Manandhar, A.; Gondi, S.; Westover, K.D.; Engen, J.R. Structural Dynamics in Ras and Related Proteins upon Nucleotide Switching. J. Mol. Biol. 2016, 428, 4723–4735. [Google Scholar] [CrossRef] [PubMed]
- Baldi, P.; Brunak, S. Bioinformatics, Second Edition: The Machine Learning Approach; MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
- Alwosheel, A.; van Cranenburgh, S.; Chorus, C.G. Is Your Dataset Big Enough? Sample Size Requirements When Using Artificial Neural Networks for Discrete Choice Analysis. J. Choice Model. 2018, 28, 167–182. [Google Scholar] [CrossRef]
- Zhou, N.; Jiang, Y.; Bergquist, T.R.; Lee, A.J.; Kacsoh, B.Z.; Crocker, A.W.; Lewis, K.A.; Georghiou, G.; Nguyen, H.N.; Hamid, M.N.; et al. The CAFA Challenge Reports Improved Protein Function Prediction and New Functional Annotations for Hundreds of Genes through Experimental Screens. Genome Biol. 2019, 20, 244. [Google Scholar] [CrossRef]
- Jones, D.T. Setting the Standards for Machine Learning in Biology. Nat. Rev. Mol. Cell Biol. 2019, 20, 659–660. [Google Scholar] [CrossRef]
- Walsh, I.; Fishman, D.; Garcia-Gasulla, D.; Titma, T.; Pollastri, G.; Harrow, J.; Psomopoulos, F.E.; Tosatto, S.C.E. DOME: Recommendations for Supervised Machine Learning Validation in Biology. Nat. Methods 2021, 18, 1122–1127. [Google Scholar] [CrossRef]
- Khakzad, H.; Igashov, I.; Schneuing, A.; Goverde, C.; Bronstein, M.; Correia, B. A New Age in Protein Design Empowered by Deep Learning. Cell Syst. 2023, 14, 925–939. [Google Scholar] [CrossRef]
- Wang, J.; Lisanza, S.; Juergens, D.; Tischer, D.; Watson, J.L.; Castro, K.M.; Ragotte, R.; Saragovi, A.; Milles, L.F.; Baek, M.; et al. Scaffolding Protein Functional Sites Using Deep Learning. Science 2022, 377, 387. [Google Scholar] [CrossRef]
Type | Description |
---|---|
Transformer | Architecture specifically designed for processing data with a sequential character, mainly text. It can dynamically weigh the significance of different components of the input via a mechanism called “attention”. |
Language model | Deep learning approach that, using transformers or other architectures, is trained to predict the next item in a sequence. |
Protein language model | Language model that handles protein amino-acid sequences |
Generative model | Approach designed to generate new data “similar” to those it was trained on. For example, generating “feasible” new protein sequences. |
Recurrent neural networks | Neural network architecture in which the inputs and outputs are of the same nature and codified in the same way. Consequently, the output of a given iteration can be used as input for the next. They are used for processing sequential or temporal data, for example. |
Graph neural network | Neural network whose architecture (neurons and connection) is not general but that of a graph representing the phenomenon the network is intended to model. |
Convolutional neural network | Neural network architecture in which matrix operations are applied on some of its layers besides the typical forward propagation operations. Especially suited for handling spatial data, either 2D (e.g., images) or 3D (3D convolutional NN, e.g., three-dimensional structures) |
Geometric deep neural network | Deep neural network architecture especially suited for handling generic geometrical data. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pitarch, B.; Pazos, F. Deep Learning Approaches for the Prediction of Protein Functional Sites. Molecules 2025, 30, 214. https://doi.org/10.3390/molecules30020214
Pitarch B, Pazos F. Deep Learning Approaches for the Prediction of Protein Functional Sites. Molecules. 2025; 30(2):214. https://doi.org/10.3390/molecules30020214
Chicago/Turabian StylePitarch, Borja, and Florencio Pazos. 2025. "Deep Learning Approaches for the Prediction of Protein Functional Sites" Molecules 30, no. 2: 214. https://doi.org/10.3390/molecules30020214
APA StylePitarch, B., & Pazos, F. (2025). Deep Learning Approaches for the Prediction of Protein Functional Sites. Molecules, 30(2), 214. https://doi.org/10.3390/molecules30020214