The Difference in Structural States between Canonical Proteins and Their Isoforms Established by Proteome-Wide Bioinformatics Analysis
Abstract
:1. Introduction
2. Materials and Methods
2.1. Construction of Datasets of Canonical Proteins and Their Isoforms
2.1.1. Main Dataset
2.1.2. Dataset of Proteins from Cancer-Related Genes with Well-Documented Expression Levels
2.1.3. Datasets for Estimation of the Structural Difference in Isoforms by Using AlphaFold Modeling
2.2. Bioinformatics Tools Used to Annotate Structural States of Proteins
2.3. Detection of Structural Changes in and around the Difference Regions
2.4. Analysis of Tandem Repeats in Canonical Proteins and Isoforms
3. Results and Discussion
3.1. Identification, Classification, and Distribution of Difference Regions
3.2. Distribution of Structured and Unstructured Regions
3.3. Changes in Subcellular Localization
3.4. Proportion of Aggregation-Prone Regions
3.5. Canonical Proteins Have More Degradation Motifs Than Their Isoforms
3.6. Occurrence of Tandem Repeats in Canonical Proteins and Isoforms
3.7. Differences within the 3D Structures of Canonical Proteins and Isoforms Predicted by AlphaFold
3.7.1. Exon Deletions with the Preservation of the Overall Structure
- Proteins with tandem repeats
- Globular proteins
3.7.2. Exon Substitutions That Preserve the 3D Structure
3.7.3. Deletion That Is Substituted in the Structure by Another Part of the Molecule
3.7.4. Deletions That Destabilize Structured Domains
3.7.5. Limitations of AlphaFold in the Interpretation of the Conformational Changes
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wang, E.T.; Sandberg, R.; Luo, S.; Khrebtukova, I.; Zhang, L.; Mayr, C.; Kingsmore, S.F.; Schroth, G.P.; Burge, C.B. Alternative isoform regulation in human tissue transcriptomes. Nature 2008, 456, 470–476. [Google Scholar] [CrossRef] [Green Version]
- Pan, Q.; Shai, O.; Lee, L.J.; Frey, B.J.; Blencowe, B.J. Deep surveying of alternative splicing complexity in the human tran-scriptome by high-throughput sequencing. Nat. Genet. 2008, 40, 1413–1415. [Google Scholar] [CrossRef]
- Melamud, E.; Moult, J. Structural implication of splicing stochastics. Nucleic Acids Res. 2009, 37, 4862–4872. [Google Scholar] [CrossRef] [Green Version]
- Harrow, J.; Frankish, A.; Gonzalez, J.M.; Tapanari, E.; Diekhans, M.; Kokocinski, F.; Aken, B.L.; Barrell, D.; Zadissa, A.; Searle, S.; et al. GENCODE: The reference human genome annotation for The ENCODE Project. Genome Res. 2012, 22, 1760–1774. [Google Scholar] [CrossRef] [Green Version]
- Sánchez-Pla, A.; Reverter, F.; de Villa, M.C.R.; Comabella, M. Transcriptomics: mRNA and alternative splicing. J. Neuroimmunol. 2012, 248, 23–31. [Google Scholar] [CrossRef]
- Uhlén, M.; Fagerberg, L.; Hallström, B.M.; Lindskog, C.; Oksvold, P.; Mardinoglu, A.; Sivertsson, Å.; Kampf, C.; Sjöstedt, E.; Asplund, A.; et al. Proteomics. Tissue-Based Map of the Human Proteome. Science 2015, 347, 1260419. [Google Scholar] [CrossRef]
- Tress, M.L.; Abascal, F.; Valencia, A. Alternative Splicing May Not Be the Key to Proteome Complexity. Trends Biochem. Sci. 2016, 42, 98–110. [Google Scholar] [CrossRef] [Green Version]
- Savosina, P.; Karasev, D.; Veselovsky, A.; Miroshnichenko, Y.; Sobolev, B. Functional and structural features of proteins associated with alternative splicing. Int. J. Biol. Macromol. 2020, 147, 513–520. [Google Scholar] [CrossRef]
- Hegyi, H.; Kalmár, L.; Horvath, T.; Tompa, P. Verification of alternative splicing variants based on domain integrity, truncation length and intrinsic protein disorder. Nucleic Acids Res. 2010, 39, 1208–1219. [Google Scholar] [CrossRef] [Green Version]
- Birzele, F.; Csaba, G.; Zimmer, R. Alternative splicing and protein structure evolution. Nucleic Acids Res. 2007, 36, 550–558. [Google Scholar] [CrossRef]
- The UniProt Consortium. UniProt: The universal protein knowledgebase in 2021. Nucleic Acids Res. 2021, 49, D480–D489. [Google Scholar] [CrossRef] [PubMed]
- O’Leary, N.A.; Wright, M.W.; Brister, J.R.; Ciufo, S.; Haddad, D.; McVeigh, R.; Rajput, B.; Robbertse, B.; Smith-White, B.; Ako-Adjei, D.; et al. Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016, 44, D733–D745. [Google Scholar] [CrossRef] [Green Version]
- Cunningham, F.; Allen, E.J.; Allen, J.; Alvarez-Jarreta, J.; Amode, M.R.; Armean, I.M.; Austine-Orimoloye, O.; Azov, A.G.; Barnes, I.; Bennett, R.; et al. Ensembl 2022. Nucleic Acids Res. 2021, 50, D988–D995. [Google Scholar] [CrossRef]
- Rodriguez, J.M.; Maietta, P.; Ezkurdia, I.; Pietrelli, A.; Wesselink, J.-J.; Lopez, G.; Valencia, A.; Tress, M.L. APPRIS: Annotation of principal and alternative splice isoforms. Nucleic Acids Res. 2012, 41, D110–D117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yang, I.S.; Son, H.; Kim, S.; Kim, S. ISOexpresso: A web-based platform for isoform-level expression analysis in human cancer. BMC Genom. 2016, 17, 631. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zea, D.J.; Richard, H.; Laine, E. ASES: Visualizing evolutionary conservation of alternative splicing in proteins. Bioinformatics 2022, 38, 2615–2616. [Google Scholar] [CrossRef] [PubMed]
- UniProt Consortium. UniProt: A worldwide hub of protein knowledge. Nucleic Acids Res. 2019, 47, D506–D515. [Google Scholar] [CrossRef] [Green Version]
- Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242. [Google Scholar] [CrossRef] [Green Version]
- Uversky, V.N. Intrinsically Disordered Proteins and Their “Mysterious” (Meta)Physics. Front. Phys. 2019, 7, 10. [Google Scholar] [CrossRef] [Green Version]
- Falgarone, T.; Villain, É.; Guettaf, A.; Leclercq, J.; Kajava, A.V. TAPASS: Tool for annotation of protein amyloidogenicity in the context of other structural states. J. Struct. Biol. 2022, 214, 107840. [Google Scholar] [CrossRef]
- Uversky, V.N. Typical Functions of IDPs and IDPRs. In Intrinsically Disordered Proteins, 1st ed.; Gomes, G.M., Ed.; Springer: Cham, Switzerland, 2014; pp. 13–33. [Google Scholar] [CrossRef]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef] [PubMed]
- Fu, L.; Niu, B.; Zhu, Z.; Wu, S.; Li, W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 2012, 28, 3150–3152. [Google Scholar] [CrossRef] [PubMed]
- Boratyn, G.M.; Schäffer, A.A.; Agarwala, R.; Altschul, S.F.; Lipman, D.J.; Madden, T.L. Domain enhanced lookup time accelerated BLAST. Biol. Direct 2012, 7, 12. [Google Scholar] [CrossRef] [Green Version]
- Bairoch, A.; Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000, 28, 45–48. [Google Scholar] [CrossRef]
- Sillitoe, I.; Bordin, N.; Dawson, N.; Waman, V.P.; Ashford, P.; Scholes, H.M.; Pang, C.S.M.; Woodridge, L.; Rauer, C.; Sen, N.; et al. CATH: Increased structural coverage of functional space. Nucleic Acids Res. 2020, 49, D266–D273. [Google Scholar] [CrossRef] [PubMed]
- Mirdita, M.; Schütze, K.; Moriwaki, Y.; Heo, L.; Ovchinnikov, S.; Steinegger, M. ColabFold: Making protein folding accessible to all. Nat. Methods 2022, 19, 679–682. [Google Scholar] [CrossRef]
- Schrödinger. The PyMOL Molecular Graphics System, Version 1.8; Schrödinger Technical: New York, NY, USA, 2015. Available online: http://www.pymol.org/pymol(accessed on 26 October 2022).
- Mészáros, B.; Erdős, G.; Dosztányi, Z. IUPred2A: Context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 2018, 46, W329–W337. [Google Scholar] [CrossRef] [Green Version]
- Petersen, T.N.; Brunak, S.; von Heijne, G.; Nielsen, H. SignalP 4.0: Discriminating signal peptides from transmembrane regions. Nat. Methods 2011, 8, 785–786. [Google Scholar] [CrossRef]
- Krogh, A.; Larsson, B.; von Heijne, G.; Sonnhammer, E.L. Predicting transmembrane protein topology with a hidden markov model: Application to complete genomes. J. Mol. Biol. 2001, 305, 567–580. [Google Scholar] [CrossRef]
- Ahmed, A.B.; Znassi, N.; Château, M.; Kajava, A.V. A structure-based approach to predict predisposition to amyloidosis. Alzheimer’s Dement. 2014, 11, 681–690. [Google Scholar] [CrossRef]
- Rousseau, F.; Schymkowitz, J.; Serrano, L. Protein aggregation and amyloidosis: Confusion of the kinds? Curr. Opin. Struct. Biol. 2006, 16, 118–126. [Google Scholar] [CrossRef] [PubMed]
- Walsh, I.; Seno, F.; Tosatto, S.C.; Trovato, A. PASTA 2.0: An improved server for protein aggregation prediction. Nucleic Acids Res. 2014, 42, W301–W307. [Google Scholar] [CrossRef]
- Kumar, M.; Michael, S.; Alvarado-Valverde, J.; Mészáros, B.; Sámano-Sánchez, H.; Zeke, A.; Dobson, L.; Lazar, T.; Örd, M.; Nagpal, A.; et al. The Eukaryotic Linear Motif resource: 2022 release. Nucleic Acids Res. 2021, 50, D497–D508. [Google Scholar] [CrossRef] [PubMed]
- Richard, F.D.; Kajava, A.V. TRDistiller: A rapid filter for enrichment of sequence datasets with proteins containing tandem repeats. J. Struct. Biol. 2014, 186, 386–391. [Google Scholar] [CrossRef] [PubMed]
- Szklarczyk, R.; Heringa, J. Tracking repeats using significance and transitivity. Bioinformatics 2004, 20, i311–i317. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jorda, J.; Kajava, A.V. T-REKS: Identification of Tandem REpeats in sequences with a K-meanS based algorithm. Bioinformatics 2009, 25, 2632–2638. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Madeira, F.; Pearce, M.; Tivey, A.R.N.; Basutkar, P.; Lee, J.; Edbali, O.; Madhusoodanan, N.; Kolesnikov, A.; Lopez, R. Search and sequence analysis tools services from EMBL-EBI in 2022. Nucleic Acids Res. 2022, 50, W276–W279. [Google Scholar] [CrossRef]
- Colak, R.; Kim, T.; Michaut, M.; Sun, M.; Irimia, M.; Bellay, J.; Myers, C.L.; Blencowe, B.J.; Kim, P.M. Distinct Types of Disorder in the Human Proteome: Functional Implications for Alternative Splicing. PLOS Comput. Biol. 2013, 9, e1003030. [Google Scholar] [CrossRef] [Green Version]
- Arsic, N.; Slatter, T.; Gadea, G.; Villain, E.; Fournet, A.; Kazantseva, M.; Allemand, F.; Sibille, N.; Seveno, M.; de Rossi, S.; et al. Δ133p53β isoform pro-invasive activity is regulated through an aggregation-dependent mechanism in cancer cells. Nat. Commun. 2021, 12, 5463. [Google Scholar] [CrossRef]
- Uversky, V.N.; Dunker, A.K. Understanding protein non-folding. Biochim. Biophys. Acta (BBA)-Proteins Proteom. 2010, 1804, 1231–1264. [Google Scholar] [CrossRef] [Green Version]
- Pepys, M.B. Amyloidosis. Annu. Rev. Med. 2006, 57, 223–241. [Google Scholar] [CrossRef] [PubMed]
- Tsang, B.; Pritišanac, I.; Scherer, S.W.; Moses, A.M.; Forman-Kay, J.D. Phase Separation as a Missing Mechanism for Interpretation of Disease Mutations. Cell 2020, 183, 1742–1756. [Google Scholar] [CrossRef] [PubMed]
- Uversky, V.N. Protein intrinsic disorder-based liquid–liquid phase transitions in biological systems: Complex coacervates and membrane-less organelles. Adv. Colloid Interface Sci. 2017, 239, 97–114. [Google Scholar] [CrossRef] [PubMed]
- Kotulska, M.; Wojciechowski, J.W. Bioinformatics Methods in Predicting Amyloid Propensity of Peptides and Proteins. In Computer Simulations of Aggregation of Proteins and Peptides, 1st ed.; Li, M.S., Kloczkowski, A., Cieplak, M., Kouza, M., Eds.; Methods in Molecular Biology, Humana: New York, NY, USA, 2022; Volume 2340, pp. 1–15. [Google Scholar]
- Ezkurdia, I.; Rodriguez, J.M.; Pau, E.C.-D.S.; Vázquez, J.; Valencia, A.; Tress, M.L. Most Highly Expressed Protein-Coding Genes Have a Single Dominant Isoform. J. Proteome Res. 2015, 14, 1880–1887. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ravid, T.; Hochstrasser, M. Diversity of degradation signals in the ubiquitin–proteasome system. Nat. Rev. Mol. Cell Biol. 2008, 9, 679–689. [Google Scholar] [CrossRef] [Green Version]
- Varshavsky, A. N-degron and C-degron pathways of protein degradation. Proc. Natl. Acad. Sci. USA 2019, 116, 358–366. [Google Scholar] [CrossRef] [Green Version]
- A.Andradeab, M.; Iratxetaab, C.P.; Ponting, C. Protein Repeats: Structures, Functions, and Evolution. J. Struct. Biol. 2001, 134, 117–131. [Google Scholar] [CrossRef] [Green Version]
- Kajava, A.V. Tandem repeats in proteins: From sequence to structure. J. Struct. Biol. 2011, 179, 279–288. [Google Scholar] [CrossRef]
- Paladin, L.; Necci, M.; Piovesan, D.; Mier, P.; Andrade-Navarro, M.A.; Tosatto, S.C. A novel approach to investigate the evolution of structured tandem repeat protein families by exon duplication. J. Struct. Biol. 2020, 212, 107608. [Google Scholar] [CrossRef]
- Liu, M.; Grigoriev, A. Protein domains correlate strongly with exons in multiple eukaryotic genomes—Evidence of exon shuffling? Trends Genet. 2004, 20, 399–403. [Google Scholar] [CrossRef]
- Lesk, A.M.; Levitt, M.; Chothia, C. Alignment of the amino acid sequences of distantly related proteins using variable gap penalties. Protein Eng. Des. Sel. 1986, 1, 77–78. [Google Scholar] [CrossRef] [PubMed]
- Paladin, L.; Bevilacqua, M.; Errigo, S.; Piovesan, D.; Mičetić, I.; Necci, M.; Monzon, A.M.; Fabre, M.L.; Lopez, J.L.; Nilsson, J.F.; et al. RepeatsDB in 2021: Improved data and extended classification for protein tandem repeat structures. Nucleic Acids Res. 2020, 49, D452–D457. [Google Scholar] [CrossRef] [PubMed]
- Wise, H. The roles played by highly truncated splice variants of G protein-coupled receptors. J. Mol. Signal. 2012, 7, 13. [Google Scholar] [CrossRef] [Green Version]
- Dardenne, E.; Pierredon, S.; Driouch, K.; Gratadou, L.; Lacroix-Triki, M.; Espinoza, M.P.; Zonta, E.; Germann, S.; Mortada, H.; Villemin, J.-P.; et al. Splicing switch of an epigenetic regulator by RNA helicases promotes tumor-cell invasiveness. Nat. Struct. Mol. Biol. 2012, 19, 1139–1146. [Google Scholar] [CrossRef] [PubMed]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Osmanli, Z.; Falgarone, T.; Samadova, T.; Aldrian, G.; Leclercq, J.; Shahmuradov, I.; Kajava, A.V. The Difference in Structural States between Canonical Proteins and Their Isoforms Established by Proteome-Wide Bioinformatics Analysis. Biomolecules 2022, 12, 1610. https://doi.org/10.3390/biom12111610
Osmanli Z, Falgarone T, Samadova T, Aldrian G, Leclercq J, Shahmuradov I, Kajava AV. The Difference in Structural States between Canonical Proteins and Their Isoforms Established by Proteome-Wide Bioinformatics Analysis. Biomolecules. 2022; 12(11):1610. https://doi.org/10.3390/biom12111610
Chicago/Turabian StyleOsmanli, Zarifa, Theo Falgarone, Turkan Samadova, Gudrun Aldrian, Jeremy Leclercq, Ilham Shahmuradov, and Andrey V. Kajava. 2022. "The Difference in Structural States between Canonical Proteins and Their Isoforms Established by Proteome-Wide Bioinformatics Analysis" Biomolecules 12, no. 11: 1610. https://doi.org/10.3390/biom12111610
APA StyleOsmanli, Z., Falgarone, T., Samadova, T., Aldrian, G., Leclercq, J., Shahmuradov, I., & Kajava, A. V. (2022). The Difference in Structural States between Canonical Proteins and Their Isoforms Established by Proteome-Wide Bioinformatics Analysis. Biomolecules, 12(11), 1610. https://doi.org/10.3390/biom12111610