Compositional Bias of Intrinsically Disordered Proteins and Regions and Their Predictions
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data
2.2. Categorization of IDRs
2.3. Computational Analysis
3. Results and Discussion
3.1. Compositional Biases from the TOP-IDP Scale and the CAID Data Are Consistent
3.2. Compositional Biases Differ between Different Categories of IDRs
3.3. Compositional Biases for the Putative and Native Disorder Are Highly Correlated and These Correlations Influence Predictive Performance
3.4. Predictive Performance of Disorder Predictors Differs across Different Classes of IDPs
3.5. Matching Disorder Predictors to Specific Classes of IDPs Substantially Improves Predictive Performance
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Dunker, A.K.; Babu, M.M.; Barbar, E.; Blackledge, M.; Bondos, S.E.; Dosztanyi, Z.; Dyson, H.J.; Forman-Kay, J.; Fuxreiter, M.; Gsponer, J.; et al. What’s in a name? Why these proteins are intrinsically disordered: Why these proteins are intrinsically disordered. Intrinsically Disord. Proteins 2013, 1, e24157. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Uversky, V.N.; Gillespie, J.R.; Fink, A.L. Why are “natively unfolded” proteins unstructured under physiologic conditions? Proteins-Struct. Funct. Genet. 2000, 41, 415–427. [Google Scholar] [CrossRef]
- Oldfield, C.J.; Uversky, V.N.; Dunker, A.K.; Kurgan, L. Introduction to intrinsically disordered proteins and regions. In Intrinsically Disordered Proteins; Salvi, N., Ed.; Academic Press: Cambridge, MA, USA, 2019; pp. 1–34. [Google Scholar]
- Lieutaud, P.; Ferron, F.; Uversky, A.V.; Kurgan, L.; Uversky, V.N.; Longhi, S. How disordered is my protein and what is its disorder for? A guide through the “dark side” of the protein universe. Intrinsically Disord. Proteins 2016, 4, e1259708. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ward, J.J.; Sodhi, J.S.; McGuffin, L.J.; Buxton, B.F.; Jones, D.T. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol. 2004, 337, 635–645. [Google Scholar] [CrossRef]
- Xue, B.; Dunker, A.K.; Uversky, V.N. Orderly order in protein intrinsic disorder distribution: Disorder in 3500 proteomes from viruses and the three domains of life. J. Biomol. Struct. Dyn. 2012, 30, 137–149. [Google Scholar] [CrossRef]
- Peng, Z.; Mizianty, M.J.; Kurgan, L. Genome-scale prediction of proteins with long intrinsically disordered regions. Proteins 2014, 82, 145–158. [Google Scholar] [CrossRef]
- Peng, Z.; Yan, J.; Fan, X.; Mizianty, M.J.; Xue, B.; Wang, K.; Hu, G.; Uversky, V.N.; Kurgan, L. Exceptionally abundant exceptions: Comprehensive characterization of intrinsic disorder in all domains of life. Cell Mol. Life Sci. 2015, 72, 137–151. [Google Scholar] [CrossRef]
- Romero, P.; Obradovic, Z.; Li, X.; Garner, E.C.; Brown, C.J.; Dunker, A.K. Sequence complexity of disordered protein. Proteins 2001, 42, 38–48. [Google Scholar] [CrossRef]
- van der Lee, R.; Buljan, M.; Lang, B.; Weatheritt, R.J.; Daughdrill, G.W.; Dunker, A.K.; Fuxreiter, M.; Gough, J.; Gsponer, J.; Jones, D.T.; et al. Classification of intrinsically disordered regions and proteins. Chem. Rev. 2014, 114, 6589–6631. [Google Scholar] [CrossRef]
- Yan, J.; Cheng, J.; Kurgan, L.; Uversky, V.N. Structural and functional analysis of “non-smelly” proteins. Cell Mol. Life Sci. 2020, 77, 2423–2440. [Google Scholar] [CrossRef]
- Theillet, F.X.; Kalmar, L.; Tompa, P.; Han, K.H.; Selenko, P.; Dunker, A.K.; Daughdrill, G.W.; Uversky, V.N. The alphabet of intrinsic disorder: I. Act like a Pro: On the abundance and roles of proline residues in intrinsically disordered proteins. Intrinsically Disord. Proteins 2013, 1, e24360. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Uversky, V.N. The alphabet of intrinsic disorder: II. Various roles of glutamic acid in ordered and intrinsically disordered proteins. Intrinsically Disord. Proteins 2013, 1, e24684. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Uversky, V.N. The intrinsic disorder alphabet. III. Dual personality of serine. Intrinsically Disord. Proteins 2015, 3, e1027032. [Google Scholar] [CrossRef] [Green Version]
- Campen, A.; Williams, R.M.; Brown, C.J.; Meng, J.; Uversky, V.N.; Dunker, A.K. TOP-IDP-scale: A new amino acid scale measuring propensity for intrinsic disorder. Protein Pept. Lett. 2008, 15, 956–963. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Quaglia, F.; Meszaros, B.; Salladini, E.; Hatos, A.; Pancsa, R.; Chemes, L.B.; Pajkos, M.; Lazar, T.; Pena-Diaz, S.; Santos, J.; et al. DisProt in 2022: Improved quality and accessibility of protein intrinsic disorder annotation. Nucleic Acids Res. 2022, 50, D480–D487. [Google Scholar] [CrossRef] [PubMed]
- Sickmeier, M.; Hamilton, J.A.; LeGall, T.; Vacic, V.; Cortese, M.S.; Tantos, A.; Szabo, B.; Tompa, P.; Chen, J.; Uversky, V.N.; et al. DisProt: The Database of Disordered Proteins. Nucleic Acids Res. 2007, 35, D786–D793. [Google Scholar] [CrossRef] [Green Version]
- Lazar, T.; Martinez-Perez, E.; Quaglia, F.; Hatos, A.; Chemes, L.B.; Iserte, J.A.; Mendez, N.A.; Garrone, N.A.; Saldano, T.E.; Marchetti, J.; et al. PED in 2021: A major update of the protein ensemble database for intrinsically disordered proteins. Nucleic Acids Res. 2021, 49, D404–D411. [Google Scholar] [CrossRef]
- Varadi, M.; Tompa, P. The Protein Ensemble Database. Adv. Exp. Med. Biol. 2015, 870, 335–349. [Google Scholar]
- Le Gall, T.; Romero, P.R.; Cortese, M.S.; Uversky, V.N.; Dunker, A.K. Intrinsic disorder in the Protein Data Bank. J. Biomol. Struct. Dyn. 2007, 24, 325–342. [Google Scholar] [CrossRef]
- Burley, S.K.; Bhikadiya, C.; Bi, C.; Bittrich, S.; Chen, L.; Crichlow, G.V.; Christie, C.H.; Dalenberg, K.; Di Costanzo, L.; Duarte, J.M.; et al. RCSB Protein Data Bank: Powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res. 2021, 49, D437–D451. [Google Scholar] [CrossRef]
- Fukuchi, S.; Amemiya, T.; Sakamoto, S.; Nobe, Y.; Hosoda, K.; Kado, Y.; Murakami, S.D.; Koike, R.; Hiroaki, H.; Ota, M. IDEAL in 2014 illustrates interaction networks composed of intrinsically disordered proteins and their binding partners. Nucleic Acids Res. 2014, 42, D320–D325. [Google Scholar] [CrossRef] [PubMed]
- Schad, E.; Ficho, E.; Pancsa, R.; Simon, I.; Dosztanyi, Z.; Meszaros, B. DIBS: A repository of disordered binding sites mediating interactions with ordered proteins. Bioinformatics 2018, 34, 535–537. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hatos, A.; Monzon, A.M.; Tosatto, S.C.E.; Piovesan, D.; Fuxreiter, M. FuzDB: A new phase in understanding fuzzy interactions. Nucleic Acids Res. 2022, 50, D509–D517. [Google Scholar] [CrossRef] [PubMed]
- Miskei, M.; Antal, C.; Fuxreiter, M. FuzDB: Database of fuzzy complexes, a tool to develop stochastic structure-function relationships for protein complexes and higher-order assemblies. Nucleic Acids Res. 2017, 45, D228–D235. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ficho, E.; Remenyi, I.; Simon, I.; Meszaros, B. MFIB: A repository of protein complexes with mutual folding induced by binding. Bioinformatics 2017, 33, 3682–3684. [Google Scholar] [CrossRef] [Green Version]
- Zhou, J.; Oldfield, C.J.; Yan, W.; Shen, B.; Dunker, A.K. Identification of Intrinsic Disorder in Complexes from the Protein Data Bank. ACS Omega 2020, 5, 17883–17891. [Google Scholar] [CrossRef]
- Walsh, I.; Giollo, M.; Di Domenico, T.; Ferrari, C.; Zimmermann, O.; Tosatto, S.C. Comprehensive large-scale assessment of intrinsic protein disorder. Bioinformatics 2015, 31, 201–208. [Google Scholar] [CrossRef] [Green Version]
- Kurgan, L.; Radivojac, P.; Sussman, J.L.; Dunker, A.K. On the Importance of Computational Biology and Bioinformatics to the Origins and Rapid Progression of the Intrinsically Disordered Proteins Field. In Pacific Symposium on Biocomputing; World Scientific: Singapore, 2020; pp. 149–158. [Google Scholar]
- Zhao, B.; Kurgan, L. Surveying over 100 predictors of intrinsic disorder in proteins. Expert Rev. Proteom. 2021, 18, 1019–1029. [Google Scholar] [CrossRef]
- Katuwawala, A.; Oldfield, C.J.; Kurgan, L. Accuracy of protein-level disorder predictions. Brief. Bioinform. 2020, 21, 1509–1522. [Google Scholar] [CrossRef]
- Katuwawala, A.; Kurgan, L. Comparative Assessment of Intrinsic Disorder Predictions with a Focus on Protein and Nucleic Acid-Binding Proteins. Biomolecules 2020, 10, 1636. [Google Scholar] [CrossRef]
- Necci, M.; Piovesan, D.; Dosztanyi, Z.; Tompa, P.; Tosatto, S.C.E. A comprehensive assessment of long intrinsic protein disorder from the DisProt database. Bioinformatics 2018, 34, 445–452. [Google Scholar] [CrossRef] [PubMed]
- Peng, Z.L.; Kurgan, L. Comprehensive comparative assessment of in-silico predictors of disordered regions. Curr. Protein Pept. Sci. 2012, 13, 6–18. [Google Scholar] [CrossRef] [Green Version]
- Deng, X.; Eickholt, J.; Cheng, J. A comprehensive overview of computational protein disorder prediction methods. Mol. Biosyst. 2012, 8, 114–121. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, X.; Liu, B. A comprehensive review and comparison of existing computational methods for intrinsically disordered protein and region prediction. Brief. Bioinform. 2019, 20, 330–346. [Google Scholar] [CrossRef] [PubMed]
- Meng, F.; Uversky, V.N.; Kurgan, L. Comprehensive review of methods for prediction of intrinsic disorder and its molecular functions. Cell Mol. Life Sci. 2017, 74, 3069–3090. [Google Scholar] [CrossRef] [PubMed]
- Varadi, M.; Vranken, W.; Guharoy, M.; Tompa, P. Computational approaches for inferring the functions of intrinsically disordered proteins. Front. Mol. Biosci. 2015, 2, 45. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, J.; Feng, Y.; Wang, X.; Li, J.; Liu, W.; Rong, L.; Bao, J. An Overview of Predictors for Intrinsically Disordered Proteins over 2010–2014. Int. J. Mol. Sci. 2015, 16, 23446–23462. [Google Scholar] [CrossRef]
- Zhao, B.; Kurgan, L. Deep learning in prediction of intrinsic disorder in proteins. Comput. Struct. Biotechnol. J. 2022, 20, 1286–1294. [Google Scholar] [CrossRef]
- Kurgan, L. Resources for computational prediction of intrinsic disorder in proteins. Methods 2022, 204, 132–141. [Google Scholar] [CrossRef]
- Meng, F.; Uversky, V.; Kurgan, L. Computational Prediction of Intrinsic Disorder in Proteins. Curr. Protoc. Protein Sci. 2017, 88, 2–16. [Google Scholar] [CrossRef]
- Dosztanyi, Z.; Meszaros, B.; Simon, I. Bioinformatical approaches to characterize intrinsically disordered/unstructured proteins. Brief. Bioinform. 2010, 11, 225–243. [Google Scholar] [CrossRef] [PubMed]
- He, B.; Wang, K.; Liu, Y.; Xue, B.; Uversky, V.N.; Dunker, A.K. Predicting intrinsic disorder in proteins: An overview. Cell Res. 2009, 19, 929–949. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jin, Y.; Dunbrack, R.L., Jr. Assessment of disorder predictions in CASP6. Proteins 2005, 61 (Suppl. 7), 167–175. [Google Scholar] [CrossRef] [PubMed]
- Bordoli, L.; Kiefer, F.; Schwede, T. Assessment of disorder predictions in CASP7. Proteins 2007, 69 (Suppl. 8), 129–136. [Google Scholar] [CrossRef] [PubMed]
- Noivirt-Brik, O.; Prilusky, J.; Sussman, J.L. Assessment of disorder predictions in CASP8. Proteins 2009, 77 (Suppl. 9), 210–216. [Google Scholar] [CrossRef]
- Monastyrskyy, B.; Kryshtafovych, A.; Moult, J.; Tramontano, A.; Fidelis, K. Assessment of protein disorder region predictions in CASP10. Proteins 2014, 82 (Suppl. 2), 127–137. [Google Scholar] [CrossRef] [Green Version]
- Necci, M.; Piovesan, D.; Predictors, C.; DisProt, C.; Tosatto, S.C.E. Critical assessment of protein intrinsic disorder prediction. Nat. Methods 2021, 18, 472–481. [Google Scholar] [CrossRef]
- Melamud, E.; Moult, J. Evaluation of disorder predictions in CASP5. Proteins 2003, 53 (Suppl. 6), 561–565. [Google Scholar] [CrossRef] [Green Version]
- Monastyrskyy, B.; Fidelis, K.; Moult, J.; Tramontano, A.; Kryshtafovych, A. Evaluation of disorder predictions in CASP9. Proteins 2011, 79 (Suppl. 10), 107–118. [Google Scholar] [CrossRef] [Green Version]
- Necci, M.; Piovesan, D.; Tosatto, S.C. Large-scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe. Protein Sci. 2016, 25, 2164–2174. [Google Scholar] [CrossRef] [Green Version]
- Deiana, A.; Forcelloni, S.; Porrello, A.; Giansanti, A. Intrinsically disordered proteins and structured proteins with intrinsically disordered regions have different functional roles in the cell. PLoS ONE 2019, 14, e0217889. [Google Scholar] [CrossRef] [Green Version]
- Howell, M.; Green, R.; Killeen, A.; Wedderburn, L.; Picascio, V.; Rabionet, A.; Peng, Z.L.; Larina, M.; Xue, B.; Kurgan, L.; et al. Not That Rigid Midgets and Not So Flexible Giants: On the Abundance and Roles of Intrinsic Disorder in Short and Long Proteins. J. Biol. Syst. 2012, 20, 471–511. [Google Scholar] [CrossRef]
- Uversky, V.N.; Oldfield, C.J.; Dunker, A.K. Showing your ID: Intrinsic disorder as an ID for recognition, regulation and cell signaling. J. Mol. Recognit. 2005, 18, 343–384. [Google Scholar] [CrossRef] [PubMed]
- Babu, M.M. The contribution of intrinsically disordered regions to protein function, cellular complexity, and human disease. Biochem. Soc. Trans. 2016, 44, 1185–1200. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hahn, S. Phase Separation, Protein Disorder, and Enhancer Function. Cell 2018, 175, 1723–1725. [Google Scholar] [CrossRef] [Green Version]
- Peng, Z.; Xue, B.; Kurgan, L.; Uversky, V.N. Resilience of death: Intrinsic disorder in proteins involved in the programmed cell death. Cell Death Differ. 2013, 20, 1257–1267. [Google Scholar] [CrossRef] [Green Version]
- Zhou, J.; Zhao, S.; Dunker, A.K. Intrinsically Disordered Proteins Link Alternative Splicing and Post-translational Modifications to Complex Cell Signaling and Regulation. J. Mol. Biol. 2018, 430, 2342–2359. [Google Scholar] [CrossRef] [Green Version]
- Ahmed, S.S.; Rifat, Z.T.; Lohia, R.; Campbell, A.J.; Dunker, A.K.; Rahman, M.S.; Iqbal, S. Characterization of intrinsically disordered regions in proteins informed by human genetic diversity. PLoS Comput. Biol. 2022, 18, e1009911. [Google Scholar] [CrossRef]
- Hu, G.; Wu, Z.; Uversky, V.N.; Kurgan, L. Functional Analysis of Human Hub Proteins and Their Interactors Involved in the Intrinsic Disorder-Enriched Interactions. Int. J. Mol. Sci. 2017, 18, 2761. [Google Scholar] [CrossRef] [Green Version]
- Zhao, B.; Katuwawala, A.; Oldfield, C.J.; Hu, G.; Wu, Z.; Uversky, V.N.; Kurgan, L. Intrinsic Disorder in Human RNA-Binding Proteins. J. Mol. Biol. 2021, 433, 167229. [Google Scholar] [CrossRef]
- Peng, Z.; Mizianty, M.J.; Xue, B.; Kurgan, L.; Uversky, V.N. More than just tails: Intrinsic disorder in histone proteins. Mol. Biosyst. 2012, 8, 1886–1901. [Google Scholar] [CrossRef] [PubMed]
- Wang, C.; Uversky, V.N.; Kurgan, L. Disordered nucleiome: Abundance of intrinsic disorder in the DNA- and RNA-binding proteins in 1121 species from Eukaryota, Bacteria and Archaea. Proteomics 2016, 16, 1486–1498. [Google Scholar] [CrossRef] [PubMed]
- Wu, Z.; Hu, G.; Yang, J.; Peng, Z.; Uversky, V.N.; Kurgan, L. In various protein complexes, disordered protomers have large per-residue surface areas and area of protein-, DNA- and RNA-binding interfaces. FEBS Lett. 2015, 589, 2561–2569. [Google Scholar] [CrossRef] [PubMed]
- Peng, Z.; Oldfield, C.J.; Xue, B.; Mizianty, M.J.; Dunker, A.K.; Kurgan, L.; Uversky, V.N. A creature with a hundred waggly tails: Intrinsically disordered proteins in the ribosome. Cell Mol. Life Sci. 2014, 71, 1477–1504. [Google Scholar] [CrossRef] [PubMed]
- Buljan, M.; Chalancon, G.; Dunker, A.K.; Bateman, A.; Balaji, S.; Fuxreiter, M.; Babu, M.M. Alternative splicing of intrinsically disordered regions and rewiring of protein interactions. Curr. Opin. Struct. Biol. 2013, 23, 443–450. [Google Scholar] [CrossRef] [PubMed]
- Meng, F.; Na, I.; Kurgan, L.; Uversky, V.N. Compartmentalization and Functionality of Nuclear Disorder: Intrinsic Disorder and Protein-Protein Interactions in Intra-Nuclear Compartments. Int. J. Mol. Sci. 2015, 17, 24. [Google Scholar] [CrossRef] [Green Version]
- Yan, J.; Dunker, A.K.; Uversky, V.N.; Kurgan, L. Molecular recognition features (MoRFs) in three domains of life. Mol. Biosyst. 2016, 12, 697–710. [Google Scholar] [CrossRef] [Green Version]
- Zhao, B.; Katuwawala, A.; Uversky, V.N.; Kurgan, L. IDPology of the living cell: Intrinsic disorder in the subcellular compartments of the human cell. Cell Mol. Life Sci. 2020, 78, 2371–2385. [Google Scholar] [CrossRef]
- Meng, F.; Kurgan, L. High-throughput prediction of disordered moonlighting regions in protein sequences. Proteins 2018, 86, 1097–1110. [Google Scholar] [CrossRef]
- Sluchanko, N.N.; Bustos, D.M. Intrinsic disorder associated with 14-3-3 proteins and their partners. Prog. Mol. Biol. Transl. Sci. 2019, 166, 19–61. [Google Scholar]
- Uversky, V.N. Unusual biophysics of intrinsically disordered proteins. Biochim. Biophys. Acta 2013, 1834, 932–951. [Google Scholar] [CrossRef] [PubMed]
- Uversky, V.N. The most important thing is the tail: Multitudinous functionalities of intrinsically disordered protein termini. FEBS Lett. 2013, 587, 1891–1901. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nielsen, J.T.; Mulder, F.A.A. There is Diversity in Disorder-“In all Chaos there is a Cosmos, in all Disorder a Secret Order”. Front. Mol. Biosci. 2016, 3, 4. [Google Scholar] [CrossRef] [PubMed]
- Romero, P.; Obradovic, Z.; Kissinger, C.; Villafranca, J.E.; Dunker, A.K. Identifying disordered regions in proteins from amino acid sequence. In Proceedings of the 1997 Ieee International Conference on Neural Networks, Houston, TX, USA, 12–12 June 1997; Volume 1–4, pp. 90–95. [Google Scholar]
- Radivojac, P.; Obradovic, Z.; Smith, D.K.; Zhu, G.; Vucetic, S.; Brown, C.J.; Lawson, J.D.; Dunker, A.K. Protein flexibility and intrinsic disorder. Protein Sci. 2004, 13, 71–80. [Google Scholar] [CrossRef] [Green Version]
- Dosztanyi, Z. Prediction of protein disorder based on IUPred. Protein Sci. 2018, 27, 331–340. [Google Scholar] [CrossRef] [Green Version]
- Erdos, G.; Pajkos, M.; Dosztanyi, Z. IUPred3: Prediction of protein disorder enhanced with unambiguous experimental annotation and visualization of evolutionary conservation. Nucleic Acids Res. 2021, 49, W297–W303. [Google Scholar] [CrossRef]
- Meszaros, B.; Erdos, G.; Dosztanyi, Z. IUPred2A: Context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 2018, 46, W329–W337. [Google Scholar] [CrossRef]
- Dosztanyi, Z.; Csizmok, V.; Tompa, P.; Simon, I. IUPred: Web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 2005, 21, 3433–3434. [Google Scholar] [CrossRef] [Green Version]
- Wang, S.; Ma, J.; Xu, J. AUCpreD: Proteome-level protein disorder prediction by AUC-maximized deep convolutional neural fields. Bioinformatics 2016, 32, i672–i679. [Google Scholar] [CrossRef]
- Orlando, G.; Raimondi, D.; Codice, F.; Tabaro, F.; Vranken, W. Prediction of disordered regions in proteins with recurrent neural networks and protein dynamics. J. Mol. Biol. 2022, 434, 167579. [Google Scholar] [CrossRef]
- Hu, G.; Katuwawala, A.; Wang, K.; Wu, Z.; Ghadermarzi, S.; Gao, J.; Kurgan, L. flDPnn: Accurate intrinsic disorder prediction with putative propensities of disorder functions. Nat. Commun. 2021, 12, 4438. [Google Scholar] [CrossRef] [PubMed]
- Deng, X.; Eickholt, J.; Cheng, J. PreDisorder: Ab initio sequence-based prediction of protein disordered regions. BMC Bioinform. 2009, 10, 436. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mirabello, C.; Wallner, B. rawMSA: End-to-end Deep Learning using raw Multiple Sequence Alignments. PLoS ONE 2019, 14, e0220182. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hanson, J.; Yang, Y.; Paliwal, K.; Zhou, Y. Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks. Bioinformatics 2017, 33, 685–692. [Google Scholar] [CrossRef] [Green Version]
- Hanson, J.; Paliwal, K.K.; Litfin, T.; Zhou, Y. SPOT-Disorder2: Improved Protein Intrinsic Disorder Prediction by Ensembled Deep Learning. Genom. Proteom. Bioinform. 2019, 17, 645–656. [Google Scholar] [CrossRef]
- Hanson, J.; Paliwal, K.; Zhou, Y. Accurate Single-Sequence Prediction of Protein Intrinsic Disorder by an Ensemble of Deep Recurrent and Convolutional Architectures. J. Chem. Inf. Model. 2018, 58, 2369–2376. [Google Scholar] [CrossRef] [Green Version]
- Katuwawala, A.; Ghadermarzi, S.; Kurgan, L. Computational prediction of functions of intrinsically disordered regions. Prog. Mol. Biol. Transl. Sci. 2019, 166, 341–369. [Google Scholar]
- Hatos, A.; Hajdu-Soltesz, B.; Monzon, A.M.; Palopoli, N.; Alvarez, L.; Aykac-Fas, B.; Bassot, C.; Benitez, G.I.; Bevilacqua, M.; Chasapi, A.; et al. DisProt: Intrinsic protein disorder annotation in 2020. Nucleic Acids Res. 2020, 48, D269–D276. [Google Scholar] [CrossRef] [Green Version]
- Piovesan, D.; Tosatto, S.C.E. Mobi 2.0: An improved method to define intrinsic disorder, mobility and linear binding regions in protein structures. Bioinformatics 2018, 34, 122–123. [Google Scholar] [CrossRef] [Green Version]
- Vacic, V.; Uversky, V.N.; Dunker, A.K.; Lonardi, S. Composition Profiler: A tool for discovery and visualization of amino acid composition differences. BMC Bioinform. 2007, 8, 211. [Google Scholar] [CrossRef] [Green Version]
- Kendall, M.G. A new measure of rank correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
- Wang, K.; Hu, G.; Wu, Z.; Su, H.; Yang, J.; Kurgan, L. Comprehensive Survey and Comparative Assessment of RNA-Binding Residue Predictions with Analysis by RNA Type. Int. J. Mol. Sci. 2020, 21, 6879. [Google Scholar] [CrossRef] [PubMed]
- Lise, S.; Jones, D.T. Sequence patterns associated with disordered regions in proteins. Proteins 2005, 58, 144–150. [Google Scholar] [CrossRef] [PubMed]
- Bhopatkar, A.A.; Uversky, V.N.; Rangachari, V. Disorder and cysteines in proteins: A design for orchestration of conformational see-saw and modulatory functions. Prog. Mol. Biol. Transl. Sci. 2020, 174, 331–373. [Google Scholar] [PubMed]
- Kini, R.M.; Evans, H.J. A hypothetical structural role for proline residues in the flanking segments of protein-protein interaction sites. Biochem. Biophys. Res. Commun. 1995, 212, 1115–1124. [Google Scholar] [CrossRef]
- Richardson, J.S.; Richardson, D.C. Amino-Acid Preferences for Specific Locations at the Ends of Alpha-Helices. Science 1988, 240, 1648–1652. [Google Scholar] [CrossRef] [Green Version]
- Lang, B.; Babu, M.M. A community effort to bring structure to disorder. Nat. Methods 2021, 18, 454–455. [Google Scholar] [CrossRef]
- Fan, X.; Kurgan, L. Accurate prediction of disorder in protein chains with a comprehensive and empirically designed consensus. J. Biomol. Struct. Dyn. 2014, 32, 448–464. [Google Scholar] [CrossRef]
- Mizianty, M.J.; Peng, Z.L.; Kurgan, L. MFDp2: Accurate predictor of disorder in proteins by fusion of disorder probabilities, content and profiles. Intrinsically Disord. Proteins 2013, 1, e24428. [Google Scholar] [CrossRef]
- Mizianty, M.J.; Uversky, V.; Kurgan, L. Prediction of intrinsic disorder in proteins using MFDp2. Methods Mol. Biol. 2014, 1137, 147–162. [Google Scholar]
- Mizianty, M.J.; Stach, W.; Chen, K.; Kedarisetti, K.D.; Disfani, F.M.; Kurgan, L. Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources. Bioinformatics 2010, 26, i489–i496. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Obradovic, Z.; Peng, K.; Vucetic, S.; Radivojac, P.; Dunker, A.K. Exploiting heterogeneous sequence properties improves prediction of protein disorder. Proteins 2005, 61 (Suppl. 7), 176–182. [Google Scholar] [CrossRef] [PubMed]
- Peng, K.; Radivojac, P.; Vucetic, S.; Dunker, A.K.; Obradovic, Z. Length-dependent prediction of protein intrinsic disorder. BMC Bioinform. 2006, 7, 208. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Katuwawala, A.; Oldfield, C.J.; Kurgan, L. DISOselect: Disorder predictor selection at the protein level. Protein Sci. 2020, 29, 184–200. [Google Scholar] [CrossRef] [Green Version]



| Protein Set | No. Proteins | No. IDRs | No. Disordered Residues | Median IDR Length | Average IDR Length | 
|---|---|---|---|---|---|
| Complete dataset | 652 | 838 | 54,820 | 34 | 65.5 | 
| Fully disordered proteins | 56 | 57 | 9208 | 132 | 157.6 | 
| Short IDRs | 124 | 148 | 1810 | 12 | 12.2 | 
| Long IDRs | 71 | 77 | 14,935 | 139 | 193.9 | 
| Disordered binding regions | 232 | 256 | 21,389 | 54 | 83.6 | 
| Dataset | AUCpreD | AUCpreD-np | DisoMine | flDPlr | flDPnn | Predisorder | RawMSA | SPOT-Disorder1 | SPOT-Disorder2 | SPOT-Disorder-Single | 
|---|---|---|---|---|---|---|---|---|---|---|
| CAID dataset | 0.757 | 0.751 | 0.765 | 0.793 | 0.814 | 0.747 | 0.780 | 0.744 | 0.760 | 0.757 | 
| Fully disordered proteins | 0.475 | 0.505 | 0.612 | 0.687 | 0.666 | 0.636 | 0.801 | 0.502 | 0.547 | 0.621 | 
| Low disorder content with short IDRs | 0.715 | 0.698 | 0.654 | 0.703 | 0.736 | 0.708 | 0.651 | 0.675 | 0.687 | 0.678 | 
| Low disorder content with binding long IDRs | 0.669 | 0.664 | 0.649 | 0.723 | 0.751 | 0.661 | 0.711 | 0.635 | 0.693 | 0.658 | 
| Low disordered content with non-binding long IDRs | 0.801 | 0.785 | 0.747 | 0.802 | 0.816 | 0.778 | 0.806 | 0.771 | 0.779 | 0.779 | 
| High disordered content with binding IDRs | 0.732 | 0.718 | 0.686 | 0.732 | 0.731 | 0.735 | 0.760 | 0.716 | 0.732 | 0.726 | 
| High disordered content with non-binding IDRs | 0.824 | 0.815 | 0.799 | 0.726 | 0.737 | 0.816 | 0.811 | 0.866 | 0.808 | 0.824 | 
| Predictors | AUC | AUPR | MCC | F1 | 
|---|---|---|---|---|
| Meta-method that selects the best predictor for each disorder class | 0.855 | 0.605 | 0.474 | 0.560 | 
| flDPnn | 0.814 * | 0.475 * | 0.358 * | 0.462 * | 
| flDPlr | 0.793 * | 0.422 * | 0.323 * | 0.433 * | 
| RawMSA | 0.780 * | 0.414 * | 0.288 * | 0.404 * | 
| DisoMine | 0.765 * | 0.388 * | 0.244 * | 0.367 * | 
| SPOT-Disorder2 | 0.760 * | 0.340 * | 0.200 * | 0.351 * | 
| AUCpred | 0.757 * | 0.479 * | 0.258 * | 0.399 * | 
| SPOT-Disorder-Single | 0.757 * | 0.318 * | 0.221 * | 0.348 * | 
| AUCpred-np | 0.751 * | 0.428 * | 0.226 * | 0.349 * | 
| Predisorder | 0.747 * | 0.325 * | 0.227 * | 0.359 * | 
| SPOT-Disorder1 | 0.744 * | 0.268 * | 0.143 * | 0.284 * | 
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, B.; Kurgan, L. Compositional Bias of Intrinsically Disordered Proteins and Regions and Their Predictions. Biomolecules 2022, 12, 888. https://doi.org/10.3390/biom12070888
Zhao B, Kurgan L. Compositional Bias of Intrinsically Disordered Proteins and Regions and Their Predictions. Biomolecules. 2022; 12(7):888. https://doi.org/10.3390/biom12070888
Chicago/Turabian StyleZhao, Bi, and Lukasz Kurgan. 2022. "Compositional Bias of Intrinsically Disordered Proteins and Regions and Their Predictions" Biomolecules 12, no. 7: 888. https://doi.org/10.3390/biom12070888
APA StyleZhao, B., & Kurgan, L. (2022). Compositional Bias of Intrinsically Disordered Proteins and Regions and Their Predictions. Biomolecules, 12(7), 888. https://doi.org/10.3390/biom12070888
 
         
                                                


 
       