The intrinsically disordered regions (IDRs) of proteins have been extensively investigated over the past several years, and their numbers are significantly higher in eukaryotes than in bacteria or archaea [1
]. More than one-third of eukaryotic proteins contain IDRs of more than 30 residues in length [3
]. IDRs are characterized by a low abundance of order-promoting and bulky hydrophobic aromatic amino acids such as Ile, Leu, Val, Cys, Asn, Trp, Tyr, and Phe, a high abundance of disorder-promoting amino acids such as Ala, Arg, Gly, Gln, Ser, Glu, Lys and Pro, and a high net charge at neutral pH [1
]. While IDRs form unstructured domains, predictions on which regions of a protein are disordered based on protein sequences can be used to select target regions for 3D structural analysis. For example, the disordered C-terminal 100 amino acid deletion mutant of the human nei-like DNA glycosylase (NEIL) protein was successfully expressed in Escherichia coli
, purified, and crystallized [10
]. More than 50 methods have been developed since the early 2000s for protein disorder predictions [6
], including the development of structure-related databases and the design of IDR prediction algorithms.
IDRs often contain sites for various post-translational modifications (PTMs) such as phosphorylation, glycosylation, ubiquitination, acylation, and methylation, suggesting that PTMs favor the easily accessible and flexible disordered regions [13
]. PTMs are involved in enzyme activity, protein localization, protein-protein interactions, signaling cascades, DNA repair, and cell division. PTMs commonly cause only small changes on the surface of the disordered region of a protein, whereas PTM-based protein-protein interactions often cause large-scale three-dimensional structural changes such as disorder-to-order transitions of IDRs [15
]. Importantly, the larger structural changes are generally considered to have significant effects on protein functional diversity.
PTM prediction methods are well established. Several studies have reported that protein phosphorylation on Ser, Thr, and Tyr predominantly occurs in the disordered regions of animal proteins [16
]. Proteome-wide analyses have also shown that there are correlations between IDRs and other PTMs such as glycosylation, acetylation, and methylation [20
Proteins with IDRs have also been examined in plant proteomes [1
]. For example, Arabidopsis thaliana
) proteins were found to be generally less disordered than human proteins. However, certain functional classes of A. thaliana
proteins are significantly more enriched in disordered regions in response to their environment compared to human proteins [23
]. This suggests that plants may use disorder independently as an easy and fast mechanism for introducing versatility into protein interaction networks that underlie complex biological processes to quickly and efficiently adapt to environmental changes of which they cannot escape [23
]. Additionally, the relationships between PTMs and other cellular features are increasingly being explored in plant studies. For instance, phosphoproteomics and genetics analyses of A. thaliana
have revealed insight into signaling pathways [24
]. However, proteome information in many plant species is lacking. For example, A. thaliana
is one of the most commonly studied plants and detailed genomic data are available. Conversely, one-third of the proteins in A. thaliana
still lack functional annotations in terms of biological processes in the Gene Ontology database [24
]. Recently, several large-scale experimental and computational approaches have been used to enhance our knowledge of plant species proteomes [27
]. One such higher plant study was a proteome-wide computational analysis of the correlation between protein disorder and PTMs including Ser/Thr/Tyr-phosphorylation and O/N-linked protein glycosylation [19
]. However, sufficient numbers of this type of study have not yet been performed, and plant researchers are currently developing better resources for omics analyses of plant species.
In order to enrich the gene annotation in plants and to understand gene and protein functions, we performed a proteome-wide correlation analysis of protein disorder with the major types of eukaryotic PTMs including Ser/Thr/Tyr-phosphorylation, O/N-linked glycosylation, and ubiquitination in 20 typical eukaryotic algae. We further analyzed sequences enriched in proline (P), glutamic acid (E), serine (S), and threonine (T) residues (PEST), as well as transmembrane helices. Phosphorylation, glycosylation, and ubiquitination were analyzed with a preference for occurrence in disordered regions using proteome sequence sets. We further investigated whether PEST regions and transmembrane helices preferentially occurred in disordered regions. In the process, it was revealed that protein disorder and multiple PTMs are favored by species-specific proteins among the algae.
It was previously reported that disordered protein content was dependent on the length of the protein [54
], while at the same time, disordered protein content is independent of proteome size [22
]. In the current study, the degree of protein disorder was not associated with algae proteome size (Figure 1
B), which was similar to a previous analysis [21
Sequence redundancies before and after filtering were examined for every algae proteome (Figure S2
), which were calculated as the number of removed sequences after filtering (subtracting the number of redundant sequences after filtering from that of before filtering) divided by the number before filtering. The numbers of sequences before and after filtering are given in Table S1
. A strong positive correlation was observed between the content of redundant sequences and proteome size similar to previous results [21
]. Additionally, it was found that the redundancy of amino acid sequences in the algae proteomes, which was approximately 1% to 27% (Figure S2
), was lower as a whole than that in higher plants, which has been reported to be approximately 7% to 37% [21
]. This implies that gene duplication occurs more often in higher plants than in algae, and that it is possible to obtain inaccurate results when sequences are not filtered for redundancy.
In the current study, we systematically investigated disordered protein content, sites of multiple PTMs, and regions of PEST and transmembrane helices in algae proteomes. In addition, we investigated the correlation between disordered protein content and the number of PTM sites or regions of PEST and transmembrane helices in algae proteomes. Previous studies have reported on the content of protein disorder in many species. For example, in prokaryotes, varying disordered protein content in different thermal groups has been observed [55
]. However, disordered protein content is much higher in eukaryotes than in prokaryotes [2
]. In higher plants, disordered protein content has been reported to be higher in monocot proteomes than that in dicot proteomes [21
]. However, in the algae proteome analysis herein, a trend in disordered protein content among the algae groups such as green algae, oomycetes, diatom, yellow algae, and red algae was not found. This may be due to the inclusion of algae from a wide range of groups. The disordered protein content ranged from approximately 20% to 35% (Figure 1
A). Thus, a more detailed comparison of integral algae characteristics should be performed to understand the differences of disordered protein content. A previous study has suggested that surface accessibility of enzyme-mediated reversible PTMs is closely related to protein-protein interactions in disordered regions [58
]. Accordingly, we found that disordered protein content positively correlated with the presence of predicted PTM sites in the algae proteomes (Table 2
). Specifically, there were strong correlations between disordered protein content and Ser- and Thr-phosphorylation, O-glycosylation, ubiquitination, and PEST regions (Table 1
). Prior human and plant proteome-wide analyses have revealed that phosphorylation sites are significantly enriched in disordered proteins [16
]. Thus, the results of our proteome-wide analysis of algae are similar to those of other eukaryotic proteome-wide analysis. This suggests that the present algae analysis can be applied to other eukaryotic data sets to determine common features of protein disorder and PTMs. No trend was observed for the average abundance of phosphorylation sites between algae groups (Figure 2
A). However, a positive correlation between the predicted number (from 0 to ≥7) of Ser, Thr, and Tyr phosphorylation sites and disordered protein content was observed for every algae proteome (Figure 2
B–D). These results are in agreement with past analyses of higher plant species [21
]. Therefore, we suggest that the abundance of phosphorylation in disordered proteins is common in various algae and land plants.
We found that there was a low correlation between disordered protein content and Y-phosphorylation in algae, especially in yellow algae (Table 1
). Similarly, small ratio values for Y-phosphorylation are presented in Table 2
. Many PTMs including phosphorylation have been shown to occur in disordered regions, because IDRs are easily accessible and flexible. However, a recent study demonstrated that Y-phosphorylation does not tend to occur in disordered regions, which is in contrast to S- and T-phosphorylation [59
]. Similarly, we found that Y-phosphorylation was predicted to occur preferentially in ordered regions.
It has previously been reported that the O-linked glycosylation sites are predominantly located in the IDRs of many eukaryotic proteins such those in as Homo sapiens
, Drosophila melanogaster
, Caenorhabditis elegans
, A. thaliana
, Oryza sativa
, and Schizosaccharomyces pombe
]. In the present study, a strong positive correlation between protein disorder and O-glycosylation was also observed in algae (Figure 3
B, Table 1
and Table 2
), which is similar to a previous report [21
]. The positive correlation between protein disorder and O-glycosylation was expected because O-glycosylation in plants occurs on the Pro in consensus sequences [A/S/T/V]-P(1,4)-X(0,10)-[A/S/T/V]-P(1,4) [40
], in which amino acids tend to form disorder regions [1
]. Hence, the positive correlation between protein disorder and O-glycosylation can be considered as a universal trend in eukaryotes, including algae. In contrast to O-glycosylation, it is known that N-glycosylation does not strongly correlate with IDR content because N-glycosylation generally occurs co-translationally before a protein is fully folded. As a result, there is a lack of structural preference for N-glycosylation. Accordingly, no clear structural preference has been reported for N-glycosylation in proteins [29
]. The correlation analysis in this study also failed to reveal a strong association between N-glycosylation and protein disorder in algae (Figure 3
D, Table 1
and Table 2
). However, the analysis results of the diatoms showed positive correlations between disordered protein content and N-glycosylation. (Figure 3
D, Table 1
). N-glycosylation is thought to be related to folding and stability of proteins, the extracellular matrix and cell-adhesion molecules [61
]. In addition, most diatoms exude polymers from a slit or apical pore field in the siliceous cell wall. The exuded polymers consist of extracellular matrices, and are assembled into a variety of structures, such as trails (material left behind during motility), sheaths (organic matrices tightly associated with the cell wall), capsules (organic matrices loosely associated with the cell walls), and stalks (permanent attachment structures) [62
]. Therefore, the result of correlation analysis between N-glycosylation and protein disorder implies that the living environment and lifestyle of diatoms may be related to N-glycosylation and protein disorder.
Ubiquitination is well known to be involved in protein degradation in eukaryotic cells. Similarly, PEST sequences represent a universal target for proteolytic degradation [48
]. In several studies, IDR-related ubiquitination sites and regions of PEST within IDRs have been identified [4
]. Therefore, we investigated the correlation between IDRs and the abundance of sequence motifs for ubiquitination and PEST regions. A strong positive correlation was observed between protein disorder and the predicted presence of both ubiquitination sites and PEST regions in all algae proteomes (Figure 4
B,D and Table 1
). These results provide further evidence that ubiquitination sites and PEST motifs preferentially occur in the flexible disordered regions of proteins including those in algae.
Transmembrane helices in proteins are important in a variety of critical and diverse biological processes. It has been reported that disordered protein content and the abundance of transmembrane helices has an influence on protein expression and solubility [4
]. Therefore, we investigated the relation between the IDR content and the presence of transmembrane helices. A strong negative correlation was observed between the predicted presence of transmembrane helices and protein disorder in all algae proteomes (Figure 5
B). This result was expected because transmembrane helices commonly consist of hydrophobic residues, whereas disordered regions primarily contain hydrophilic residues. Thus, the function of transmembrane helices is associated with structural stability rather than flexibility.
To further analyze correlations between PTMs sites and IDRs, we determined the relative abundances of site-specific PTMs sites including phosphorylation, O-glycosylation, and ubiquitination in ordered regions and disordered regions of algae proteomes. As a result, we observed that most values of Rd/o are over 1.0 indicating that PTMs primarily occur in the disordered regions (Table 2
). However, Y-phosphorylation and N-glycosylation did not have a preference for disordered regions whereas S- and T-phosphorylation and ubiquitination had a strong preference for disordered regions.
Previous studies have reported that disordered regions are associated with higher amino acids substitution rates [51
]. By comparing protein-protein interactions that occur in humans, flies, and yeast, it was found that interactions in disordered regions were significantly less conserved than in ordered regions [52
]. Additionally, while evolutionarily conserved sequences within disordered regions are important for function, these conserved sequences account for only 5% of the disordered regions [66
]. Therefore, disordered regions are less conserved than structured regions. In the present study, the association between protein disorder and PTM sites was higher in species-specific protein clusters than in common protein clusters in the algae proteomes as a whole (Table 3
). Taken together, these results indicate that disordered regions, PTMs sites, and species-specific protein sequences are significantly related in algae proteomes.