In Silico Analysis of Correlations between Protein Disorder and Post-Translational Modifications in Algae

Recent proteome analyses have reported that intrinsically disordered regions (IDRs) of proteins play important roles in biological processes. In higher plants whose genomes have been sequenced, the correlation between IDRs and post-translational modifications (PTMs) has been reported. The genomes of various eukaryotic algae as common ancestors of plants have also been sequenced. However, no analysis of the relationship to protein properties such as structure and PTMs in algae has been reported. Here, we describe correlations between IDR content and the number of PTM sites for phosphorylation, glycosylation, and ubiquitination, and between IDR content and regions rich in proline, glutamic acid, serine, and threonine (PEST) and transmembrane helices in the sequences of 20 algae proteomes. Phosphorylation, O-glycosylation, ubiquitination, and PEST preferentially occurred in disordered regions. In contrast, transmembrane helices were favored in ordered regions. N-glycosylation tended to occur in ordered regions in most of the studied algae; however, it correlated positively with disordered protein content in diatoms. Additionally, we observed that disordered protein content and the number of PTM sites were significantly increased in the species-specific protein clusters compared to common protein clusters among the algae. Moreover, there were specific relationships between IDRs and PTMs among the algae from different groups.


Introduction
The intrinsically disordered regions (IDRs) of proteins have been extensively investigated over the past several years, and their numbers are significantly higher in eukaryotes than in bacteria or archaea [1,2]. More than one-third of eukaryotic proteins contain IDRs of more than 30 residues in length [3]. IDRs are characterized by a low abundance of order-promoting and bulky hydrophobic aromatic amino acids such as Ile, Leu, Val, Cys, Asn, Trp, Tyr, and Phe, a high abundance of disorder-promoting amino acids such as Ala, Arg, Gly, Gln, Ser, Glu, Lys and Pro, and a high net charge at neutral pH [1,[4][5][6][7][8][9]. While IDRs form unstructured domains, predictions on which regions of a protein are disordered based on protein sequences can be used to select target regions for 3D structural analysis. For example, the disordered C-terminal 100 amino acid deletion mutant of the human nei-like disorder and PTMs including Ser/Thr/Tyr-phosphorylation and O/N-linked protein glycosylation [19,21]. However, sufficient numbers of this type of study have not yet been performed, and plant researchers are currently developing better resources for omics analyses of plant species.
In order to enrich the gene annotation in plants and to understand gene and protein functions, we performed a proteome-wide correlation analysis of protein disorder with the major types of eukaryotic PTMs including Ser/Thr/Tyr-phosphorylation, O/N-linked glycosylation, and ubiquitination in 20 typical eukaryotic algae. We further analyzed sequences enriched in proline (P), glutamic acid (E), serine (S), and threonine (T) residues (PEST), as well as transmembrane helices. Phosphorylation, glycosylation, and ubiquitination were analyzed with a preference for occurrence in disordered regions using proteome sequence sets. We further investigated whether PEST regions and transmembrane helices preferentially occurred in disordered regions. In the process, it was revealed that protein disorder and multiple PTMs are favored by species-specific proteins among the algae.

Comparison of Data Sets and Global Intrinsic Disorder in Algae Proteomes
We prepared non-redundant sequence sets from 20 algae whole-protein sequences using the OrthoMCL tool [30] to remove sequences with >90% identity (for details, see Experimental Section 4.1). The number of sequences in the analyzed algae proteomes varied greatly from approximately 4800 in Cyanidioschyzon merolae to 20,000 in Fragilariopsis cylindrus (Table S1). Disordered protein content was determined using DISOPRED [31], which ranged from approximately 20% in Pyropia yezoensis to 35% in Chlamydomonas reinhardtii ( Figure 1A). A similar trend was observed with another disorder prediction tool, RONN [32] ( Figure S1). Additionally, we found that there was a correlation between disordered protein content and the total number of amino acids among the algae proteomes ( Figure 1B

Correlation between Intrinsic Disorder and Phosphorylation
Reversible protein phosphorylation, principally on serine, threonine, or tyrosine residues, is considered the most important PTM because of its role as a major regulatory mechanism in every living cell. Many studies have reported that protein phosphorylation sites are predominantly observed in the disordered regions of eukaryotic proteins, including those of plants [17,18,21]. Furthermore, the amino acid composition, sequence complexity, hydrophobicity, charge, and other sequence attributes of regions adjacent to phosphorylation sites are very similar to those of IDRs [18].
Recently, the bioinformatics tool Musite was developed for large-scale prediction of phosphorylation sites from protein sequences as a stand-alone application [17]. Therefore, the Musite tool was used to predict phosphorylation sites in the target algae proteomes. The number of predicted phosphorylation sites was normalized to the uniform length of 400 amino acids, rather than per sequence, to account for the difference in the average protein lengths in the datasets. Similarly, we used normalized values in the analyses of the number of glycosylation sites, ubiquitination sites, PEST regions, and transmembrane helices. No trend in the average abundance of phosphorylation sites in algae proteomes was found ( Figure 2A). However, a strong positive correlation was observed between protein disorder and phosphorylation. For every algae proteome analyzed, there was a positive correlation between the predicted number (from 0 to >7) of Ser, Thr, or Tyr phosphorylation sites and disordered protein content ( Figure 2B-D). The correlations between the number of Ser and Thr phosphorylation sites to disordered protein content were statistically significant as determined by one-tailed probability values and false discovery rates controlled by the Benjamini-Hochberg procedure [33] (Table 1 correlation between disordered protein content and the number of phosphorylation sites was confirmed by using the alternative disorder prediction tool, RONN ( Figure S3).      Pearson correlation coefficients, their associated p values, and false discovery rates (Benjamini-Hochberg procedure) are presented in the upper, middle and lower rows, respectively, for all analyzed correlations between IDRs and proteomic parameters. Shaded values indicate p values greater than 0.05 that were not considered statistically significant. The column names stand for S phosphorylation, T phosphorylation, Y phosphorylation, N-linked glycosylation, O-linked glycosylation, region enriched in proline (P), glutamic acid (E), serine (S), and threonine (T) residues (PEST), ubiquitination, and transmembrane helices.

Correlation between Intrinsic Disorder and Glycosylation
Protein glycosylation is one of the most common PTMs where the attachment of glycans to specific residues affects protein folding, localization, and stability. Glycosylated proteins play an important role in essential biological functions such as immunogenicity, catalytic activity, viral clearance, and ligand-receptor interactions [34][35][36][37]. Two major types of protein glycosylation exist: O-linked and N-linked. O-glycosylation occurs frequently in eukaryotic cells and is involved in many basic cellular functions. However, significant differences in O-glycosylation patterns between plants and animals have been described, including the sites of glycan addition and glycan composition. O-glycosylation in plants mainly occurs on hydroxylproline (Hyp) residues [38]. The freely available NetOGlyc tool can be used to predict O-glycosylation sites in mammalian proteins [39]. However, there is currently no available prediction tool for O-glycosylation sites for plant proteins. Therefore, in the present study, we designed an original program to predict Hyp-O-linked glycosylation sites utilizing the consensus sequence for Hyp-O-linked glycosylation previously reported for plants [40]. The average abundance of O-glycosylations sites varied in the studied algae proteomes from approximately 0.8 to more than 1.7 sites per protein ( Figure 3A). Additionally, we observed that the disordered protein content was positively correlated with the number of O-glycosylation sites (from 0 to >7) in every algae proteome analyzed ( Figure 3B). The correlation between IDRs and O-glycosylation had high correlation coefficients and were found to be statistically significant ( Table 1). The positive correlation between disordered protein content and the number of O-glycosylation sites was confirmed by using the alternative disorder prediction tool, RONN ( Figure S4A).  Another major type of glycosylation, N-linked glycosylation, is arguably the most conserved form of protein glycosylation in eukaryotes [41]. As in other eukaryotic cells, N-linked glycosylation occurs frequently in plant cells and it has been shown that eukaryotic N-glycoproteins have invariant sequence recognition patterns, structural constraints, and subcellular localization [42]. N-linked glycosylation occurs on many secreted and membrane-bound glycoproteins as a co-translational process through the attachment to an asparagine (N) at the consensus motif asparagine-X-serine/threonine (NXS/T), in which X is any amino acid except proline [43]. For N-glycosylation site prediction, we used the prediction tool NetNglyc1.0 [44], which is noted to predict N-glycosylation sites in human proteins. However, the N-linked glycosylation pathway in algae shares a high degree of homology with that in other eukaryotic organisms [40,45]. Therefore we predicted N-linked glycosylation sites in the 20 algae proteomes by combining the results of the NetNglyc1.0 algorithm, and the existence of signal peptides by SignalP [46] and transmembrane regions by TMHMM [47] ( Figure 3C). N-glycosylation correlated negatively with the disordered protein content in most of the studied algae proteomes. However, the results of the diatoms showed positive correlations between disordered protein content and N-glycosylation ( Figure 3D, Table 1). These results were confirmed using the alternative disorder prediction tool, RONN ( Figure S4B).

Correlation between Intrinsic Disorder and Ubiquitination and PEST (Region Enriched in Proline (P), Glutamic Acid (E), Serine (S), and Threonine (T) Residues)
Ubiquitination sites and PEST regions within a protein sequence are related to protein degradation [48]. We used the freely available UbPred tool for ubiquitination site prediction and epestfind tool for PEST region prediction [49,50]. The content of ubiquitination sites varied from approximately 0.4 to 1.4 sites per 400 amino acids, and the content of PEST regions varied from approximately 0.2 to 0.5 sites per 400 amino acids in the analyzed algae proteomes ( Figure 4A,C). We observed that the predicted presence of both ubiquitination sites and PEST regions were statistically positively correlated to disordered protein content in all algae proteomes ( Figure 4B,D) with high correlation coefficients ( Table 1). The observed correlations of ubiquitination and PEST with protein disorder were confirmed with the alternative disorder prediction tool, RONN ( Figure S5).

Correlation of Intrinsic Disorder with Transmembrane Helices
Transmembrane helices in proteins play an important role in the transport of various substances across biological membranes. In the present study, we used the freely available TMHMM tool for transmembrane helix prediction. The content of transmembrane helices varied in the analyzed algae proteomes from approximately 0.6 to 0.8 regions per 400 amino acids ( Figure 5A). A negative correlation was observed between the number of predicted transmembrane helices and disordered protein content in all algae proteomes ( Figure 5B). The observed correlations of the number of transmembrane helices with disordered protein content were confirmed with the alternative disorder prediction tool, RONN ( Figure S6).

Relative Content of PTMs (Post-Translational Modifications) in Ordered and Disordered Regions of Algae Proteins
To further analyze the correlation between PTM sites and disordered regions, we determined the relative abundance of specific PTM sites including those for phosphorylation, O-glycosylation, and ubiquitination, in ordered or disordered regions of the algae proteomes. In accordance with a previous analysis [21], we used the relative abundance values for analysis, which are presented as a ratio of normalized PTM content in disordered and ordered segments (Rd/o) of algae proteomes (for details, see Experimental Section 4.3). This approach allowed us to estimate the robustness of the observed correlations between PTM sites and disordered regions. This analysis focused on site-specific PTMs rather than region-specific PTMs such as PEST and transmembrane helix content. Higher values of the Rd/o parameter were obtained for S-and T-phosphorylation as well as ubiquitination in the algae proteomes as a whole (18.8, 10.5, and 12.5, respectively). In contrast, the Rd/o parameter for Y-phosphorylation and N-glycosylation in total algae proteomes (1.3 and 1.2, respectively) indicated less robust relationships (Table 2). Incidentally, we found that oomycetes had the strongest correlation between phosphorylation and protein disorder compared to those of the other algae such as green algae, diatom, yellow algae, and red algae ( Table 2).

Relative Disordered Protein Content and the Number of PTMs (Post-Translational Modifications) in Specific and Common Protein Clusters of Algae Proteomes
In a previous study, protein disordered regions were associated with higher amino acids substitution rates [51]. Therefore, disordered proteins may evolve faster and hence are less conserved than structured proteins [52,53]. As such, there should be differences in the disordered protein content and the number of PTMs between species-specific protein clusters and common protein clusters of algae proteomes. For these calculations, we used the OrthoMCL tool to classify common protein clusters involving all 20 algae species used in this study, and species-specific protein clusters involving just one algae species. In total, we analyzed 15,205 common protein clusters and 66,858 species-specific protein clusters. The content of each PTM parameter such as protein disorder, phosphorylation, glycosylation, and ubiquitination in species-specific protein clusters or in common protein clusters are shown in Table 3. The content of PTM parameters was 1.4 to 3.4 times higher in species-specific protein clusters than in common protein clusters, as expected. Notably, in green algae, diatom, and yellow algae, the occurrence of S/T-phosphorylation in species-specific protein clusters was more than three times higher than in common protein clusters (Table S2). Table 3. Preference of protein disorder and PTM in species-specific protein clusters and common protein clusters of algae proteomes. Total values of each parameter in species-specific protein clusters and common protein clusters are presented. The one-tailed p values as determined with t-tests between species-specific protein clusters and common protein clusters were less than 0.05. Ratio of s/c = the value of specific/the value of common.

Discussion
It was previously reported that disordered protein content was dependent on the length of the protein [54], while at the same time, disordered protein content is independent of proteome size [22]. In the current study, the degree of protein disorder was not associated with algae proteome size ( Figure 1B), which was similar to a previous analysis [21].
Sequence redundancies before and after filtering were examined for every algae proteome ( Figure S2), which were calculated as the number of removed sequences after filtering (subtracting the number of redundant sequences after filtering from that of before filtering) divided by the number before filtering. The numbers of sequences before and after filtering are given in Table S1. A strong positive correlation was observed between the content of redundant sequences and proteome size similar to previous results [21]. Additionally, it was found that the redundancy of amino acid sequences in the algae proteomes, which was approximately 1% to 27% ( Figure S2), was lower as a whole than that in higher plants, which has been reported to be approximately 7% to 37% [21]. This implies that gene duplication occurs more often in higher plants than in algae, and that it is possible to obtain inaccurate results when sequences are not filtered for redundancy.
In the current study, we systematically investigated disordered protein content, sites of multiple PTMs, and regions of PEST and transmembrane helices in algae proteomes. In addition, we investigated the correlation between disordered protein content and the number of PTM sites or regions of PEST and transmembrane helices in algae proteomes. Previous studies have reported on the content of protein disorder in many species. For example, in prokaryotes, varying disordered protein content in different thermal groups has been observed [55]. However, disordered protein content is much higher in eukaryotes than in prokaryotes [2,56,57]. In higher plants, disordered protein content has been reported to be higher in monocot proteomes than that in dicot proteomes [21]. However, in the algae proteome analysis herein, a trend in disordered protein content among the algae groups such as green algae, oomycetes, diatom, yellow algae, and red algae was not found. This may be due to the inclusion of algae from a wide range of groups. The disordered protein content ranged from approximately 20% to 35% ( Figure 1A). Thus, a more detailed comparison of integral algae characteristics should be performed to understand the differences of disordered protein content. A previous study has suggested that surface accessibility of enzyme-mediated reversible PTMs is closely related to protein-protein interactions in disordered regions [58]. Accordingly, we found that disordered protein content positively correlated with the presence of predicted PTM sites in the algae proteomes (Table 2). Specifically, there were strong correlations between disordered protein content and Ser-and Thr-phosphorylation, O-glycosylation, ubiquitination, and PEST regions (Table 1). Prior human and plant proteome-wide analyses have revealed that phosphorylation sites are significantly enriched in disordered proteins [16,19,23]. Thus, the results of our proteome-wide analysis of algae are similar to those of other eukaryotic proteome-wide analysis. This suggests that the present algae analysis can be applied to other eukaryotic data sets to determine common features of protein disorder and PTMs. No trend was observed for the average abundance of phosphorylation sites between algae groups (Figure 2A). However, a positive correlation between the predicted number (from 0 to >7) of Ser, Thr, and Tyr phosphorylation sites and disordered protein content was observed for every algae proteome ( Figure 2B-D). These results are in agreement with past analyses of higher plant species [21]. Therefore, we suggest that the abundance of phosphorylation in disordered proteins is common in various algae and land plants.
We found that there was a low correlation between disordered protein content and Y-phosphorylation in algae, especially in yellow algae (Table 1). Similarly, small ratio values for Y-phosphorylation are presented in Table 2. Many PTMs including phosphorylation have been shown to occur in disordered regions, because IDRs are easily accessible and flexible. However, a recent study demonstrated that Y-phosphorylation does not tend to occur in disordered regions, which is in contrast to S-and T-phosphorylation [59]. Similarly, we found that Y-phosphorylation was predicted to occur preferentially in ordered regions.
It has previously been reported that the O-linked glycosylation sites are predominantly located in the IDRs of many eukaryotic proteins such those in as Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans, A. thaliana, Oryza sativa, and Schizosaccharomyces pombe [5,16]. In the present study, a strong positive correlation between protein disorder and O-glycosylation was also observed in algae ( Figure 3B, Tables 1 and 2), which is similar to a previous report [21]. The positive correlation between protein disorder and O-glycosylation was expected because O-glycosylation in plants occurs on the Pro in consensus sequences [A/S/T/V]-P(1,4)-X(0,10)-[A/S/T/V]-P(1,4) [40], in which amino acids tend to form disorder regions [1,[4][5][6][7][8]. Hence, the positive correlation between protein disorder and O-glycosylation can be considered as a universal trend in eukaryotes, including algae. In contrast to O-glycosylation, it is known that N-glycosylation does not strongly correlate with IDR content because N-glycosylation generally occurs co-translationally before a protein is fully folded. As a result, there is a lack of structural preference for N-glycosylation. Accordingly, no clear structural preference has been reported for N-glycosylation in proteins [29,60]. The correlation analysis in this study also failed to reveal a strong association between N-glycosylation and protein disorder in algae ( Figure 3D, Tables 1 and 2). However, the analysis results of the diatoms showed positive correlations between disordered protein content and N-glycosylation. ( Figure 3D, Table 1). N-glycosylation is thought to be related to folding and stability of proteins, the extracellular matrix and cell-adhesion molecules [61]. In addition, most diatoms exude polymers from a slit or apical pore field in the siliceous cell wall. The exuded polymers consist of extracellular matrices, and are assembled into a variety of structures, such as trails (material left behind during motility), sheaths (organic matrices tightly associated with the cell wall), capsules (organic matrices loosely associated with the cell walls), and stalks (permanent attachment structures) [62]. Therefore, the result of correlation analysis between N-glycosylation and protein disorder implies that the living environment and lifestyle of diatoms may be related to N-glycosylation and protein disorder.
Ubiquitination is well known to be involved in protein degradation in eukaryotic cells. Similarly, PEST sequences represent a universal target for proteolytic degradation [48,63]. In several studies, IDR-related ubiquitination sites and regions of PEST within IDRs have been identified [4,64,65]. Therefore, we investigated the correlation between IDRs and the abundance of sequence motifs for ubiquitination and PEST regions. A strong positive correlation was observed between protein disorder and the predicted presence of both ubiquitination sites and PEST regions in all algae proteomes ( Figure 4B,D and Table 1). These results provide further evidence that ubiquitination sites and PEST motifs preferentially occur in the flexible disordered regions of proteins including those in algae.
Transmembrane helices in proteins are important in a variety of critical and diverse biological processes. It has been reported that disordered protein content and the abundance of transmembrane helices has an influence on protein expression and solubility [4,64,65]. Therefore, we investigated the relation between the IDR content and the presence of transmembrane helices. A strong negative correlation was observed between the predicted presence of transmembrane helices and protein disorder in all algae proteomes ( Figure 5B). This result was expected because transmembrane helices commonly consist of hydrophobic residues, whereas disordered regions primarily contain hydrophilic residues. Thus, the function of transmembrane helices is associated with structural stability rather than flexibility.
To further analyze correlations between PTMs sites and IDRs, we determined the relative abundances of site-specific PTMs sites including phosphorylation, O-glycosylation, and ubiquitination in ordered regions and disordered regions of algae proteomes. As a result, we observed that most values of Rd/o are over 1.0 indicating that PTMs primarily occur in the disordered regions (Table 2). However, Y-phosphorylation and N-glycosylation did not have a preference for disordered regions whereas S-and T-phosphorylation and ubiquitination had a strong preference for disordered regions.
Previous studies have reported that disordered regions are associated with higher amino acids substitution rates [51]. By comparing protein-protein interactions that occur in humans, flies, and yeast, it was found that interactions in disordered regions were significantly less conserved than in ordered regions [52,53]. Additionally, while evolutionarily conserved sequences within disordered regions are important for function, these conserved sequences account for only 5% of the disordered regions [66]. Therefore, disordered regions are less conserved than structured regions. In the present study, the association between protein disorder and PTM sites was higher in species-specific protein clusters than in common protein clusters in the algae proteomes as a whole (Table 3). Taken together, these results indicate that disordered regions, PTMs sites, and species-specific protein sequences are significantly related in algae proteomes.

Prediction of Multiple Properties of Protein Sequences
We used prediction tools for multiple calculations on the proteome sequences of the above 20 algae. To calculate intrinsic disorder in proteins, DISOPRED (version 2.4.2) [31] and RONN (version 3) [32] were employed. The disordered protein content of each protein was obtained by dividing the number of disordered amino acid residues by the protein length. We analyzed the sites of Ser, Thr, and Tyr phosphorylation, N-linked Asn glycosylation, O-linked Pro glycosylation, and ubiquitination. Moreover, the regions of transmembrane helix domains and PEST regions were investigated. These PTMs and regions were predicted using the following bioinformatics algorithm or tools. Sites of phosphorylation were predicted with Musite (version 1.0.1) [17] downloaded from http://musite.sourceforge.net, which was used to predict phosphorylation sites in the target algae proteomes with "Eukaryote-General-Ser-Thr; Eukaryote-General-Tyr" rather than with "A. thaliana-General-Ser-Thr" as a prediction model. This was because oomycetes, which are regarded as colorless algae [85], were included in the algae species set in this analysis and the A. thaliana model did not contain Tyr-phosphorylation information. Sites of O-glycosylation were predicted based on the previously reported consensus sequence [A/S/T/V]-P(1,4)-X(0,10)-[A/S/T/V]-P(1,4) [40,45] for plant proteins containing sites of O-linked Pro glycosylation. While there is currently no prediction tool to detect O-glycosylation sites in plants, we made an original Perl script utilizing the aforementioned consensus sequences. Sites of N-glycosylation were predicted with the NetNGlyc tool (version 1.0) [44]. The N-glycosylation sites were predicted by combining the results of the NetNglyc tool, and the existence of signal peptides by SignalP (version 4.0) [46] and transmembrane regions by TMHMM (version 2.0) [47]. Sites of ubiquitination were predicted with the UbPred tool [86] downloaded from http://ubpred.org. Sites predicted with medium confidence (score range 0.69 ≤ s ≤ 0.84) were considered valid ubiquitination sites from the UbPred prediction results. Transmembrane helix regions were predicted with TMHMM (version 2.0) [47], and PEST regions were predicted with the epestfind tool of EMBOSS [49,50]. We applied the TMHMM and epestfind tools with default runtime parameters. The numbers of predicted PTM sites or regions (transmembrane helix and PEST regions) in proteins were normalized to the uniform length of 400 amino acids, rather than per sequence, to account for the difference in the average protein lengths in the datasets.

Relative Content of PTMs in Ordered and Disordered Segments
The tendency of specific PTM sites to occur in disordered and ordered region of each algae proteome was calculated by using appropriate algorithms following described methods [21]. Specifically, in accordance with a past analysis, the relative abundance of a specific PTM in the ordered and disordered segments of algae proteomes was analyzed using the following ratio: Rd/o = Nd/Ld:No/Lo, where No is the total number of PTM sites in the ordered segment of a proteome, Lo is the length of the ordered proteome segment, Nd is the total number of PTM sites in the disordered segment of a proteome, and Ld is the length of the disordered proteome segment. By this definition, the Rd/o value equals 1 if the relative abundances of a PTM in the ordered and disordered regions are the same. Furthermore, a value of >1 indicates when a PTM preferentially occurs in disordered regions, and a value of <1 indicates if a PTM tends to occur in ordered regions.

Classification of Species-Specific Proteins and Common Proteins of Algae Proteomes
The protein sequences were analyzed using the OrthoMCL tool [30] to classify species-specific and common proteins among the 20 algae proteomes. Pairwise sequence similarities between all protein sequences were calculated with BLASTP using an e-value cutoff of 1 × 10 −5 . Using these results, protein clusters that were equivalent to orthologous groups were estimated using the Markov clustering algorithm employed in OrthoMCL with the default runtime parameters. In this study, a singlet and a cluster consisting of only one species were regarded as a species-specific protein. In contrast, a cluster which consisted of all 20 species, was regarded as a common protein.

Conclusions
This analysis is the first large-scale bioinformatics study of IDRs in algae proteomes, which are considered common ancestors of plants. Our findings regarding the correlation between disordered protein content and multiple PTM sites, PEST, and transmembrane helices in algae proteomes may be useful for investigations into the unsolved biological roles of these proteins, and for understanding algae evolution. Information resources are currently being developed for various plant species from comprehensive analyses such as those in genetics and metabolomics [87][88][89][90][91][92][93]. As such, we aim to perform an integrative analysis of protein properties combined with the above proteomic data, and expect to reveal relationships between proteomics and other biological features in the future.