Evolutionary Divergence of Phosphorylation to Regulate Interactive Protein Networks in Lower and Higher Species

The phosphorylation of proteins affects their functions in extensively documented circumstances. However, the role of phosphorylation in many interactive networks of proteins remains very elusive due to the experimental limits of exploring the transient interaction in a large complex of assembled proteins induced by stimulation. Previous studies have suggested that phosphorylation is a recent evolutionary process that differently regulates ortholog proteins in numerous lineages of living organisms to create new functions. Despite the fact that numerous phospho-proteins have been compared between species, little is known about the organization of the full phospho-proteome, the role of phosphorylation to orchestrate large interactive networks of proteins, and the intertwined phospho-landscape in these networks. In this report, we aimed to investigate the acquired role of phosphate addition in the phenomenon of protein networking in different orders of living organisms. Our data highlighted the acquired status of phosphorylation in organizing large, connected assemblages in Homo sapiens. The protein networking guided by phosphorylation turned out to be prominent in humans, chaotic in yeast, and weak in flies. Furthermore, the molecular functions of GO annotation enrichment regulated by phosphorylation were found to be drastically different between flies, yeast, and humans, suggesting an evolutionary drift specific to each species.


Introduction
Kinases elicit the inhibition and activation of a large spectrum of biological events via the addition of phosphate groups to amino acid residues. The end of the process involves the action of specific phosphatases removing the phospho group by cleavage. This appears as an essential mode of intracellular transduction pathways triggered by outside signaling inputs [1][2][3][4]. The phosphorylation of amino acids on eukaryotic proteins targets mostly serine, threonine, and tyrosine [5]. However, more recently, histidine is also a candidate substrate for some kinases, which is likely underestimated because of the phospho-histidine escaped analytical procedures due to its intrinsic chemical instability [6]. Most membrane kinases in the metabolic pathways are tyrosine kinase receptors activated by extracellular peptides or protein association involved in cell/cell interaction. In contrast, cytosolic serine and threonine kinases are activated by the transduction of external activators into cells through the transient elevation of calcium, cAMP, IP3, or any other second messengers [1][2][3][4][5]. The phosphorylation cascade integrates multiple regulatory steps and extends the fine tuning of interconnected metabolic pathways. This cascade is a key phenomenon for propagating the signals that take place within cells that associate multiple partners to inhibit or activate numerous types of enzymatic activities or protein functions. Moreover, the hubs of scaffold proteins that spatially and temporally organize the functional protein assemblages in eukaryotic cells are intrinsically linked to phosphorylation events, thus defining a phospho-dependent interactome that regulates the interaction of multiple proteins and During the last decade, powerful technologies have drastically advanced and allowed the realization of public databases in which the exact and exhaustive phospho sites found by the experimental procedures for multiple species models are quantified. These curated lists, which are perfectly annotated and searchable, represent a powerful tool for the comparison of phospho-proteome between species regarding the homologous conserved proteins through different taxa from lower to higher orders [27][28][29][30][31][32][33][34][35]. The curated data are accessible on multiple websites, such as PhosphoSitePlus [31][32][33], Phospho.ELM [27,28], phosphoPep [19], PhosphoNET [36], phosphoGRID [37], and dbPAF [18] websites. In the past, these database resources have reported phosphorylation sites in proteins from H. sapiens, M. musculus, R. norvegicus, D. melanogaster, C. elegans, S. pombe, and S. cerevisiae, and these numbers have gradually increased due to their complementation with more recent experimental datasets. The detailed annotations of proteins are accessible for each entry, and the duplication or the multiple redundancy between these separate databases coming from divergent experimental procedures constitutes a powerful method for cross examination. The CUCKOO workgroup at Huazhong University of Science and Technology has been working for some years on the development of an integrative database of protein phosphorylation in animals, plants, and fungi [18,38]. In 2021, they released a database called EPSD (http://epsd.biocuckoo.cn, accessed on 8 March 2022- [39]) that combines the information stored in dbPPT [38], dbPAF [18], and 13 additional databases, including Phos-phoSitePlus [31][32][33], Phospho.ELM [27,28], UniProt (The UniProt Consortium, 2019) [35], PhosphoPep [19], PhosphoGRID [37], dbPTM [23], FPD [40], HPRD [34], MPPD [41], P3DB [42], PHOSIDA [30], PhosPhAt [43], and SysPTM [44]. Altogether, EPSD contains 1,616,804 experimentally identified phosphorylation sites in 209,326 phosphoproteins from 68 eukaryotic species.
Extremely abundant literature has therefore documented the biology of phosphoproteome. Numerous individual orthologous phospho-proteins have been compared in different contexts and in different species. However, little is known about how phosphorylation operates with large scale genomes and the full spectrum of proteomes. Very rare studies have explored the overall interacting protein landscapes induced by thousands of P-sites together that coincidentally act with each other. In a reverse effort compared to the general trend, which consists of analyzing proteins one by one with their few partners, we investigated the idea that global phosphorylation analysis might unmask unsuspected biological principles. Therefore, we aimed to highlight whether massive and integrated P-sites coincidentally contribute to the creation of matrices of proteins without which their function would not be fulfilled.
In this report, we used bioinformatics tools to investigate, using a wide genome scale, the full phosphoproteome of a few generic biological models with respect to major integrated functions. First, we explored whether the degree of phosphorylation of hortologs is conserved across phylogenetically distant species. Second, we determined the comparative rate of phospho-sites in hortolog proteins for each of these species. Then, we investigated the correlation between the degree of phosphorylation of proteins and the size of interacting networks in which they were a component, which we validated using a statistically significant p-value. The protein networks were described by the number of nodes, edges, and the average number of neighbors, according to the standard tools of bio-informatic analysis. The GO annotation of molecular functions was equally scrutinized based on the degree of phosphorylation of proteins. Finally, the integrated substrate maps specific to six different kinases families were graphically superimposed to highlight the significant overlapping of P-sites within protein networks. In addition, an extended analysis of the time course of phospho-proteome induced by EGF factor was performed as an example to show the time-dependent, specific, and changing networks. The multi components and systemic analysis reported in this work show substantial divergence between higher and lower organisms regarding the role of phosphate addition to orchestrate larger protein complexes and phosphorylation dependent GO annotation specificities. All these elements argue in favor of an evolutionary achievement in higher organisms that has selected phosphorylation for which more phospho-sites in proteins guide larger sizes of interaction networks. These multiple and complementary computational analyses highlight the evidence supporting a meta-organization of proteins in different modalities of matrix, orchestrated by thousands of P-sites coincidentally acting with each other.

Results
Large scale proteome was scrutinized in different model species representing major orders and taxa to determine a survey regarding their respective phosphorylation. The distribution of the number of phosphorylation sites per 1000 amino acids was carried out, and the results were compared between species. We aimed to investigate whether the phosphorylation status was correlated with the size of proteins and whether differences occurred between lower and higher species. Proteome was also divided in two panels: homologous protein found in other species versus non-homologous proteins with any resemblance elsewhere to establish whether phosphorylation targeted mainly one group or the other. Results are summarized in Table 1. Briefly, the main information of the computational searching on the large-scale proteome indicated few unambiguously trends, including the following: the Homo sapiens, Mus musculus proteome are massively phosphorylated (90% and 80%, respectively), Drosophila and C. elegans are weakly phosphorylated (38% and 28%, respectively), A. thaliana was moderately phosphorylated, and Danio rerio presents a uniquely low percentage (9%). Finally, the yeast Saccharomyces cerevisiae was found, surprisingly, at the level of humans and mice (87%). The analysis on the subcategories (homologous versus non-homologous proteins) showed no trends in phosphorylation status in each model species, which suggests that the addition of phosphate is not related to an ancient and primitive mechanism linked to the conservation of proteins through the evolutionary drift of living organisms. Few more elements highlighted by the computational analysis were striking. Homologous proteins found phosphorylated in Homo sapiens and Mus musculus showed a high percentage of the total protein (85% and 69%, respectively), whereas this percentage fell drastically in Danio rerio (6%), and significantly in Drosophila, C. elegans, and Arabidopsis (16%, 14%, and 19%, respectively). This result should be underestimated by the fact that in humans, the homologous protein component is massively represented (the non-homologous component represents less than 10% of the total). Moreover, the phosphorylated proteins had a higher molecular weight than the non-phosphorylated ones, which was constantly observed in all the species. The average length for the non-phosphorylated versus phosphorylated was 189/590 for Homo sapiens, 286/612 for Mus musculus, 535/847 for Danio rerio, 408/582 for Drosophila, 349/542 for Arabidopsis, and finally, 275/556 for Saccharomyces cerevisiae. This strongly suggests that phosphate addition occurs in larger sizes of proteins likely orchestrating the functionality of subdomains. The distribution of phosphate groups added to the individual protein in the seven species in proposed in Figure 1.
For this purpose, two parameters were graphically plotted: the number of phosphorylated sites per 1000 amino acids protein units and the frequency of such events in the whole proteome. Figure 1 shows the log-log plot of the relationship between these two entities. Interestingly, the observed parallelism of the distribution of phosphorylation sites did not correlate with the distance between species in a phylogenetic tree. The three most phosphorylated species already found in Table 1, namely H. sapiens, M. musculus and S. cerevisiae, showed a very similar slope. The two least phosphorylated species, D. rerio and C. elegans, presented an equally similar slope. Finally, A. thaliana and D. melanogaster, the two medium phosphorylated species, showed the same profile. What is remarkable is that the similarity of the slopes on the log-log plot between the seven analyzed species suggests that the different orders are highly uniform and homogenous. For this purpose, two parameters were graphically plotted: the number phosphorylated sites per 1000 amino acids protein units and the frequency of such ev in the whole proteome. Figure 1 shows the log-log plot of the relationship between th two entities. Interestingly, the observed parallelism of the distribution of phosphoryla sites did not correlate with the distance between species in a phylogenetic tree. The th most phosphorylated species already found in Table 1, namely H. sapiens, M. musc and S. cerevisiae, showed a very similar slope. The two least phosphorylated species rerio and C. elegans, presented an equally similar slope. Finally, A. thaliana and melanogaster, the two medium phosphorylated species, showed the same profile. Wh remarkable is that the similarity of the slopes on the log-log plot between the se analyzed species suggests that the different orders are highly uniform and homogeno   The table shows the average length and weight of proteins distributed according to different criteria (including the presence of a phosphorylation site and the existence of homologous proteins in another species). For phosphorylated proteins, the average number of phosphorylation sites and the average number of phosphorylation sites per 1000 amino acids are also specified.

Conservation of Phosphorylation in Homolog Proteins of Phylogenetically Distant Species
The degree of conservation of the phospho-sites on the heterologous proteins was investigated between the seven species representing different orders and taxa by computational search and quantification based on a full scale proteome. For each pair of analyzed species, we calculated a Spearman correlation between the phosphorylation rate of homologous proteins (see Materials and Methods). Figure 2 summarizes the results presented in the form of a heatmap where correlations were considered strong for values > 0.6, moderate for values between 0.6 and 0.4 and weak for values between 0.4 and 0.2. All correlations were positive. At the top of the heatmap, the phylogenetic tree of the seven species is drawn. The importance of the correlation was roughly linked to the distance of the species on the phylogenetic tree. The only exception was Danio rerio which was, surprisingly, the species whose distribution of phosphorylation sites was the furthest from humans. We expected to observe a correlation between the number of phosphorylation sites per protein and their lengths (or masses). We investigated whether this correlation between masses and the number of phosphorylation sites for each phosphorylated and orthologous protein was divergent within each species. Figure S1 plots the regression line for each species as an index of phosphorylation density. The slopes of the regression lines were found in accordance with the distances in the phylogenetic tree, except for Danio rerio. To reduce the influence of outliers on the determination of the regression line, we We expected to observe a correlation between the number of phosphorylation sites per protein and their lengths (or masses). We investigated whether this correlation between masses and the number of phosphorylation sites for each phosphorylated and orthologous protein was divergent within each species. Figure S1 plots the regression line for each species as an index of phosphorylation density. The slopes of the regression lines were found in accordance with the distances in the phylogenetic tree, except for Danio rerio. To reduce the influence of outliers on the determination of the regression line, we zoomed the figure to select only proteins with a mass of less than 500,000 daltons and with less than 500 phosphorylation sites. Results, presented in Figure 3A, confirmed the marked differences between species.

Comparative Rate of Phospho-Sites in Homolog Versus Unique Proteins for Each of These Species
The computational analysis of full scale proteome was pursued to investigate the comparative analysis of phosphorylation in proteins that have a homolog in another species versus those without a homolog. Figures 3B and S2 show the data for Homo sapiens.

Comparative Rate of Phospho-Sites in Homolog Versus Unique Proteins for Each of These Species
The computational analysis of full scale proteome was pursued to investigate the comparative analysis of phosphorylation in proteins that have a homolog in another species versus those without a homolog. Figures 3B and S2 show the data for Homo sapiens. This analysis confirms that proteins with homologs were much longer than those without. In terms of the distribution of phosphorylation sites, we noticed that proteins with homologs were slightly more phosphorylated than those without. The same observation applied to Mus musculus (see Figures 4A and S3). The proportion of non-homologous protein is very low in these two species (humans and mice); therefore, a bias for rigorous interpretation is possible. For Danio rerio, the opposite was observed as proteins without homologs were longer than those with homologs, with similar distribution of phosphorylation sites (see Figures 4B and S4). The same observation related to Danio rerio applied to Drosophila melanogaster ( Figures 4C and S5). For Caenorhabditis elegans, the proteins with homologs were shorter and also had fewer phosphorylation sites (see Figures 4D and S6). For Saccharomyces cerevisiae, proteins without homologs were shorter than those with homologs and had significantly more phosphorylation sites (see Figures 4E and S7). For Arabidopsis thaliana, a pattern where the proteins without homologs were longer than the others with the regression lines almost in superposition was observed (see Figures 4F and S8). Altogether, these data argue in favor of an evolutionary drift inside each species that exhibits, to some extent, little coherence and a chaotic history.

Correlation between the Degree of Phosphorylation of Proteins and the Size of Interacting Networks
Comparative evolutionary analyses based on computational tools were performed to extract phosphorylation information to uncover the role of phosphate addition in mastering a larger network of interacting proteins. For Homo sapiens, we divided the 19,645 proteins in our dataset into deciles according to the number of phosphorylation sites. Thus, there were 2368 proteins with zero, one, or two phosphorylation sites in the first decile, 2130 proteins between three and five phosphorylation sites in the second decile, etc. (see Table 2). Human proteins are divided in deciles depending on the number of their phosphorylation site. Interaction data were collected using the String database with a confidence score > 0.7. For each decile, the number of proteins (nb nodes), the number of interactions between the proteins (nb edges), the average number of neighbors of each protein with interactions (avg. neighbors), the number of connected components (nb connected), and the percentage of proteins with no interactions (% singletons) are given.
The interactions between proteins in the same decile were calculated using the String database and a confidence score greater than 0.7. This determined the number of interactions between proteins (edges), the average number of interactions of each protein (avg. Neighbors), the number of connected components (nb. Connected), and the percentage of proteins with no interactions (% non-connected nodes). The graphical representation of the average number of interactions for each decile is given in Figure 5A.
proteins with homologs were shorter and also had fewer phosphorylation sites (see Figures 4D and S6). For Saccharomyces cerevisiae, proteins without homologs were shorter than those with homologs and had significantly more phosphorylation sites (see Figures  4E and S7). For Arabidopsis thaliana, a pattern where the proteins without homologs were longer than the others with the regression lines almost in superposition was observed (see Figures 4F and S8). Altogether, these data argue in favor of an evolutionary drift inside each species that exhibits, to some extent, little coherence and a chaotic history.   melanogaster, the non-phosphorylated proteins constitute the first bin, and the remaining proteins are divided into quartiles. The height of the first bar, which represents the category of nonphosphorylated proteins, cannot be compared to the others because it contains seven times more proteins and the probability of obtaining interactions in this network is much higher. (C) For S. cerevisiae, each bar represents a quintile. For this species, the non-phosphorylated proteins constitute the first bin, and the remaining proteins are divided into quartiles. Interaction data were collected using the String database with a confidence score > 0.7. For each bin, the number of proteins (nb nodes), the number of interactions between the proteins (nb edges), the average number of neighbors of each protein with interactions (avg. neighbors), the number of connected components (nb connected), and the percentage of proteins with no interactions (% singletons) are given. We noticed that the average number of interactions was correlated with the number of phosphorylation sites (or the more phosphorylation sites there were on a protein, the more interactors it had). In humans, the massive networks measured by the number of edges, nodes, and averaged neighbors were found to be clearly dependent on and correlated with the increased number of phospho-sites per protein.
We chose to do the same type of analysis for two other species: D. melanogaster and S. cerevisiae. D. melanogaster contains a large proportion of proteins without a phosphorylation site (more than 60% of the total), which makes it impossible to divide into deciles. We chose to consider the non-phosphorylated proteins as a category and to split the rest into quartiles. The data for D. melanogaster are specified in Table 3. We noticed a relative correlation between the average number of interactions and the number of phosphorylation sites, but this trend does not seem highly significant compared to Homo sapiens. The graphical representation of the average number of interactions for each category of phospho site density is given in Figure 5B (the high number of interactions obtained for the non-phosphorylated proteins might be biased because there are seven times more proteins in this category). For S. cerevisiae, the phospho-proteins was divided into quintiles (see Table 4). In this organism, a relative correlation between phosphorylation sites and interactions was observed, although the quintile of the most phosphorylated proteins showed a relative decrease. As for humans, the number of edges and neighbors increased with phosphorylation and were inversely correlated with the number of connected components for these two species (see Table 4 and Figure 5C). For this species, the non-phosphorylated proteins constitute the first bin, and the remaining proteins are divided into quartiles. Interaction data were collected using the String database with a confidence score > 0.7. For each bin, the number of proteins (nb nodes), the number of interactions between the proteins (nb edges), the average number of neighbors of each protein with interactions (avg. neighbors), the number of connected components (nb connected), and the percentage of proteins with no interactions (% singletons) are given. For this species, proteins are divided in quintiles depending on the number of their phosphorylation site. Interaction data were collected using the String database with a confidence score > 0.7. For each quintile, the number of proteins (nb nodes), the number of interactions between the proteins (nb edges), the average number of neighbors of each protein with interactions (avg. neighbors), the number of connected components (nb connected), and the percentage of proteins with no interactions (% singletons) are given.

GO Annotations Analysis Based on the Degree of Phosphorylation of Proteins
The same categories defined by the degree of phosphorylation (including deciles for humans and quintiles for flies and yeast) were statistically assigned to a GO enrichment under the terms of a biological process. We found that in humans, non-phosphorylated proteins (those in the first two deciles) have a role in the immune response and perception. The most phosphorylated proteins (belonging to the last four deciles) have multiple regulatory roles (transcription, cell cycle, cellular response, splicing, and chromatin remodeling) and functions related to development and morphogenesis. All the significant annotations (which are associated with an adjusted p-value < 0.05) are listed in Table S1. Overall, we saw that significant annotations were much more numerous for the deciles corresponding to the most phosphorylated proteins, including: 780 for decile 10, 369 for decile 9, 21 for decile 8, 15 for decile 7, and two for decile 6 were found. No significant enrichment was found for deciles 4 and 5, and there were three significant annotations for decile 3 and 2, and 15 for decile 1. Table S2 contains a list of selected annotations for humans. A visualization of the p-values associated with the selected annotations for each of the deciles is proposed as a heatmap in Figure 6.
Overall, we saw that significant annotations were much more numerous for the deciles corresponding to the most phosphorylated proteins, including: 780 for decile 10, 369 for decile 9, 21 for decile 8, 15 for decile 7, and two for decile 6 were found. No significant enrichment was found for deciles 4 and 5, and there were three significant annotations for decile 3 and 2, and 15 for decile 1. Table S2 contains a list of selected annotations for humans. A visualization of the p-values associated with the selected annotations for each of the deciles is proposed as a heatmap in Figure 6.  In flies and yeast, paradoxically, there were more annotations on the list of no or low phosphorylated proteins than on the most phosphorylated ones. For yeast, only the first two quintiles (the least phosphorylated proteins) were significantly annotated with GO terms. Significant annotations for humans did not emerge for flies and yeast. However, there were shared annotations between these two latter organisms like the ones corresponding to mitochondrial translation. Remarkably, in flies and yeast, we found GO annotations mostly in the low or very low-level quantiles of phosphorylation, which appeared to be the opposite in humans. All the significant annotations for D. melanogaster and S. cerevisiae are listed in Tables S3 and S4 respectively. The corresponding heatmaps are in Figures 7 and 8. two quintiles (the least phosphorylated proteins) were significantly annotated with GO terms. Significant annotations for humans did not emerge for flies and yeast. However, there were shared annotations between these two latter organisms like the ones corresponding to mitochondrial translation. Remarkably, in flies and yeast, we found GO annotations mostly in the low or very low-level quantiles of phosphorylation, which appeared to be the opposite in humans. All the significant annotations for D. melanogaster and S. cerevisiae are listed in Tables S3 and S4     In the yeast, the less phosphorylated proteins (those in the first two quintiles) have a role in protein formation and transport. Oligosaccharide biosynthetic processes are associated with proteins in the second quintile, and no significant annotation are found for the most phosphorylated proteins.

Integrated Substrate Map for Six Different Kinases Families Highlights a Significant Overlapping
The site http://kinase.com/wiki/index.php/Kinase_classification was used to perform a survey based on the full spectrum of proteomes to determine the extent to which the kinases had overlapping targets. Among these 14 groups of kinases, eight had identified targets that were listed in publicly available databases for different species. The number Figure 8. A heatmap of selected GO enrichments identified for S. cerevisiae proteins grouped by quintile based on their number of phosphorylation sites. The figure lists the significant GO enrichments by biological process terms for each quintile of S. cerevisiae proteins defined according to their degree of phosphorylation. Each table row represents a different GO term, and each column represents a different quintile. The numbers in the cells are the log10 p-values with FDR correction of the enrichment of a bin with a specific GO term. Empty cells are associated with a log10 p-value of 0, which corresponds to a p-value of 1. In the yeast, the less phosphorylated proteins (those in the first two quintiles) have a role in protein formation and transport. Oligosaccharide biosynthetic processes are associated with proteins in the second quintile, and no significant annotation are found for the most phosphorylated proteins.

Integrated Substrate Map for Six Different Kinases Families Highlights a Significant Overlapping
The site http://kinase.com/wiki/index.php/Kinase_classification (accessed on 18 March 2022) was used to perform a survey based on the full spectrum of proteomes to determine the extent to which the kinases had overlapping targets. Among these 14 groups of kinases, eight had identified targets that were listed in publicly available databases for different species. The number of phosphorylated proteins by this later group is listed in Table S5. The whole dataset represents a total of 7073 proteins. From the list in Table S5, we distinguished that six kinases had more than 3000 targets (AGC, CAMK, STE, CMGC, TK, and CK1) and three kinases had less than 1500 ones (TKL, Other, and Atypical). Plotting of the phosphorylated proteins by the six major kinase groups in a Euler diagram representation shows an intertwined complex network which mixes strict specificity and dense overlapped targeting. The representation is shown in Figure 9. . An euler diagram of the proteins phosphorylated by six major kinase groups. Euler diagram representation of the overlapping sets of proteins phosphorylated by the groups of kinases AGC, CAMK, CK1, CMGC, STE, and TK. In the figure, each kinase group is identified by a number between brackets (from 1 for AGC to 6 for TK). The number of proteins corresponding to each region of the diagram is specified, followed by the identifiers of the kinase groups that phosphorylate them. In dense areas, the content of some regions is taken out of the body of the diagram and linked to the corresponding region by arrows. The area of each region in the diagram is proportional to the number of elements included.
A large number of proteins were phosphorylated by all kinase groups, but the TK (Tyrosine Kinase) group differed from the others due to the fact that it phosphorylated restrictively and specifically most of its proteins targets that were insensitive to the action of the other kinase groups. In order to simplify this very dense graphical representation, we kept only the most prominent kinases, i.e., those that target more than 2% of the total proteins. By doing this, we obtained a drastic simplification. We noticed that the STE, CK1, and CAMK groups phosphorylated exactly the same group of proteins. The same is true for a group composed of Other, TKL, and Atypical. By regrouping these clusters, their overlapping, and their intersection, we propose a simplified graph that highlights the intertwined landscape of interconnected phosphorylation processes (see Figure 10). . An euler diagram of the proteins phosphorylated by six major kinase groups. Euler diagram representation of the overlapping sets of proteins phosphorylated by the groups of kinases AGC, CAMK, CK1, CMGC, STE, and TK. In the figure, each kinase group is identified by a number between brackets (from 1 for AGC to 6 for TK). The number of proteins corresponding to each region of the diagram is specified, followed by the identifiers of the kinase groups that phosphorylate them. In dense areas, the content of some regions is taken out of the body of the diagram and linked to the corresponding region by arrows. The area of each region in the diagram is proportional to the number of elements included.
A large number of proteins were phosphorylated by all kinase groups, but the TK (Tyrosine Kinase) group differed from the others due to the fact that it phosphorylated restrictively and specifically most of its proteins targets that were insensitive to the action of the other kinase groups. In order to simplify this very dense graphical representation, we kept only the most prominent kinases, i.e., those that target more than 2% of the total proteins. By doing this, we obtained a drastic simplification. We noticed that the STE, CK1, and CAMK groups phosphorylated exactly the same group of proteins. The same is true for a group composed of Other, TKL, and Atypical. By regrouping these clusters, their overlapping, and their intersection, we propose a simplified graph that highlights the intertwined landscape of interconnected phosphorylation processes (see Figure 10). Figure 10 shows that some proteins were only phosphorylated by kinases belonging to the TK, CMGC, and AGC categories. Two hundred and twenty-four identified proteins (in the center of the diagram) were phosphorylated by all kinase groups. Interestingly, the targets of Other, TKL, and Atypical kinase groups were also targets for all the other kinases with any restrictive specificity. The two remaining groups of proteins were one group phosphorylated by all kinase groups except Other, TKL, and Atypical and the other group phosphorylated by STE, CK1, CAMK, CMGC, and AGC.  Figure 10. A filtered Euler diagram of the significant groups of proteins phosphorylated by five major kinase groups. The Euler diagram represents the overlapping sets of proteins phosphorylated by the most prominent kinase groups, i.e., those that target more than 2% of the total proteins. The STE, CK1, and CAMK groups, which phosphorylate exactly the same proteins, are grouped in a same category. The same goes for the groups of kinases Other, TKL, and Atypical. In the figure, each kinase group is identified by a number between brackets (from 1 for TK to 5 for AGC). The number of proteins corresponding to each region of the diagram is specified, followed by the identifiers of the kinase groups that phosphorylate them. The area of each region in the diagram is proportional to the number of elements include.
To further investigate the clustering of the Figure 10, a GO annotation for enrichment analysis was performed with the corresponding lists (see Table S6). The functions of AGCphosphorylated proteins are more diverse and concern biological regulation (FDR = 2.6 × 10 −4 ), signaling (FDR = 1.

Time Course of Phospho-Proteome and Time Dependent Specific Changes in Networks Induced by EGF Factor
Finally, we compared changing and non-changing subsets of the phospho-proteome Figure 10. A filtered Euler diagram of the significant groups of proteins phosphorylated by five major kinase groups. The Euler diagram represents the overlapping sets of proteins phosphorylated by the most prominent kinase groups, i.e., those that target more than 2% of the total proteins. The STE, CK1, and CAMK groups, which phosphorylate exactly the same proteins, are grouped in a same category. The same goes for the groups of kinases Other, TKL, and Atypical. In the figure, each kinase group is identified by a number between brackets (from 1 for TK to 5 for AGC). The number of proteins corresponding to each region of the diagram is specified, followed by the identifiers of the kinase groups that phosphorylate them. The area of each region in the diagram is proportional to the number of elements include.

Time Course of Phospho-Proteome and Time Dependent Specific Changes in Networks Induced by EGF Factor
Finally, we compared changing and non-changing subsets of the phospho-proteome using as example human Phospho-sites that were unaffected or affected by the stimulation of EGF in a culture cell system in a series of time course-controlled experiments. The aim of this study was to visualize the changes in the networks of correlated and interactive proteins upon a stimulus action in a time dependent manner. Data relative to the variation of phosphorylation following EGF exposure on Hela cells were measured using SILAC (stable isotope labelling by amino acids in cell culture) at six time-points: after 1 min, 5 min, 10 min, 15 min, 20 min, and 30 min, were retrieved from the qPhos database. These data were then processed to associate each protein with a count that corresponds to the variation in phospho-sites at each time point (see Materials and Methods). The summary of the number of phosphorylations per protein for each time point is specified in Table S7. As a function of time after EGF exposure, the change in the number of phosphorylations per protein ranged from zero (no change) to 20 (the maximum measured for the Desmoyokin protein-AHNAK) after 10 min. Using the data in Table S7, it is easy to list, for each time point, the proteins that were the most differentially phosphorylated. However, these lists highlight potentially unrelated proteins from which it is difficult to infer a meaning. In transcriptomics, following Rapaport et al.'s 2007 statement [45] that "a small but coherent difference in the expression of all the genes in a pathway should be more significant than a larger difference occurring in unrelated genes", the focus has shifted to analyzing the activity of genes in the light of our knowledge of their molecular interactions, i.e., researchers seek to identify dense parts of an interaction network that contain genes that are significantly dysregulated. Those parts of an interaction network that are both dense and active are called "modules" or "active modules". In the present study, we took a similar approach by identifying the most differentially phosphorylated protein modules on a protein-protein interaction network. These modules of interacting proteins differentially phosphorylated at each time point were identified using AMINE (Active Module Identification through Network Embedding- [46]) and listed in Tables S8-S13. Depending on the time elapsed since the exposure to EGF, the number of modules varied from three at 1 min to a maximum of 18 at 15 min and decrease to two at 30 min. The structure of the modules was similar to a large one composed of several proteins ranging from 17 to 118 and other smaller ones. In this study, we focused on the first module. The enrichments of this module with annotations from Gene Ontology (Biological Process), KEGG, Reactome, and WikiPathway for the different time points are shown in Table S14. Module 1 contains, for all the time points, the three proteins EGFR, SHC1, and MAPK1. Overall, there was a large overlap of proteins belonging to module 1 at all the time points. The evolution of the phosphorylation of the proteins included in the prominent module 1 representing protein interaction for each time point is shown in Figure 11. The proteins in red and those in brown were differentially phosphorylated compared to the control. The proteins in red represent those that were differentially phosphorylated at this time point compared to the previous time point (or newly phosphorylated proteins). The proteins in brown are those that were differentially phosphorylated but were already differentially phosphorylated at the previous time point. Finally, the proteins with a white background are those that were no longer differentially phosphorylated at this time point but were differentially phosphorylated at the previous time point (or back to normal). This analysis highlights the dynamic protein networking upon external stimulation. A dynamic and variable landscape of networks linked to stimulus-specific phospho-proteome can be considered as the rewiring of a signaling network. ol. Sci. 2022, 23, 14429 18 of 25 Figure 11. An interaction graph between the proteins included in the most striking module identified by AMINE for each of the time points after EGF exposure. The proteins in red and those in brown are those that were differentially phosphorylated compared to the control. The proteins in red represent those that were differentially phosphorylated at this time point compared to the previous time point (newly phosphorylated proteins). The proteins in brown are those that were differentially phosphorylated but were already differentially phosphorylated at the previous time point. Finally, the proteins with a white background are those that were no longer differentially phosphorylated at this time point but were at the previous time point (or back to normal).

Discussion
Phosphorylation constitutes a major post-translational modification ubiquitous in prokaryotic and eukaryotic cells, inducing transformation in protein steric configuration and activation and inhibition of various enzymatic activity and promoting and impeding protein-protein interactions. Consequently, phosphorylation orchestrates many biological processes, acting as a signal transduction triggered by extracellular factors. As expected, aberrant phosphorylation has been described in numerous reports as the cause of a large number of diseases, such as cancer and dysfunctional developmental programs across species [47]. Alzheimer disease in humans has been particularly scrutinized [48][49][50], along with other neuronal afflictions [51]. The immense effort to build a Figure 11. An interaction graph between the proteins included in the most striking module identified by AMINE for each of the time points after EGF exposure. The proteins in red and those in brown are those that were differentially phosphorylated compared to the control. The proteins in red represent those that were differentially phosphorylated at this time point compared to the previous time point (newly phosphorylated proteins). The proteins in brown are those that were differentially phosphorylated but were already differentially phosphorylated at the previous time point. Finally, the proteins with a white background are those that were no longer differentially phosphorylated at this time point but were at the previous time point (or back to normal).

Discussion
Phosphorylation constitutes a major post-translational modification ubiquitous in prokaryotic and eukaryotic cells, inducing transformation in protein steric configuration and activation and inhibition of various enzymatic activity and promoting and impeding protein-protein interactions. Consequently, phosphorylation orchestrates many biological processes, acting as a signal transduction triggered by extracellular factors. As expected, aberrant phosphorylation has been described in numerous reports as the cause of a large number of diseases, such as cancer and dysfunctional developmental programs across species [47]. Alzheimer disease in humans has been particularly scrutinized [48][49][50], along with other neuronal afflictions [51]. The immense effort to build a comprehensive inventory of the phospho-proteome across species in order to document the phosphorylation across all the spectrum of eukaryotic and prokaryotic living organisms has been undertaken in the past two decades [52][53][54][55][56]. The comparative catalog of this inventory amenable to the full genome scale of protein byproducts in key species could deepen our comprehension of this important biological mechanism that has been selected in evolution for five hundred millions years. Published advances have facilitated our evolutionary analysis, including searching highly conserved mechanisms that might have spread over living organisms and the singular derived events within each species [57].
Regarding the prokaryotes, or the most primitive organisms, only a few hundred proteins per organism have been reported to be phosphorylated, whereas for the eukaryotic species, the detection of thousands of phosphorylated proteins have been comparatively found. Bacteria have been intensively studied due to the simplicity of cells amenable to experimental procedures [58][59][60]. In this report, an evolutionary analysis was conducted to investigate whether a core of phosphoproteins is preferentially conserved in all species from lower to higher evolution hierarchy. We aimed to document whether protein orthologs across living organisms might share a hidden prominent mark like a signature that would be conserved in phosphorylated sites. However, when the phosphoproteome of yeast and Caenorhabditis elegans are compared, the results suggest that phosphorylation evolved separately after the divergence of higher eukaryotes from yeast, which is corroborated by the apparition of a large number of species-specific kinases after the split [61]. Interestingly, mitochondria have a bacterial origin in accordance with the symbiotic paradigm and are found in pro and eukaryotic cells. Their phosphorylation status shows unambiguous, limited conservation, which seem to indicate that phosphorylation is a more recent process evolving separately inside each taxa and orders. Moreover, the quantification of phosphosites in mitochondria in diverse species shows a drastic elevation in mammals compared to that of yeast [61].
An analysis of phosphoproteome between non-heterolog and heterolog proteins in humans was informative-no clear-cut status can dissociate the two groups, although the non-ortholog proteins constitute a very small component versus the total. Within the eukaryotic reign, identified phosphoproteins are no more orthologs than non-orthologs depending on the species.
Interestingly, in flies and worms, which have roughly the same phylogenetic distance to mammals, the same low level of phosphorylation associated with overlapping GO enrichment was observed. This pattern of GO annotation obtained in a comparative analysis between humans on one side and worms and flies on the other side is drastically different, suggestinglittle conservation of the phosphorylation and its biological implication across evolution.
Together, comparative large scale phosphorylation datasets in yeast, flies, mice, and humans suggests striking differences in structural features. The intensity of phosphorylation is correlated with increases in the size of interacting networks in humans, whereas this trend is unclear or perhaps non-existent in lower species like flies. This discrepancy between the conservation of alignments in protein sequences and the evolutionary divergence in the phosphorylation process indicates that each species has manipulated the regulatory processes of phosphate addition in a very specific manner. In accordance with this finding, the size of the worm kinome is two times larger than of that of the fly kinome, with a high level of divergence between them as half of worm kinone are specific to this species [61][62][63]. The fact that 90% of human proteins are phosphorylated highlights the evolutionary success of primates in creating large, connected protein networks whose transient association and dissociation is obviously guided by phosphorylation [64]. Primates have invented robust phosphorylation processes to integrate multiple connected partners, which is not seen in other species, such as flies, zebra fish, and worms. However, in E. coli, for example, around 100 of its 4000 (3%) proteins are known to be phosphorylated, versus more than half of all proteins in eukaryotic cells [61]. Site-specific phosphorylation co-evolved with the adaptation of the species according to their evolutionary history. This suggests that regulation via phosphorylation occurred late in prokaryotic evolution from the split between yeast and higher eukaryotes. This might explain why eucharyotic species far remote in the phylogenetic tree share some common features regarding the kinone [61,[65][66][67][68].
Based on one example of biological stimulus acting on a simple cultured cell, we present a comprehensive time course of protein networks triggered by time dependent changes of phospho-sites. After stimulation, the networks arranged by phosphate addition extend in a wave like manner and then retract to the basal level.
It is generally accepted that amino acid changes in protein sequences have evolved at a constant rate in history within each species. Phylogenetic analysis based on genome sequencing and RNA-sequencing determination have shown that the switch between acidic and phosphorylatable residues took place for some proteins in the evolutionary tree between eukaryotes and prokaryotes. Thus, the acidic residues (Asp or Glu) might have been replaced by serine or threonine-in such a case, the salt bridge can be restored via the addition of a phosphate group. In this scenario, the proteins authorizing chemical modification due to the phosphate addition are multi-phasic, allowing fine regulation [8].
In this report, we tried to address when an ordered apparition of kinases took place in the evolutionary timing of living organisms, and some major events of phosphorylation could explain the primate status [67,68].

Conclusions
In this report, we did not focus our study on known substrate motifs as signaling modules across species for major kinases. We assumed that in the evolutionary processes, some motifs can appear with modified kinases, diminishing the pertinence of identical motifs searching in full scale phosphoproteome. Instead, we preferred the strategy of determining the global phosphorylation in proteins and its incidence to guide molecular networks.
Few, if any, comparative quantifications of overall phosphorylation and its evolutionary conservation across species have been undertaken. Using bioinformatics tools on genome wide scale eukaryotic phosphoproteomes, including humans, flies, worms, and yeast, we present the most comprehensive evolutionary landscape of eukaryotic phosphoproteomes thus far. We scrutinize the phosphoproteomes in these different models of generic models for which extensive full scale genome datasets are available. Most reports describe the phosphorylation in individual proteins along with their few regulatory partners and their comparison between species. These studies are usually also establish the degree of evolutionary conservation. Our approach documents a new paradigm: the role of the integrated P-sites landscape, including phosphorylation noise without known biological function, in the shape and assembly of proteins networks. Taking into account the massive P-sites without any known function (only 3% to 5% of P-sites have been found with a regulatory function), we explored the relationship between the full spectrum of phosphorylation and the meta organization of protein matrixes. A panel of computational tests addressed the role of full spectrum P-sites, including: (i) in relation to the dynamics of protein networking upon stimulation; (ii) in relation to the conserved networking between species; and (iii) in relation to the landscape of intertwined P-sites by different kinases into the networks. More importantly, our datasets highlight that P-sites numbers in proteins, including mostly "noisy" phosphorylation, increase drastically with the size of networks in which these proteins are component. These datasets might unmask a prominent role of the "P-site noise", which is attributed to P-sites without a known function that constitute more than 95% of the total, regarding shaping proteins matrixes.
We might conclude that phosphorylation orchestrates protein networks in humans. Amazingly, this observation appears to be more chaotic in lower species. The functional role of many phospho-sites remains unknown and poorly documented despite a considerable inventory of phosphorylated proteins with sequences information of modified residues across many species, ranging from lower to higher organisms. In the massive background and noise of phosphate addition, only a few highly conserved phosphorylation hotspot regions have been documented with functional importance [69]. If 90% of human proteins are found phosphorylated, only 5% have a reported regulatory function situated mostly in the catalytic domain or the protein-protein interaction domain [67][68][69].
Overall, all these imbricated studies highlight that the kinome and phospho landscapes evolve separately in each species, with a mixed of conserved processes between them and innovative new kinases and P-sites specific to each one. Our overall analysis at full proteome level demonstrates clearly that the paradigm of the great proportion of phosphate addition without any known function and considered metabolic noise should be re-evaluated. Phosphate addition, besides the known regulation of biochemical function regarding few individual proteins, likely intervenes with the orchestrating network of interacting proteins upon stimulation through what is unanimously viewed as its noise component. Moreover, stable or long-lasting P-sites without any attributed function might be a chemical key to assemble proteins in networks.

Computing Correlation between the Distribution of the Number of Phosphorylation Sites Per Protein in Different Species
HomoloGene is a system that automatically detects homologs, including paralogs and orthologs, among the genes of 21 completely sequenced genomes. Homologene database attributes a HID (HomoloGene group ID) to proteins from different species that represents a grouping of homologous genes among species. This data has been combined with the number of phosphorylation sites per proteins given by EPSD. For each pair of analyzed species, we calculated a Spearman rank correlation, which gave a value in the range [−1, +1] where a value of +1 meant a perfect positive association between ranks, a value of 0 meant no association between ranks, and a value of −1 meant a perfect negative association between ranks. Data about the phylogenetic tree of the seven selected species were from Hedges (2002) [72].

Correlation between Phosphorylation and Protein Interactions
For humans, we divided the 19,645 proteins in our dataset into deciles according to the number of phosphorylation sites. There were 2368 proteins with zero, one, or two phosphorylation sites in the first decile, 2130 proteins with between three and five phosphorylation sites in the second decile, etc. (see Table 2). For S. cerevisiae, we divided the data into quintiles (Table 4). D. melanogaster contained a large proportion of proteins without a phosphorylation site (more than 60% of the total) which made it impossible to divide it into bins of equal size. For this organism, we considered the non-phosphorylated proteins as a category and sub-divided the rest into quartiles ( Table 3). The interactions between proteins in the same bin were retrieved from the String database and used a confidence score greater than 0.7. Calculations of the number of interactions, the average number of interactions of each protein, the number of connected components, and the percentage of proteins having no interaction were done with Cytoscape [73].

Enrichment Analysis
Functional enrichments for Gene Ontology Biological Process terms, KEGG, WikiPathways, and Reactome Pathways were retrieved using Cytoscape [73] and the StringApp plugin [74]. Only terms associated with a False Discovery Rate (FDR) of less than 0.05 were retained.

Dynamic Analysis
In the qPhos database, we selected data relative to the variation of phosphorylation following EGF exposure to Hela cells measured using SILAC (stable isotope labelling by amino acids in cell culture) at six time-points: after 1 min, 5 min, 10 min, 15 min, 20 min, and 30 min. The dataset contained the variation of phosphorylation at each time-point compared to the control for multiple phosphorylation sites. For the same protein, increases of phosphorylation were observed on some sites, while some other sites showed a decrease. In order to obtain a unique value reflecting its change in phosphorylation for each protein, we counted the number of sites where the variation in phosphorylation exceed a factor of two (a Log2 FoldChange EGF vs. control ≤ −1 or ≥ 1). Our count of the number of phosphorylation variation by protein for every time point is specified in Table S7. Modules of interacting proteins differentially phosphorylated at each time point were identified with AMINE (Active Module Identification through Network Embedding- [46]). The method was adapted to use the number of phosphorylation variation per protein computed above instead of a p-value generated by a differential analysis pipeline. The String database [52] was used to retrieve protein interaction data with a combined evidence score greater than 0.7. Measures of protein activity were represented by phosphorylation change counts.