3.1. A Brief Analysis of the Behavior That We Expect for S1 Free in Solution
Chemical-physical data of the S1 subunit of SARS-CoV-2 Spike (Gene ID: 43740568 (ncbi.nlm.nih) and UniProtKB: PODTC2) were calculated by the web server CIDER [
35], while their interpretation was exclusive to the author of this manuscript. The mature S1 subunit contains 711 aa, as decoded by its mRNA. We know many structural details of S1 in the Spike structure [
38] and its mechanism of action in penetrating a living cell through ACE2 [
24]. Spike (1273 aa in humans) is a precursor protein (
Figure 1) that is proteolytically cleaved by furin into an N-terminal S1 subunit and the C-terminal hydrophobic S2 subunit. The latter mediates cell attachment. S1 is visible in the extracellular environment.
Its key role is to interact with ACE2 and, through its immunogenic epitopes, manage interactions with external proteins. The fate of the S1 particles released by furin has never received much attention [
39]. However, S1 is detectable in the blood for a long time, both after infection and vaccination. The free structure of S1 at 3.6 Å is on PDB (7A91; entry DOI:10.2210/pdb7a91/pdb). Neutralizing antibodies target the epitopes on the S1 surface [
40]. Thus, researchers fragmented S1 into peptides to understand which epitopes have the greatest antigenic power. They found a high number of strong epitopes in the 440–600 region, where the HLA epitope (448–456), called NF9 peptide, dominates [
41]. What the researchers have overlooked is the conformational behavior in solution and the chemical-physical characteristics of the subunit, since they have studied techniques to develop detection assays [
42]. These are important features for assessing the interaction tendency of proteins.
I used the platform CIDER of Pappu’s lab for calculations (see methods for details). We can predict with approximation its conformational preference as weak polyampholytes (FCR < 0.3) for S1 (FCR < 0.160), by combining chemical-physical parameters calculated by sequence [
24,
29]. The S1 subunit shows a fraction of charged residues (FCRs) of 0.160 and a net charge per residue (NCPR) of 0.006. This translates to a net positive charge at pH 7.0 of +4.95. The strong positive net charge of S1 at neutral pH favors interactions with negatively charged proteins or surfaces, but also suggests the excellent solubility of the protein in aqueous media. The FCR and NCPR values, as calculated by the CIDER server, suggest a dispersed positive charge distribution over the entire protein with a high number of charged patches in the region 400–600 and in the N-terminal tail. Also, a K value (charge patterning parameter) of 0.175 suggests segregation of charged residues within the protein with conformational fluctuations caused by long-range electrostatic attractions. This value is zero for sequences with well-mixed charges [
34].
CIDER calculated a value of disorder-promoting capacity (DPC) of 0.549 [
34]. The terminal segment, from residue 440 to about 700, is the one with the greatest number of disorder-promoting residues, according to Dunker [
43]. This suggests that this region comprises many disordered segments. Flexibility analyses and structural models support this conclusion (see 
Figure S1 [
44,
45,
46] in 
Supplements). The proline content of S1 (5.2%) is high (37 P residues, and, on average, one P every 19 residues). P is a residue that acts as a mobile hinge inducing structural orientation change, but it is also the protein residue with the most disorder-promoting potency [
37]. We also need to put glycine on the same level as P because of its strong disorder-promoting ability [
37]. Its content is also high, 46 residues (6.5%), with one G residue every 15.4 residues. This residue induces a strong structural flexibility in the structural environment that surrounds it, favoring broad structural fluctuations [
47]. The set of highlighted parameters suggests an extended and mobile globule-like structure [
34], very flexible with disordered segments but also very soluble in solution (see 
Figure S1 in Supplementary Material) and susceptible to electrostatic interactions. This contradicts the general idea of compactness that arises from snapshot views of Spike by three-dimensional techniques. This extended flexible conformation benefits the virus by enhancing its ability to bind to receptors. The flexibility of the receptor-binding domain allows it to explore a wider space, increasing the likelihood of encountering a receptor and boosting its virulence. A fast equilibrium between protein–solvent and intra-protein interactions should control the conformational properties of IDR segments in solution, which are crucial for interactions [
36,
48].
These chemical-physical and structural properties of S1, free in solution, explain well the reason for the many interactions with human proteins found in BioGRID. But S1 also displays 87 sites for PTMs on its sequence. They modulate the structure and functionality of the protein in the different metabolic and temporal contexts in which it operates alone. Many sites in the IDRs undergo phosphorylation by serine/threonine kinases, which, by modifying the conformational properties of S1, allows it to coordinate many cellular signaling events [
49]. These results highlight the importance of considering the intrinsic conformational behavior of this protein free in solution when developing vaccines because the final step releases S1 into the cell, free to interact with human proteins [
50].
  3.2. Data Source
All features highlighted in this study are based on experimental data extracted from BioGRID (see Methods, 
Section 2.1). I select 158 human proteins out of the 1371 unique interactors of S1, which induces 3002 raw interactions. The selected proteins are all characterized by a high significance level (level score ≥ 2) as well as by binding S1 with at least one Low Throughput (LT) interaction. BioGRID prioritizes molecular interactions detected with LT experiments over those detected with high-throughput (HT) experiments. This is because LT experiments are more targeted and accurate in identifying specific, biologically relevant interactions. HT experiments can produce a larger number of interactions, including interactions that are not significant [
22].
Table S1 (Supplementary Material) reports the proteins extracted from BioGRID that interact with S1 at LT. The 
Figure S2 (Supplementary Material) shows the interactome calculated by STRING for these proteins. The figure shows a compact network suggesting common functional activities. Although the confidence score is low (0.400) and all 7 channels are open to collect as much information as possible, the proteins form a compact network with an excellent 
p-value suggesting shared biological activities. The low score value and the use of all channels were used to collect as much information as possible, postponing the pruning of the less significant nodes to a later time. However, twelve nodes remained disconnected. The lack of connection suggests either little research on these proteins or that they are not involved in this specific functional context [
51]. Therefore, I eliminated them so as not to alter calculating the topological parameters [
52,
53,
54,
55]. I pruned these nodes of low significance (CACNA1C, CLPTM1, CNTN1, CSNK1G3, IL1RAPL2, LYSMD3, MPZL1, MSMP, PIM2, SLC6A15, SLC7A4, and WDR45B) to increase the accuracy and robustness of my conclusions. In 
Figure S3, I show the new 146-node interactome. This new interactome appears well organized, with a central compact body and many peripheral subgraphs, locations of specific biological activities. For instance, the subgraph on the left (ZDHHCXX-GOLGA7) is the palmitoyl–transferase complex involved in protein transport from the Golgi to the cell surface [
56]. The pentagonal subgraph (bottom compared to the previous one) shows components of the Coatomer cytosolic protein complex II (COPII), which promotes forming transport vesicles from the endoplasmic reticulum (ER) and regulates the intracellular membrane trafficking, from the formation of transport vesicles to their fusion with membranes [
57]. Many of these 146 nodes have all the characteristics of functional compactness and high rank to maximize the metabolic processes of the network through enrichment as functional seeds [
58]. Efficient seed selection should select the most influential nodes to achieve the maximum level of functional influence. This is because, in the enrichment phase, the robustness of seeds is essential to counteract potential disturbances, such as topological alterations. An accurate selection of influential seeds reduces perturbations in the network [
59].
 Functional enrichment is based on statistical parameters related to biological functions associated with the gene set extracted from BioGRID. It identifies biological and functional themes (pathways, Gene Ontology, diseases, etc.) that, although sometimes over-represented, apply to the topic under study. Integrating multiple pathways (KEGG, Reactome, etc.) offers advantages in terms of more probable, more extensive, and robust functional annotations, necessary for a better understanding of the functions and metabolic regulation existing in a complex biological system such as the virus–host one.
I have two major goals: extracting useful information from the functional processes of the proteome that are related to functional seeds, as a strategy; defining the topological space in which to represent and visualize the structural organization of the extracted metabolic processes as a method. STRING implemented the calculation for the functional enrichment of these nodes by adding 500 first and 500 second order proteins (direct and secondary interactions) until obtaining an interactome of 1146 nodes (see 
Figure S4). This interactome, despite being very compact and with an excellent statistic, still has disconnected nodes, most likely because of heterogeneous data. Although access to scientific documents in natural language by Text Mining is easier, the results of this automated search are often not relevant to the needs of the user looking for experimental and quantitative data [
60,
61]. In fact, extracting information through key phrases and relationships used by these systems leads to heterogeneous results with differences among the scientific databases from which the articles were retrieved, even if articles are similar [
60]. It is important to note that bioinformatic platforms treat less studied genes/proteins as if they were background noise and often eliminate them from the calculations [
62,
63]. This generates uncertain in predictions or information, so I eliminated disconnected nodes. The 
Excel File S1 sheet 1 reports the pruning protocol with the degree-lists of the 1060 residual nodes and with the 87 nodes eliminated. 
Figure 2 shows this interactome (from now on “interactome-1060”) as calculated by STRING.
  3.5. Reverse Engineering
Reverse engineering in biology applies an engineering concept, that of dismantling a process to understand it and discover the biological strategy. Thus, it is often used to discover the design principles of a biological system when the relationships between microscopic and higher-level processes are degenerate (many-to-many or one-to-many). It addresses the understanding of a complex system when the non-linear relationships between the system’s capabilities and its deep molecular mechanisms change. This suggests its usefulness in analyzing a complex functional system, faced with limited a priori knowledge of its “design principles” [
134]. At first glance, “disassembling to reassemble” may seem like a reductionist approach to systems biology. However, data-intensive biological fields use reverse engineering approaches to recognize nonrandom connectivity patterns and identify the functional capabilities of the overall network architecture [
20]. This enables a topological analysis that abstracts from the context of network connectivity to identify functional capabilities. It is an approach to understanding how certain components are wired to create a functional whole. The search for these design principles allows us to know lower-level causal details and becomes robust when integrated with external experimental data tested in vivo and which can therefore biologically validate an interaction as real [
135].
My goal is to understand whether, in the same system but with changed organizational features, S1 performs similar operations Although specific metabolic parameters influence the activation of molecular mechanisms and functions, we might identify parameter spaces that support the same functions. In a broader discussion, we should consider that groups of viral proteins contribute to enhance the virulence of the virus by attacking single human proteins with multiple interactions, but some also do so alone. This should not be surprising because I have already observed it in the liver affected by COVID-19 [
10]. SARS-CoV-2 shows a broad tissue tropism, although of varying degrees, perhaps greater than what clinicians can appreciate through observation. This tropism is, however, expressing the steps necessary to progress the viral infection, even in phenotypically different individuals, and represents a strategic adaptation to the host. Achieving success in replication requires the virus to leverage Spike’s interactivity and adapt its proteome strategy to the host’s unique metabolic landscape. Among the main variables that the virus encounters are age, sex, nutritional status, and previous individual pathologies.
In the 
Excel File S2, sheet 1, I show the overall results of the reverse engineering analysis. I checked, one by one, all 1060 nodes of the interactome in 
Figure 1 against the 25,521 interactions collected in the SHPID database [
10]. These interactions derive from individual proteins encoded by the SARS-CoV-2 proteome and those of the human proteome, as reported in BioGRID.
This file shows that there are multiple interactions of S1 together with other viral proteins (
sheet 3). I also found many viral proteins that interact in a one-to-one manner with specific human proteins (
sheet 2). They could be pharmacological targets. Many human proteins are not involved in any viral activities. This result confirms my previous observations on COVID-19 [
10]. Proteins not involved control metabolic processes that are beneficial for both the virus and the human host. The most interesting observation (
sheet 2) is a set of 27 unique one-to-one interactions of S1 with human proteins (ACE2, AGTR1, AKT2, APOE, ASGR1, AVPR1B, C1QB, C1QC, CD46, CFH, CFP, CLEC4M, COP1, CR2, DPP4, ESR1, F10, FLT1, L12RB1, ITGB6, LYPLA2, MBL2, NID1, SDC1, SDC2, SNCA, TLR4). Through these proteins, we can try to understand in which functional processes they are involved, with which human proteins, and whether these interactions could represent a functional framework exclusive to S1, or to the infection.
The multiple interactions refer to the attacks conducted by groups of viral proteins, including S1, against single human proteins. The file (
sheet 3) lists 148 human proteins attacked in this way. A careful observation shows that the number of viral proteins attacking single human proteins is often considerable. An example of this, the gene EIF2S1, Eukaryotic Translation Initiation Factor 2 Subunit Alpha, encodes the protein IF2A_Human. This is a protein of 315 aa, with a mixed alpha/beta structure. It is a member of the eIF2 complex that functions in the early stages of protein synthesis by forming a ternary complex with GTP and initiator tRNA. Nineteen viral proteins (nsp1, nsp3, nsp4, nsp5, nsp6, nsp13, nsp14, E, M, N, S, ORF10, ORF3a, ORF3b, ORF6, ORF7b, ORF7a, ORF8, ORF9b) attack this protein. Given its small size, like that of the ternary complex itself, it is impossible for there to be enough surface space for the interactions of 19 proteins concurrently. This means that the interactions, although all brief and momentary, occur at different times. However, without the time sequence, it is impossible to define the actual functional mechanism affected by these interactions (even without wanting to consider the “where”). On this basis, it once again seems useless to define an overall mechanism through chronologically undefined single interactions.
  3.6. Interactome-814
I used these twenty-seven proteins as functional seeds in the human proteome. 
Figure 7 shows the new interactome calculated by STRING.
This interactome (from now on “interactome-814”) comprises 814 nodes (
Excel File S3, sheet 1). The first observation is that despite that I added 1000 proteins for the enrichment, the system only accommodates 787 of them (814 − 27 = 787). This seems to reflect a low number of experimentally proven interactions. We can consider that STRING classifies only 21.46% of them as High or Highest (
Excel File S3, sheet 2), which brings us back to the considerations made in 
Appendix A. 
Table 4 shows that this interactome too has a periphery rich in subgraphs, but is on average less dense (0.22), with a value of the average number of neighbours about 50% lower than interactome-1060. Heterogeneity (1.042) suggests the tendency of this network to contain hub nodes, while the centralization value (0.138) still supports compactness, even if the distance between two nodes (diameter) is lower but still high, and supports the almost asymmetrical architecture we observe. In conclusion, we have an interactome with a global organization quite like the previous one, although smaller and less dense in terms of connectivity.
Also, interactome-814 shows a power law characteristic of scale-free networks (
Figure 8). Differently from interactome-1060, the log–log distribution plot shows a fit with a good R
2 value of 0.7528, so this log–log fit is the signature of a system well described by the power law equation. Hence, interactome-814 should show a very balanced and linear overall growth, without distorting effects. Also, in this case, the exponent y is greater than 1, showing a central component that does not prevent peripheral modules. As for the two slopes, comparing them, they are both negative and not very different in value. However, the slope shows different growth rates, with the number of nodes increasing faster in 1060. The two interactomes, although similar, react differently to internal or external factors, and this could be because of the greater heterogeneity found in 1060. All this suggests that, despite the considerable underlying biological complexity, the relationships between metabolic processes and population sizes of the interactomes seem to obey a simple relationship, given by the power equation. This is a further fact that justifies the comparisons I am making of the two interactomes and also the search for the specific functional activities of the S1 protein.
In 
Figure 9, I show the centrality distributions of interactome-814. I reported the numerical values of the first 26 terms for each distribution in the 
Excel File S3, sheet 3. The same procedure adopted for interactome-1060 was used to assign the highest-ranking values. We can see in the betweenness centrality distribution that the upper range of the distribution is very wide, involving proteins with both a high degree and medium-low degree. What is striking is that some of them are also present in the centrality distribution of the eigenvectors. Since these are different topological properties, this, as we will see later, suggests mixed proteins (hub/bottleneck), a situation not present in interactome-1060. 
Table 5 reports the results showing the highest-ranking hub and bottleneck nodes. A comparison with 
Table 3 shows that although the architecture of the two interactomes may seem quite similar, the main proteins that underlie their structural and functional organization are different and behave differently. The individual nodes in 
Table 3 perform only one activity, either as a hub or as a bottleneck. There are no mixed-activity nodes. We can consider those in 
Table 3 as pure hub and bottleneck nodes [
136], while many of those in 
Table 5 show mixed activity.
 The functional coincidence between some hubs and bottlenecks in the interactome shows that these proteins not only cover many interactions but play a critical role in maintaining connectivity and stability in the network [
137]. The coincidence also suggests that these proteins are fundamental to the function of the biological system and may represent key points for therapeutic interventions or functional analyses [
136]. In fact, both categories of strategic positions in the network help to understand the robustness and vulnerability of the interactome, revealing potential regulatory mechanisms [
138]. This allows us to consider that the S1 subunit behaves differently when it interacts alone with the human proteome. To obtain a more reliable picture, I verified whether a hub-spoke scheme also exists in this case and the main allocations these proteins have in the cellular compartments. 
Figure 10 shows a hub-spoke scheme where the central system is mixed because both pure hubs and bottlenecks man it.
The processes shown in the table are just some of the most relevant terms in which the nodes of the hub-and-spoke organization are involved. The graph shown is the structural backbone of the network. These activities, as well as many others not reported, support the deep involvement of S1 in metabolic activities, even with worrying negative aspects (hsa05200 or HAS-199418). All processes show high strength values, which suggest coordinated and active processes, and are well supported even at the gene expression level. The graph includes 23 nodes among the highest-level ones, and 22 of them are involved in well-supported and significant negative processes.
Figure 11 shows the four most significant distributions relative to the cellular compartments (cytosol and nucleus) and tissues (nervous system and blood) populated by the proteins of interactome-814. The upper parts of the distributions exhibit dense populations between the values 4 and 5. This shows the high functional activity of the proteins that populate it. The extent of involvement of high-ranking proteins can be determined by analyzing the distribution along the abscissa (degree). In this interactome, the cytosol, and nucleus stand out as the most involved and populated cellular compartments. However, the extracellular area and membrane level also exhibit intense metabolic activity. Among the tissues, the nervous system is involved by proteins that include many of the high-ranking ones.
 In summary, the two interactomes, despite their similar structure, perform distinct functions that are only broadly defined. It becomes important to focus on the functional activity to understand if, and how much, they differ from the point of view of metabolic purposes and, above all, which genes oversee these processes. Are they the same genes or are they different genes? What is surprising is that interactome-814, despite having a lower total number of nodes than interactome-1060, controls 7120 terms and 40% more functions based on Gene Ontology terms (see 
Table 6). 
Figure S6 (Supplementary Material) summarizes the major functional roles of interactome-814. I highlight the major subgraphs, showing one of their primary functions. The Data Merging approach will implement and detail this still-rough functional summary later.
In Barabasi–Albert network models, enrichment arises from a network growth process governed by the preferential attachment of nodes. The same protein can exert different functions by binding to different partners. A fundamental question is to understand how the opportunistic choices of individual nodes shape the properties of the global network. Identifying these influential nodes is a challenging and still understudied task. We also have to consider that nodes are biological agents and links represent their functional interactions, which can also be modeled as cooperative activities. Nodes, taking part in an ever-increasing number of molecular processes, can change their local behavior or topology, maximizing their cooperative activity [
139,
140].
How does a protein select among a multitude of potential binding partners within a cell, expanding its functional repertoire? An adequate response should consider the location and translation rate of messenger RNA (mRNA), as both factors can cause spatial regulation of protein synthesis, affecting local protein concentrations and interactions [
141,
142]. The rate of translation elongation can indeed influence protein folding and its interactions with other proteins [
143,
144]. Slow translation can allow more time for co-translational folding and interaction with certain partners, while rapid translation might favor interactions with different proteins or lead to misfolding [
145,
146]. However, additional considerations can also come from other types of comparisons of the two interactomes.
  3.7. Data Merging
The two interactomes, 1060 and 814, although induced by the same viral protein, appear to operate in different metabolic contexts. Characterizing the behavior of these two networks is essential to understand the complexity of S1 action [
147]. The differences appear clear if we compare the set of GO processes controlling each interactome. The enrichment of interactome-814 shows 7120 terms in 15 categories. Interactome-1060 shows 4989 terms in 15 categories. The difference in terms is 1.42-fold, but for ontological terms, it is 40%, and the three Ontologies reflect the functions. The size and reliability of the datasets under study, the scientific design, and the phenotype specificity affect identifying critical nodes and functional processes in any system. I standardized these variables by making the methodological procedures as similar as possible and, most importantly, using only experimental data and selecting only those with the highest reliability. I considered the topological properties of nodes and evaluated their functional roles based on their ability to transmit information within and between modules in the network. Using Gene Ontology for genomic functional annotation is crucial, as it can reveal important biological information. Gene Ontology (GO) comprises three categories: molecular function, cellular component, and biological process [
148]. But it has redundancy problems when analyzing them together, especially because of gene overlaps. The redundancy in GO annotations can complicate interpreting biological data [
149]. Therefore, the analysis of a single ontology, such as Biological Processes, which are also the most abundant and all-encompassing, can be a useful strategy to limit the redundancy and improve the clarity and significance of the results [
150,
151].
By comparing the Biological Processes (GO) of the two interactomes, I still highlighted the large functional differences already noted. There are 2,557 processes for interactome-814 versus 1430 processes for interactome-1060, which is 44.1% more. A closer look at the two interactomes (see 
Excel File S1, sheet 4 and Excel File S3, sheet 4) shows that many functions are similar, while others appear specific to each of them. The same happens for many of the nodes involved. In fact, some of them appear many times in different Biological Processes associated with the same interactome. All this suggests the important and central role of these genes in regulating some cellular functions related to COVID-19 [
152], but it also raises questions we cannot yet answer today. For example, if the same gene appears in dozens of different Biological Processes, does this occur in a narrow window or over a long-time horizon? The analysis of cellular systems requires the coordination of large numbers of events, but identifying the temporal cues underlying interactions is the critical part of understanding cellular functions. With current knowledge, we could have a variety of interpretations, but they may be distorted [
153]. This has led us to investigate the overall behavior of Biological Processes, rather than wanting to find the gold process at all costs.
Existing multiple interactions within the interactome show a complex network of gene regulation, in which some genes can influence a myriad of Biological Processes. However, when we say many genes and a “myriad of biological processes”, we need to know what we are talking about in quantitative terms. To my knowledge, no study related to SARS-CoV-2 has ever made such an assessment. To understand the similarity and dissimilarity of functions and the genes that support them, I used an analysis borrowed from marketing methods to compare the two data sets represented by the Biological Processes (GO). I compared the two interactomes through Data Merging (details in Methods, 
Section 2), combining the two large biological data sets into one (see 
Excel File S4, sheets 1 and 2). Data Merging is used to evaluate interaction parameters, append observations, and find repetitions. Therefore, the logic I used was that to distinguish the common processes (coupled processes) from single processes (uncoupled processes) of each interactome. Merging the data optimizes the collection of all information into a single set, maximizing the completeness with which critical information can be extracted and analyzed. 
Excel File S4, sheet 1 also reports in full all the genes involved in the single terms, both paired and unpaired for 68,300 genes, which are also reported (
sheet 2 of Excel File S4). These genes are redundant, because the same gene can take part in dozens of different molecular processes, as shown below in 
Table 7. This table illustrates the general picture that emerges from the merging of the two data sets, both containing common processes, but also specific to one or the other data set.
Table 7 shows how the Data Merging reveals thousands of genes with widespread gene redundancy, but also many uncoupled processes. These results show the activities exerted by the S1 subunit alone in its one-to-one relationships (in 814) have a relevant functional incidence (53%). However, the large number of high-scored genes in the same processes also means that multiple genes will have to appear multiple times in Biological Processes associated with the same interactome. An average value of over twenty genes per process shows how difficult it is to single out a single signaling pathway, or even a metabolic process, and assign genes to it.
 The observed differences in gene composition suggest that gene expression and its involvement can vary depending on the specific context, such as different tissue types, conditions, or stages of development. This can cause different genes to be highlighted, even at different times, within the same larger biological process. We should not overlook the different ways in which 68,003 genes can be organized into 2837 different processes. About twenty genes are actually responsible for many processes. The overall number of processes is 23
2837, while S1 comprises 26
1515. This is an astronomical number of combinations, which makes it clear why adequate and correct experimental data, and their control, are necessary to reduce the combinations to a few when studying specific functional processes in any design context. As an illustration, when examining IL12A, involved in coupled processes, or RACK1, involved in uncoupled processes of 814 (
Table 7), they exert a wide range of biological functions, so many that each of them is involved in over 100 processes. Therefore, how can we ascribe the precise biological pathway in which each of these proteins takes part, considering their abundance of over 100 occurrences within the interactomes under investigation?
Studies on HeLa cells have revealed that protein expression levels exceeding 90% are consistent with the average level of protein expression [
154]. This shows that there is ample evidence to support an excess of protein copies, even at the level of gene expression, encompassing a significant portion of transcripts that encode functional proteins. But this ensures the efficient functioning of the processes in which these proteins are involved [
154,
155,
156]. Protein abundance can be determined by many factors, such as transcription, translation, or RNA/protein decay [
157]. Therefore, these factors can combine to produce a certain expression value. The load balance between transcription and translation regulates the gene expression necessary to optimize cellular fitness [
158]. Low expression of essential proteins slows growth [
159], but even generalized overexpression of proteins slows growth because it increases metabolic load [
160] and energetic costs. Today, we can only say that the implications of over-representations of genes in an interactome can be multiple and each hypothesis influences the understanding of disease and cellular interactions [
161]. The correct regulation of genes in space is necessary for proper function.
These claims may raise many questions, but there is no clear evidence to support any hypothesis or claims made about this matter. Despite technological advances in high-throughput sequencing, our ability to draw functional conclusions from expression data is lagging and qualitative [
162,
163,
164]. The cell organizes its biochemistry in space by forming distinct chemical compartments in which membranes are separating barriers. Achieving the ability to differentiate the functions of cells within a multicellular tissue requires standardizing spatial transcriptomics data and correlating it with cellular mappings using bioinformatics systems. This will enable identifying various subpopulations with their distinct transcriptional profiles [
165,
166]. In addition, when we evaluate protein–protein interactions present in an interactome, we realize that, despite the integrations between different sources, they are far from complete in experimental terms [
167,
168]. This can lead to gaps in the real physical characterization and certainty of the interaction that is reflected in distortions of functional knowledge in GO processes. Superimposition between gene sets can cause low specificity in over-representation analysis, affecting the results and conclusions. Thus, over-representation (also called enrichment analysis) in genomic analysis plays a crucial role in several aspects. It works by identifying pathways or gene/protein sets that have a higher overlap with a known gene/protein set of functional interest than expected by chance. For example, it helps identify significant biological pathways associated with certain conditions or diseases by revealing how over-represented genes/proteins interconnect. The interconnectivity of genes, i.e., their membership in functional communities, enables us to unravel complex biological mechanisms that we cannot resolve by analyzing some individual processes or signaling pathways. In summary, over-representation is fundamental to interpreting genomic data, but when these are overabundant and complex, with high protein redundancies, as we find them here, it may be more appropriate to identify sets of genes that are interconnected and that exert specific functional activities in common. This way, we should have a more precise vision of the functional strategies in an interactome. Therefore, I eliminated redundancies from the three gene sets by isolating the single copy of each coding gene. I obtained three sets of coding genes: 944 genes for the coupled processes of interactomes 1060 + 814, 689 for the uncoupled processes of interactome-1060, and 771 for the uncoupled processes of interactome-814. I performed a clustering analysis of each of the three sets of their decoded products (
Excel File S5, sheets 1–3). The sets encompass proteins related to common and interconnected functional processes (1060 + 814), proteins involved in the one-to-one activity of S1 (814), and proteins derived from interactome-1060 that do not fall into the sets.