Interactome of Arabidopsis Thaliana

More than 95,000 protein–protein interactions of Arabidopsis thaliana have been published and deposited in databases. This dataset was supplemented by approximately 900 additional interactions, which were identified in the literature from the years 2002–2021. These protein–protein interactions were used as the basis for a Cytoscape network and were supplemented with data on subcellular localization, gene ontologies, biochemical properties and co-expression. The resulting network has been exemplarily applied in unraveling the PPI-network of the plant vacuolar proton-translocating ATPase (V-ATPase), which was selected due to its central importance for the plant cell. In particular, it is involved in cellular pH homeostasis, providing proton motive force necessary for transport processes, trafficking of proteins and, thereby, cell wall synthesis. The data points to regulation taking place on multiple levels: (a) a phosphorylation-dependent regulation by 14-3-3 proteins and by kinases such as WNK8 and NDPK1a, (b) an energy-dependent regulation via HXK1 and the glucose receptor RGS1 and (c) a Ca2+-dependent regulation by SOS2 and IDQ6. The known importance of V-ATPase for cell wall synthesis is supported by its interactions with several proteins involved in cell wall synthesis. The resulting network was further analyzed for (experimental) biases and was found to be enriched in nuclear, cytosolic and plasma membrane proteins but depleted in extracellular and mitochondrial proteins, in comparison to the entity of protein-coding genes. Among the processes and functions, proteins involved in transcription were highly abundant in the network. Subnetworks were extracted for organelles, processes and protein families. The degree of representation of organelles and processes reveals limitations and advantages in the current knowledge of protein–protein interactions, which have been mainly caused by a high number of database entries being contributed by only a few publications with highly specific motivations and methodologies that favor, for instance, interactions in the cytosol and the nucleus.


Introduction
Most proteins do not act alone but interact with others to fulfill a number of processes and functions within a cell [1]. For this reason, they often form protein complexes and molecular machines to facilitate and enable their proper function for the cell [1][2][3]. Proteinprotein interactions (PPI) are understood as physical contact, and molecular attaching between proteins and are a prerequisite for protein functioning in living cells [4]. Therefore, a better understanding of proteins and their respective function is facilitated by identifying protein-protein interactions. Analyzing and characterizing PPI contributes to an understanding of proteins' activity and response to biochemical, signaling and functional processes [1].
PPIs are essential or responsible for a broad spectrum of processes and functions, such as cell-cell interactions, the transformation and transportation of information through the cell and different transportation processes of molecules within a cell, such as the transport of ions, electrons and protons [5]. For instance, an impressive interaction of multiple proteins cooperates in mitochondria: electrons are transported to generate a proton motif The consensus subcellular localization of SUBA was applied to have a first look at the contribution of compartments (Tables S4-S9). This revealed an enrichment of nuclear proteins (26% to 34%) and a depletion of extracellular (12% to 5%) and mitochondrial proteins (9% to 6%) ( Figure 1).  The consensus subcellular localization of SUBA was applied to have a first look at the contribution of compartments (Tables S4-S9). This revealed an enrichment of nuclear proteins (26% to 34%) and a depletion of extracellular (12% to 5%) and mitochondrial proteins (9% to 6%) ( Figure 1). Figure 1. Subcellular distribution of proteins in the network and according to protein-coding AGI listed in the genome. The diagram is based on the consensus prediction provided by SUBA4. Nuclear proteins (green) were enriched in the network, and a depletion was visible for extracellular (grey) and mitochondrial proteins (blue).
The GO distribution confirmed the high content of nuclear proteins through enrichment of proteins involved in DNA-binding, transcription factor activity, nucleic acid binding, chromatin binding and transcription regulator activity ( Figure 2). Figure 1. Subcellular distribution of proteins in the network and according to protein-coding AGI listed in the genome. The diagram is based on the consensus prediction provided by SUBA4. Nuclear proteins (green) were enriched in the network, and a depletion was visible for extracellular (grey) and mitochondrial proteins (blue).
The GO distribution confirmed the high content of nuclear proteins through enrichment of proteins involved in DNA-binding, transcription factor activity, nucleic acid binding, chromatin binding and transcription regulator activity ( Figure 2).
Among the process GOs, a depletion of proteins involved in translation was obvious. Although one large-scale experiment focused on the interaction network of membrane proteins, we could only observe limited enrichment in plasma-membrane, ER, vacuolar or Golgi-proteins, which increased from 18% to 21%. The process and function categories showed a consistently reduced abundance in the lipid metabolism process and lipidbinding function, as well as a reduced cell-cell signaling process and signaling receptor binding function ( Figure 2). Reduction in the processes of translation regulator activity and translation was accompanied by a reduction in the RNA-binding function ( Figure 2). . The ratio has been calculated as the number of GOs in the network/the number of GOs of protein-coding genes in A. thaliana. GOs with a ratio >1 were considered as enriched in the network and labeled in red. Yellow columns correspond to ratios <1.
Among the process GOs, a depletion of proteins involved in translation was obvious. Although one large-scale experiment focused on the interaction network of membrane proteins, we could only observe limited enrichment in plasma-membrane, ER, vacuolar or Golgi-proteins, which increased from 18% to 21%. The process and function categories . The ratio has been calculated as the number of GOs in the network/the number of GOs of protein-coding genes in A. thaliana. GOs with a ratio >1 were considered as enriched in the network and labeled in red. Yellow columns correspond to ratios <1.
The network was constructed based on UniProt-IDs. For enhanced identification, commonly used AGIs were used as node labels, with the node color providing information on subcellular localization, which corresponds to the consensus localization provided by SUBA4 (Network S10). The network also contains experimental data on the subcellular localization, which were available for 9207 and 3162 proteins as mass spectroscopy and GFP data, respectively. Biochemical data were included as node attributes; for instance, TAIR provided the number of transmembrane helices for 725 proteins.
Subnetworks were defined by subcellular localization and GO-terms/gene descriptions and contain neighboring proteins/nodes of the selected nodes. Using this definition, ten subnetworks contain compartment-specific data. The respective number of interactions varied between the organelles. The largest subnetworks were associated with cytosolic and nuclear proteins ( Table 2). The organelle-subnetworks contain the following respective keyplayers: the ER network contains components of the COPII-vesicle coat (Sec12, Sec23/24, Sec13), protein sorting (cornichons and p24-proteins; Rer1) and proteins for protein biosynthesis, such as the Get-complex and ERAD-related proteins (IRE1, DERLIN1). The Golgi network comprises typical Xylosyl-, oligosaccharyl-, galacturonosyl-transferases, glucuronyltransferases, hydroxyproline O-galactosyltransferase, N-acetylglucoseaminyl transferase I, Golgi-SNAREs Gos1 and Gos11. The vacuolar network contains seven V-ATPase subunits, a V-PPase and several other vacuolar transporters. Keyplayers of respiration were found in the mitochondria, including subunits of complexes I-IV and alternative oxidases, but also subunits of the TOM-complex, mitochondrial RNA-editing enzymes and mitochondrial ribosomal proteins. Catalases, catalase chaperones, photorespiratory enzymes and peroxin 11 are present in the peroxisomal network. The chloroplast network contains key players of photosynthesis, such as D1 and D2, CP43 and CP47; subunit IV of the cytochrome b6/f complex; PsaF, PsaG, PsaL, PsbP-proteins; PsbQ; ferredoxin-NADP(H) oxidoreductases and proteins of the light-harvesting complexes; but also cytosolic, nuclear and plasma membrane proteins. It is remarkable that a high content of plasma membrane and nuclear and cytosolic proteins was found in all organelle subnetworks. The nuclear network is characterized by DNAand RNA-modifying enzymes, such as polymerases, helicases, polyadenylation, ribonucleases, histones and enzymes of histone modification; such as E3 ubiquitin ligases, histone chaperones, histone methylation complex and histone acetyltransferases; while enzymes for deacetlyation, transcription factors and other transcription-regulating factors were also found in this subnetwork. Last but not least, a large variety of proteins were present in the cytosolic subnetwork, reflecting its involvement in many processes: components of the translational machinery (ribosomal proteins, elongation factors, chaperones); of signal transduction (calmodulins, thioredoxins, 14-3-3 proteins, kinases, phosphatases, enzymes of ubiquitinylation and sumoylation) and subunits of the proteasome contribute to the cytosolic network of protein-protein interactions.
The process-related subnetworks show a high contribution of protein biosynthesis (transcription and translation) to the number of protein-protein interactions, although the number of proteins with a function in translation was reduced in the network (Table 3). Conserved and fundamental processes, such as vesicle transport, glycolysis and respiration, were characterized by a surprisingly low level of representation in the network (Table 3). Some proteins showed an apparent sticky behavior with vastly more interaction counts than other protein family members. This includes the ER cargo receptor Cnih1 (AT3G12180), which interacts with 648 out of 662 Cnih-interaction partners, in comparison to its isoforms Cnih3 (AT1G62880) and Cnih4 (AT1G12390), which count eight and ten interacting proteins, respectively. Ubiquitin 3 (UBQ3) showed a similar behavior by interacting with 1317 proteins out of 2937 in the Ubiquitin subnetwork.
Looking at signaling subnetworks revealed organelle-specificity for thioredoxins with plasma membranes and plastidic clusters, while 14-3-3 proteins showed no organelle specificity ( Figure 3). Furthermore, the 14-3-3 subnetwork contains primary active transporters such as proton pumps and calcium pumps, pointing to a role of 14-3-3 proteins in regulating calcium and pH-homeostasis, as recently suggested for pH [163].
The SNARE network contains members of the tethering complex exocyst and cell plate assembly TRAPPII tethering factors, heavy chain subunits of clathrin-coated vesicles and SNAP25-like proteins, α-SNAP proteins and members of the GET-complex, all of which are tightly connected to the function of SNAREs-despite an overrepresentation of plasma membrane proteins. The Ubiquitin subnetwork reflects the multiple functions and sticky behavior of Ubiquitin, with more than 4500 interactions in the network.
Next, the network was applied to gain new insights into the interactions of the plant proton-translocating ATPase of the vacuolar type (V-ATPase) and thereby to test the usefulness of the network for addressing biological questions. The V-ATPase is a multi-subunit proton pump, which transduces conformational alterations due to ATP-hydrolysis within the catalytic subunits VHA-A into rotation of the central stalk formed by VHA-D and -F. VHA-d transfers the rotation to the proteolipids VHA-c and VHA-c". These bear the proton binding sites, while half channels of VHA-a C-terminus serve as cytosolic proton inlet and luminal exhaust. VHA-a's N-terminal domain contributes to a stator structure together with VHA-C, -E, -G and -H, stabilizing and fixing the catalytic head to the membrane [163]. For instance, the analysis aimed to provide evidence for the existence of a Golgi-dependent transport route to the vacuole and involvement of 14-3-3 proteins in V-ATPase regulation, which might indicate a light-dependent regulation, as reported for barley [164]. The V-ATPase subnetwork contained 448 interactions. Nearly all subunits were listed as interacting with the plasma membrane intrinsic proteins PIP1B, PIP2A, Ubiquitin UBQ3 and the ammonium transporter AMT1;3, probably originating from experiments that did not distinguish between the subunits. The provided subcellular localization was applied to omit proteins of the chloroplast and mitochondria from subsequent analyses. According to the previously mentioned ER-export and transport routes along the secretory pathway, the data revealed an interaction between VHA-e and the ER exit site cargo receptor Corni-chon 1 (Cnhi1) with a high co-expression level and between VHA-e1 and the ER to Golgi SNARE BET12. Interactions with SNAREs were also observed for VHA-A and the vacuolar SYP22, while the interaction of VHA-c and VAP27-1 links the V-ATPase subsector V 0 to ER-plasma membrane tethering, and VHA-D interacts with an adapter protein (AP19) of Clathrin-coated vesicles. On the regulatory level, interactions with 14-3-3 proteins were found for VHA-A and GRF3 as well as for VHA-B and GRF1 (Figure 4). Next, the network was applied to gain new insights into the interactions of the plant proton-translocating ATPase of the vacuolar type (V-ATPase) and thereby to test the usefulness of the network for addressing biological questions. The V-ATPase is a multi-subunit proton pump, which transduces conformational alterations due to ATP-hydrolysis within the catalytic subunits VHA-A into rotation of the central stalk formed by VHA-D and -F. VHA-d transfers the rotation to the proteolipids VHA-c and VHA-c". These bear the proton binding sites, while half channels of VHA-a C-terminus serve as cytosolic proton inlet and luminal exhaust. VHA-a's N-terminal domain contributes to a stator structure together with VHA-C, -E, -G and -H, stabilizing and fixing the catalytic head to the membrane [163]. For instance, the analysis aimed to provide evidence for the existence of a Golgi-dependent transport route to the vacuole and involvement of 14-3-3 proteins in V-ATPase regulation, which might indicate a light-dependent regulation, as reported for barley [164]. The V-ATPase subnetwork contained 448 interactions. Nearly all subunits were listed as interacting with the plasma membrane intrinsic proteins PIP1B, PIP2A, Ubiquitin UBQ3 and the ammonium transporter AMT1;3, probably originating from experiments that did not distinguish between the subunits. The provided subcellular localization was applied to omit proteins of the chloroplast and mitochondria from subsequent analyses. According to the previously mentioned ER-export and transport routes along the secretory pathway, the data revealed an interaction between VHA-e and the ER exit site cargo receptor Cornichon 1 (Cnhi1) with a high co-expression level and between VHA-e1 and the ER to Golgi SNARE BET12. Interactions with SNAREs were also observed for VHA-A and the vacuolar SYP22, while the interaction of VHA-c and VAP27-1 links the V-ATPase subsector V0 to ER-plasma membrane tethering, and VHA-D interacts with an adapter protein (AP19) of Clathrin-coated vesicles. On the regulatory level, interactions with 14-3-3 proteins were found for VHA-A and GRF3 as well as for VHA-B and Subnetworks of Thioredoxins (A) and 14-3-3 proteins (B). Node color corresponds to subcellular localization (Cytosol-grey, nucleus-cyan, plastid-green, mitochondrionred, peroxisomes-orange, ER-light blue, Golgi-light purple, vacuole-dark purple, plasma membrane-purple, extracellular-white), node size indicates the molecular weight. Edge size displays co-expression strength.
Interaction between the hexokinase 1 and the glucose receptor RGS1 with VHA-B points to an energy-dependent regulation that acts directly on the head structure of the V-ATPase, which catalyzes the ATP-hydrolysis. Furthermore, VHA-A and VHA-B (as well as VHA-E and G) are targets of salt overly sensitive 2 (SOS2), a Ca 2+ -sensor with a role in K +homeostasis and salinity. V-ATPases also interact with the Calmodulin-binding IDQ6, but with a low co-expression level and steroid-binding proteins such as MSBP2 and RBL11. The nucleoside diphosphate kinase NDPK1A, which modulates the auxin-transport, interacts with VHA-c, while the soluble kinase Without No Lysine 8 (WNK8) interacts with VHA-C. Interestingly, VHA-C binds to APG5, which is responsive to nitrogen starvation, and to the nitrate reductase NIA1. A couple of transcription factors were reported to interact with V-ATPase subunits, including three members of the MYB-family, two of the TCP-family, two of the NAC-family, UNE12, scarecrow-like 3, transparent testa 1 and WRKY60.
The interaction of V-ATPase subunits and proteins involved in cell wall synthesis, such as EMB3135, TBL18, ECH and YIP4B, point to a function of V-ATPase in cell wall synthesis. Here, V-ATPase joins the complex of ECH and YIP4B at the TGN via VHA-a and VHA-c.
The interaction between V-ATPase and other transporters is also apparent, where the interaction of the phosphate transporter PHT3 and the phosphate-starvation-induced IPS2 with V 0 -subunits might correspond to the terminal signal-transduction ( Figure 4). Interaction between the hexokinase 1 and the glucose receptor RGS1 with VHA-B points to an energy-dependent regulation that acts directly on the head structure of the V-ATPase, which catalyzes the ATP-hydrolysis. Furthermore, VHA-A and VHA-B (as well as VHA-E and G) are targets of salt overly sensitive 2 (SOS2), a Ca 2+ -sensor with a role in K + -homeostasis and salinity. V-ATPases also interact with the Calmodulin-binding IDQ6, but with a low co-expression level and steroid-binding proteins such as MSBP2 and RBL11. The nucleoside diphosphate kinase NDPK1A, which modulates the auxintransport, interacts with VHA-c, while the soluble kinase Without No Lysine 8 (WNK8) interacts with VHA-C. Interestingly, VHA-C binds to APG5, which is responsive to nitrogen starvation, and to the nitrate reductase NIA1. A couple of transcription factors were reported to interact with V-ATPase subunits, including three members of the MYB-family, two of the TCP-family, two of the NAC-family, UNE12, scarecrow-like 3, transparent testa 1 and WRKY60.
The interaction of V-ATPase subunits and proteins involved in cell wall synthesis, such as EMB3135, TBL18, ECH and YIP4B, point to a function of V-ATPase in cell wall

Discussion
Data in the databases are dominated by yeast-2-hybrid experiments, and the main portion has been contributed by only a few publications ( Figure 5): the share of yeast-2-hybrid experiments is approximately one-third, and four publications account for more than half of the data. This creates a bias in the data, which results from the aim of the studies and their methodical limitations. For instance, one of the dominant studies involved screening for transcription factor interactions [165], while another focused on interactions of membrane proteins [166] so that the observed enrichment of nuclear proteins and plasma membrane protein interactions is plausible. hybrid experiments is approximately one-third, and four publications account for more than half of the data. This creates a bias in the data, which results from the aim of the studies and their methodical limitations. For instance, one of the dominant studies involved screening for transcription factor interactions [165], while another focused on interactions of membrane proteins [166] so that the observed enrichment of nuclear proteins and plasma membrane protein interactions is plausible. The experimental design has an additional impact on the subcellular localization of the proteins in the network: yeast-2-hybrid relies on protein-protein interactions in the nucleus or applies artificial nuclear localization sequences. The mating-based split-Ubiquitin system works for proteins with cytosolic C-termini in particular for the C-terminal half of Ubiquitin due to the fusion with a transcription factor, which is released by Ubiquitin-dependent proteases. The N-terminal half of Ubiquitin (Nub) can be fused to either a cytosolic N-or C-terminus [175]. For instance, mitochondrial proteins cannot be addressed by these methods, potentially contributing to a low representation of mitochondrial proteins in the network.
Another consideration applies to the probability of an interaction; complementation assays such as bimolecular fluorescence complementation and mbSUS are often performed in heterologous expression systems, partially with strong promoters. This uncouples protein-protein interactions from their spatial and temporal specificity in the plant. The specificity is widely ensured by pulldown experiments, co-immunoprecipitations, affinity chromatography and genetic interference, since proteins are mostly under control The experimental design has an additional impact on the subcellular localization of the proteins in the network: yeast-2-hybrid relies on protein-protein interactions in the nucleus or applies artificial nuclear localization sequences. The mating-based split-Ubiquitin system works for proteins with cytosolic C-termini in particular for the C-terminal half of Ubiquitin due to the fusion with a transcription factor, which is released by Ubiquitin-dependent proteases. The N-terminal half of Ubiquitin (Nub) can be fused to either a cytosolic Nor C-terminus [175]. For instance, mitochondrial proteins cannot be addressed by these methods, potentially contributing to a low representation of mitochondrial proteins in the network.
Another consideration applies to the probability of an interaction; complementation assays such as bimolecular fluorescence complementation and mbSUS are often performed in heterologous expression systems, partially with strong promoters. This uncouples protein-protein interactions from their spatial and temporal specificity in the plant. The specificity is widely ensured by pulldown experiments, co-immunoprecipitations, affinity chromatography and genetic interference, since proteins are mostly under control under their endogenous promoters and in their native environment. However, to compensate for the problem of low specificity and the risk of false-positive interactions in the network, the data were supplemented by the co-expression lsMR-values as a measure of probability. The higher the co-expression, the higher the probability of a protein-protein interaction in the plants. Low co-expression might then be an indicator of false-positive interactions. However, many proteins with related functions were found in the expected and typical clusters, demonstrating the value of the databases for proposing new hypotheses. The visible crosstalk between signaling pathways, such as 14-3-3-protein-mediated signaling and calcium signaling, or the control of proton pumps by 14-3-3 proteins, reveals the complexity of regulation and signal transduction in the plant.
Interaction between V-ATPase subunit VHA-A and 14-3-3 proteins has been shown to be blue-light-dependent in barley [164], and the observed interaction of 14-3-3 proteins and VHA-A/-B points to similar regulation in A. thaliana, which links the V-ATPase activity to daylight. In this context, the observed interplay of V-ATPase and nitrogen assimilation and availability fits well into light-dependent activation. A glucose-dependent regulation by reversible dissociation of the V-ATPase complex has been described for the yeast and the mammalian V-ATPase [176], but could not be observed in plants [177]. Instead, VHA-B1 was reported to interact with hexokinase 1 in the nucleus as part of the glucose-dependent gene-regulation complex [178]. However, this interaction was observed for VHA-B1 in the absence of other VHA-subunits, describing a moon-lighting function of VHA-B1. Interestingly, interaction with glycolytic aldolase was not found in the dataset, though its regulatory function has been described for the V-ATPase in rice [179]. The interaction with the glucose receptor RGS1 might be an alternate way to sense the glucose level in order to regulate the V-ATPase and its ATP consumption.
On the cell-biological level, the data show a clear linkage to cell wall synthesis and is in good agreement with the previously reported cell wall defects of V-ATPase knock out lines affected in the TGN [180]. Actually, the PPI-data indicate a contribution of V-ATPase to the TGN-located complex of ECHIDNA and the YPT/Rab-interacting protein 4B, which is essential for the secretion of cell wall saccharides.
Last but not least, the mechanism of the transport of V-ATPases is less understood. V 0 -subunits such as VHA-a and VHA-c were observed to be transported on a fast track from the ER to the vacuole and thus bypassed the Golgi and TGN, while the V 1 -subunit VHA-E followed the canonical secretory pathway [181][182][183]. The interaction with components of the ER to Golgi transport, such as Cnih1 and Bet12, supports the finding of COP II-dependent transport without ruling out an alternate direct pathway to the vacuole. Furthermore, the interaction found between V-ATPase subunit VHA-c and VAP-27-1 even links the V-ATPase to contact sites between the ER and the plasma membrane [184].

Materials and Methods
Databases-all gene descriptions and GO-terms were bulk downloads from The Arabidopsis Information Resource (TAIR, www.arabidopsis.org, accessed on 16 July 2021). First, a list of all Arabidopsis genome identifiers (AGI) was downloaded from TAIR and used for interaction searches in IntAct and Biogrid. If required, UniProt-IDs were mapped to AGI using the UniProt ID-mapping (https://www.uniprot.org/uploadlists/, accessed on 16 July 2021).
Networks-required data sheets were created using Microsoft Excel (Tables S1-S9); all networks were designed using Cytoscape version 3.8.2 (Network S10). Simple grid layout was chosen for the entire master network, compound spring embedder layout for subnetworks. LsMR-values were chosen as edge-attributes; all other data were node attributes. Cytoscape styles were defined to express the co-expression, subcellular localization and molecular weight (Table 4).

Conclusions
The dataset contains known protein-protein interactions in A. thaliana, which were combined with expression, subcellular localization and GO-association data. Interactions can be screened by gene identifiers and found pairs evaluated based on co-expression and localization data to estimate the probability of interaction. The exemplary analysis of the V-ATPase revealed interactions that were well described in the literature but also others that have not been considered before. Such novel interactions might be worthwhile to investigate further in order to open new perspectives on the characterization of a protein or complex.