How to understand cells well enough to predict evolution: polarity establishment in budding yeast as a case study

A bottom-up route towards predicting evolution relies on deep understanding of the complex network proteins form inside cells. In a rapidly expanding panorama of experimental possibilities, the most difficult question is how to conceptually approach the disentangling of such complex networks. These can exhibit varying degrees of hierarchy and modularity, which obfuscate certain protein functions that may prove pivotal for adaptation. Using the well-established polarity network in budding yeast as a case study, we first organize current literature to highlight protein entrenchments inside polarity. Following three examples, we see how alternating between experimental novelties and subsequent emerging design strategies can construct a layered understanding, potent enough to reveal evolutionary targets. We show that if you want to understand a cell’s evolutionary capacity, such as possible future evolutionary paths, seemingly unimportant proteins need to be mapped and studied. Finally, we generalize this research structure to be applicable to other systems of interest.


Introduction
How cells work and how they evolve is at the heart of cell biology. In this work we will review how cellular architecture ("how cells work") and its evolutionary properties ("how they evolve") are related to each other. Understanding evolution and possible mutational paths of protein networks, and especially the cell polarity network, is not only satisfying our curiosity but may also help us understand and possibly predict cancer progression [1]. Every cell consist of many different interconnected functional protein networks (for definitions, see Table 1), such as transcription, translation, or polarity establishment [2]. The network's architecture, (for example: which protein binds to/reacts with which other protein), impacts the evolutionary possibilities of a network in multiple ways. For example, hubs, proteins with many binding partners, tend to mutate slower [3]. And less connected proteins, that may be deleted in a cell without a detectable change in cell physiology, can permit duplication of other genes and thus promote evolution [4]: Duplicates of a gene enable new options for diversification, which facilitate further evolution of a gene/protein and the surrounding network [5][6][7]. Interestingly, many mutations (from 3% of non-silent mutations in bacteria to 30% in hominids [8] ) in a cell show very weak, or no effect on the cell's function, a phenomenon called neutrality [9]. Thus, proteins that may seem unimportant for how the cell works now, in this environment, may become important when changes occur in the network architecture due to a mutation or switch in environment [10,11]. Degree to which parts of the network are embedded with other parts in the network. In this sense, it can be received as the reciprocal of modularity.

Modularity
Potential to group parts of a protein network given a certain representation of the protein network (e.g., in terms of mechanisms, genetic or physical interactions) Hub protein Highly connected protein in a network (often essential)

Neutrality
No consequence of a mutation to phenotype (in current environment)

Hierarchy
Clear layering of pathways inside a protein network Redundancy Multiple mechanisms that can to some extend interchangeably contribute to the same function A suitable model system to concretize how these proteins, without a detectable phenotype, shape a network is polarity establishment in Saccharomyces cerevisiae, or budding yeast. During polarity establishment, yeast must choose a direction in which to divide, which involves directing dozens of proteins in a process of breaking its internal spherical symmetry (see e.g., [12]. The organism itself is well studied and many network properties have surfaced, such as hierarchy and presence of hubs [2]. Also here, neutrality is pervasive, as only 40% of homozygous gene deletions for the entire organism initially had obvious phenotypes [13]. Moreover, the environment has been shown to have a notable influence on neutrality, as lethal heterozygous deletions can be compensated by poor medium [14]. As a general rule, both the network architecture and the environment can mask the function of many proteins. In addition, in budding yeast, over time the field has been successful in categorizing most proteins to reconstruct several quasi-modular networks interpretable as distinct biological functions, one of which is polarity [15]. Even inside polarity a grouping exists (based on time spent at the polar cortex) that forms submodules [16]. We will focus on the establishment of polarity (symmetry breaking), which has also been a topic for evolutionary studies.
For example, in [17], a mutant strongly defective in polarity establishment was experimentally evolved and found to recover remarkably reproducibly e.g., the first rescuing mutation to sweep the population was always the same. Because of this exhibited tractability of the adaptations, network structures within the polarity network that facilitated evolution could be concretely interpreted in terms of redundancies [18]. In another approach to determine the flexibility of the polarity network, historical evolution was studied for 40+ proteins in almost 300 fungal species in [11]. Again, the polarity network exhibited sufficient modularity so that studying its evolution separately from other functions still yielded interpretable results. For example, authors showed that polarity network size is shaped in part by the fungal lifestyle (e.g, uni-or mutlicellular). This motivates the use of the yeast polarity system for studies in network organization (with properties as hierarchy, modularity/connectivity and redundancy that can 'hide' proteins) as well as evolution.
To illustrate how to move from detecting the strong phenotypes to the important hidden features, we first present a concise review of the polarity protein network in the form of a Venn diagram, ideal for depicting a hierarchical and semi-modular protein grouping. The feasibility of deciphering a protein's role in this Venn diagram turns out to depend on how deep the protein is embedded inside the network, as is the feasibility of attributing genotypes to phenotypes. Therefore, we present in the sections thereafter three quests towards complete understanding of a protein (class) with varying depth of embedding, and consequently, with the current stage of the research varying from far advanced, advanced and relatively preliminary respectively. The three examples serve to show how improved understanding of the non-trivial parts of the networks can elucidate evolutionary trajectories. Ultimately, we believe that in all the work done in yeast for many decades by many researchers there is a common recipe applicable and useful for many protein networks, as expanded upon in the outlook.

Polarity overview
Within the yeast polarity network, four pathways to polarize exist which cannot easily be considered modular. Their interconnectivity can be conveniently visualized in the form of a Venn diagram ( Figure 1). These pathways are hierarchically set up, in the following order: the mating pathway at the top, then the bud scar pathway, followed by the reaction-diffusion (RD) pathway and finally the actin pathway. In short, their function boils down to the act of condensing the GTPase Cdc42 bound to GTP molecules (i.e., active Cdc42) to one point on the plasma membrane, which can signal downstream effectors to proceed the cell cycle [19]. To prevent premature or overdue localization of active Cdc42 and allow some influence on the hierarchy of pathways, a fifth pathway exists to control the previous four, namely the timing pathway. The next section summarizes the most important interactions in and across all pathways, starting with the timing cue, before expanding upon the three examples. As a site note, we have done our best to include all relevant papers, but apologize for important papers we have missed.  (Table S1 for references).

Timing: the control knob
During isotropic growth in G1, active Cdc42 localization is suppressed by over activity of its associated GTPase activating proteins (GAPs) and sequestration of its guanine nucleotide exchange factor (GEF). The consequence of both circumstances is the vast abundance of inactive Cdc42, which is bound to GDP instead of GTP [20], rendering it impossible to signal the polarity cue. The purpose of this pathway (see top dark-purple region in Error! Reference source not found.) is hence to timely reduce GAP activity and release the GEF, which must be in response to important physiological parameters that indicate the readiness of the cell; sufficient protein production, a sufficient size, and sufficient nutrition.
The physiological state of the cell enters the equation through nuclear levels of cyclin Cln3. Upon sufficient nutrition and size, Cln3 levels rise either more directly through higher Cln3 mRNA abundance [21], or more indirectly through Ydj1 disturbing Cln3 localization by Whi3 [22], the latter also being an inhibitor of Cln3 mRNA translation [23]. The arrival of nuclear Cln3 allows binding partner and cyclin-dependent kinase Cdc28 [24] to phosphorylate Whi5, which had inhibited expression of Cln2, another cyclin [25,26]. Cln2 can then reinforce its own expression, consolidating the original Cln3 signal [27]. Now, the Cdc28-Cln2 complex can distribute the physiological signal to the aforementioned targets, the GAPs and the GEF. The kinase Cdc28 phosphorylates all four GAPs Bem2, Bem3, Rga1 and Rga2 [28][29][30][31] and Far1, which was keeping the GEF Cdc24 in the nucleus [32]. Now cytoplasmic levels of active Cdc42 can rise, leading to polarity establishment through subsequent pathways.
Importantly, the completion of the timing pathway causes the hierarchy of the subsequent pathways to change. While the mating pathway is otherwise dominant, the kinase Cdc28 phosphorylates Ste5, a crucial hub in the mating pathway, to stop the mating in its tracks [33][34][35]. In the following discussion of the mating pathway, the situation is considered where the timing pathway did not overwrite its behavior.

Mating: heavily cross-linked
The mating pathway is the dominant force across the four symmetry-breaking pathways. While polarization in a random orientation is possible after the timing cue (see the section on Reaction-Diffusion further on), the presence of pheromones of the opposite mating type (a or α) should redirect the Cdc42 localization to the side of the pheromone signal. This process revolves around Ste5, as also depicted in the left, blue-grey circle of Figure 1.
Briefly put, once pheromones bind membrane proteins Ste2 and Ste3 [36,37], Ste4 is released from the membrane [38,39] and binds Ste20 and scaffold Ste5. This scaffold binds Ste7, Ste11 and Fus3, which are activated by sequential phosphorylation [40][41][42]. Fus3 may inhibit the GAPs Bem2 and Bem3 [29], while Ste5 binds the GEF Cdc24 [43], replacing the absence of the timing pathway result to stimulate activity of Cdc42. While this simplified view would suffice to redirect the Cdc42 localization, the mating pathway is much more intertwined with the other pathways than seemingly necessary, particularly with the actin pathway. The abundant mechanistic redundancies result in a more complex picture, obfuscating the role of proteins involved. For example, active Cdc42 stimulates Ste11 phosphorylation/activation [44]. Another form of positive feedback, as well as a bridge to the actin pathway, is the Cdc42 recruitment of formin Bni1 [45]. The resulting nucleation of actin cables may transport Ste5-GEF Cdc24 complexes [46], possibly also through Bem1. This scaffold coimmunoprecipitates with Act1 [47] and Far1 [48], which is bound to the GEF Cdc24 [49], but is itself in turn also bound to Bem1 [50]. Another actin cross-link is the phosphorylation and localization of Bni1 through Fus3 [51]. Clearly, care must be taken in assigning roles to different proteins, as many are overloaded.

Bud scar: mostly modular and ordered
In absence of a mating cue, the timing pathway reduce GAP activity and release the GEF, while the mating pathway is repressed. Under the new hierarchy, the bud scar pathway is normally dominant. The scar refers to leftover proteins from the previous division, named septins [52,53]. This spatial cue can be exploited for polarity establishment; a new bud forms adjacent to the scar (axial budding, haploids) or also at the opposing side (bipolar budding, diploids) [54,55]. The bottom, dark blue circle of Error! Reference source not found. represents this path from septins to Cdc42 recruitment graphically. More background information about the core bud scar protein group Bud1 to Bud5 is discussed separately in one of the three case studies, and only a brief overview of the pathway as a whole is discussed in this section.
After Bud5, localization follows of Bud2 [67], the GAP for Bud1 [68], to complete the control of the GTPase cycle of Bud1. Finally, Bud1 localizes GEF Cdc24 and indirectly Bem1 [69], to redirect the pattern formation made possible after the timing cue. As linkage of Cdc42 GAP Rga1 to septins prevents re-use of the previous location [70], the new bud forms adjacent to the bud scar.
As a whole, the bud scar pathway is not completely modular either. Aside from nudging the reaction-diffusion pathway (see next section), an example of a cross-link is that Bud8 and Bud9 are delivered by actin transport [71]. The highest position in the hierarchy in absence of the mating cue is also not absolute; multiple ways to promote the subsequent reaction-diffusion pathway exist, such as deletion of Bud1 and Bud8 [72], Axl2 and Rax1 [73], or Bem1 [17]. Therefore, it has been possible to retrieve the information discussed in the following section.

Reaction-diffusion (RD): ample redundancy
Even in absence of chemical or spatial cues, the shift in balance towards activation of Cdc42 induced by the timing pathway still provides the conditions for swift symmetry breaking. As accurately modelled in [74], the strong positive feedback generated by the Bem1-GEF Cdc24 complex is sufficient, making polarity success rather insensitive to GAP abundance. More details on the GAPs are uncovered in [18] and placed in a broader context in the case study further on.
What makes this pathway special is the limited number of proteins that are unique to this pathway, as seen from the central, emerald circle in Error! Reference source not found., namely only Cla4 and Rdi1. The latter is the least cross-linked of the two, providing a possible justification for referring to the WT mechanism as the Rdi1 polarity mechanism, as in [75]. Cla4 is more contextdependent, possibly having two opposing roles, promoting and inhibiting polarity [76,77,18]. Yet both Rdi1 and Cla4 are dispensable for polarity [76,78].
An even stronger addition to the redundancy within this pathway is on the positive feedback side. Without Bem1, generic rescuing feedbacks suffice [18], among which Cla4 could account for 20% of their function [79]. More feedbacks may be found in the GAPs (see GAP case study) through actin transport as described in e.g., [75]. This brings us to the actin pathway as the final layer to discuss.

Actin: the mysterious auxiliary layer
The actin pathway (right-most green circle in Error! Reference source not found.) has featured several times already in the previously discussed pathways, but its individual role is still quite uncertain. Yeast formin Bni1 which nucleates actin cables, binds active Cdc42 [45], and is known to be involved in exo-and endocytosis [80,81]. This suggest transport of polarity proteins from and to the presumptive bud site. The resulting actin pathway has been confusingly implicated in two opposing roles; promoting Cdc42 polarization, see e.g., [82,75,83], as well as negatively impacting Cdc42 polarization [84,85].
A way to reconcile these findings is that actin transport contributes to a process promoting Cdc42 polarization but without relying on significant transport of Cdc42 itself. As mentioned in the mating pathway, Bem1 and Act1 co-immunoprecipitate [47] suggesting that Bem1, and concordantly its multiple binding partners, might get transported through the actin pathway. However, in absence of Bem1 80% of the positive feedback is still unidentified [79], which may very well be actin-related. Instead, a prime candidate is the GAP group, which is known to bind the epsin-coating of actin cables involved in endocytosis [86]. This is further discussed in the case study on the GAPs. In any case, it is quite difficult to decipher the actin pathway, in large part due to its low positioning in the yeast polarity hierarchy.

Case studies
In the introduction the need for determining the (potential) functions of a protein inside a complex network even when these are normally hidden, was explained as, evolution may exploit these later. In the following sections, the reconstitution of the mechanistic details involving three protein(s) (classes), that are currently at different stages in their discovery (see graphical mechanistic summary in Figure 2), is illustrated; the Bud1-5 bud scar protein subset, the GAP proteins for Cdc42, and finally Nrp1.
In any case, formation of the holistic picture for a particular protein is a result of decades of work. The three cases encompass two common routes towards complete understanding. On the one hand, technical advances that improve detection of phenotypes were the most prominent method of progress for the bud scar proteins. On the other hand, the GAP protein and Nrp1 case show that progress can rely more on very specific designs of experiments than on new technology. Together, these cases illustrate the foundation under the general strategy to approach networks put forward in in the outlook.

Bud scar proteins
One of the strongest phenotypes to observe for budding yeast is the location of the next bud. Already more than half a century ago, it was documented that S. cerevisiae exhibits two possible budding patterns [54]. Normally for haploids, the next bud grows next to the previous division site (axial budding), whereas diploids can also pick the location opposite of the previous site (bipolar budding). Changes in this pattern are clear phenotypes, and are therefore a useful detection tool in bulk mutagenesis screens. In 1991, five genes were identified that affect (seemingly as their sole phenotype) the bud site selection, and were therefore named BUD1 (formerly known as RSR1 [58]), BUD2, BUD3, BUD4 and BUD5 [55,56]. These genes and their associated proteins are therefore straightforwardly localized in the bud scar circle in the Venn diagram ( Figure 1).
The next step, retrieving information on physical interactions, is more elaborate. For example, authors in [56] reason the physical mechanism of Bud5 (amongst others a GEF for Bud1) based on sequence information, which is further substantiated in [68], together with the GAP action of Bud2. But with the advent of GFP [87], it also became possible to get in vivo spatio-temporal information of yeast proteins, later even in bulk [88]. Now armed with the tool of fluorescence microscopy, substantial progress was made for validating the roles for Bud2 and Bud5 [67].
The final confirmation on the mechanisms comes when zooming into the domains of a protein, by expressing truncated versions or intelligent point mutations. For example, the domains responsible for how Bud5 affects whether budding is axial or bipolar are described in [89]. But for other proteins it was necessary to consider in greater detail the protein network that the bud genes compose. Therefore, it has taken more time to reverse-engineer the details of for Bud3 and Bud4. Authors in [62] combine their findings with earlier literature to demonstrate that septins from the previous division localize Bud4, which can then bind Bud3. Subsequently, Bud4 binds Axl2, and after this Axl1 as well. Domain analysis of Bud3 further uncovered more details [90] to show Bud3 can serve as a GEF for Cdc42 (but early in G1, before Start), which is important for the Bud4-Axl1 link.
This leaves only a few mechanistic unknowns. For example, the exact recruitment of Bud2 as depicted in Figure 2 is putative, as it is unclear how Bud2 is recruited even in absence of Bud1 or Bud5 [64]. Bud1 can recruit itself by dimerization and Cdc42 at the bud site of the membrane [57]. In the same paper, it is shown that while Bud5 also recruits Bud1, the former may only indirectly be involved in the Bud1 dimerization, considering the full GTP-GDP cycle is critical for appropriate dimerization. This suggests guidance of Bud2 for the self-recruitment, but the exact order for Bud1binding and hydrolysis of its GTP is open for interpretation, given the alternative model in [64].
As an alternative representation of the current state of research of the bud scar proteins, Figure  3 depicts an ordering in the form of a mind map. For clarity, only Bud1 is considered. As can be seen, the bud scar protein case is relatively well advanced, with ample coverage of all categories.

GAPs
Central to the function of GTPase Cdc42 are proteins that promote GTP hydrolysis, i.e., its GTPase activating proteins. Bem3, was the first to be identified [91], followed by Rga1 and Rga2 [92][93][94]. A summary of Bem3 studies/information is given in Error! Reference source not found.. The specific molecular function of GAPs allowed validation in vitro, where Cdc42-GTP was incubated with a GAP to determine whether the amount of GTP indeed decayed faster than without a GAP. In [91] Bem2 was not found to exhibit detectable GAP activity, and not until [95] could Bem2 be convincingly considered a GAP as well.
After their establishment as a GAP, the localization of Bem3, Rga1 and Rga2 was determined using either antibody staining or GFP-tagging [96]. These GAPs colocalized with Cdc42 at the bud site (although Rga1 is slightly more dispersed). Bem3 is also found in Spitzenkörper-like structures, but seemingly after polarity establishment during polarized growth [97]. The GAP localizations affirm their role in polarity, although they also seemed somewhat redundant, as a triple mutant is still viable [94].
More information on GAPs was gathered through their interactions. Bem2 and Bem3 were found to be associated with the mating pathway [29], Bem3, Rga1 and Rga2 relate to the actin pathway [86,98], while [70] suggests a link between Rga1 and the bud scar pathway. However, several open questions remain. For example, why is there more than one GAP? Why and how are these distributed across multiple pathways?
More clues were to follow from the tedious deciphering of the mechanistic action of the GAPs. The difficulty behind this problem is clearly elucidated by the model of [74]. There, authors show the dominance of the Bem1-mediated positive feedback minimizes the phenotypical influence of varying GAP concentration. Consequently, this led to the realization that GAP details only emerge in a Δbem1 background.
In [18] a proposed mechanistic model for the GAPs was validated in this background. The idea is that GAPs are temporarily retained on the membrane by Cdc42 during the GTP hydrolysis process. Only in the location in the cell with high membrane concentrations of Cdc42-GTP will this lead to a local depletion of available GAPs, which cannot be compensated by the cytosolic diffusive flux. Elsewhere on the membrane, this is not a problem and Cdc42 is promptly inactivated and recycled, leading to only one spot on the plasma membrane where active Cdc42 accumulates. A generic positive feedback mechanism for Cdc42 (due to e.g., Cla4) completes the symmetry breaking. This provides a good example of the added value of revealing non-obvious mechanistic details. This GAP model, which is obscured in presence of Bem1, allowed authors in [18] to provide a detailed explanation of the evolutionary trajectory observed in [17].
Yet, this may not be the complete role of the GAPs. Their mechanism partly supplements the need for the Bem1-mediated positive feedback, but a further deletion of CLA4 and BEM3 reveals that 80% of the positive feedback originates elsewhere [79] (p. 101). Here the interconnectivity with another pathway can surface, which initially seemed elusive and redundant.
As discussed in the actin pathway, there could be a critical link with actin and the GAPs. The aforementioned local depletion of unbound GAPs may be reinforced by actin transport [79] (p. 34). Epsin coatings of endocytic vesicles colocalize with polarized growth and bind GAPs [86,98]. Moreover, active Cdc42 releases the auto-inhibition of kinases Ste20 [44] and Cla4 [99], both of which phosphorylate myosins 3 and 5 to ultimately lead to activation of the Arp2/3 complex [100], critical for endocytosis. In this way, sites of active Cdc42 can promote the endocytosis which may reinforce the stability of the site, providing the feedback needed to establish polarity (model F in [101]). The combination of interactions of recyclable GAPs and active Cdc42 would also fulfil the requirement of actin-mediated recruitment formulated in [102], provided the GAPs diffuse slowly on the membrane [84]. Figure 4 graphically summarizes the available information for one of the GAPs, Bem3. However, there is still room for improvement with experiments whose design can just now be established. As with the bud scar pathway, subtle information may be retrieved through experiments at the domain level, to test e.g., the GAP trafficking hypothesis. One could remove the link between GAPs and actin through deletion of epsins ENT1 and ENT2 (in the Δbem1 Δcla4 background) and replacing this by only a weakly expressed ENTH-domain of Ent1 (truncation). Modulating this expression should show how strong this effect is. More information on GAP interactors in this role may also be retrieved by using this mutant as the crippled starting point in an evolution experiment, akin to [17]. Furthermore, the resulting scatter of the GAPs across the other polarity pathways currently leaves room for interpretation and speculation. Given Figure 1, the components that are most shared also seem the most critical. This is most obvious when noting that actin pathway components are not just essential for polarity establishment, for example, Rho1 is needed for cell wall integrity synthesis later on during polarized growth [103][104][105]. Extrapolating, the location (wedged between pathways) of the GAPs in the Venn diagram may suggest an important (but not essential) function for each of them in establishing polarity.
Hypothetically, the GAPs might serve as an evolutionary control knob to mediate the relative hierarchy between pathways. This could be favourable in situation where different hierarchies are optimal, such as when mating is infrequent (e.g., the diploid state becomes the default), or when the bud scar is not often used (frequent sporulation). Strategic dispersal of multiple GAPs may therefore provide more handles for the cell to optimize the pathways then simply having one GAP in larger copy numbers.

Nrp1
After discussing two well-studied protein classes, we address an underexposed protein, namely Nrp1. With this we would like to show the difficulties and possibilities that are still open in a case when the most straightforward experiments do not provide obvious, interpretable phenotypes. Although its deletion does not have a detectable phenotype in standard lab conditions, Nrp1 is important evolutionary [17,106]. Usually, Nrp1 is mentioned merely peripherally in articles as bycatch in studies with an alternative focus. Therefore, a chronologically ordered literature overview does not make sense here, as very little of the research findings actually builds on previous work. An overview of the Nrp1 knowledge is given in Figure 5, where it is apparent that there are some gaps in our understanding. Nrp1 was first described by [107], who have given the protein its name. NRP1 stands for 'Asparagine rich protein', the name refers to the region of the protein sequence that has many asparagines (short name: "N"). Genes are often named for their defining characteristics or functions. Reynaud and co-workers [107] did not find a phenotype or function for Nrp1, thus the seemingly nondescript name.
Nrp1 has been linked to stress response and stress granules. For example, it has been implicated in the response to glucose and oxygen [108][109][110], although the precise mechanism or function of Nrp1 in this response is not known. A hypothesis is that Nrp1 forms an aggregate or prion (like a stress granule) by its low complexity domains, because the repeated asparagine sequence in Nrp1 are often found to form prions for other proteins [111]. Nrp1 itself also seems to form a prion. In addition, Nrp1 can potentially bind and regulate mRNA. The mRNA regulation occurs through another documented domain, namely an RNA binding motif [111].This implicates that Nrp1 can bind specific mRNA sequences, however no specific mRNAs have been identified until now [112].
The supposed link between Nrp1 and the polarity network was found by authors in [17]. They show that null mutations in NRP1 could rescue a bem1Δ. This prompted interest in a search for the function of Nrp1. Another connection to the polarity network and a possible explanation for what was found in [17] is the synthetic lethality of NRP1 with CLA4 [113].
Diepeveen et al [11] have found that Nrp1 is highly conserved within the Ascomycota which hints to a function of some importance. One typically expects that essential genes are the most conserved parts as they cannot easily be mutated [3]. However this is not always the case, for example CDC42 is conserved in most fungal species (and also outside of fungi) but it is not present in others [11,114,115]. Interestingly, NRP1 is more conserved than several essential genes [11].What the reason for this conservation is, is unclear. Conserved sequence does not mean conserved function or interactions. So, although a highly similar Nrp1 protein is present in a species, this does not mean it has the same function in that species.
An interesting look into the kind of experiments and research done for a protein that is similar in sequence to Nrp1 and also probably functionally related to Nrp1 is given by Whi3 [116]. Whi3 like Nrp1 has an RNA recognition motif and a low complexity / repeated region. A difference is that Whi3 has repeated glutamines and Nrp1 has repeated asparagines, though these amino acids are chemically similar. In the paper of [116] authors remove these domains and check functionality and localization of Whi3 in Ashbya gossypii. Whi3 in A.gossypii and budding yeast share important functionality [23,116]. Whi3 localizes to stress granules, indirectly regulates the G1/S phase transition via Cln3 and affects many mRNAs in yeast [23]. A similar role may be hypothesized for Nrp1. It also localizes to stress granules [109], affects the G1 exit [17], and may bind some mRNAs [112]. Different from Whi3, Nrp1 does not have a clear RNA target (like Cln3) that explains its functioning.
Here we will provide some research paths from different areas of research to test our hypothesis from the previous paragraph. First, from a more chemistry perspective a strategy is to look at the structure of Nrp1. The order of the domains is known, but it is unclear whether the low-complexity domain will fold, as they have been shown to be disordered ( [117]). This unstructured part of the protein may move about freely. It is interesting to see if the C-terminal part of the protein does again fold specifically. One would be able to visualize the structure of the protein by way of NMR [118], however it might be difficult in this case to determine the structure if it is indeed unstructured/moving.
Second, looking at Nrp1 from a cell biological perspective, deletion studies are often used. Unfortunately, this does not work as easy for NRP1 as the single deletion does not yield any phenotype. However, in a different background a phenotype can be found, the bem1Δ [17]. Analyzing what happens in these cells will help understand Nrp1. This can be done on a population level by doing a fitness assay. On a single cell level one can use microscopy. A previous high-throughput study shows Nrp1 present in the cytoplasm [88]. A more detailed study focusing on polarity establishment can give more insight in the functioning of Nrp1, especially when combined with the previous approach (different genetic backgrounds).
By the same token, a more in-depth analysis of the RNA binding ability of Nrp1 is also relevant. Again, using Nrp1 in different environments may give different results and find different specific RNAs that are bound. RNA chip-seq has become easier to execute over the last few years [119] and thus now it might be possible to do the proposed experiments.
The ultimate goal is to find a molecular mechanism for Nrp1, but this goal cannot be reached without knowledge on other aspects of the network. Nrp1 is a good example of a protein that is buried deep in the network, which makes the investigation challenging. However, it is worthwhile to dig deep and find the hidden functionality of proteins like Nrp1 that seem neutral at first glance, but have a significant evolutionary role.

Outlook
Now that we have explored the past and present of the polarity network, it is time to look to the future. What are still open questions in the field? What are research opportunities? Although the yeast polarity network has been studied for many years, there still remain many unsolved mysteries. For example, the study by [17] gave much insight into the evolution and possible back-up mechanisms of the polarity network, but gave rise to the question of how Nrp1 is involved.
Much research has been done under perfect lab conditions, which results in specific results. It would be interesting to also design experiments that explore the genotype-phenotype map in different conditions. This will make it possible to find previously hidden components, that do not show up under standard lab conditions. Another possibility is changing the environment together with the expression level of specific genes, which can affect fitness [120], as s has been shown for Cdc42 [18].
In vitro work is also an important next step to isolate parts of the network and see if these parts can by independently perform a function [121]. As an example in budding yeast, in vitro reconstitution of a She-protein mediated mechanism of asymmetric mRNA transport revealed subtle details that were hard to demonstrate in vivo [122].
From an evolution standpoint it is interesting to see how the network as it is in current strains came to be and what the variation is that can be found in the wild. The variation within a wild population shows the spread that is available in the genotype map. It provides insight into what genotypes are preferable in certain environments. Apart from the genotype showing the history of a population, also for example the expression levels of proteins can be inherited [123]. Such epigenetic inheritance can be important for the reaction of an organism to stress and other environmental factors [124].
The yeast polarity examples discussed also delineate a general route forward in dismantling complexly connected protein networks. A graphical overview is depicted in Figure 6. The arrow heads indicate the type of information obtained from proteins inside the network, and higher degrees of information are successively more difficult to obtain, and usually rely on first reaching the previous level. In this way, we work our way deeper into the protein network and slowly but steadily elucidate beyond the obvious phenotypes. In the top category, much work has been done determining the effects of simple deletions. For budding yeast, a large knock-out database has been present for more than two decades [13]. The ease of experimentally parallelizing the deletion construction even allows for ample double deletion data constituting genetic interactions, which for multiple model systems is bundled in the BioGRID database [125]. Yet, there is a myriad of combinatorial possibilities for gene deletions, and it has become clear from the GAP and Nrp1 examples that these need to be explored intelligently, rather than by brute force. Well-designed starting points for evolution experiments, for example to find that GAPs and Nrp1 genetically interact with Bem1 in [17], or SATAY assays [126] can elucidate interactions of genes of interest with important domains that only surface with strong phenotypes in the right genetic background.
However, genetic interactions can be very indirect. Epistasis is known to act globally even on unrelated networks during adaptation [127], so gathering physical information is a welcome next step. The efficient two-hybrid screens date back more than 3 decades [128], where two proteins fused to a DNA binding domain and transcriptional activator respectively promote transcription of a reporter gene when interacting. To follow interactions across the cell (see also next paragraph), FRET imaging, where two fluorescent fusion proteins cause the emission spectrum to shift when in close proximity, has also been extensively used [129], but this method also relies on the proteins of interest to be tolerant to protein fusions. More subtle modifications for tagging are used for coimmunoprecipitation [130], and are still heavily used in budding yeast [131]. Once physical interactors have been established, these can be further confirmed with their relevant binding sites by point mutation to influence the binding, as in e.g., [73] for Bem1. This level of precision also means a move towards low-throughput data gathering, but can be useful to conjecture how cells after the deletion of BEM1 evolved [18].
While the physical interactions constitute a rudimentary form of the protein network, the next step in understanding originates from adding spatio-temporal information. For example, Bud3 is a GEF for Cdc42, but only during early G1 phase [90], before the timing pathway gives the cue for symmetry breaking. If the function of interest is symmetry breaking, this can be excluded from the network overview as in Error! Reference source not found.. Generally, a high-throughput manner for establishing protein localization in vivo has been with fluorescent protein fusions, particularly with GFP variants [88]. As aforementioned, immunostaining provides a similar option for tagging and hence localization, but requires fixation of the cells, making the temporally transient contributions of components more difficult to trace. If temporal rather than spatial information on the importance of a protein is of the essence, ingenious solutions exist that conditionally disable the protein of interest, after which the results can be swiftly observed. Examples include degron systems [132,133] and optogenetic tools [134]. Finally, as with Nrp1 localization of mRNAs may be the important function, options to trace these are also present with in situ hybridization, as described in e.g., [135].
Ultimately, when all previous steps have been performed, it becomes possible to make the next step towards complete understanding, which would be mechanistic understanding. As shown in the case of actin, if insufficient information is available, modelling can lead to uncertain and contradictory results, such as is the case with the role of actin in polarity establishment [84,75,83]. If possible, the most unambiguous results would come from bottom-up approaches, such as modelling Michaelis-Menten kinetics for metabolism (e.g., [136] (pp. 165-180)) or solving reaction-diffusion systems, as for polarity done in [74,18] in case proteins cannot be assumed to be uniformly distributed. The more complete understanding of the protein network is then put to the test by in vitro reconstitution.
In summary, the general path to full network understanding as outlined in Error! Reference source not found. brings us from genetic and physical interactions to visualizing precise protein dynamics, modelling and full reconstitution. While initially, even the genetic interaction map was a tedious chore, high-throughput studies and bioinformatics tools continue to facilitate the gathering of information. Considering the speed with which the technological advances occur, the necessary data for network understanding becomes feasible for many more functions and organisms. This marks the relevance of establishing a generalizable and efficient workflow to obtain the right network data and use it for understanding and predicting its evolution.

Supplementary Materials:
The following are available online at www.mdpi.com/xxx/s1, Table S1: Literature supporting information underlying figure 1,. Acknowledgments: We would like to thank Enzo Kingma and Leila Inigo de la Cruz for careful reading of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results". Table S1 shows the corresponding literature references for the information in the Venn diagram in Figure 1. In addition, gene essentiality was obtained from [137].