Evolutionary Analysis of the B56 Gene Family of PP2A Regulatory Subunits

Protein phosphatase 2A (PP2A) is an abundant serine/threonine phosphatase that functions as a tumor suppressor in numerous cell-cell signaling pathways, including Wnt, myc, and ras. The B56 subunit of PP2A regulates its activity, and is encoded by five genes in humans. B56 proteins share a central core domain, but have divergent amino- and carboxy-termini, which are thought to provide isoform specificity. We performed phylogenetic analyses to better understand the evolution of the B56 gene family. We found that B56 was present as a single gene in eukaryotes prior to the divergence of animals, fungi, protists, and plants, and that B56 gene duplication prior to the divergence of protostomes and deuterostomes led to the origin of two B56 subfamilies, B56αβε and B56γδ. Further duplications led to three B56αβε genes and two B56γδ in vertebrates. Several nonvertebrate B56 gene names are based on distinct vertebrate isoform names, and would best be renamed. B56 subfamily genes lack significant divergence within primitive chordates, but each became distinct in complex vertebrates. Two vertebrate lineages have undergone B56 gene loss, Xenopus and Aves. In Xenopus, B56δ function may be compensated for by an alternatively spliced transcript, B56δ/γ, encoding a B56δ-like amino-terminal region and a B56γ core.


Introduction
Although signal transduction cascades are intensely studied, relatively little is known about the role that serine/threonine phosphatases play in them. While there are over 400 serine/threonine kinase genes in the human genome, there are only around 40 serine/threonine phosphatase catalytic subunits to counter them. This was initially interpreted to mean that phosphatases have broad, constitutive activities, however, it was later found that phosphatases are highly specific and that the majority of phosphatases achieve diversity by forming numerous distinct multimeric protein complexes. In the case of PP2A, there are at least three different B regulatory subunit gene families (B55/PR55/B, B56/PR56/B', and B72/PR72/B'') that bind to the structural A subunit and the catalytic C subunits, each of which is encoded by two genes in humans. Therefore, through the combinatorial effects of the association of multiple subunits, and with the inclusion of alternative splicing, PP2A could form as many as 200 different heterotrimers. As the B subunits are more diverse than the A and C subunits, they are the major contributors to substrate specificity and subcellular localization of the PP2A holoenzyme [1,2].
PP2A carries out essential cellular functions, and therefore its subunits are encoded by one of the most highly conserved sets of genes. The C subunit is the most conserved, with 75% identity between human and yeast proteins; human and yeast A subunit proteins share 44% identity. In humans, B56 isoforms are encoded by five widely expressed genes. B56 proteins are highly conserved between species, sharing approximately 60% identity between human and yeast. Even though individual B56 isoforms have distinct functions, the five human B56 proteins share 66% to 81% identity. B56 genes encode proteins with a highly conserved core of about 400 amino acids and variable amino-and carboxy-termini ranging from approximately ten to one hundred amino acids in length in humans. The divergent amino-and carboxy-termini are thought to provide specificity to the different isoforms. Alternative splicing occurs at the B56γ locus to produce a transcript with either a B56γ amino-terminal extension (B56γ/γ) or a mixed-isoform transcript containing a B56δ-like amino-terminal extension (B56δ/γ) [3]. As the B56 amino-and carboxy-termini are proposed to determine substrate specificity, these alternative splice products are likely to have distinct roles in the cell.
B56 isoforms have roles in numerous cell-cell signaling pathways. B56 isoforms modulate canonical Wnt signaling; most B56 isoforms are inhibitory to Wnt signaling, however, B56ε is required for Wnt signaling [4][5][6]. B56 isoforms also have a role in ras signaling, as transgenic mice with an A subunit mutation unable to bind B56 and an activating ras mutation have a reduced lifespan when compared to those solely possessing activated ras [7]. B56α inhibits Myc signaling by promoting Myc's proteasomal-mediated degradation [8]. B56γ inhibits cell spreading and metastasis by dephosphorylating paxillin [9]. B56ε also has a role in hedgehog signaling [10].
Here we explored the evolution of the B56 gene family of PP2A regulatory subunits to provide us with a deeper understanding of B56 and how its evolution has resulted in five vertebrate genes that differentially regulate cell-cell signaling pathways. This characterization is especially important, as it will aid the integration of B56 studies in diverse organisms, especially when comparing functional analyses between species containing different complements of B56 genes. In addition, B56 isoforms can have antagonistic effects on signaling pathways, resulting in either growth inhibition or growth promotion. Understanding the origin of the antagonistic isoforms may be useful in understanding their disparate roles in signaling pathways. We performed a hierarchical clustering and a phylogenetic analysis to examine the highly conserved B56 isoforms. We traced the expansion of the B56 gene family from simple to complex organisms, and also found interesting patterns of gene duplication and deletion throughout the evolution of the B56 gene family.

Identification of B56 Gene Family Homologs
We analyzed B56 sequences from thirty-three species, each possessing between one and nine B56 genes, for a total analysis of 105 B56 sequences ( Table 1). The best match of each vertebrate B56 protein sequence to the corresponding human B56 ortholog is shown in Table 2. We examined B56 genes from sixteen diverse species of mammals, birds, reptiles, amphibians, and fish. Each of the vertebrate B56 isoforms matched the corresponding human ortholog, as can be seen from their low expected values, and their high maximum scores, percent query coverages, percent identities, and percent similarities. Their amino acid identities ranged from 75% to 100%, while their similarities ranged from 80% to 100%. The B56ε gene is highly conserved, as the B56ε protein from Homo sapiens is identical to that in six species: Macaca mulatta, Bos     B56 is also well conserved in simple chordates, nonchordate animals, fungi, protists, and plants. The amino acid identities between both simple chordate and nonchordate animals versus human B56 proteins were 59% to 84%, while their similarities were 77% to 94% ( Table 3). The identities and similarities between fungi and protists versus human B56 proteins ranged from 51% to 62% and 69% to 80%, respectively ( Table 3). The identities and similarities between plant and human B56 proteins were slightly less than those observed with fungi and protists, and ranged from 47% to 57% and 61% to 77%, respectively ( Table 4). The high conservation of B56 proteins between animals, fungi, protists, and plants suggest that B56 plays a key role in basic cellular functions. The details of the protein similarities of vertebrates; simple animals, fungi, and protists; and plants, including data from all B56 pair-wise comparisons with human B56 isoforms, are listed in supplementary Tables S1-S3, respectively. An alignment of all analyzed B56 sequences is shown in supplementary Figure S1. Table 3. Blast summary of simple chordate/nonchordate animal/fungi/protist B56 sequence alignment. Homo sapiens B56 isoforms were used as queries in Blastp searches against the NCBI database. Each of the five H. sapiens B56 isoforms was similar in its identity and similarity to each of the hits, and therefore no specific B56 isoform orthologs could be identified. However, the NCBI hits are listed with the H. sapiens query with which they had the lowest E-value. The NCBI hits are provided along with their protein accession number (Accession #), E-value (E), maximum score (M), percent query coverage (Q), percent identity (% I), and percent similarity (% S). The superscript 1 denotes sequences retrieved from JGI [14], while the superscript 2 denotes sequences from Uniprot [13]. The high level of conservation of the B56 isoforms in distant species can be seen through the low E-values, high maximum scores, and high query coverages. Identities range from 51% to 84% and similarities range from 69% to 94%.

Hierarchical Clustering
A hierarchical clustering was undertaken to gain insight into the relationship among the 105 B56 genes from animal, fungal, protist, and plant species. This analysis is based on sequence identity obtained through BLAST hits. The identity matrix was populated with the percent identity values, where rows correspond to the queries of the 105 genes, and columns correspond to the target database of the 105 genes. The identity matrix was then visualized using hierarchical clustering (Figure 1). The dendrograms and heat maps clearly delineate separate gene clusters for animal and plant B56 genes, with the animal cluster further subdivided into two clusters, B56αβε and B56γδ. Within the animal B56 genes, the B56αβε cluster has clearly grouped into its three isoforms and the B56γδ cluster has segregated into its two isoforms. The increased heterogeneity in the B56αβε cluster may suggest that the duplicate copies were retained because they acquired novel functions. The plant B56 genes do not segregate into distinct families, suggesting that plant B56 family genes underwent duplication later than in animal lineages. However, we only examined three plant species, and a broader analysis may reveal additional information. Species possessing a single B56 gene of each B56 subfamily (Amphimedon queenslandica, Hydra vulgaris, Drosophila melanogaster, Caenorhabditis elegans, and Strongylocentrotus purpuratus) generally fall in line with either the B56αβε or B56γδ subfamilies. Although this data visualization clearly delineates B56 subfamilies and suggests relationships between the B56 genes, it provides only an overview of B56 gene family divergence and evolution. A phylogenetic analysis was performed to trace the diversification of the B56 family.  Figure 1. B56 hierarchical cluster based on percent identity. Each B56 protein sequence was chosen in turn as the query sequence in Blastp search. The resultant pair-wise percent identities were plotted. The identity is indicated by color, ranging from the highest to lowest identity, progressively colored light red, red, maroon, black, dark green, medium green, and light green. The B56 isoform designation refers to the vertebrate isoforms; fp refers to fungal and protist B56 genes; plant refers to plant B56 genes.

B56 Gene Family Phylogeny
B56 was present as a single gene in eukaryotes prior to the divergence of animals, fungi, protists, and plants ( Figure 2). Subsequently, four separate B56 clades evolved, mirroring species divergence. The plant B56 clade displayed the deepest division, followed by the protist B56 clade, with a local support value of 1.0, and then the fungal and animal B56 clades, with a local support value of 0.91. The B56αβε and B56γδ clades separated with a local support value of 0.93. Because of the structure of the B56 gene products, which are comprised of an approximately 400 amino acid conserved core domain and variable amino-and carboxy-termini, this global analysis included the core domain but the termini were excluded. This was a consequence of the lack of significant identity between the aminoand carboxy-termini of distant B56 isoforms, as the algorithm used for these analyses eliminates any region in the alignment displaying a gap in any sequence in the phylogenetic tree construction. To study the evolution of B56 genes in more detail, we examined individual B56 gene clades separately from the remaining B56 gene family, thereby reducing the exclusion of the less-conserved termini.

Plants
Two different sets of nomenclature were initially used to describe the B56 genes, B56 and B', as several laboratories concurrently isolated the genes; the B' designations have been retained to describe the plant B56 genes [15]. The separate analysis of B56 plant genes yielded a phylogenetic tree with more sequence coverage than the global B56 analysis, as fewer sequence gaps reduced the extent of the sequences excluded in the FastTree phylogenetic tree construction. As Chlamydomonas reinhardtii, a unicellular green algae, is believed to be a representative of a terrestrial plant progenitor, the single B56 gene present in C. reinhardtii likely represents the B56 progenitor of multicellular plants [16]. The C. reinhardtii B56 gene is named wdb, which is a misnomer. It is not more highly related to its namesake, which was initially identified in D. melanogaster, than to the other B56 isoforms, and would more appropriately be renamed B56, without an isoform designation [17]. The B56 gene was duplicated numerous times within multicellular plant species, as Arabidopsis thaliana has nine B56 genes while Oryza sativa (Japanese rice) has seven (Figure 3). A previous report proposed a B56 family tree composed of eight A. thaliana and five O. sativa genes based on a neighbor-joining algorithm UPGMA (Unweighted Pair Group Method with Arithmetic Mean). The tree consisted of three B56 subfamilies named B'α, B'η, and B'κ, with two A. thaliana genes, B'γ and B'δ, placed outside of the defined subfamilies [18]. Our analysis employed several multiple sequence alignment algorithms and maximum likelihood methods for phylogenetic tree construction, and differs from that previously proposed (Figure 3 and data not shown). Three distinct clades were resolved. Each of these clades was present in both A. thaliana and O. sativa, and therefore likely present prior to the divergence of monocots (O.

Protists and Fungi
The protists, Dictyostelium discoideum, Dictyostelium purpureum, Dictyostelium fasciculatum, and Polyspondylium pallidum, each contain a single B56 gene, as do the fungi Saccharomyces cerevisiae, Ashbya gossypii, Aspergillus nidulans, and Aspergillus niger (Figure 4). In contrast, the fungus Schizosaccharomyces pombe possesses two B56 genes, likely resulting from a gene duplication occurring after the divergence of Aspergillus and S. pombe. The lineage of the B56 gene does not precisely follow that of the fungal species. With regard to species divergence, S. cerevisiae and A. gossypii form a clade separate from S. pombe and Aspergillus species, whereas with the B56 gene, S. cerevisiae, A. gossypii, and Aspergillus form a clade separate from S. pombe. This is not uncommon, as many fungal species have acquired genes by horizontal gene transfer from not only distantly related fungal species, but also from bacteria and plants [19,20].

Animals
A duplication of the B56 gene prior to the divergence of diploblastic and triploblastic species, animals with two or three germ layers, respectively, led to the formation of two animal B56 clades, B56αβε (B56-1) and B56γδ (B56-2), with a local support value of 0.93 (Figure 2). The diploblasts Amphimedon queenslandica (sponge) and Hydra vulgaris (fresh water polyp) maintained one representative from each B56 subfamily (A. queenslandica: B56β and B56δ; H. vulgaris: B56α and B56δ). Within the triploblasts, protostomes D. melanogaster and Caenorhabditis elegans retained a single B56 gene from each subfamily: wdb and PPTR-1 from B56αβε, and B56-1 and PPTR-2 from B56γδ, respectively. In deuterostomes, Strongylocentrotus purpuratus (sea urchin) possesses a B56 gene from each subfamily, named B56α and B56δ. Although current nomenclature suggests that these genes may be more closely related to an individual isoform within the subfamilies, A. queenslandica, H. vulgaris, D. melanogaster, C. elegans, and S. purpuratus, B56αβε and B56γδ subfamily genes are derived from branches that diverged prior to divergence within the B56αβε and B56γδ subfamily clades. Consequently, A. queenslandica B56β, H. vulgaris B56α, D. melanogaster wdb, C. elegans PPTR-1, and S. purpuratus B56α should be more appropriately named; we suggest B56-1. In addition, A. queenslandica B56δ, H. vulgaris B56δ, D. melanogaster B56-1, C. elegans PPTR-2, and S. purpuratus B56δ should be more appropriately named to signify that they diverged prior to divergence of the B56γδ subfamily clade, perhaps with the name B56-2. In congruence with this nomenclature, the B56αβε subfamily would become the B56-1 subfamily and the B56γδ subfamily would become the B56-2 subfamily.
Two rounds of whole-genome duplications occurred after the divergence of urochordates (e.g., sea squirt) and cephalochordates (e.g., lancelets) but prior to the divergence of cyclostomes (e.g., lamprey) and gnathostomes (jawed vertebrates) [21]. Many paralogous genes present on duplicated genomes were lost, but some remain. Not surprisingly then, chordates contain higher copy numbers of B56 genes than simpler organisms. B. floridae (lancelet, a chordate containing a neural cord and notochord but lacking vertebrae), whose genome sequence was first reported in 2008, has a full complement of five B56 genes [22]. Three of these genes share 70%-90% identity and 82%-96% similarity with one another and fall into the B56αβε subfamily, but have not separated into distinct B56α, B56β, and B56ε isoforms; the other two B56 genes share 88% identity and 90% similarity and are within the B56γδ subfamily ( Figure 5). This suggests that B. floridae branched off from vertebrate progenitors after two rounds of whole genome duplication, but prior to the time at which the B56αβε or B56γδ subfamilies evolved into the five vertebrate isoforms. In addition, the presence of three B56αβε genes and two B56γδ genes suggests that one B56αβε gene and two B56γδ genes were lost after the whole-genome duplications (or one B56γδ gene was lost after the first genome-wide duplication). The genome sequence of P. marinus (sea lamprey, a primitive vertebrate) was first reported in 2013, and is available at 5.0X whole genome coverage [23,24]. We identified three P. marinus B56 genes: two B56αβε subfamily members and one B56γδ subfamily member (Table 2). Similar to B. floridae, one B56αβε subfamily member diverged from the B56αβε clade prior to isoform specialization (S4RGA7, Figure 6). However, S4RHV1 forms a clade with B56β, while S4RN43 forms a clade with B56δ. This suggests that P. marinus branched off from vertebrates after isoform specialization had started, but before it had been completed. P. marinus' phylogenetic position suggests that it will possess a full complement of B56 genes; these genes will likely be revealed once a more complete coverage of the P. marinus genome is obtained. The five chordate B56 genes present in B. floridae are maintained in all chordates examined, with two exceptions, as described below. The B56γδ clade has a two-fold lower substitution rate than the B56αβε clade before they first branch (leading to 12% and 25% divergence, respectively) ( Figure 2). This finding correlates with the heat map (Figure 1), suggesting that the B56γδ clade is either newer than the B56αβε clade, or that it is under stronger selection to maintain its sequence. Our data suggest that the B56γδ clade has fewer substitutions because it resulted from the second genome-wide duplication, with the paralogs from the first genome-wide duplication being lost. However, our data does not rule out the possibility that the B56γδ clade may be more constrained. Future studies of synonymous/non-synonymous changes may determine the mechanism behind the conservation of the B56γδ clade, as well as the mechanism behind the limited B56 subfamily divergence in B. floridae and P. marinus.

The B56αβε Subfamily
Within the B56αβε subfamily, individual B56 isoforms exhibited distinct levels of evolutionary change. D. rerio B56 αβε genes were most divergent from the rest of the species examined (Figures 2  and 7). This was not unexpected, as D. rerio (zebrafish) is the outlier of the vertebrate species examined. B56ε displayed the most stringent conservation, as it underwent 4% amino acid changes excluding D. rerio, and 13% amino acid changes including D. rerio. B56α displayed an intermediate level of conservation, as it underwent 8% amino acid changes excluding D. rerio, and 18% amino acid changes including D. rerio. B56β was the least conserved, as it underwent 23% amino acid changes excluding D. rerio, and 29% amino acid changes including D. rerio (each also excluding M. mulatta (rhesus macaque)). M. mulatta's B56β gene displayed an exceptionally high amino acid substitution rate, 25% since its divergence from other mammals. This was due in large part to a 63 amino acid region in the amino half of its core that lacks significant conservation with other B56 sequences. In addition, unlike B56α and B56ε, reptilian and amphibian B56β displayed a relatively high amino acid substitution rate, 14% versus 8% and 4% in B56α and B56ε, respectively, again suggesting reduced constraint on B56β sequence in these species (Figure 2). In summary, B56ε was under the strongest selective pressure to maintain its sequence, whereas B56α was under moderate selective pressure. B56β's selective pressure was similar to B56α in mammalian genes (excluding M. mulatta), but much looser in reptiles and amphibians. Alternatively, B56α and B56β may have been under positive selection. The evolution of the B56αβε subfamily is of particular interest, as isoforms within this subfamily have antagonistic effects on the canonical Wnt signaling pathway [4,6]. B56ε is required for canonical Wnt signaling, whereas B56α inhibits Wnt signaling. There is also evidence suggesting that B56β has an inhibitory role [4]. An earlier report used UPGMA to suggest that B56α and B56ε are more highly related to one another than to B56β [1]. We carried out several analyses to sort out the relationships within the B56αβε subfamily, using FastTree 2, Bayesian, and neighbor joining programs (Figure 7 and data not shown). The majority of our analyses showed that B56ε diverged prior to B56α and B56β. However, there were also instances where B56α and B56ε appeared more closely related. Therefore, our data is suggestive of B56ε being more distantly related to B56α and B56β, correlating with the functional data, but this conclusion is not robust. This ambiguity was likely due to the fact that there were few informational differences within the B56αβε clade.

The B56γδ Subfamily
A distinct analysis of the B56γδ subfamily was carried out to construct a B56γδ phylogenetic tree based on sequences specific for the B56γδ subfamily to gain insight that was not obtained from the global B56 analysis, which was based on the core domain. Both B56γ and B56δ vertebrate isoforms differed by approximately 12% when the B56δ/γ splice variants were not included in the analysis (Figures 2 and 8, and data not shown). With the inclusion of B56δ/γ, B56γ differed by 29%. This is due to the fact that B56δ/γ has an 82 amino acid amino-terminal region that is not related to the 19 amino acid amino-terminal region of B56γ/γ. H. vulgaris contains one B56 gene from each subfamily. The B56γδ family member of H. vulgaris segregated within the B56δ clade in the larger phylogenetic analysis of B56 ( Figure 2). However, all other B56 proteins that were examined from nonchordate animal species did not segregate into distinct isoforms within the B56 subfamily clades. We therefore included the H. vulgaris B56γδ protein in our analysis of the vertebrate B56γδ subfamily to more accurately place H. vulgaris B56γδ within the B56 tree. This B56γδ-specific analysis placed H. vulgaris B56γδ within the B56γδ subfamily but outside of the B56γ and B56δ isoform clades. Therefore, the H. vulgaris B56γδ protein now falls in line with other diploblasts (A. queenslandica), protostomes (D. melanogaster and C. elegans), and primitive deuterostomes (S. purpuratus) in which the B56 genes have not evolved into distinct isoforms.

The Loss of Vertebrate B56 Genes
B56δ was not found in X. laevis or X. tropicalis but was present in A. mexicanum, a closely related amphibian ( Figure 9). As X. tropicalis's genome has been completely sequenced, this strongly suggests that the B56δ gene was lost in these two Xenopus species. Within archosaurs, B56β was not found in G. gallus and F. peregrinus but was present in Alligator mississippiensis. As G. gallus', F. peregrinus', and A. mississippiensis's genomes have all been completely sequenced, this strongly suggests that the B56β gene was lost in the Aves lineage. These two separate B56 gene losses suggest that B56 isoforms may share some overlapping functions. Since the amino-and carboxy-terminal variable domains of the protein are likely to be key in carrying out isoform-specific functions, similarities in these regions may be important in understanding the potential for functional overlap between B56 isosforms. Overlapping functions would be more likely to occur within a B56 subfamily. For example, the function of B56δ in Xenopus is more likely to have been maintained by B56γ rather than by a B56 αβε family member, whereas the function of B56β in Aves would more likely be carried by B56α or B56ε. Indeed, the amino-terminal variable regions of human B56α, B56β, and B56ε are approximately 50% identical and 60% similar, while their carboxy-termini lack significant similarity. Therefore, the similarity of the amino-terminal domains in the B56αβε subfamily may provide sufficient functional overlap to allow the loss of one family member. The amino-terminal variable region of human B56γ and B56δ lack significant similarity, but their carboxy-termini possess approximately 50% identity and 56% similarity, therefore their carboxy-termini, but not their amino-termini, may provide some overlapping functions. Alternatively, we previously identified an evolutionarily conserved alternative splice form of B56γ that contains a B56δ-like amino-terminal variable region [3]. This B56δ/γ isoform may be sufficient to carry out B56δ-specific functions in Xenopus. Indeed, as B56δ/γ and B56γ share their B56γ core and carboxy-termini, they are somewhat intermingled on the phylogenetic tree, with B56δ/γ and B56γ from the same species, such as B. taurus, C. lupus familiaris, and D. rerio, often segregating together (Figure 8). Figure 9. Distribution of B56 genes in plants, protists, fungi, and animals. A species tree was constructed based on the Tree of Life [25]. B56 genes are represented by rectangles; the absence of a B56 gene is signified with an X; uncertainty in the presence of a B56 isoform is signified by a question mark.

Identification of B56 Gene Homologs
All members of the B56 gene family (B56α, B56β, B56γ, B56δ, and B56ε) were identified from diverse species of animals, fungi, protists, and plants from the NCBI. Amino acid sequences of Homo sapiens B56 isoform proteins were used as queries to identify the corresponding target homologs of different species using Blastp [26]. A symmetrical similarity search scheme was employed to perform all pair-wise comparisons to confirm the homologs, and their accession numbers were then retrieved. The following stringency criteria were used for the identification of the best matches: percent query coverage ≥ 50, maximum score ≥ 100, percent identity ≥ 40, and E-value ≤ 10 −3 .

Hierarchical Clustering
In this analysis, each of the B56 protein sequences was chosen in turn as the query sequence in a Blastp search. We collected the pair-wise amino acid identity values for all possible pairs of total 105 members of the B56 protein family, and used the resulting protein percent identity matrix for data visualization. We used agglomerative hierarchical clustering to visualize similarities within and between B56 isoforms. Hierarchical clustering constructs a hierarchical structure of input data and it has become a standard visualization method since its seminal application to microarray data [27,28]. Particularly, agglomerative clustering method creates a hierarchical structure through a bottom-up approach, in which a pair of closest clusters is merged at each step. Agglomerative clustering takes an input of pair-wise similarities (or distances) among data items, from which cluster similarities (or distances) are inferred for grouping data items. We utilized the clustergram function in the Bioinformatics Toolbox of a commercial software package MATLAB 7.11 (R2010b) (MathWorks, Natick, MA, USA) and generated the heat map with dendrograms as shown in Figure 1. Each row of the identity matrix was transformed so that its mean is 0 and the standard deviation is 1 for better visualization. Also, average linkage (i.e., UPGMA) was used to compute Euclidean distance between a data point and a cluster.

Phylogenetic Analysis
Phylogenetic analysis was run on the Geneious version 7.1.5 platform [29]. B56 protein sequences from selected species were input into Geneious, and sequences were aligned using MUSCLE (MUltiple Sequence Comparison by Log-Expectation) [30]. A phylogenetic tree was inferred for these aligned protein sequences with FastTree version 2.1.5 with default settings [31]. The resulting phylogeny was rooted by using the plant B56 genes as an out-group. FastTree 2 is an approximately maximum-likelihood phylogenetic method which efficiently uses alignment with a large number of genes or protein sequences [31]. It is openly available software and it produces phylogenetic trees in a short amount of time that are as accurate as trees constructed by other maximum-likelihood methods such as PhyML 3.0 or RAxML 7.0. FastTree2 uses the CAT (category) approximation [32] to account for variation in rates across sites and also implements the Shimodaira-Hasegawa (SH) test [33] to estimate the reliability of each split in the phylogeny, which is the same as PhyML3's SH-like local support values [34]. A species phylogenetic tree was constructed based on the Tree of Life [25].

Conclusions
The B56 gene family is highly conserved. B56 was present as a single gene in simple eukaryotes, but was duplicated prior to the divergence of protostomes and deuterostomes. Further duplications occurred in chordates, resulting in three B56αβε and two B56γδ genes. These genes remained similar to one another in simple chordates, but diverged into five distinct isoforms in vertebrates. B56ε was most highly conserved, followed by B56α, B56γ, and B56δ, which displayed an intermediate level of conservation; B56β was the least conserved. This divergence in vertebrates likely led to the ability of B56 family members to regulate numerous signal transduction pathways.
The deletion of B56δ in Xenopus species and B56β in Aves suggests that some B56 isoforms may have overlapping functions. However, in the case of B56δ, there exists an evolutionarily conserved mixed-isoform alternative splice form that contains a B56δ-like amino-terminal variable domain upstream of the B56γ core region [3]. This strengthens the argument that the variable regions largely determine isoform specificity, as the presence of a B56δ amino-terminal variable domain appears to compensate for loss of the B56δ.