Expansion and Functional Divergence of Inositol Polyphosphate 5-Phosphatases in Angiosperms

Inositol polyphosphate 5-phosphatase (5PTase), a key enzyme that hydrolyzes the 5′ position of the inositol ring, has essential functions in growth, development, and stress responses in plants, yeasts, and animals. However, the evolutionary history and patterns of 5PTases have not been examined systematically. Here, we report a comprehensive molecular evolutionary analysis of the 5PTase gene family and define four groups. These four groups are different from former classifications, which were based on in vitro substrate specificity. Most orthologous groups appear to be conserved as single or low-copy genes in all lineages in Groups II–IV, whereas 5PTase genes in Group I underwent several duplication events in angiosperm, resulting in multiple gene copies. Whole-genome duplication (WGD) was the main mechanism for 5PTase duplications in angiosperm. Plant 5PTases have more members than that of animals, and most plant 5PTase genes appear to have evolved under strong purifying selection. The paralogs have diverged in substrate specificity and expression pattern, showing evidence of selection pressure. Meanwhile, the increase in 5PTases and divergences in sequence, expression, and substrate might have contributed to the divergent functions of 5PTase genes, allowing the angiosperms to successfully adapt to a great number of ecological niches.

Eukaryotes have a family of diverse 5PTases, with four members in yeast, ten in mice, ten in humans, 15 in Arabidopsis, and 21 in rice [1,9,[11][12][13][14][15][16]. All of the 5PTases are characterized by a highly conserved phosphatase domain of about 350 residues, and some plant 5PTases also contain several WD40 domains that function in protein-protein interactions [17][18][19] [20][21][22]. Loss of any single yeast 5PTase has little phenotypic effect, while the loss of any two 5PTases results in marked phenotypic changes, suggesting a degree of redundancy [23]. In mammals, 5PTases play important roles in the regulation of hematopoietic stem cell proliferation, synaptic vesicle recycling, insulin signaling, endocytosis, vesicular trafficking, and actin polymerization. Loss of 5PTase function in mammals results in a broad spectrum of diseases and disorders [7,24]. For example, mutations in OCRL1 results in the X-linked OculoCerebroRenal syndrome of Lowe and Type-2 Dent disease [25,26]. Loss of function of SHIP1 results in myeloproliferative syndrome [27,28]. SHIP2 is implicated in obesity, insulin resistance, and hypertension [29][30][31]. In addition, mutations in INPP5E can cause Lowe cerebrorenal syndromes [32]. These results reveal the functional differences in mammal 5PTases.
The evolutionary study of plant and mammal 5PTase genes is very limited. Previous reports have focused on a single species or a few species only, and there has been no systematic studies of the 5PTase family in plants or mammals. Large-scale sequenced genome data provides deeper sampling to gain insights into the evolution of the 5PTase family during the history of angiosperms. In this study, an evolutionary analysis of the 5PTase genes in major plant lineages is presented with phylogenetic analyses. Phylogenetic analyses were performed to delineate the evolutionary history of the 5PTase family in major angiosperm lineages, and exon/intron structure analyses were performed to gain insight into the possible mechanisms of the structural diversity of the 5PTase gene family. The tissue specificity and inducibility of 5PTase gene expression in plants were characterized by examining publicly available microarray data. The results obtained here will broaden our understanding of the roles and evolution of 5PTases and provide a framework for further functional investigations of these genes in plants.

Data Sources and Homolog Searches
Several datasets (Ensembl Genome, JGI Genome, Phytozome) and multiple steps were used to search for 5PTase sequences. We selected representative species from animals, plants, and fungi, including 16 plants, five animals, and one fungus. Proteomic data from animals and fungi were downloaded from the ENSEMBL (release 91, http://www.ensembl.org) and JGI (http://genome.jgi. doe.gov/) databases, respectively. Plant sequences were obtained from Phytozome v12.1 (http: //www.phytozome.net/). The sequences for Amborella trichopoda were retrieved from the Amborella Genome Database (http://www.Amborella.org/). The number of 5PTase genes in different organisms can be found in Figure 2.
The hidden Markov model (HMM) program (version 3.0) [43] was employed with the Hidden Markov model to retrieve all eukaryotic 5PTase homologs. The Hidden Markov model profile of the 5PTase protein domain (PF03372 in Pfam database) was downloaded and used in local searches of the datasets [44]. These sequences were verified using the protein families (PFAM) database (http://pfam.xfam.org/search), the Conserved Domains Database (CDD) (http://www.ncbi.nlm.nih.gov/ Structure/cdd/wrpsb.cgi), available from the National Center for Biotechnology Information, and the SMART (a Simple Modular Architecture Research Tool) database (http://smart.embl-heidelberg.de/), with a threshold e-value of less than 1e-10 [44-46].

Phylogenetic Analyses
Phylogenetic analyses were conducted using two methods: neighbor-joining (NJ), and maximum likelihood (ML). NJ trees were constructed using MEGA 5.0 with Poisson correction, pairwise deletion, and bootstrap (1000 replicates) [47]. Multiple sequence alignments were performed using ClustalW with default parameters in MEGA 5.0 (Table S1, Figure S2). PhyML 3.0 and RaxML v7.0.4 were used to construct ML trees, with the Jones, Taylor, and Thorton (JTT) model and gamma distribution option, and rooted using phosphoinositide-specific phospholipase C AT1G13680 and AT1G49740 as outgroups, as these two phospholipases belong to families other than the 5PTases [48,49].

Motif and Synteny Analyses
All 5PTase amino acid sequences were used to search against the PFAM, CDD, and SMART databases to uncover other known domains or motifs apart from the 5PTase domain. Multiple Em for Motif Elicitation (MEME) software (v4.9.0) was used to discover novel conserved motifs that might not be recorded in public databases [50]. Duplicate gene pairs were searched for any evidence of synteny using the Plant Genome Duplication Database (http://chibba.agtec.uga.edu/duplication) [51,52].

Expression Analysis
For the expression profile analysis of 5PTase genes in Arabidopsis and rice, ATH1 22k and Os 51k microarray data from the Genevestigator V3 database were used, and then the heat maps were constructed using the obtained gene expression datasets, respectively [53]. Transcriptome data of soybean 5PTase homologs in representative tissues were analyzed [54]. For heat, cold, and drought stress analysis, two-week-old soybean was exposed to 42 • C, 4 • C, and PEG400 for 1 h, respectively. Each treatment consisted of three biological replicates, and each biological replicate consisted of pools of six plants. Total RNA was isolated, and RNA sequencing (RNA-seq) was performed on the Illumina HiSeqTM2000 platform.

Calculation of Ka/Ks-Values
The ratio of non-synonymous to synonymous substitutions (Ka/Ks) was estimated to investigate the molecular evolution of 5PTase genes. The duplicated 5PTase coding sequences (CDS) were aligned using ClustalW (http://www.genome.jp/tools/clustalw/) to predict the divergence periods. Based on previous research, a Ka/Ks ratio of <1 indicates purifying selection, a ratio =1 indicates neutral evolution, and a ratio >1 indicates positive selection [55].

Identification of 5PTase Genes in Plants, Animals, and Fungi
Using the Hidden Markov model (HMM) algorithm, the complete set of 5PTases genes were identified from a comprehensive dataset that contained selected plants, animals, and fungi. In total, 318 sequences were retrieved from 16 plants, five animals, and one fungus ( Figure 2). 5PTase genes were determined based on whether the corresponding protein contained the 5PTase domain. The 5PTase proteins that were identified ranged in size from 169 to 1916 amino acids. Among the major lineages of green plants, 5PTase genes are present in algae, bryophyta, gymnosperms, and angiosperms ( Figure 2). Further investigation revealed that the copy number of the 5PTase genes varied considerably among plants, ranging from three in the green algae Chlamydomonas reinhardtii to 12 in Physcomitrella patens (bryophyta), 21 in Oryza sativa (monocot), and 15 in Arabidopsis thaliana (eudicot), with the highest copy number being 39 in Glycine max (eudicot). Land plants contained higher copy numbers than algae, indicating that duplications of 5PTase genes likely occurred after land plants diverged from green algae. The 5PTase genes are also widespread among animals, from simple invertebrates, such as the Caenorhabditis elegans, to mammals, such as humans, with the gene copy number ranging from four to ten (Figure 2).
To standardize the gene names, we adopted a common nomenclature system from previous studies based on the names of Arabidopsis, human, and yeast genes [11,16]. For yeast, human, and Arabidopsis genes with known functions or previous research, the published gene names were retained and used as a reference. Rice genes that were found to be orthologous to Arabidopsis were named after the Arabidopsis genes. Finally, recent paralogs were distinguished using a lower-case letter after the number. Phylogenetic relationships of species that were used in this study. The species tree was constructed based on the phylogeny of conserved nuclear genes [56][57][58]. The total number of 5PTase proteins found in the genome of each species is indicated.

Phylogenetic Classification of 5PTase Genes into Four Groups
The overall sequence identity between different 5PTases was very low (approx. 10%). In order to explore the evolutionary relationships of eukaryotic 5PTase genes, we conducted phylogenetic analyses with amino acids containing the conserved 5PTase domain from representative species (Figure 3, Table  S1, and Figure S2). Based on the phylogenetic analyses, the eukaryotic 5PTase genes can be divided into four major groups. Notably, these four groups are inconsistent with the four types of animal 5PTases reported by Majerus et al. [2]. Among these, Group I contains members from fungi, plants, and animals; Group II contains genes from plants and animals; whereas Group III only contains genes from plants ( Figure 3). Group I contains genes from fungi, animals, and plants, indicating an early origin from the most recent common ancestor (MRCA) of the three kingdoms. More angiosperm 5PTase genes were identified in Group I than in fungi and animal 5PTases, indicating that this group of genes was expanded in angiosperms. Group II contains plant and animal 5PTase genes, indicating that they originated before the separation of plants and animals. Four Arabidopsis 5PTase genes (At5PTase12/13/14, and AtFAR3) were identified in Group III, all of which contained several WD40 domains in their N-terminus [11]. Group IV also contains genes from fungi, animals, and plants, but their homologs were lost in angiosperm (Figure 3). In addition, angiosperm in Groups I and III had higher gene copy numbers compared with fungi and animals, suggesting the expansion and functional divergence of 5PTase genes in angiosperms.

Structural Analysis of 5PTase Family Genes
Intron loss and gain have relative importance in eukaryotic evolution [62]. The exon/intron organizations of different 5PTase genes were examined for available genome sequences. Most plant members exhibit a similar exon/intron organization in terms of exon length and intron number, whereas there was a considerably more diverse number and length of introns in human 5PTase genes ( Figure 4). For instance, 5PTase genes in plants have 6-12 introns; 6% (27/75) of the genes have nine introns, 20% (15/75) have ten, 2.6% (2/75) have 11, and 2.6% (2/75) have 12 introns ( Figure 4A). Whereas human 5PTase genes have 9-31 introns and are strikingly distinct in intron arrangement and number between different genes ( Figure 4B). The great differences in exon/intron organization between plant and human 5PTases suggest that intron gain or loss may play an important role in the structural evolution of the 5PTase family.

Identification of Multiple Duplication Events in Land Plant 5PTases
To further investigate the evolution and duplication events of the 5PTase gene family, we conducted phylogenetic analyses in plants ( Figure 5). 5PTase genes were divided into two subgroups, with the corresponding proteins of most members in Subgroup II containing several WD40 domains (Figure 4). Within Subgroup I, there were three independent duplications in angiosperm and eudicots, suggesting that the duplication events likely occurred in the ancestor of a corresponding lineage ( Figure 5). The duplication of eudicot At5PTase10/6 occurred in the common ancestor of core eudicots, and the two duplications of At5PTase3/5/7 occurred before the divergence of eudicots and monocots. Within Subgroup II, there was one independent duplication in core eudicots, suggesting that the duplication events likely occurred in the ancestor of core eudicots ( Figure 5). In addition, 39 5PTase genes were identified in Glycine max, along with 16 soybean leaf pairs ( Figure 5), demonstrating that more gene duplications occurred in soybean 5PTase. Figure 5. Phylogenetic analysis of plant 5PTase genes. The tree topology of 5PTase protein sequences was constructed by Maximum Likelihood (ML) using the Jones, Taylor, and Thorton model among Arabidospis, rice, moss, Medicago truncatula, soybean, Lotus japonicas, and Chalamydomonas reinhardtii. The 5PTase protein sequences were aligned using MAFFT (MSA tool that uses Fast Fourier Transforms). The red stars represent major duplication events.
WGDs are common in angiosperms, so in order to explore whether plant 5PTase genes were caused by genome duplication, we searched for possible synteny in genomic regions containing the 5PTase genes. Two and seven pairs of Arabidopsis and rice 5PTases, respectively, were found in syntenic genomic regions, indicating that these multiple gene copies are the result of whole genome or segmental duplications ( Figure 6, Table 1). To investigate the molecular evolution of 5PTase genes, the ratio of non-synonymous to synonymous substitutions (Ka/Ks) was estimated (Table 1), which revealed that the Ka/Ks ratios varied from 0.06 to 1.87 among three differential species (Table 1). Most of the 5PTase duplicates displayed lower Ka/Ks ratios (less than 1), indicating that these duplicates experienced strong purifying selection. The Ka/Ks ratio of Os5PTase9b/9c was 1.01, suggesting that these two 5PTase genes were constrained by neutral evolution in rice, whereas the Ka/Ks ratio of Os5PTase1a/1b was 1.87, indicating that its evolution was driven by positive selection, thus allowing it to neo-functionalize. Soybean contains multiple gene duplicates and Ka/Ks ratios that range from 0.07 to 0.39, with none being greater than 1 (Table 1).

Expression and Functional Divergence of 5PTase
To investigate differences in the expression of 5PTase genes, we analyzed the expression profiles of At5PTases and Os5PTases during plant development, based on the microarray data reported in the Genevestigator (Figure 7). Overall, all of the At5PTase and Os5PTase genes are expressed during the vegetative and reproductive development stages, and display strong organ specificity. In Arabidopsis, most Group I 5PTases are mainly expressed in vegetative organs: At5PTase3, At5PTase5, and At5PTase9 are expressed the highest in the root; At5PTase2, At5PTase4, and At5PTase7 are mainly expressed in the leaf; whereas most of the Group III 5PTases are mainly expressed in the reproductive organs ( Figure 7A). In addition, differential expression patterns also exist among 5PTases in the same group. For example, At5PTase8, one 5PTase gene in Group I, is mainly expressed in old flowers but not in the vegetative organs. In rice, most Os5PTases are highly expressed in the reproductive organs, and other Os5PTases are mainly expressed in the leaves and shoots. Consistent with At5PTase expression, rice 5PTases also display both similar and differential expression patterns among the same groups ( Figure 7B). Apparently, Os5PTase6, Os5PTase4b, and Os5PTase10b are mainly expressed in early inflorescence; Os5PTase11, Os5PTase7a, Os5PTase5a, Os5PTase5b, and Os5PTase12b are only highly expressed in anthers; whereas, Os5PTase1b, Os5PTase3, Os5PTase9b, Os5PTase8, Os5PTase4a, and Os5PTase10a are only highly expressed in pistil. All of these results indicate both the conservation and diversification of expression among 5PTase genes in Arabidopsis and rice. The Ka/Ks ratios were less than 1 in soybean 5PTase paralogs, suggesting that soybean 5PTase genes have undergone strong negative selection pressure (Table 1). To gain insight into the potential developmental roles of Gm5PTase genes, the expression profiles of Gm5PTase genes were analyzed in 28 developmental stages/tissues, based on previous transcriptome data [54]. Overall, many genes displayed a distinct tissue-specific expression pattern, suggesting specific roles in particular stages of development ( Figure S3). For example, seven Gm5PTase genes (Gma03G081200, Gma20G000800, Gma09G285200, Gma08G093200, Gma19G173900, Gma05G070400, and Gma13G185500) are mainly expressed in the root, and seven Gm5PTase genes (Gma17G153000, Gma10G045100, Gma03G173000, Gma07G107000, Gma13G132700, Gma05G180400, and Gma08G138000) are mainly expressed in flowers. These tissue-specific expression patterns are consistent with the expression patterns of At5PTases and Os5PTases (Figure 7).
To further investigate the functional roles of Gm5PTase genes, we analyzed the transcriptomic datasets of soybean 5PTases under different abiotic stresses, such as drought, cold, and high temperature (Li, Wang, and Yuan, unpublished data) ( Figure S3). Many Gm5PTase genes showed specific induction under the cold, heat, and drought. For example, seven, nine, and seven Gm5PTase genes were induced under cold, drought, and heat stress, respectively ( Figure S4), indicating that soybean 5PTase genes might play important roles in stress responses.

5PTases Genes Have Been Conserved during Speciation
Multiple 5PTase genes were found in each of the plant, animal, and fungal genomes analyzed, including dicots, monocots, bryophytes, vertebrates, and invertebrates ( Figure 2). This indicates that 5PTase genes were present prior to the diversification of plants, animals, and fungi ( Figure 3). In land plants, more 5PTase genes were identified than in animals, indicating that the number of 5PTase genes in plants underwent expansion. Based on the phylogenetic analyses, the 5PTases diverged earlier during specification (Figure 3). In addition, the Ka/Ks ratio revealed that most plant 5PTase protein families underwent a process of purifying selection ( Table 1), suggesting that the 5PTase protein families tended to be stable during the long evolutionary process.
Multiple gene copies were found for the plant 5PTases, and many of them were localized in syntenic genomic regions, indicating that they resulted from whole genome duplications (WGDs) or segmental duplications, and in our study, 17 Gm5PTase paralogous pairs were identified in the soybean genome. In a previous study, the soybean genome was shown to have experienced two WGD events; one occurred 59 million years ago (mya) and the second occurred 13 mya [63,64]. The divergence period of Gm5PTase gene pairs ranged from 6.41 to 15.68 mya, indicating that these Gm5PTase pairs were formed by the most recent WGD, which also suggests that the WGD duplication was the main mechanism for the 5PTase gene family expansion during the evolution of soybean. It has been previously reported that regulatory genes (i.e., protein kinases and transcription factors) and signaling genes are more likely to be retained after duplication events compared to the genome-wide average [65][66][67]. During cell growth, development, and stress responses, 5PTases are the key enzyme of the phosphatidylinositol (PI) pathway. They serve as molecular "hubs" during PI signaling and have, therefore, been conserved throughout evolution.

Functional Divergence in the Evolution of 5PTase Genes
Gene duplication provides the raw material for functional innovation [68]. 5PTase proteins are present in nearly all domains of life and display an ancient origin before the divergence of fungi, plants, and animals ( Figure 3). This long evolutionary history has allowed a great deal of sequence divergence and resulted in low sequence identities between different subfamilies. Yet, the key functional domain of 5PTases remained constant in most members, with likely conservation of the original 5-phosphatases activity being the possible result of strong selection.
Previous studies have shown that functional divergence could have occurred by changes in gene expression patterns or protein subcellular localizations [69][70][71]. Intron loss or gain is important in generating structural diversity and complexity, which further promotes variability [71,72]. Members of the 5PTase pairs showed diversity in their transcription profile. As shown in Figure 7, AT5PTases and Os5PTases have shown diversity in different developmental stages: sub-cellular localization in the nucleus, chloroplast, cytoplasm, plasma membrane, and substrate specificity to inositol polyphosphates and phosphoinositides (Figure 8). In addition, the numbers of introns and the intron arrangement are strikingly distinct between plants and animals, which also contributes to the functional divergence and diversity of 5PTase family proteins during the evolutionary process in plants and animals. Therefore, the functional divergence of 5PTase has apparently arisen from protein characteristics (including sub-cellular localization, and substrate specificity), gene structural diversity (exon/intron structure), and expression patterns. These results indicate the functional divergence of 5PTases in plants and animals during their long-term evolutionary history. specificity and protein localization of At5PTases. FRA3, 5PTase9, 5PTase12, and 5PTase14 were localized in the nucleus; 5PTase2, 5PTase6, 5PTase7, and 5PTase8 were localized in the cytoplasm; 5PTase1 was localized in chloroplasts; 5PTase13 was localized in the nucleus and cytoplasm; 5PTase11 was localized in the cytoplasm and plasma membrane; and 5PTase4 was localized in the nucleus, cytoplasm, and plasma membrane. Different localizations were displayed with different colors. 5PTase 1, 5PTase 2, 5PTase 6, 5PTase 12, and 5PTase 13 belong to Type I 5PTases and purple lines are used to connect them with the 5-phosphatase substrates. 5PTase 4, 5PTase 7, 5PTase 8, 5PTase 9, 5PTase 11, 5PTase14, and FRA3 belong to Type II 5PTases and black lines are used to connect them with 5-phosphatase substrates [2]. (B) 5PTases are crucial for multiple processes of plant growth, including seed germination [35], vascular patterning [36,37], root hair initiation [35], and salt stress [40,41].

Conclusions
Highly retained proportions of 5PTases were observed in plants and animals using phylogeny analysis. 5PTases in plants and animals play a complicated role in many growth and metabolic processes. Different 5PTases show complex and cross-linked substrate specificity ( Figure 8). However, the function of each plant 5PTase and its molecular mechanisms are still unclear, and the interactions among 5PTases are still barely understood. Further studies will be carried out to explore the functions of 5PTase genes in plants.
Author Contributions: Z.Z., H.Y. and L.C. designed the research. Z.Z. wrote the manuscript. Y.L., Z.L., S.K., and Y.Z. performed the identification of 5PTases, protein structure, evolution analysis, and expression analysis. C.Z. and W.Z. participated in manuscript preparation and revision. All authors read and approved the final manuscript.
Funding: This research received no external funding.