Evolutionary Understanding of Aquaporin Transport System in the Basal Eudicot Model Species Aquilegia coerulea

Aquaporins (AQPs) play a pivotal role in the cellular transport of water and many other small solutes, influencing many physiological and developmental processes in plants. In the present study, extensive bioinformatics analysis of AQPs was performed in Aquilegia coerulea L., a model species belonging to basal eudicots, with a particular focus on understanding the AQPs role in the developing petal nectar spur. A total of 29 AQPs were identified in Aquilegia, and their phylogenetic analysis performed with previously reported AQPs from rice, poplar and Arabidopsis depicted five distinct subfamilies of AQPs. Interestingly, comparative analysis revealed the loss of an uncharacterized intrinsic protein II (XIP-II) group in Aquilegia. The absence of the entire XIP subfamily has been reported in several previous studies, however, the loss of a single clade within the XIP family has not been characterized. Furthermore, protein structure analysis of AQPs was performed to understand pore diversity, which is helpful for the prediction of solute specificity. Similarly, an AQP AqcNIP2-1 was identified in Aquilegia, predicted as a silicon influx transporter based on the presence of features such as the G-S-G-R aromatic arginine selectivity filter, the spacing between asparagine-proline-alanine (NPA) motifs and pore morphology. RNA-seq analysis showed a high expression of tonoplast intrinsic proteins (TIPs) and plasma membrane intrinsic proteins (PIPs) in the developing petal spur. The results presented here will be helpful in understanding the AQP evolution in Aquilegia and their expression regulation, particularly during floral development.


Introduction
Aquaporins (AQPs) are pore-forming membrane proteins that belong to the major intrinsic protein (MIP) family. Aquaporins facilitate selective transport of water and many other small solutes across the biological membranes. The small solutes transported through AQPs include urea; CO 2 ; H 2 O 2 ; and metalloids such as silicon (Si), boron (B), and germanium (Ge) [1,2]. Aquaporins are present in in several plant species has revealed tissue, growth stage, and environmental conditions of specific expression [13,20,31,34]. However, the RNAseq approach has limitations due to inadequate correlation between level of transcription and protein abundance. In this regard, the present study was aimed to characterize the AQP gene family in Aquilegia to understand their evolution and gene expression dynamics during the flower development. The aim of the present study was to understand the evolution of AQPs in the Aquilegia genome and subsequently study the expression profiles, gene structure, protein motifs, substrate specificity, and pore morphology of AQPs using computational tools.

Genome-Wide Identification of Aquaporins in Aquilegia coerulea
The Aquilegia coerulea genome v3.1 was retrieved from the Phytozome database (www.phytozome. net) [35]. A local database of protein sequences of Aquilegia was created using BLAST utilities provided in the Bioedit software package Version 7.0.9.0 [36]. The AQP genes were identified with BLASTp search performed using known AQPs from rice, Arabidopsis, and poplar as query sequences. An e-value of 10 −5 and a bit-score greater than 100 were used as a cut-off to identify significant matches. Aquaporin homologs identified with these criteria were used for further analysis.

Classification and Phylogeny of Aquilegia coerulea Aquaporins
Multiple sequence alignment of AQPs was performed using the CLUSTALW tool implemented in the MEGA7 software suite [37]. A phylogenetic tree of AQPs was generated using the maximum likelihood (ML) method, and 1000-bootstraps was performed to measure the stability of branch nodes in the ML tree. The classification of AQP subgroups was performed in accordance with the nomenclature of known AQPs from rice, Arabidopsis, and poplar, which were used as a query. A combined phylogenetic tree of AQPs from Aquilegia, Arabidopsis, rice, and poplar was also constructed. Phylogenetic tree of plant orders was generated based on National Center for Biotechnology Information (NCBI) taxonomy using the online server phyloT (http://phylot.biobyte.de/).

Gene Structure and Conserved Motif Analysis of Aquaporins in Aquilegia coerulea
Conserved domains for Aquilegia AQPs were identified by using the batch mode of NCBI's Conserved Domain Database (CDD www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml). All the known features of AQPs were identified using Microsoft excel utilities and protein sequence alignments generated with MEGA7. Aquaporins with a single or missing NPA motif were manually curated or by using the blastp search in NCBI. The identified AQPs were further screened with TMHMM, SOSUI, and TOPCONS software tools (www.cbs.dtu.dk; http://bp.nuap.nagoya-u.ac.jp) to identify transmembrane (TM) domains [38]. All the AQPs have two half TM helix which make it difficult to correctly predict the TM domains. Therefore, three different softwares were used to confirm the TM domain prediction and, subsequently, homology-based tertiary protein structure modeling was also used to confirm the TM domains.
Conserved motifs in the AQP proteins were identified using the 'Multiple Expression Motifs for Motif Elicitation' (MEME), an online accessible software program [39]. The default settings (minimum width 6 and maximum width of 50 amino acid motifs) were used for the initial MEME scan. The final output of MEME was manually examined.
The gene structure of Aquilegia AQPs was visualized using the Gene Structure Display Server (GSDS) (http://gsds.cbi.pku.edu.cn/) [43]. The chromosomal locations in the Aquilegia genome were assigned to each AQP genes and subsequently visualized using Mapchart software [44].

RNA-Seq Data Analysis
Gene expression of all the identified AQPs in the Aquilegia genome was evaluated by using previously published RNA-seq data [29]. The distal 0.5 mm of flower petal spur cups and blades at three stages of development (1 mm, 3 mm, and 6-7 mm spur length) were used for RNAseq analysis [30]. The transcript abundance was normalized by the FPKM (fragments per kilobase of transcript per million mapped reads) using cufflink tools. The normalized expression data for all Aquilegia AQPs were analyzed further with the MultiExperiment Viewer (MeV_4-9-0) software tool (http://www.tm4.org/mev.html). RNA-Seq data for different petal and spur tissues and conditions are deposited in the public database under the BioProject ID PRJNA270946 and previously described in Yant, Collani, Puzey, Levy and Kramer [29].

Genome-Wide Identification, Classification and Phylogenetic Distribution of Aquaporins in Aquilegia coerulea
Initially, a total of 35 AQPs were identified in the Aquilegia genome based on a homology search performed using known AQPs from rice, Arabidopsis, and poplar. Subsequently, 29 AQP genes were sorted after removing alternate transcripts of the same gene and truncated genes (Table 1). A similarity search performed using the conserved domain database (CDD) tool categorized all 29 AQPs as members of the MIP family and confirmed the presence of two NPA motifs (Table S1). Transmembrane domain prediction tools identified six signature TM domains in 24 out of 29 AQPs (Table S2).
Phylogenetic analysis of Aquilegia AQPs using the maximum likelihood (ML) method was performed along with the known AQPs from Arabidopsis, rice, and poplar, and revealed five distinct subfamilies: PIP, TIP, NIP, SIP, and XIP ( Figure 1). Based on the phylogenetic distribution and homology with known AQPs, the nomenclature of 29 Aquilegia AQPs was assigned ( Figure 1, Figure S1). The largest subfamily, TIP, comprises ten genes that were further divided into five distinct groups ( Figure S1). The nomenclature for the second-largest subfamily, NIP, was performed based on the similarity with rice or Arabidopsis NIPs. Therefore, a similar nomenclature followed for the NIPs in Aquilegia. The third and fourth subfamilies, PIP and SIP, both contained two groups. All the four members of the XIP family clustered together in the XIP1 group. The XIP2 group was entirely absent in Aquilegia when compared to the other three species used in the phylogenetic tree ( Figure 1, Figure S1). Comparisons of AQPs previously identified in 25 plant species have also indicated the entire loss of one of the two XIP group in Citrus clementina, Cajanus cajan, and many others [1]. The number of XIPs was more in aquilegia compared to other species ( Figure 2). The XIP family is absent in monocots and some of the dicots (Figure 2). Information about the taxonomical distribution of AQPs in different plant species, including basal eudicots, provides the basis for an evolutionary understanding of the subfamilies ( Figure 2).

Gene Structure and Conserved Motif Analysis of Aquaporins in Aquilegia coerulea
The considerable variation of amino acid residues in the NPA motifs, ar/R SF, and Froger's residues was observed across Aquilegia AQPs ( Table 1). Most of the AQPs contained dual NPA motifs, with the exception of seven genes that show a variation at the first or last amino acid of the NPA motif. All the members of the PIP and TIP subfamilies possessed the conserved NPA motifs. In the NIP subfamily, seven out of nine members showed the conserved NPA motifs, but two genes of this subfamily showed a variation in one of the two NPA motifs. In the case of NIP4-3, serine was substituted for arginine in the first NPA motifs, changing it to NPS. Contrastingly, in the case of NIP5-1, valine was substituted for arginine at the second NPA motif, making it NPV (asparagine-proline-valine) instead of NPA. The presence of NIP2s with required selective filters and 108 spacing between two NPA motifs suggest its ability to uptake silicon. In the SIP subfamily, SIP1-1 showed the substitution of threonine instead of alanine (NPT), and SIP2-1 showed the substitution of leucine at the first NPA motif (NPL). In the XIP subfamily, three out of four members showed substitution of valine in the first NPA motif and substitution of threonine in the case of XIP1-4 (Table 1). Interestingly, XIP1-1 also showed substitution of histidine at the second NPA motif, making it HPA (histidine-proline-alanine) instead of NPA, which has not been reported previously.

Subcellular Localization and Exon-Intron Organization of Aquilegia Aquaporins
The entire set of Aquilegia AQPs was predicted to be localized to the plasma membrane by the CELLO server, while the WoLF PSORT sever predicted vacuole localization for the seven AQPs (Table S2). Exon-intron structure analysis detected the presence of a varying number of introns among the AQPs, contributing to a variation in gene length ( Figure 3A). A range of two to five introns per gene was observed. The distinct pattern of intron-exon organization observed among the Aquilegia AQP gene family correlated well with their phylogenetic distribution. Most of the phylogenetically related AQPs shared similar gene organization, suggesting the organization was present in their common ancestor before duplication. Based on the physical distribution of the loci, the highest number of AQPs were found on chromosome 7, while chromosome number 2 possessed the least number ( Figure 3B). High levels of variation across the five AQP subfamilies for isoelectric point (PI) and molecular weight were observed. The highest molecular weight was observed in NIP2-1 (33.5 kDa), while TIP4-2 possessed the lowest molecular weight (24.4 kDa). Varying degrees of average PI were calculated for the various subfamilies, with NIPs having average PI of 6.42; PIPs with 8.4; and XIPs with 6.62 ( Figure S2). SIPs showed the highest value for PI at 9.3, while the lowest PI of 5.9 was observed in TIPs (Table S3).

Characteristic Secondary and Tertiary Protein Structure of Aquilegia Aquaporins
The 2D structure analysis of the Aquilegia AQPs provided the key to understand the basic structure and organization, such as the visualization and prediction of NPA spacing, Ar/R selectivity filters, and Froger's residue. The homology-based tertiary (3D) protein structure of all 29 Aquilegia AQPs was predicted to confirm 6 TM domains, hourglass-like configuration, and transcellular pores capable of transporting solutes ( Figure 4A,B). The entire set of Aquilegia AQPs showed the presence of six TM domains, typically arranged to form an hourglass-like structure ( Figure 4C). A transmembrane pore was also predicted in all the AQPs. Pore morphology studied using the MOLE2 software tool showed vast diversity for the pore-lining residues in terms of channel radius, length, polarity, charge, hydrophobicity, and hydropathy ( Table 2). Based on the biochemical properties of pore-lining residues, regional hydrophobicity across the pore was predicted ( Figure 5). In TIPs, hydrophilic pore openings towards both ends and a hydrophobic nature in the middle are generally uniform features. The only notable exception was TIP1-2, where the pore ends were observed to be hydrophobic, but the middle region was hydrophilic ( Figure 5). Another interesting observation is the presence of hydrophilic residues at one end but a hydrophobic nature at the opposite end, for instance, in NIP1-1, NIP4-3, and PIP2-1. More interestingly, SIP2-1 has unique hydrophilic pores and lacks a hydrophobic region ( Figure 5).   Table 2. Physicochemical properties of the pore structure predicted for the Aquilegia aquaporins (AQPs). Three-dimensional structures of AQPs were predicted by using the Phyre2 server and subsequently visualized and analyzed using the Mole 2.5 server to predict the physicochemical properties of AQP pores, such as hydropathy, bottleneck, polarity, and mutability.

RNA-Seq Data Analysis
Previously published transcriptome data of three early developmental stages of the Aquilegia petal spur cup and blade were analyzed to understand the expression dynamics of AQPs [29]. A total of 26 out of 29 AQPs were expressed in at least one of the tissues/stages. The comparison of normalized RNA-seq data in terms of FPKM (fragment per kilobase of transcript per million mapped reads) identified the TIPs and PIPs as highly expressed AQPs subfamilies. In particular, TIP1-2 and PIP1-2 were found to be the most highly expressed AQPs ( Figure 6). The magnitude of expression observed in TIPs and PIPs was several-fold higher than the other subfamilies. For instance, the most highly expressed member of TIP showed about 1792 FPKM in a 3-mm blade, PIP about 1492 FPKM in a 3-mm cup and SIP about 136 FPKM in a 1-mm cup, whereas the most highly expressed genes from NIPs and XIPs have 5.75 and 24.42 FPKM values, respectively. Since RNA expression does not necessarily correspond to protein presence and functionality, additional proteomic efforts are needed to confirm the AQP abundance.

Discussion
In the present study, genome-wide identification and characterization of AQPs were performed in Aquilegia to understand the evolution and molecular role of this family, particularly in the development of the petal nectar spur. When compared to animals, plants usually possess a higher number of AQPs, which have evolved into specific subfamilies that differ in solute specificity, subcellular localization, and molecular functions [12]. Plant AQPs are also diversified in terms of their expression profiling, with specific patterns in various tissues and under certain environmental conditions [13,15]. The higher number of AQPs in plants compared to animals is probably due to the higher frequency of whole-genome duplication events during plant evolution [18], and also possibly due to the plant physiology and sessile nature of life. The AQP gene family appears to be more diverse in moss and spike moss lineages compared to the seed plants [12,49]. For instance, moss and spike moss have seven subfamilies, including two additional HIP and GIP, along with the five subfamilies of AQPs found in most of the seed plants, including Aquilegia. Among the seed plants, variation was observed in the distribution of five subfamilies. The entire monocot clade, as well as some dicot families such as the Brassicaceae, have lost the entire XIPs lineage [1,15,50]. The absence of the XIPs in rice and Arabidopsis, the two extensively studied model plants, limits the efforts to characterize the functions of XIPs when compared to other subfamilies. In this regard, Aquilegia may serve as a useful model, first, because Aquilegia is roughly phylogenetically equidistant from monocots and dicots and, secondly, it has only lost the XIP2s lineage. The absence of the entire XIP subfamily has been reported in several studies; however, loss of a single group from XIP has not been previously reported. The occurrence of XIP1s only in basal eudicots like Aquilegia suggests a subsequent loss in monocots and diversions of two distinct XIP groups in the majority of dicots. As with Aquilegia, three other species were found to have only one XIP, which provides an opportunity to study the XIP's role more efficiently. The higher number of XIPs may share the functional role among the members, or they may work together in a dose-dependent manner, but having only one XIP may have a more versatile role.
In Aquilegia, the PIP subfamily is differentiated into two groups, as observed in all previous studies. However, the TIPs were grouped into five groups that were comparable to only one group observed in Physcomitrella patens [51]. The observations suggest that the diversification of PIPs occurred much earlier than that of TIPs.
Among NIPs, NIP2 (NIP-III) involved in silicon transport was identified in the Aquilegia, but was reported to be absent from many dicot species, including the entire Brassicaceae family. An earlier study has shown that only NIP-III present in the plant species is capable of accumulating a significant amount of silicon (>0.5%) under a condition of supplemented Si [52][53][54][55]. Furthermore, some exceptions have been observed; tomato and citrus, for example, are unable to uptake Si. The inability of these species appears to be associated with the deviation from 108 amino acid spacing between the NPA motifs [1]. Based on the NPA spacing criteria, several novel Si transporters have been identified. Interestingly, NIP-IIIs from horsetails, a primitive plant species which accumulates over 10% Si, have precise 108 amino acid spacing [56,57]. To date, no exception was observed where NIP-III with 108 NPA spacing was absent, but the plant species are able to uptake a significant amount of Si [58,59]. The presence of NIP-III (NIP2-1) with exact 108 NPA spacing indicates that Aquilegia species are capable of Si uptake. However, to confirm the predisposition, experimental validation is required.
Gene structure organization is helpful to illustrate the evolution of a gene family. As expected, the similar exon-intron organization was observed in the phylogenetically close homologs. This pattern of conserved exon-intron organization of AQPs has also been observed in several plant genomes, including rice, Arabidopsis, soybean, brassica, and flax [1,[12][13][14]60,61]. Overall, two to five intron per gene is the most frequently observed structure across the species. None of the Aquilegia AQPs were intron-less, as observed previously. However, intron-less genes are predicted to be newly evolved compared to genes abundant in intron [62]. The intron-less genes are thought to be evolved through retrotransposon activity. All the AQPs with introns suggest a shared ancestral origin and indicate at least the role of a retrotransposon in AQP family expansion. As with exon-intron organization, biochemical and physical properties of AQPs were also well aligned with the phylogenetic distribution. Molecular weight and pI [63] of AQPs are indicative of their subcellular localization. Recently, proteomic analysis in the Arabidopsis genome revealed a relatively lower pI (6.69) for vacuolar proteins when compared to all other proteins (pI 7.40). This is consistent with the low PI for Aquilegia TIPs compared to PIPs and SIPs, as observed in the present study.
Usually, the solute specificity of the AQPs is determined by the pore-lining amino acids and constriction. Solute specificity of the AQPs has largely been predicted based on conserved amino acids present at ar/R SF, NPA domains, and Forger's residues (Figure 4). These conserved positions are usually predicted based on sequence alignment. However, protein 3D structure has never been considered for the solute prediction. In this regard, the pore morphology studied in the present study will provide novel insight for the prediction of solute specificity. The pore morphology and hydropathy plots showed varying degrees of hydrophobic and hydrophilic regions in the AQPs pertaining to the transport of hydrophilic or hydrophobic solutes. The physicochemical analysis of the pore showed a high variation in polarity and hydropathy, which signifies the specificity of the various AQPs to different solutes ( Table 2).
The RNA-seq analysis provided insights into the expression patterns of the various AQPs during different stages of early spur development. Recently, it has been reported that the spur development in Aquilegia is the result of localized cell division at the early stages, followed by prolonged anisotropic cell elongation [29,64]. In the present study, higher expression of TIPs and PIPs in the developing spur cup relative to the blade signifies a potential role of AQPs during these developmental stages of the spur. The PIPs and TIPs are known to be highly permeable to water [65]. Hence, their abundance indicated their role in the spur. In Aquilegia, TIPs from different groups were found highly expressed in cups at a different stage of development, indicating a role in spur development, consistent with previous studies showing that TIPs are highly involved in the cell division and elongation process. In Arabidopsis, the expression of the β-glucuronidase (GUS) protein under the AtTIP1;1 promoter showed high activity in stem-and root-elongating tissues [66]. The tightly regulated expression of AtTIP1;2 and AtTIP2;1, leading to spatial and temporal control of the cellular water transport, which is essential during the highly regulated lateral root primordium morphogenesis and emergence. Increased expression of TIPs is also observed in elongating hypocotyls in different crop plants [31,67,68]. In Zea mays, abundant expression of ZmTIP1;1 is observed in expanding cells in roots, leaves, and reproductive organs [69,70]. In a 3-mm cup of Aquilegia, PIP1-2 showed higher expression, similar to the higher expression of DcaPIP1;1 DcaPIP1;3 previously observed in Dianthus during the flower opening stages [71]. In tulips [72,73] and roses [74], members of the PIP1 subfamily have been shown to be involved in water transport and petal expansion.
The XIPs are known to transport several solutes such as glycerol and boric acid in the plants. The characterization of abundantly expressing XIPs with respect to solute permeability is important to establish their role in spur elongation. Notably, the role of boron in flower development is well established in Arabidopsis and Brassica species [75]. The hypothesis for a probable role of XIPs in the development of petal nectar spurs in Aquilegia needs further investigation.

Conclusions
The genome-wide analysis and characterization of AQPs in Aquilegia will be helpful in understanding the evolution of the gene family across angiosperms. The occurrence of only one group in the XIP subfamily suggests the possibility of an independent loss of the entire XIP family from monocots and some of the dicot families (Brassicacea, for example), and further expansion into two groups in the majority of dicots. The expression pattern of AQPs evaluated during different stages of early spur development was helpful in illustrating the probable role. The higher expression of PIPs and TIPs observed in the developing spur indicates the role in cell elongation and differentiation, as previously observed in the root development. Additional efforts are needed to confirm the expression profiling performed here using available RNAseq data. Furthermore, extensive analysis of AQPs revealed the variation in the pore radius, morphology, and hydropathy, which will be suggestive for the prediction of the solute specificity. The AQPs identified in the present study provide an ample amount of information for further characterization and investigation to understand their role in the various developmental stages and physiological processes.
Supplementary Materials: The following are available online at http://www.mdpi.com/2223-7747/9/6/799/s1, Figure S1: Phylogenetic analysis of Aquilegia coerulea aquaporins (AQPs). Phylogenetic tree showing the distribution of aquaporins in five different subfamilies namely NOD26-like intrinsic proteins (NIPs), tonoplast intrinsic proteins (TIPs), plasma membrane intrinsic proteins (PIPs), small basic intrinsic proteins (SIPs) and uncharacterized intrinsic proteins (XIPs). The outgroup was taken from Glycine max (Gm)., Figure S2: Molecular weight and isoelectric point of Aquilegia coerulea aquaporins (AQPs)., Figure S3: Heatmap showing expression profile of aquaporin genes in petal spur of AquilIegia coIerula. Genes with no expression (0 FPKM values) were removed from the heatmap. Table S1: Conserved domain analysis of AQPs identified from Aquilegia coerula using CDD tool from NCBI. Table S2: Transmembrane domains and cellular localization of AQPs identified from Aquilegia coerula using TMHMM, TargetP, Cello and wolfpsort. Author Contributions: H.S., R.D. and T.R.S. designed the research, contributed to data analysis and manuscript writing; S.S., S.M.S., P.K., V.K., P.S., V.B., S.K. and A.N. performed data analysis and wrote the first draft of the manuscript. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.