Evolutionary and Predictive Functional Insights into the Aquaporin Gene Family in the Allotetraploid Plant Nicotiana tabacum

Aquaporins (AQPs) are a class of integral membrane proteins that facilitate the membrane diffusion of water and other small solutes. Nicotiana tabacum is an important model plant, and its allotetraploid genome has recently been released, providing us with the opportunity to analyze the AQP gene family and its evolution. A total of 88 full-length AQP genes were identified in the N. tabacum genome, and the encoding proteins were assigned into five subfamilies: 34 plasma membrane intrinsic proteins (PIPs); 27 tonoplast intrinsic proteins (TIPs); 20 nodulin26-like intrinsic proteins (NIPs); 3 small basic intrinsic proteins (SIPs); 4 uncharacterized X intrinsic proteins (XIPs), including two splice variants. We also analyzed the genomes of two N. tabacum ancestors, Nicotiana tomentosiformis and Nicotiana sylvestris, and identified 49 AQP genes in each species. Functional prediction, based on the substrate specificity-determining positions (SDPs), revealed significant differences in substrate specificity among the AQP subfamilies. Analysis of the organ-specific AQP expression levels in the N. tabacum plant and RNA-seq data of N. tabacum bright yellow-2 suspension cells indicated that many AQPs are simultaneously expressed, but differentially, according to the organs or the cells. Altogether, these data constitute an important resource for future investigations of the molecular, evolutionary, and physiological functions of AQPs in N. tabacum.

While many plant AQPs primarily function as water channels, they can also transport a wide range of substrates, such as ammonia (NH 3 ), antimony (Sb), arsenic (As), boron (B), glycerol, hydrogen

Genome-Wide Identification and Characterization of NtAQP Genes
The whole genome shotgun sequence of N. tabacum and its two ancestors, N. tomentosiformis and N. sylvestris, were searched for AQP genes, using pBLAST and AQP sequences from S. tuberosum and S. lycopersicum as queries. NtAQP protein sequences were analyzed and compared with SlAQP and StAQP for domain identification and functional annotation. Of 101 initial unique hits for NtAQPs, 13 were considered AQP pseudogenes and discarded after a manual inspection of their nucleotide and amino acid sequences and their TM domains. We finally obtained 88 genes encoding 90 full-length AQP proteins, and NtXIP1;1 and NtXIP1;2 genes encoding two splice variants (α and β), as shown in Table 1. This represents the greatest AQP gene number in a Solanaceae plant genome. We identified 49 AQP genes encoding 51 and 50 full-length proteins in two N. tabacum ancestors, namely N. tomentosiformis and N. sylvestris, respectively, as shown in Table S1. The phylogenetic protein analysis showed that NtAQPs cluster into five subfamilies (PIPs, TIPs, NIPs, SIPs, and XIPs) similar to NtoAQPs, NsAQPs, and SlAQP and StAQP, as shown in Figures 1-3. NtAQPs nomenclature was done from protein sequence comparison with the known SlAQP and StAQP, as shown in Figure 1. Sequences belonging to hybrid intrinsic proteins (HIPs) and GlpF-like intrinsic proteins (GIPs) reported in the non-vascular moss Physcomitrella patens [14] were not found. In N. tabacum, we identified 34 PIPs, 27 TIPs, 20 NIPs, 3 SIPs, and 6 XIPs, including two splice variants. Figure 1 shows that the PIPs cluster either into the PIP1 or PIP2 groups, and the NtTIPs into five groups (TIP1 to TIP5), similar to the potato and tomato TIPs [3,7]. Eight NIP groups were found in N. tabacum, contrary to the seven groups in Arabidopsis and soybean [5,11], and three to four NIP groups in poplar, rice, and maize [6,10,12]. Similar to Arabidopsis, rice, maize, poplar, and soybean, N. tabacum had two SIP groups, namely SIP1 and SIP2s, with two and one isoforms, respectively. Two XIP subgroups were observed in N. tabacum, and four XIP subgroups in potato [3]. Subcellular localization prediction was conducted using WoLF PSORT software, and the results were as follows: NtPIPs-plasma membrane (PM) and chloroplast, as shown in Table 1; TIPs-vacuole and PM; NIPs-PM and vacuole; SIPs-PM (SIP2;1 in both the PM and chloroplast); XIPs-PM. These localizations are just predictions and need to be experimentally demonstrated. Part of the predictions are in agreement with the data reported in the literature, but many differences are also observed. For instance, plant PIP2s are not found in the chloroplasts, TIPs are mostly located in the vacuole (and not in the PM, as predicted for many NtTIPs), and NIPs were not identified in the vacuole. SIPs were localized in the PM and/or the ER in Arabidopsis and maize [40] (Lebrun and Chaumont, unpublished data), but never in the chloroplast. The amino acid number, calculated molecular weight (MW), and isoelectric point (pI) of NtAQP homologs are shown in Table 1.
Like their counterparts in other plant species, all PIPs, TIPs, NIP1s, NIP2s, NIP3s, NIP4s, NIP7s, and NIP8s from N. tabacum, have two conserved NPA motifs in loops B and E, as shown in Figure 3 and Figures S1-S5. NIP5s and NIP6s have unusual NPA motifs, in which the alanine in loop E is substituted by a valine, and have a characteristic arginine-rich C-terminus, as shown in Figure 3 and Figure S3). In N. tabacum SIPs, the alanine in the first NPA motif is substitued by either a threonine (SIP1;1) or a leucine (SIP2s) residue, as shown in Figure 3 and Figure S4. All the SIPs have the conserved NPA motif in loop E with a unique characteristic lysine-rich C-terminus, as shown in Figure S4, which contains an ER retention signal [1,41] (Lebrun and Chaumont, unpublished). In the N. tabacum genome, there are four XIP genes, including NtXIP1;1 and NtXIP1;2, which encode two splice variants (α and β) [15]. In N. tabacum XIPs in the first NPA motif (loop B), alanine is substituted by a valine residue, as shown in Figure 3 and Figure S5.  were aligned with all NtAQPs using the Clustal Omega server (http://www.ebi.ac.uk/Tools/msa/ ClustalOmega/) and a phylogenetic tree was constructed using Maximum Likelihood method based on the JTT matrix-based model with 1000 bootstraps. AQPs clustered into five different subfamilies (PIPs, TIPs, NIPs, SIPs, and XIPs). Each AQP subfamily is shown with a specific background color. NtAQPs are indicated in black; StAQPs and SlAQPs are in red and blue, respectively. aligned with all NtAQPs using the Clustal Omega server (http://www.ebi.ac.uk/Tools/msa/Clustal Omega/) and a phylogenetic tree was constructed using Maximum Likelihood method based on the JTT matrix-based model with 1000 bootstraps. AQPs clustered into five different subfamilies (PIPs, TIPs, NIPs, SIPs, and XIPs). Each AQP subfamily is shown with a specific background color. NtAQPs are indicated in black; StAQPs and SlAQPs are in red and blue, respectively.  (Ns) and N. tomentosiformis (Nto) AQPs. The deduced amino acid sequences of NtAQPs, NtoAQPs, and NsAQPs were aligned using the Clustal Omega server (http://www.ebi.ac.uk/Tools/msa/Clustal Omega/) and a phylogenetic tree was constructed using Maximum Likelihood method based on the JTT matrix-based model with 1000 bootstraps. The NtAQPs clustered into five different subfamilies (PIPs, TIPs, NIPs, SIPs and XIPs), with the corresponding NtoAQP and NsAQP subfamilies. Each AQP subfamily is shown with a specific background color. NtAQPs are indicated in black, NtoAQPs are in blue, and NsAQPs are in magenta.

Figure 2.
Phylogenetic relationships among N. tabacum (Nt) AQPs and its two ancestors, N. sylvestris (Ns) and N. tomentosiformis (Nto) AQPs. The deduced amino acid sequences of NtAQPs, NtoAQPs, and NsAQPs were aligned using the Clustal Omega server (http://www.ebi.ac.uk/Tools/msa/ClustalOmega/) and a phylogenetic tree was constructed using Maximum Likelihood method based on the JTT matrix-based model with 1000 bootstraps. The NtAQPs clustered into five different subfamilies (PIPs, TIPs, NIPs, SIPs and XIPs), with the corresponding NtoAQP and NsAQP subfamilies. Each AQP subfamily is shown with a specific background color. NtAQPs are indicated in black, NtoAQPs are in blue, and NsAQPs are in magenta. Subcellular localization prediction was conducted using WoLF PSORT software, and the results were as follows: NtPIPs-plasma membrane (PM) and chloroplast, as shown in Table 1; TIPs-vacuole and PM; NIPs-PM and vacuole; SIPs-PM (SIP2;1 in both the PM and chloroplast); XIPs-PM. These localizations are just predictions and need to be experimentally demonstrated. Part of the predictions are in agreement with the data reported in the literature, but many differences are also observed. For instance, plant PIP2s are not found in the chloroplasts, TIPs are mostly located in the vacuole (and not in the PM, as predicted for many NtTIPs), and NIPs were not identified in the vacuole. SIPs were localized in the PM and/or the ER in Arabidopsis and maize [40] (Lebrun and Chaumont, unpublished data), but never in the chloroplast. The amino acid number, calculated molecular weight (MW), and isoelectric point (pI) of NtAQP homologs are shown in Table 1.
Like their counterparts in other plant species, all PIPs, TIPs, NIP1s, NIP2s, NIP3s, NIP4s, NIP7s, and NIP8s from N. tabacum, have two conserved NPA motifs in loops B and E, as shown in Figure 3 and Figures S1-S5. NIP5s and NIP6s have unusual NPA motifs, in which the alanine in loop E is substituted by a valine, and have a characteristic arginine-rich C-terminus, as shown in Figure 3 and Figure S3. In N. tabacum SIPs, the alanine in the first NPA motif is substitued by either a threonine (SIP1;1) or a leucine (SIP2s) residue, as shown in Figure 3 and Figure S4. All the SIPs have the conserved NPA motif in loop E with a unique characteristic lysine-rich C-terminus, as shown in Figure S4, which contains an ER retention signal [1,41] (Lebrun and Chaumont, unpublished). In the N. tabacum genome, there are four XIP genes, including NtXIP1;1 and NtXIP1;2, which encode two splice variants (α and β) [15]. In N. tabacum XIPs in the first NPA motif (loop B), alanine is substituted by a valine residue, as shown in Figure 3 and Figure S5. Figure 3. Grouping of N. tabacum PIPs, TIPs, NIPs, SIPs, and XIPs based on the ar/R and FPs. The phylogenetic tree was generated as described in Figure 1. The residues in the ar/R selectivity filter and the FPs were identified from the multiple sequence alignment, shown in Figure S1-S5. The ar/R and FP groupings within each subfamily were done based on the corresponding amino acid compositions, which are indicated on the right side of the phylogenetic tree. The solutes predicted, based on substrate specific signature sequences to be transported, are mentioned in square brackets. As, B, C, H, N, Si, Sb, and U indicate arsenic, boron, CO2, H2O2, ammonia, silicon, antimony, and urea, respectively.

NtAQP Gene Structures
The N. tabacum AQP genomic sequences were analyzed for introns and exons, as shown in Figure  4 and Figure S6. Apart from a few inconsistencies, the number and position of introns are conserved within each AQP subfamily. NtPIP genes have two or three introns, except for NtPIP2;3, which has a single intron, and NtPIP1;5, NtPIP1;7, NtPIP1;11, and NtPIP2;8, which have no introns, as shown in Figure 4. Among them, NtPIP2;2 has a very long intron (~15 kb), as shown in Figure S6. The NtTIP The phylogenetic tree was generated as described in Figure 1. The residues in the ar/R selectivity filter and the FPs were identified from the multiple sequence alignment, shown in Figures S1-S5. The ar/R and FP groupings within each subfamily were done based on the corresponding amino acid compositions, which are indicated on the right side of the phylogenetic tree. The solutes predicted, based on substrate specific signature sequences to be transported, are mentioned in square brackets. As, B, C, H, N, Si, Sb, and U indicate arsenic, boron, CO 2 , H 2 O 2 , ammonia, silicon, antimony, and urea, respectively.

NtAQP Gene Structures
The N. tabacum AQP genomic sequences were analyzed for introns and exons, as shown in Figure 4 and Figure S6. Apart from a few inconsistencies, the number and position of introns are conserved within each AQP subfamily. NtPIP genes have two or three introns, except for NtPIP2;3, which has a single intron, and NtPIP1;5, NtPIP1;7, NtPIP1;11, and NtPIP2;8, which have no introns, as shown in Figure 4. Among them, NtPIP2;2 has a very long intron (~15 kb), as shown in Figure S6. The NtTIP subfamily exhibits relatively stable gene structure in comparison with other subfamilies. The majority of them have two introns except for TIP1;2-4 and TIP1;8-9, which have a single intron and NtTIP1;1 with no intron, as shown in Figure 4. The majority of NtNIPs have four introns with variable intron-exon organization, as shown in Figure 4 and Figure S6. NtNIP5;1 has three introns, and NtNIP3s and NtNIP6;1 have five introns, while NtNIP8;2 possesses a unique gene structure with six introns (the greatest number of introns in an AQP gene), one of which is 10 kb long, as shown in Figure S6. The NtSIP genes have two introns, except for NtSIP2;1, which has no intron. The NtXIPs gene structure was very conserved with two introns, except for NtXIP2;1, which has a single intron, as shown in Figure 4. subfamily exhibits relatively stable gene structure in comparison with other subfamilies. The majority of them have two introns except for TIP1;2-4 and TIP1; [8][9], which have a single intron and NtTIP1;1 with no intron, as shown in Figure 4. The majority of NtNIPs have four introns with variable intronexon organization, as shown in Figure 4 and Figure S6. NtNIP5;1 has three introns, and NtNIP3s and NtNIP6;1 have five introns, while NtNIP8;2 possesses a unique gene structure with six introns (the greatest number of introns in an AQP gene), one of which is 10 kb long, as shown in Figure S6. The NtSIP genes have two introns, except for NtSIP2;1, which has no intron. The NtXIPs gene structure was very conserved with two introns, except for NtXIP2;1, which has a single intron, as shown in Figure 4.

Analysis of NtAQPs Ar/R Selectivity Filter and Froger's Position
We identified the four amino acid residues at the ar/R selectivity filter and the five residues in the FPs using sequence alignments, and used them to group the NtAQPs based on the amino acid residue properties and to compare these groups with those of other species, such as tomato and potato, as shown in Figure 3 [3,7,9]. In addition, all NtAQPs were subjected to the ScanProsite tool (http://prosite.expasy.org/scanprosite/), to identify the substrate specificity-determining positions (SDPs) based on the ar/R, FP, and NPA motifs, and thereby the predicted substrate(s) of each isoform, as shown in Table 2, Figure 3, and Table S2. Water is considered as the universal substrate for AQPs, even though some isoforms were shown not to facilitate its diffusion through the membrane [15].

Analysis of NtAQPs Ar/R Selectivity Filter and Froger's Position
We identified the four amino acid residues at the ar/R selectivity filter and the five residues in the FPs using sequence alignments, and used them to group the NtAQPs based on the amino acid residue properties and to compare these groups with those of other species, such as tomato and potato, as shown in Figure 3 [3,7,9]. In addition, all NtAQPs were subjected to the ScanProsite tool (http://prosite.expasy.org/scanprosite/), to identify the substrate specificity-determining positions (SDPs) based on the ar/R, FP, and NPA motifs, and thereby the predicted substrate(s) of each isoform, as shown in Table 2, Figure 3, and Table S2. Water is considered as the universal substrate for AQPs, even though some isoforms were shown not to facilitate its diffusion through the membrane [15]. The ar/R selectivity filter in all the NtPIPs is composed of F, H, T, and R residues in TM2, TM5, LE1, and LE2, respectively, and is identical to the ar/R filter found in all the plant PIPs, as shown in Figure 3. According to the residues located at the P1 of FPs, M or Q (G), NtPIPs cluster into two groups, I and II, as shown in Figure 3. Twelve PIPs (mainly PIP1s) are predicted CO 2 channels and thirteen PIPs (mainly PIP2s) are predicted H 2 O 2 channels, as shown in Figure 3. Based on the ar/R filter, the NtTIPs cluster into four groups (I, II, III, and IV), as shown in Figure 3. The P3-P5 positions in FPs of all NtTIPs are conserved and consist of A, Y, and W residues, respectively, as shown in Figure 3. Based on the disparities in P1 and P2 positions, all TIPs could be divided into two groups. TIP1s and TIP2s are predicted H 2 O 2 channels, and TIP1s and TIP4s are predicted urea channels, as shown in Figure 3. TIP2s and TIP4s are also predicted as NH 3 channels, which is in agreement with experimental evidence in other species [18,42]. Based on the ar/R selectivity filters, all NtNIPs are divided into four different groups, as shown in Figure 3. On the other hand, based on the FPs, NtNIPs cluster into three groups, as shown in Figure 3, such as potato and tomato, but unlike other plants (Arabidopsis, maize, etc.) [3,6,7,11]. Our analysis predicted that the As transporters are only distributed among the NtNIPs (10 NIPs belonging to Group I, based on the ar/R filter and FPs), as shown in Figure 3. NIP2;1, NIP5;1, and NIP3;2 are predicted as Si, B, and H 2 O 2 channels, respectively. The NtSIPs are grouped into two groups based on both the ar/R selectivity filter and FPs, as shown in Figure 3. Very few studies have examined the channel specificity of SIPs. Two SIPs from Arabidopsis showed some water channel activity when expressed in yeast [40]. The NtXIPs are clustered into two groups based on the ar/R selectivity filter. However, based on FPs, all NtXIPs were grouped in a single group, as shown in Figure 3. XIP1;1 and XIP1;2 are predicted as B, urea, and H 2 O 2 channels, as shown in Figure 3. The specificity and function of NtXIP1;1, including its splice variant, were studied in detail and were shown to facilitate the diffusion of B, H 2 O 2 , NH 3 , and urea, but not water [15,43].

Expression of NtAQP Genes in Roots, Leaves, and Flowers as well as BY-2 Suspension Cells
The heatmap based on FPKM values shows the NtAQPs transcript levels in roots, leaves, and flowers, as shown in Figure 5. Among the 88 NtAQPs genes, 73, 75, and 71 are expressed in mature flowers, leaves, and roots, respectively, and 68 genes are ubiquitously expressed in all analyzed organs. PIPs are expressed in flowers, leaves, and roots but differently according to the isoforms. A greater number of NtPIP1 genes are expressed in flowers and leaves than in roots-NtPIP1;1 and NtPIP1;10 being the most expressed isoforms in flowers and leaves, respectively, and NtPIP1;3-8 and NtPIP1;11 not being expressed in roots. A decreased amount of NtPIP2 transcripts is generally observed, but all NtPIP2s are expressed in the three organs with the exception of NtPIP2;9 and NtPIP2;18, which are not expressed or are expressed very little, as shown in Figure 5. NtTIP gene expression levels are often greater in the leaves compared with the other organs, even if a greater number of NtTIP genes are expressed in roots, as shown in Figure 5. Among the 20 NtNIP genes, seven (NtNIP3;2 and all the NtNIP4s) are not or very lowly expressed in the three organs in the tested conditions. The other NtNIP genes are relatively less expressed compared to the other AQP subfamily members, as shown in Figure 5. All NtSIP genes were ubiquitously expressed in flowers, leaves, and roots, NtSIP1;2 being the most expressed NtSIP in the leaf, as shown in Figure 5. Finally, NtXIP1;1 was the most expressed NtXIP in the three organs with the expression of the others being very decreased.
N. tabacum BY-2 suspension cells are widely used to study different physiological processes, the role of specific proteins, or as a heterologous expression system to produce high value pharmaceutical antigens or antibodies [44][45][46][47][48]. We determined which AQP genes are expressed in those cells that grow in suspension in an aqueous environment. RNA from wild-type BY-2 cells was extracted and RNA-seq data analyzed for the expression of the 88 NtAQP genes. The heatmap based on FPKM values is shown in Figure 6. mRNA of 53 NtAQP genes were detected in BY-2 cells growing in a standard MS medium. The most expressed NtAQP genes were 11 PIP1s, TIP1;1, the three SIPs, and XIP1;1.

Discussion
By screening the N. tabacum genome databases, we identified 88 complete AQP genes, almost twice the number of AQP genes identified in tomato and potato [3,7]. The number of AQP homologs always varies between plant species, the dicot plant genomes usually encoding more homologs than the monocot plants, except for the 68 full-length AQP genes found in P. virgatum, a polyploid monocot species [9]. The great number of AQP genes in the N. tabacum genome arose from an allotetraploidization event that occurred about 200,000 years ago [36,38] between N. tomentosiformis and N. sylvestris, which each have 49 AQP genes. The difference between the identified gene number in N. tabacum (88) and the sum of the N. tomentosiformis and N. sylvestris AQP genes (98) suggests that some were lost after the polyploidization event. In addition, we also could not exclude the recent local duplication events in each species, as deduced by the protein phylogenetic tree, shown in Figure 2, in which two very close isoforms from the same species are found on the same branch (i.e., NtPIP2;1 and 2;2, NtoPIP2;2 and 2;3, NsNIP3;1 and 3;2, NtoNIP6;1 and 6;2, etc.). Models have been proposed to explain duplicated gene fate: pseudogenization, sub-functionalization, and neo-functionalization [49]. Redundancy also allows one of the copies to accumulate mutations without affecting plant fitness, and new allelic variants or changes in the gene expression pattern can be observed [50]. While activity determination of the duplicated isoforms would be required to determine a sub-or neo-functionalization, changes in expression patterns can be deduced from the rough NtAQP expression data analysis. For instance, the duplicated NtPIP2;1 and NtPIP2;2 showed different expression levels, which can be organ dependent.
We identified five subfamilies (PIP, TIP, NIP, SIP, and XIP) among the three Nicotiana species, similar to most other dicots, except for Brassicaceae and monocots, which have no XIP subfamily [15]. Several N. tabacum AQPs have been characterized [51][52][53][54][55], and some became paradigms in the plant AQP community [21,22]. NtAQP1, corresponding to NtPIP1;5 in our study, is a PIP1 protein located both in the plasma membrane and the chloroplast envelope, which exhibits water and CO 2 channel permeability [21]. This discovery highlighted the important diverse roles of AQPs in plant physiology and, more particularly, in photosynthesis, through their contribution in facilitating CO 2 membrane diffusion [28]. More recently, the membrane diffusion of another gas, O 2 , was reported to be facilitated by NtPIP1;3 when expressed in yeast, and an increased NtPIP1;3 transcript level was measured in N. tabacum roots after a seven day hypoxia treatment [22], suggesting a potential new physiological role of plant AQPs in O 2 membrane permeability. NtXIP1s are the first plant XIP isoforms that have been functionally characterized [15]. NtXIP1;1 is located in the plasma membrane and is shown in a functional assay in heterologous systems to facilitate the membrane diffusion of H 2 O 2 , glycerol, boron, and urea, but not water [15]. NtXIP1;1 overexpression in N. tabacum results in disturbed boron tissue distribution, leading to boron deficient phenotypes in meristems and young leaves [43]. Interestingly, the NtXIP1;1 gene contains a sequence motif in the first intron that initiates an RNA-processing mechanism that results in two splice variants (α and β), resulting in two amino acid residue differences [15]. We also identified XIP spliced variants for NtXIP1;2, NtoXIP1;1, NsXIP1;1, and NtoXIP2;1 isoforms, and also XIPs from S. tuberosum and S. lycopersicum [15], indicating a conservation of this genomic feature in the Solanaceae family.
To elucidate the substrate specificity of NtAQPs, different signature sequences, including SDPs, NPA motifs, ar/R filter, and FPs were identified, as shown in Figure 3 and Table 2. From this multiple analysis, a majority of PIP1s and PIP2s were predicted to facilitate CO 2 and H 2 O 2 diffusion, respectively, in addition to water, as shown in Figure 3. This was confirmed in functional assays performed for NtPIP1;5 (NtAQP1) and NtPIP2;1 [21,53,54]. Most TIPs have similar NPA and FPs, suggesting that differences in their substrate transport selectivity might be regulated by the ar/R filter residues. Based on this ar/R filter, Group I and Group II TIPs have a wider pore aperture, which might facilitate the diffusion of relatively larger substrates than water, such as urea, ammonia, and H 2 O 2 [56][57][58]. NtTIP4;1 (NtTIPa) was indeed shown to be permeable to water and urea, but also glycerol [51]. NIPs are most diverse in their NPA motifs, ar/R filter, and FPs, suggesting various substrate transport selectivities for these subfamily members and putatively important physiological roles. NIPs are also known to facilitate the transport of metalloids, such as arsenic and boron, as shown in Figure 3. NtXIPs are predicted to transport H 2 O 2 , boric acid, and urea, and were confirmed in transport assays performed with NtXIP1;1 [15,43]. In addition, NtXIP1;1 is not a water channel but is able to facilitate glycerol diffusion [15]. Finally, limited information is available for plant SIP specificity. Water channel activity was determined for AtSIP1s, unlike for AtSIP2;1 [40]. This global substrate specificity study, based on prediction is, however, to be taken with caution, as a single amino acid change, even in the transmembrane domains, could affect the channel characteristic or conformation [59]. Therefore functional assays in heterologous or homologous systems will have to be carried out when analyzing the functional role of specific NtAQP.
As expected, NtAQP transcript levels are dependent on the plant organs, but it is quite surprising to observe that 68 of 88 AQP genes are ubiquitously expressed in roots, young leaves, and flowers. PIP and TIP transcripts are relatively more abundant than other subfamily mRNAs, as shown in Figure 5. Considering that the main role of these isoforms is the water facilitated permeation through plasma and vacuolar membranes, this observation confirms their primordial role in water movement through plant tissues, in cell expansion, and cell water homeostasis [24,60]. The NIP expression level is low, except for NtNIP5;1, but due to their metalloid substrate specificity, a more restricted tissue/cell expression pattern in specific physiological conditions might be expected [43,61,62]. mRNA of 53 NtAQP genes were also detected in BY-2 suspension cells growing in a standard MS medium, even if the relative expression level between them was different to what was observed in plant organs. This could be due to the dedifferentiated nature of those cells and/or the specific cell environment of the culture medium. The most expressed NtAQP genes in BY-2 cells are 11 NtPIP1s, NtTIP1;1, the three NtSIPs, and NtXIP1;1. High NtPIP gene expression was also reported in maize Black Mexican Sweet (BMS) suspension cells [63], but in this case, the two most expressed genes belonged to the PIP2 group. Plant PIP1s physically interact with PIP2s within heterotetramers, leading to PIP1 relocalization from the endoplasmic reticulum to the plasma membrane [59,64]. We might wonder whether PIP2 abundance in BY-2 cells is sufficient to bring all PIP1s to the plasma membrane. The increased PIP expression in suspension cells suggests that they are important in controlling membrane water permeability during suspension cell growth. In fact, PIP expression varies according to BMS cell growth stages, and this is correlated with greater cell water permeability, measured at the end of the log phase and stationary phase [63]. This might be dependent on variations in the medium composition and/or internal osmotic pressure. PIP and TIP gene expression in BY-2 suspension cells might also be involved in the control of cell expansion. Cauliflower BobTIP26-1 overexpression in suspension cells (N. tabacum cv. Wisconsin 38) increases the cell volume [65], cell enlargement being mostly accounted by vacuole swelling. The quite high expression of NtSIPs is also intriguing, knowing that SIPs are mostly expressed in the endoplasmic reticulum and their function is still unknown. ZmSIP1;2 is also expressed in BMS suspension cells, and its expression is not dependent on the growth stage [63]. Suspension cells might be a promising model to investigate the physiological role at the cell level as well as the biochemical properties of this AQP subfamily. Actually, BY-2 suspension cells represent very useful tools to study AQP function, localization regulation, substrate specificity, and structure, as the cells are easily transformed by Agrobacterium tumefaciens or biolistics, and great cell amounts could be obtained for protein purification and reconstitution [66].
In this comprehensive analysis, we identified a highly diverse AQP gene family in N. tabacum as well as in its two ancestors, N. tomentosiformis and N. sylvestris. The signature sequence for substrate selectivity and the possible biological function of NtAQPs were predicted. The transcriptomic data of N. tabacum and BY-2 suspension cells represent an excellent resource to guide further analysis of the function of any selected AQP isoform.

Identification and Sequence Analysis of NtAQPs
The genomes of N. tabacum, N. tomentosiformis, and N. sylvestris available at the Sol Genomics Network (https://solgenomics.net/organism/Nicotiana_tabacum/genome), were searched for AQPs using BLASTp (http//http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?PAGE=Proteins) tools with the protein sequences of 47 AQPs from S. lycopersium (tomato) and 41 AQPs from S. tuberosum (potato) as queries. Every sequence from each species was individually compared with functional annotations by browsing the N. tabacum databases.

Phylogenetic Analysis of N. Tabacum AQPs (NtAQPs)
NtAQPs amino acid sequences were separately aligned with S. lycopersium AQPs (SlAQPs) and S. tuberosum AQPs (StAQPs) using the Clustal Omega program (https://www.ebi.ac.uk/Tools/msa/clustalo/) and a phylogenetic tree was built using Molecular Evolution Genetic Analysis (MEGA), version 7.0 [67]. The phylogenetic analysis was conducted using the Maximum Likelihood method, based on the Jones-Taylor-Thornton (JTT) matrix-based model with 1000 bootstraps. The identified NtAQPs were classified into different subfamilies according to the phylogenetic relationships with SlAQPs and StAQPs.

Identification of Substrate Specificity Determining Positions (SDPs)
The aligned NtAQP sequences were searched manually for SDPs by following the prediction explained previously [9,17] and clustered into different functional groups. The functional group sequences were aligned using Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/).

Expression Profile of NtAQP Genes
Transcript levels as FPKM (Fragments per Kilobase of Transcript per Million Mapped Reads) values of NtAQP genes in different organs (mature flowers, leaves and roots) were obtained from the Gene Expression Omnibus (GEO) repository and GenBank Sequence Read Archive (SRA) under the accession code SRP029183 (SRX338104: N. tabacum TN90 root; SRX338101: N. tabacum TN90 leaf; SRX495520: N. tabacum TN90 mature flower). Three biological replicates were obtained from each organ. The FPKM values of the respective NtAQP genes were extracted from the databases and transformed into logarithmic (log 10 ) values to generate the heatmap. A heatmap showing the logarithmic NtAQPs transcript levels in root, leaf, and flower was generated using Microsoft Excel conditional formatting, based on the normalized FPKM values. In our analysis, a logarithmic FPKM value > 0 was used as a threshold to consider whether a gene is expressed.

RNA-Seq Experiment
N. tabacum cv. BY-2 suspension cells were grown in the dark at 25 • C with agitation on a rotary shaker (90 rpm) in liquid MS medium (4.4 g/L Murashige and Skoog salts (MP BIOMEDICALS, Solon, OH), 30 g/L sucrose, 0.2 g/L KH 2 PO 4 , 2.5 mg/L thiamine, 50 mg/mL myo-inositol, and 0.2 mg/L 2,4-D, pH 5.8 (KOH)). Cultures were grown in 50 mL of medium in a 250 mL Erlenmeyer flask and a 5% inoculum was transferred each week into fresh medium. BY-2 cells (100 mg) were collected three days after inoculation (exponential phase) and the total RNA was extracted from three biological replicates and sent to the Macrogen Company, which performed the library preparation, RNA sequencing, and data analysis. For the library preparation, the mRNA was purified from total RNA and transformed into a template molecule library, appropriate for subsequent cluster generation using the Illumina ® TruSeq™ RNA Sample Preparation Kit. The first step in the workflow encompassed purifying the poly-A-containing mRNA molecules using poly-T oligo-attached magnetic beads. After purification, the mRNA was split into small pieces using divalent cations under high temperature. The cleaved RNA fragments were copied into first strand cDNA using reverse transcriptase and random primers. This was followed by the second strand cDNA synthesis using DNA polymerase I and RNase H. These cDNA fragments then went through an end repair process, the addition of a single "A" base, and then the ligation of adapters. Finally, the products were purified and enriched with PCR to generate the final cDNA library. The library was then submitted for paired-end 2 × 100 bp sequencing in Illumina HiSeq2000. Sequencing data were analyzed through the Trinity pipeline, which permitted de novo transcriptome reconstruction. The transcript abundances were calculated using RSEM (1.2.15) software [68]. Blast-X (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastx&PAGE_TYPE= BlastSearch&LINK_LOC=blasthome) was used to compare the six-frame translation products of a nucleotide query sequence against a protein sequence database (go_v20150407). Finally, the FPKM values for the respective AQP genes were identified from the annotated BY-2 cell transcriptomic data. A heatmap was generated based on the transformed logarithmic (log 10 ) FPKM values. Similar to organ specific expression data, FPKM values > 0 were used as a threshold to consider whether a gene is expressed.

Conflicts of Interest:
The authors declare no conflict of interest.