Genome-Wide Identification, Structure Characterization, and Expression Pattern Profiling of the Aquaporin Gene Family in Betula pendula

Aquaporin water channels (AQPs) constitute a large family of transmembrane proteins present throughout all kingdoms of life. They play key roles in the flux of water and many solutes across the membranes. The AQP diversity, protein features, and biological functions of silver birch are still unknown. A genome analysis of Betula pendula identified 33 putative genes encoding full-length AQP sequences (BpeAQPs). They are grouped into five subfamilies, representing ten plasma membrane intrinsic proteins (PIPs), eight tonoplast intrinsic proteins (TIPs), eight NOD26-like intrinsic proteins (NIPs), four X intrinsic proteins (XIPs), and three small basic intrinsic proteins (SIPs). The BpeAQP gene structure is conserved within each subfamily, with exon numbers ranging from one to five. The predictions of the aromatic/arginine selectivity filter (ar/R), Froger’s positions, specificity-determining positions, and 2D and 3D biochemical properties indicate noticeable transport specificities to various non-aqueous substrates between members and/or subfamilies. Nevertheless, overall, the BpePIPs display mostly hydrophilic ar/R selective filter and lining-pore residues, whereas the BpeTIP, BpeNIP, BpeSIP, and BpeXIP subfamilies mostly contain hydrophobic permeation signatures. Transcriptional expression analyses indicate that 23 BpeAQP genes are transcribed, including five organ-related expressions. Surprisingly, no significant transcriptional expression is monitored in leaves in response to cold stress (6 °C), although interesting trends can be distinguished and will be discussed, notably in relation to the plasticity of this pioneer species, B. pendula. The current study presents the first detailed genome-wide analysis of the AQP gene family in a Betulaceae species, and our results lay a foundation for a better understanding of the specific functions of the BpeAQP genes in the responses of the silver birch trees to cold stress.


Introduction
Aquaporins (AQPs) are small (21 to 34 kDa) channel-forming transmembrane proteins that belong to the major intrinsic protein (MIP) superfamily. They are found in all tissue types, and are localized in the plasma membrane, endomembrane system (endoplasmic reticulum, the Golgi apparatus, lysosomes, vesicles, endosomes, vacuoles), and in the membranes of chloroplasts and mitochondria. They are involved in the bidirectional transfer of water, many small solutes and gases across cell membranes in response to osmotic and hydrostatic pressures, or concentration gradients.
First discovered in animals, AQPs were subsequently found in almost all living organisms [1]. Compared to other living kingdoms, plants harbor a remarkable abundance and divergence of AQP sequences (e.g., 35 AQPs in Arabidopsis thaliana [2], 28 in Beta vulgaris [3], 40 in Sorghum bicolor [4], 52 in Olea europaea [5], 54 in Populus trichocarpa [6], 71 in Gossypium hirsutum [7], and 120 in Canola [8]). This distinctiveness is most probably due to the result of gene duplication and the higher ploidy levels in plants, the higher degree of subcellular compartmentalization of the plant cells, and the need for better water control capacity related to the sessile nature of plants [9]. In this respect, and based on their primary sequence, plant aquaporin homologs are divided into up to five major subfamilies: the plasma membrane intrinsic proteins (PIPs), primarily localized in the plasma membrane; the tonoplast intrinsic proteins (TIPs), predominantly targeted to the vacuolar membrane; the nodulin 26-like intrinsic proteins (NIPs), localized in the plasma membrane and the endoplasmic reticulum; the small basic intrinsic proteins, (SIPs) localized in various endomembrane systems; and the uncharacterized intrinsic proteins (XIPs), mostly related to the plasma membrane [10,11]. Three additional subfamilies also exist: GlpF-like intrinsic proteins (GIPs) and hybrid intrinsic proteins (HIPs) have been described in basal Plantea lineage species (Lycopodiopsida and Bryopsida), and the large intrinsic proteins (LIPs) in the algal phylum Heterokontophyta. These subfamilies are thought to have been lost in vascular plants during the course of evolution, plausibly due to function redundancies [12,13]. Such purging events are not exclusive to these ancestral subfamilies, as evidenced by genome-wide identification and comparison studies on AQPs which have confirmed the loss of the XIPs in monocots, certain plant lineages (Brassicaceae) [14,15], and the NIP2s (categorized as NIPIII) in a large group of plant species [5,16].
Despite this high sequence diversity, high resolution coupled to three-dimensional prediction studies of AQPs from different organisms have revealed that the AQP structure displays highly conserved tetrameric characteristics where each protomer resembles an hourglass-like model [17][18][19]. An AQP monomer is typically formed with six transmembrane (TM) alpha helices (helix 1 to helix 6) connected by five loops (A-E) with both the Nand Cterminals having a cytoplasmic orientation. This then embeds in the lipid bilayer to make route for water and solute transport inside a so-called pore, described within each monomer.
There are two highly conserved Asn-Pro-Ala (NPA) motifs in two half helices (HB and HE) of loopB and loopE in the middle of the pore. Their steric configuration and distance from each other define the size selectivity barrier for various permeants [20,21]. An additional constriction zone consists of four amino acids known as the "aromatic/Arginine" (ar/R: F58-H182-C191-R197 in AQP1) selectivity filter which defines the substrate specificity and permeability [22]. These residues are located in the TM helix 2 (H2), TM helix 5 (H5), and loop E (LE1 and LE2), respectively.
In addition to these two significant selectivity barriers placed within the pore, several other conserved features are known to play a role in AQP solute specificity, including Froger residues (aka Froger's positions) (FPs) composed of five conserved amino acids (P1-P5: T116-S196-A200-F212-W213 in AQP1) known to discriminate glycerol-transporting aquaglyceroporins (GLPs) from water-conducting AQPs [23]. Similarly, AQPs exhibit nine specificity-determining positions (SDPs) that were predicted to facilitate the diffusion of some non-aqueous solutes [24]. These motifs, and/or the position of a particular amino acid, are viewed as key elements for the selection and the transport of permeants across the cell membranes.
AQPs play a key role in maintaining the homeostasis of water and solutes by facilitating the transport of water and a variety of inorganic and organic solutes such as inter alia boric and silicic acids, ammonia, glycerol, hydrogen peroxide, urea, polyols, glycine, and lactate [25]. The fact that the diffusion of small molecules is of prime importance for the maintenance of cell life implies that the AQPs play major roles in numerous plant physiological processes like reproduction, anther dehiscence, seed germination, fruit ripening, photosynthesis, stomatal regulation, petal-and leaf-water movements, xylem embolism repair, maintenance of cell turgor, and cell elongation [10,26,27]. AQPs are also involved in plant responses to biotic and abiotic environment stress [28]. As a result, huge efforts are being made to assess their role in improving tolerance to abiotic stresses such as drought, salinity, heat, cold, and heavy metal toxicity, as well as in alleviating biotic stresses including the induction of disease immunity pathways with the diffusion of the pathogen-induced apoplastic hydrogen peroxide (H 2 O 2 ) to the cytoplasm [29][30][31].
Overall, mining the key AQP genes controlling growth tolerance to various adverse environmental conditions becomes increasingly significant for modern forestry, particularly through high-throughput technologies. To our knowledge, no systemic characterization of the AQP genes has been conducted in Betula genus, and the ever-growing availability of plant genomes and transcriptomes (including some Betula sp. [32,33]) offers unprecedented access to specific gene families, which can be exhaustively evaluated.
The regulation of the annual cycle of growth and dormancy is of great significance for the growth and survival of boreal and temperate tree species under northern climatic conditions. Air temperature (together with photoperiod) is a main environmental driving-force for this regulation [34], and the cold stress sensing and tolerance deployed by northern tree species, including silver birch, in a context of profound climate change are under investigation [35][36][37]. Cold stress reduces the root (Lp r ) and leaf (K leaf ) hydraulic conductivities [38,39], correlated with a global dehydration [40]. However, cell responses to cold stress and adaptation to the freezing condition are distinct processes overlapped only by some multifaceted interactions of many gene products. It remains that the cold stress response and the water conductivities (i.e., Lp r and K leaf ) have been connected to the plant's water uptake mechanism and cell membrane water permeability, i.e., two physiological processes that are tightly regulated by aquaporins.
No study has been conducted on genome-wide identification, sequence characterization (i.e., diversity and structure), or the functional prediction of aquaporins in Betulaceae, although some birch genomes have been sequenced and released [32,38,41]. In the present study, the genome of Betula pendula was analyzed to identify the aquaporin encoding genes (named BpeAQPs). The genomical, structural, and biochemical features were inferred based on present knowledge for the entire set of BpeAQPs published here from raw and unannotated sequences from NCBI databases, therefore classifying each sequence in the general plant AQP system. Furthermore, the expression profiling of BpeAQPs were analyzed and compared using transcriptome sequencing data available for B. pendula from different organs (flower, leaf, xylem, and root), specifically in leaves under cold stress. The knowledge obtained from this study is expected to provide the first comprehensive classification of AQPs in the B. pendula species in particular, and in the Betulaceae clade in general. In addition, it provides key fundamentals for exploring the functions and mechanisms of B. pendula AQP proteins, and their potential roles in physiology and stress alleviation in silver birch.

General Considerations
The increasing number of sequenced plant genomes opens new horizons in the study of functional genomics and the precision of annotating new subfamily genes, manipulating them to potentially enhance plant performance and resilience to ever-changing environmental conditions. In this regard, AQP diversity has been studied in different ligneous species, including Camellia sinensis [42], Citrus sinensis [43], Coffea canephora [44], Eucalyptus grandis and E. globulus [45,46], Gossypium hirsutum [7,47], Hevea brasiliensis [48], Jatropha curcas [49], Malus domestica [50], Olea europaea [5], Populus trichocarpa [6,13], and Vitis vinifera [51]. Irrespective of whether AQP studies are carried out on perennial or annual species, they all demonstrate how AQPs support significant roles in the cellular homeostasis of different organs and tissues and in the radial and systemic transport of permeants, by increasing the permeability of various membranes to water and essential nutrients in response to any fluctuating demands (nutrition, transpiration, etc.) [10].
The AQP gene family exhibits a particularly high diversity and abundance in plants, most of which require further study for a better understanding. Current study on Betula AQPs has its own worth because, to the best of our knowledge, no study has been conducted on genome-wide identification, sequence characterization (i.e., diversity and structure), and the functional prediction of aquaporins in Betulaceae. Accordingly, a systematic genomewide screening of the content of AQP-encoding genes was performed in the B. pendula genome. Sequence homology analysis and protein domain validation based on the analysis of conserved domains and protein topology resulted in the identification of 33 putatively functional and non-redundant AQP genes, named BpeAQPs (Table 1). BpeAQP full-length cDNAs and related amino acids and complete nomenclature are detailed in Figure S1 and Table S1. The 33 BpeAPQs include members of each of the five major AQP subfamilies common to most other Viridiplantae (i.e., NIPs, PIPs, TIPs, SIPs, and XIPs) ( Figure 1 and Figure S2). Ten proteins were grouped in the PIP and were differentiated according to the pattern of conserved motifs into two groups: PIP1 and PIP2; eight TIPs into five groups: TIP1, TIP2, TIP3, TIP4 and TIP5; eight NIPs into six groups: NIP1, NIP2, NIP4, NIP5, NIP6, and NIP7; four XIPs into two groups: XIP1 and XIP2; and three SIPs into two groups: SIP1 and SIP2 (Table 1 and Table S1). Similar proportion and diversity of members were established in woody and annual species. It is worth mentioning that Brassicaceae and Monocots have no XIP subfamily [15]. Sequences belonging to hybrid intrinsic proteins (HIPs) and GlpF-like intrinsic proteins (GIPs) reported in the moss Physcomitrella patens were not found [12]. These present results are in agreement with those of earlier studies that showed that the number and diversity of AQPs in each subfamily are highly specific in plants.
When compared with other woody plants, the number of AQPs identified in B. pendula is similar to that in J. curcas (32 members [49]), V. vinifera (32 members [51]), or C. sinensis (34 members [43]), but significantly lower than that present in M. domestica (42 members [50]), P. trichocarpa (55 members [6]), H. brasiliensis (51 members, [48]), G. hirsutum (71 members [7]), or O. europaea (79 members [5]). The lower number of AQP genes encoded in the B. pendula genome could be explained in part by whole-genome duplication events different from those observed in the species where they are relatively higher [52][53][54]. However, the proximal localization of several homologs (<100 kb; [55]), their phylogenetic relatedness (>70% of similarity, [56]), and similar gene structures strongly allude to the occurrence of gene tandem duplication events in the evolutionary history of some regions of the B. pendula genome. It concerns two PIP2s (BpePIP2;2 and BpePIP2;3) on chromosome 9 and the XIP subgroup on chromosome 3 and the contig 2937. The AQP gene families might have expanded through duplication [57,58], but, more generally, changes in the number of genes among the different species may be due to the size of their genome and/or to evolutionary processes for adaptation in the natural environment. Accordingly, gene duplication represents a main driving force in increasing genetic diversity, gene family size, and creating novel genes in Eukaryotes in general, and, in particular, in plants [59][60][61]. Our results for BpeAQPs suggest that some PIP and XIP members might have expanded through duplication. Further investigations are therefore needed to clarify the evolutionary events that took place within the aquaporin family in the Betula genus, and whether the duplication of some members could be responsible for functional divergences within and between subfamilies.  Bold italic letters denote unusual amino acids in the NPA motifs. f ar/R SF, ar/R selectivity filters (H2-H5-LE1-LE2). g Froger's residues (P1-P2-P3-P4-P5). h Potential substrate transported prediction using the signature sequences developed by [21]. The five aquaporin subfamilies are highlighted in the shaded rows. Figure 1. Phylogenetic analysis of the BpeAQP full-length and the truncated BpeXIP2;1-like protein sequences from Betula pendula. Deduced amino acid sequences were aligned using ClustalW, and the phylogenetic tree was constructed using the maximum parsimony method. Maximum parsimony analysis was conducted using the subtree-pruning-regrafting algorithm. The number next to the branch's nodes represents bootstrap values ≥50% based on 5 000 resamples. The distance scale denotes the number of amino acid substitutions per site. The name of each subfamily is indicated next to the corresponding group. The BpeAQP accession numbers and sequences are listed in Figure  S1 and Table S1. The complete phylogenetic analysis of aquaporin family proteins of Betula pendula (BpeAQPs, filled circles) with the AQP sequences from Arabidopsis thaliana (AtAQPs), Lycopersicon esculentum (SlAQPs), Olea europaea (OeuAQPs) and Populus trichocarpa (PoptrAQPs) is proposed in Figure S2.

Genome Structure of BpeAQP Genes Models
The 33 BpeAQPs in our study could be assigned to twelve chromosome-forming groups (Table S1). The chromosomal distribution varies greatly, from one member in four chromosomes (6,7,11,19) to a high of 7 in chromosome 5. Chromosomes 5 and 8 contain three to five subfamilies of aquaporin genes, whereas other chromosomes carry one or two subfamilies of aquaporin genes. The chromosomal location of BpePIP1;1, BpeTIP5;1, BpeXIP2;1-like and BpeXIP1;3, BpeNIP1;2, and BpeSIP1;2 could not be determined because of an incomplete physical map for B. pendula. This genomic distribution is consistent with previous studies showing that AQPs are unevenly distributed over a large number of chromosomes [5,44,50].
A comprehensive analysis of the structural properties of the genes, including exon/intron number and position, led to interesting conclusions regarding the possible origin, evolutionary relationship, and gene function among the different AQP isomers. As is commonly observed in multigenic families, such as plant AQPs, the exon-intron structure analysis detects the presence of a varying number and length of introns among the BpeAQPs, contributing to significant variations in gene length (Figure 2b). The number of introns per BpeAQP also varies from 1 to 4: the BpePIPs are characterized by three introns, the BpeTIPs by two introns, the BpeXIPs by two introns for the XIP2 clade and one intron for the XIP1 clade, the BpeNIPs by a conserved four-intron structure (except for BpeNIP5;2, which features three introns), and the majority of the BpeSIPs display one to three introns. The fact that most members of the same subfamily share a similar exon/intron structure reinforces the observed phylogenetic distributions. Finally, BpePIP1;2, BpePIP1;4, and several members from the NIP and SIP subfamilies display the longest introns (>3 kbp). were aligned using ClustalW, and the phylogenetic tree was constructed using the maximum parsimony method. Maximum parsimony analysis was conducted using the subtree-pruning-regrafting algorithm. The number next to the branch's nodes represents bootstrap values ≥ 50% based on 5000 resamples. BpeAQPs clustered into five AQP subfamilies: BpePIPs, BpeXIPs, BpeTIPs, BpeNIPS and BpeSIPs. (B) Exons and introns of the BpeAQP genes are represented by red boxes and black lines, respectively. Gene structures were compared using GSDS software. Gene orientations are indicated (5 -3 ) in the x-axis. (C) Distribution of the conserved motifs among the BpeAQP proteins. Motif analysis was performed by using the MEME web server. Ten conserved motifs were identified, and the different motifs are identified using different colored boxes, as indicated at the bottom of the Figure. Each color block in the different proteins indicates a specific motif, for which the amino acids are detailed in Figure S3. Protein orientations are indicated (Nter-Cter) on the x-axis.
Previous studies have established that the gain or loss of exons tied to natural selection are common features of the evolutionary process in plants genomes [62]. Perhaps most importantly, introns would interfere with the regulation of gene expression in different biological contexts, a phenomenon that is still under-estimated. Although most of them are spliced out during transcriptional or post-transcriptional maturations, introns can serve as important biological functions by elevating gene expression without functioning as a binding site for transcription factors; the phenomenon is called 'intron-mediated enhancement' [63,64]. The small number of differences in the gene structure displayed by some BpeAQPs belonging to a same subfamily or between subfamilies suggest that these genes have evolved to perform different functions. Finally, the intron-exon organization of the B. pendula AQP subfamilies is similar to those in many other plants (cf references listed below), showing that AQPs are highly conserved in plants in an evolutionary way, whatever their taxonomic group.

Expertizing Sequence Discrepancies for Few AQP Candidates
Among the BpeAQPs, two PIP1 encoding partial aquaporin-like sequences, which are truncated and lack both of the two NPA motifs, were classified as pseudogenes and excluded from further sequence analysis (Table S1 and Figure S1).
Similarly, an XIP2 encoding partial aquaporin-like sequence (Bpev01.c1577.g0027) could also be classified as a pseudogene due to a truncated N-ter region of 51 amino acids ( Figure S1). The genomes of Betula plathyphylla and B. nana display these two XIP2 versions in full-length and truncated versions (BPChr09G20962 and BPChr09G20935 in B. plathyphylla, respectively, and CAOK01002722 and CAOK01020514.1 in B. nana, respectively). However, an alignment of the two BpeXIP2s with their XIP2 orthologs reveals that the BpeXIP2;1 pseudogene sequence is 100% identical to the equivalent region in the full-length sequences of its related orthologs, whereas the BpeXIP2;1 full-length version (Bpev01.c2937.g0004) has several divergent residues that are beyond the truncated region in question, but which are 100% identical to the equivalent region in the full-length sequences of its related orthologs ( Figure S4). Rebuilding contigs with the equivalent truncated portion from Bpev01.c2937.g0004 with the pseudogene sequence (Bpev01.c1577.g0027) which occurs at the first intron generates a redefined theoretical protein which is 100% identical to its B. platyphylla and B. nana orthologs.
To reinforce our views on these sequence discrepancies, and the solution to solve them, it appears that transcriptional analyses by RNA-sequencing shows significant expression only for the pseudogene sequence (Bpev01.c1577.g0027), whereas the full original sequence (Bpev01.c2937.g0004) gives null or quasi-null expressions in all of the organs analyzed (cf part 2.4). Lastly, the alignments of the B. pendula whole transcriptome mRNA-sequencing reads cover and validate the existence of this new putative sequence. From our experience, the reason for the presence of this type of sequence in the database is usually due to an unvalidated identification method after genome sequencing and a de novo assembly of reads, which, in the case of B. pendula, was performed without a reference genome and experimental evidence. Therefore, this new theoretical full-length protein was included in all our analyses.

Protein Transporter Structure Analysis, and Conserved Substrate-Specific Residues of Betula AQP
The function of a protein is based on its 3D structure, determined by the primary protein sequence and amino acid interaction in the folded protein. For AQPs, the pore structure is key to understanding AQP function and examining the consequences of primary sequence variations to the 3D pore organization itself. A global 3D analysis is mandatory before any detailed scrutinization of the primary sequences.

Protein and Pore 2D and 3D Basic Structure Organizations of BpeAQPs
The solute specificity of the AQPs is determined by a series of amino acids whose orientation of the side chain lines the pores, creating various spatial constrictions. Although predictive, the conserved positions of selectivity motifs provide the key to understanding the basic organization of the pore, and these analyses can be strengthened by 2D and 3D analyses of the pore structure. In this regard, the homology-based tertiary protein structure and the biochemical properties of pore-lining residues (i.e., channel radius, length, polarity, charge, hydrophobicity, hydropathy, solubility, and ionizable residues) across the pore were predicted on individual BpeAQPs protomers in the membranes by using Phyre2 and MOLE2.5 software tools (Table 2 and Figure S5). The results confirm that each Betula AQP displays the six expected TM domains, which adopt an hourglass-like configuration and a transmembrane pore. Pore morphology analyses show significant diversity for the pore-lining residues which, de facto, affects regional hydrophobicity and charge across the tunnel. This diversity implies the occurrence of specificities between different BpeAQPs to facilitate the permeation of different hydrophilic or hydrophobic solutes with different volume and chemical profiles.
Finally, every BpeAQP member displays a hydrophilic zone along the channel, generally located at the most constricted areas that correlate with the NPA and/or ar/R motifs, depending on the member. Furthermore, the average pore diameters for the different subfamilies at the ar/R constriction range from 1.8 ± 0.22 Ångströms for the BpePIPs (i.e., 1.52 ± 0.11 Å and 2.2 ± 0.2 Å for the BpePIP2 and BpePIP1 groups, respectively), 1.92 ± 0.28 Å for the BpeSIPs, 3.68 ± 0.55 Å for the BpeNIPs, and 3.96 ± 0.31 Å for the BpeXIP. The BpeTIP subfamily exhibits an intermediate average pore diameter of 2.92 ± 0.44 Å.
The pore diameter averages of all the BpePIPs and of four BpeTIPs are smaller than the ≈2.8 Å diameter of a water molecule. However, PIP and TIP protomers are able to form homo-and hetero-tetramer structures. These associations lead to intrinsic changes in each protomer structure, resulting in greater stability and protein folding which, in fine, enhances the performance of water transport across biological membranes [65]. These structural events concern the PIP1 protomers in particular which, originally, have no water permeability [66,67].
These molecular adjustments, which are functionally crucial for the global activity of the quaternary structure in water permeation of the BpeAQP protomer diameters embedded in a tetramer configuration, need to be evaluated further, particularly for the BpePIPs and BpeTIPs, by adopting appropriate computational models [68] involving functional and structural validations. Finally, the pore diameters of 5.2 Å for the BpeNIP2 or the >3 Å for some of the BpeTIPs, BpeNIPs, and BpeXIPs correlate with the diameters of the various predicted permeants described in this study (cf chapter 2.3.3), i.e., silicic acid and/or bulky hydrophobic solutes (e.g., glycerol, lactate, boric acid, and ammonia), respectively.
Overall, it would then be very interesting to specifically mutate some key pore-lining residues in order to validate the importance of several regulation points in the permeability control of BpeAQPs.
Nevertheless, the purposes of asymetric electric charges remain an open research question. It is noteworthy that many AQPs are considered as "electrical dipoles". The reasons for this electrical polarity are not fully understood yet, and still undergo investigation. It could be related to the electrical polarity of the biomembranes, therefore stabilizing the embedded protein. In addition, this contrast of charges could be key in the regulation of the intrinsic channel activities of the AQPs, knowing that these charges are quantitative and that the pH, phosphorylation, and Ca 2+ could modulate them.

Sequence Structure and Protein Function Relationship of BpeAQPs
Plant AQPs present different functional characteristics driven by intrinsic biochemical properties [9,10]. The biochemical features of the 33 BpeAQPs identified in our study were predicted, i.e., the protein size, molecular weight (MW), isoelectric point (pI), and the grand average of hydropathy (GRAVY) ( Table 1). BpeAQPs range from 231 to 321 (average 272) amino acids in length, molecular weights range from 24.49 to 32.84 (average 28.64) kD, and pI values range from 5 to 10.05 (average 7.59). The AQP subfamilies from different plant species classically share these features. Interestingly, the vacuolar proteins classically show a relatively lower pI (6.69) when compared to all other cytosolic proteins (pI 7.40) [69]. This lower average pI value of the BpeTIPs (average 5.72), compared with the four other subfamilies (pI > 6.5, except for BpePIP2;2-3, BpeXIP1s and BpeNIP5;1 which have pI < 6.5), is linked to the absence of the basic residues in the C-terminal domains of the proteins. This relationship was previously described in Arabidopsis [2], eucalyptus [45] and olive [5]. The pI could reflect a functional constraint imposed on the MIPs, including the specific key regulation sites across the C-terminal regions involved in phosphorylation, methylation, sorting signals, and the interaction with regulating partners, which differ between MIP families. These results further demonstrate that aquaporins are involved in a versatile and dynamic regulation of solute movement [70]. Concerning the GRAVY parameters which reflect protein hydrophobicity and hydrophilicity, the scores are all positive (ranging from 0.314 to 0.913), indicating their hydrophobic nature, which, for an aquaporin, is a key property that facilitates high water and solutes permeability across membranes [17].
The putative conserved motifs among the Betula aquaporin members were detected by the MEME program. The results showed that the majority of the BpeAQPs within the same group shared similar motifs, and that three of the motifs (motif 4 which corresponds to TM4, motif 8 to NPA2, and motif 3 to TM6) are highly conserved between all BpeAQPs (Figure 2c and Figure S3). The other motifs are specific to one or several subfamilies, meaning that they may play a crucial role in particular functions. Moreover, the BpePIP subfamily shows the most motifs, whereas the BpeSIP subfamily displays the fewest. These singularities are consistent with those of other plant species [5,44,52].

Conserved Substrate-Specific Residue, and Solute Permeability of BpeAQPs per Subfamily
Plant AQPs facilitate the transport of water and some small neutral molecules, such as glycerol, urea, boric acid, silicic acid, NH 3 , CO 2 , and H 2 O 2 . Multiple sequence alignment, coupled with atomic resolution and molecular dynamic simulations, reported residue compositions at key positions in the protein known to regulate AQP function, including dual NPA motifs, ar/R filter, FPs, and predictive SDPs. All these specific pore-lining residues contribute to determining which substrates would permeate through the AQP pore [71].
The conserved NPA motifs, which face each other on opposite sides, create an electrostatic repulsion of protons and act as a size barrier, contributing specifically to the selectivity of water molecules [72]. The ar/R filter, FPs, and predictive SDPs (SDP1-9) are essential for the selective transport of water and non-aqua substrate molecules [10,23,24].
Multiple sequence alignments reveal that NPA motifs are relatively well-conserved between subfamilies, whereas the ar/R filters and FPs are divergent and are characterized by high subfamily-and/or subgroup-specific residues (Table 1 and Figure S6).

PIP Subfamily
Multiple sequence alignments reveal that the dual NPA motifs are highly conserved in the BpePIPs and BpeTIPs, showing the typical NPA residues of aquaporins ( Figure S6a,b). The BpePIP and BpeTIP functions in water transport are very specialized, and their structures are strongly preserved in Viridiplantae [73], suggesting that they are subjected to strong selection pressure in most plant taxonomic lineages [74].
Some divergence occurs between the BpePIP members in the ar/R filter and FP signatures. The ar/R filters in all the BpePIPs are well-conserved and are composed of F, H, T, and R residues in TM2, TM5, LE1, and LE2, respectively. They are identical to the ar/R filters found in all plant PIPs; this hydrophilic motif is a signature of water and solute transporting aquaporins compared with the other subfamilies, which include more hydrophobic residues [21,22]. Additionally, FP P2-P5 in the BpePIPs exhibited identical amino acids (i.e., S, A, F, W); only the P1 position showed mixed residues (Q, G and M). This particularity has been reported in other plants such as olive [5], upland cotton [7], sweet orange [43], grapevine [51], rubber tree [48], jatropha [49] and cassava [52].
The sequence homology analysis of the BpePIPs with various orthologues shows that, depending on the PIP member, the BpePIP1s could be able to transport boric acid, H 2 O 2 , urea, and CO 2 , and the BpePIP2s could be able to transport H 2 O 2 and urea ( Figure S7) [24]. Some gases (e.g., CO 2 and O 2 ) diffuse through the membranes passively. However, several experimental data show that PIPs increase membrane permeability to CO 2 in mesophyll and stomatal guard cells. With the noticeable exception of holoparasitic plants, CO 2 is considered as a key substrate for plants (i.e., photosynthesis), and the facilitation of its membrane diffusion at the mesophyll seems to be exclusive of the plant PIPs [75,76]. More recently, the membrane diffusion of another gas, O 2 , was reported to be facilitated by NtPIP1;3, and an increased NtPIP1;3 transcript level was measured in Nicotiana tabacum roots after a seven-day hypoxia treatment [77], expanding the range of gas permeation possibilities provided by PIPs.
The conserved P2-P5 residues (A-F-W) were reported to be a signature for the CO 2 transporter, which is shared by all BpePIPs. In this regard, SDP analysis predicts CO 2 permeabilities for three BpePIP1s (BpePIP1;2, -3, -4), but not for BpePIP2s. Several PIP2s can mediate CO 2 diffusion [78]. However, this feature would be made effective with PIP1-PIP2 tetramer configurations [74], and with the presence of an isoleucine (I) located at the end of the loop E [79]. This residue is present in five of the BpePIP2s (not for BpePIP2;4 where I is substituted by Leucine (L)) ( Figure S6a). However, SDP analysis does not predict CO 2 diffusion for the BpePIP2s due to the replacement of I by a methionine (M) (SDP2) for all BpePIP2s, and some residue divergences in the SDP6 (D vs G-A-T) for three BpePIP2s (BpePIP2;1 -4 -6) ( Figure S7). Most especially, it also reveals that three of the BpePIP2s (i.e., BpePIP2;2 -3 -5) show this single divergence on SDP1 (I vs M). To our knowledge, no experimental data has targeted this particular SDP2 residue. This observation might be related to the fact that mutations in the conserved amino acids in PIPs alter the capacity of the protein in the diffusion of CO 2 [79]. All this together opens the possibility that CO 2 can diffuse through some PIP2s in concert with certain PIP1s in B. pendula.
Our results show that, except for BpePIP2;2 and -3, all BpePIPs are predicted to facilitate the diffusion of H 2 O 2 . At low nanomolar levels, H 2 O 2 acts as a central hub integrating the signaling network in response to biotic and abiotic stress and during reproduction and development processes [80]. Although it can diffuse passively through organelle and plasma membranes, AQPs were identified as the main candidate for the transport of H 2 O 2 ; in this respect, some AQPs are now called peroxyporins [81]. Interestingly, H 2 O 2 regulates the subcellular redistribution of peroxyporins [82] and their permeability by phosphorylation [83]. However, aside from the SDPs, the pore lining-residues involved in the selectivity and diffusion of H 2 O 2 are not clearly characterized. Recently, growth assays of yeast cells expressing mutated HvPIP2;5 have shown that Ser126 has a large impact on H 2 O 2 transport with a minor influence on HvPIP2;5-mediated water transport [84]. This serine is shared by all BpeAQPs (Figure S6a).
All the conserved domains observed in the BpePIPs support the idea that the PIP family is involved in various physiological reactions involving small molecules, which remain to be elucidated experimentally.

TIP Subfamily
Based on the ar/R filter homology, the BpeTIPs are classified into four groups [85]. Group I is composed of BpeTIP1;1, BpeTIP1;2 and BpeTIP1;3 (with ar/R residues: H, I, A, V), group IIa is composed of BpeTIP2;1 and BpeTIP2;2 (H, I, G, R), group IIb is formed with BpeTIP3;1 and BpeTIP4;1 (H, I, A, R), and group III includes only one member, BpeTIP5;1 (N,V,G,C) ( Table 1 and Figure S6b). This last group is relatively rare in most plant species [86]; and our results confirmed the variability of the TIPs observed in plants. The diversity observed in the ar/R selectivity filter in the different BpeTIP subgroups is similar to the TIPs from other plant species [5][6][7]13,[42][43][44][45][46][47][48][49][50][51]. The FPs are well conserved between members, mainly composed of (T, S, A, Y, W), except for BpeTIP5;1 with (V, A, A, Y, W) residues. The TIPs, conjointly with the PIPs, are considered to be the main channels controlling water balance in plants. However, when compared to the BpePIPs, the BpeTIPs exhibit a greater divergence in the residues of the ar/R selectivity filter and FPs. Consequently, based on the ar/R filter, the BpeTIP-I and -II groups have a wider pore aperture, which might facilitate the permeation of relatively larger substrates than water, such as H 2 O 2 , urea, and ammonia [7,87].
Depending on the BpeTIP member, the conserved residues ensure the transport of various solutes across membranes (Table 1) [24]. For example, TIPs are able to transport nitrogenous compounds [88], such as urea or ammonia [87,89]. Ammonia transport is a singularity of the TIPs in plants. It is dependent on H2 and H5 positions (H and I residues, respectively) with a non-polar LE1 (A/G) [90] and on the R vs V substitution at the LE2 position [91]. Interestingly, a histidine (H), localized in the loop C and only observed in BpeTIP2s and BpeTIP4;1, would be key for the NH 4 + deprotonation [92]. One member per TIP group is predicted to transport H 2 O 2 ( Table 1) [24]. The transport of these solutes has been proven experimentally in heterologous systems [87,89,93,94] and mutagenic studies [95]. Our results suggest that Betula TIPs from groups I and II are able to transport these solutes across membranes (Table 1). Even though the role of H 2 O 2 in the vacuole is unclear, it is thought to be mostly associated with ROS detoxification and the activities of heme-containing and flavin-dependent oxidoreductases, Cu/Zn SOD, or type III peroxidases (PRX) that are able to generate/utilize H 2 O 2 to oxidize a wide range of metabolites.
To close, BpeTIP5;1 is the only aquaporin in group III. Its ortholog, AtTIP5;1, shares similar FPs, and can transport urea [87]. It is therefore plausible that BpeTIPs provides the same transport.

XIP Subfamily
The BpeXIP family is composed of four members (and two pseudogenes), and has a lower overall sequence identity compared with other AQP subfamilies. The three BpeXIP1s belong to the XIP-A group, and the XIP2;1 belongs to the XIP-B clade [14]. Compared with other subfamilies, little is known about this recently discovered family, probably due to its absence in the most extensively studied model plants, including several dicot families, such as Brassicaceae (i.e., A. thaliana), or the entire monocot clade with T. aestivum, O. sativa and Z. mays, for example.
Every BpeXIP harbors conserved (N/S)P(I/L)-NPA motifs which are usually found for this XIP subfamily in other plant species ( Figure S6c). Several variations in ar/R selective filter for XIP family members have been reported in different studies [6,12]. BpeXIPs display H2, H5, and LE1 amino acids with polar uncharged (Threonine) or hydrophobic side chains (Valine, Isoleucine, Alanine), increasing the hydrophobicity of the constriction. The hydrophobic nature of XIPs facilitates transport of bulky and hydrophobic molecules, such as glycerol, urea, and boric acid in plants [96].
However, it is still unknown what kind of functional effect this residue change might have on the protein, and no transport function has been described for the XIP-A subfamily, although they are predicted to transport urea or glycerol [6,24].
Regarding the XIP-B subfamily, BpeXIP2;1 is predicted to transport H 2 O 2 and urea [21], which is supported by experimental studies proving that NtXIP1;1 [92] or HbXIP2;1 [11] facilitate the diffusion of these solutes with glycerol. Besides that, NtXIP1;1 overexpression in N. tabacum results in disturbed boron tissue distribution, leading to boron deficient phenotypes in meristems and young leaves [97].
Froger's positions are globally conserved; however, no clear role has been attributed to them in the transport selectivity of the XIP protein.
A singularity concerns PoptrXIP2;1, which is the only XIP known to be capable of transporting small amounts of water (for information, NtXIP1;1 [97] and HbXIP2;1 [11] are non-permeable to water). The truncated version of the BpeXIP2;1 shares similar dual NPAs and ar/R SF signatures with PoptrXIP2;1 (as well as with the XIP2 truncated sequences from B. nana and B platyphylla). These conserved dual NPA motifs appear to be rare [15] and are observed with PoptrXIP2;1 from P. trichocarpa [14]. Therefore, like PoptrXIP2;1, it is plausible that a complete version of the BpeXIP2;1 copy which displays these dual NPAs is able to transport small amounts of water. However, these sequences are amputated of their first exon (which corresponds to the N-ter extension), making them silent. The fact that a plausible but "rare" water-permeable version of XIP can exist in a species, although becoming non-functional, is intriguing. As such, this evolutionary occurrence is very interesting and needs to be proven experimentally.

NIP Subfamily
Concerning the NIP subfamily, the two NPAs show the same sequence as the PIPs and TIPs, except for the BpeNIP5s and BpeNIP6;1 where the alanine is replaced by a serine residue in the first NPA, and by a valine in the second NPA motif ( Figure S6d). The divergences concern the ar/R filter and the FPs, suggesting various substrate transport selectivity for these subfamily members and, putatively, the involvement in differential physiological roles.
The NIP family forms a monophyletic group, mainly predicted to localize in the plasma membrane [98]. It belongs to the aquaglyceroporins, a subset of the AQP family, and displays more diverse substrate specificities than the PIP and TIP subfamilies, playing a major role in the transport of glycerol, lactic acid, urea, and many metalloids, including arsenic, boric and silicic acids, and antimony [99,100].
Eight NIPs were identified in the Betula genome. They were classified into the three NIP subgroups based on their ar/R filter, suggesting that each subgroup of the NIPs has its own function. The group I is composed of BpeNIP1;1, BpeNIP1;2 and BpeNIP4;1; the group II is composed of BpeNIP5;1, BpeNIP5;2, BpeNIP6;1, and BpeNIP7;1; and the group III includes the BpeNIP2s. The BpeNIP ar/R filters were almost identical to those of C. sinensis [43] and O. europaea [5], where these genes are predicted to act as water facilitators [85].
The BpeNIPs belonging to the group I harbor the classical (W-V-A-R) in H2, H5, LE1, and LE2 residues that would be more hydrophobic than those observed for the group II members which harbor (A-I-G/A-R), (S-I-A-R) or (A-V-G-R) residues. These divergences in residues would impact the transport ability of the protein, suggesting that each subgroup of the NIPs has its own function. For example, although the typical (W-V-A-R) residues are directly involved in the water transport, it was reported that this signature confers a low water permeability compared with the other groups, while enhancing the transport of uncharged solutes, such as glycerol and formamide [101]. Furthermore, the NIPs that combine the ar/R signature (A-I-G/A-R) with the dual NPS/NPV motifs are able to transport arsenite, boric acid, and silicic acid in rice [102,103].
In addition, group II is very interesting in many ways. The BpeNIP5s are very close to AtNIP5;1, which seem to be capable of transporting arsenite, boric acid, and antimony in experimental assays [100,104]. The BpeNIP6;1 is phylogenetically close to AtNIP6;1, which is able to transport urea and the amino acid glycine (Gly) in addition to metalloids [105,106]. As for the BpeNIP7;1, its phylogenetic divergence does not separate it from group II and its proximity to AtNIP7;1 suggests that it can perform similar functions by transporting arsenite, boric acid, antimony, urea, and Gly [104]. Interestingly, the presence of a specific tyrosine residue in the TM2 (Y81) of AtNIP7;1 has been directly related to gating and regulation of urea and Gly transport [107]. The BpeNIP7;1 displays this polar and aromatic residue in TM2 (Y91) (Figure S6d), and the ability of the BpeNIP7;1 to transport these permeants must be considered.
The NIP-group III is present in the Betula genome, and the NIP2 members compose it. Their ar/R selectivity filter is unique and is composed of the typical G, S, G, and R residue (Table 1 and Figure S6d). The small size of the amino acid in the H2 and H5 positions, and their precise distance of 108 amino acids distance to the NPA domains, generate the largest pore diameter (>4.38 Å) described in aquaporins, allowing the passage of very large solutes such as silicic acid, arsenite, and boric acid when expressed in Xenopus oocytes [16]. Mutations in the H5 position result in a loss of transport activity for these molecules [101]. In the plant kingdom, different species accumulate varying degrees of silicon (Si) [108,109]; this group III is present in several silicon-accumulating plants [105]. In addition, NIP2 members from Oryza sativa or Zea mays are able to transport arsenite, boric acid, antimony, urea, H 2 O 2 , and Gly (Table 1 and Figure S7) [21,100,[110][111][112]. BpeNIP2;1 is phylogenetically assigned to the same clade as the Populus trichocarpa and Solanum lycopersicon NIP2s ( Figure S2) (as well as O. sativa and Z. mays NIP2s, data not shown), suggesting that the BpeNIP2;1 could be viewed as a Si transporter. The plant species considered as high accumulators of Si are known to accumulate up to 10% of Si on a dry weight basis [113], whereas the species with a low Si-accumulating capacity (i.e., around 0.2% or less of Si) lack NIP2s or functional NIPs without a precise distance between the NPA domains and the ar/R selectivity filter. Among the angiosperms, Fagales, which includes B. pendula, presents significant shoot Si concentrations [114], suggesting that Betula can be a potential silicon-accumulating species. However, to confirm the spatial predispositions of the NPA and the ar/R selectivity filter of the BpeNIP2;1 and its Si-permeability, experimental validation is required.

SIP Subfamily
The SIPs are divided into two groups based on sequence alignments and on both the ar/R selectivity filter and FPs. In general, the NPA domains of the SIPs are highly divergent from conventional AQPs, especially in the first NPA domain [115]. These features are found in the BpeSIPs (Table 1 and Figure S6e); the first NPA motif shows the replacement of alanine by threonine (T; BpeSIP1;1 and PvSIP1;2), serine (S, BpeSIP1;2) or leucine (L; BpeSIP2;1), while the second NPA motif is completely conserved in the other subfamilies. The ar/R positions from SIP aquaporins show valine/alanine/serine in H2, threonine/lysine in H5, proline/glycine in LE1, and asparagine/serine in LE2. All these residue variations are classically shared by the SIP subfamily in other plant species.
Even today, the SIP protein basic structure, solute specificity, and function in planta are still under characterization. Water channel activity was determined for the AtSIP1s, unlike for AtSIP2;1; they are subcellularly localized to the endoplasmic reticulum [116]. However, no data on transport of non-aqueous substrates is available. The variety in the ar/R selectivity filters and FPs, in comparison with those of other MIPs, suggests that the water channel function may not be the sole function of the SIPs. In this regard, homologuesand structure-based analyses predict that the SIP1s could be capable of transporting urea, while the SIP2s could transport H 2 O 2 and arsenite [117].
Finally, although every global substrate specificity study, mainly based on prediction, has to be carefully considered, our phylogenetic analysis links the BpeSIPs to their orthologues in A. thaliana, P. trichocarpa, L. esculentum, and O. europaea ( Figure S2). Even though the BpeSIPs show slight residue variations at the selectivity filters, an attribution of solute transport to these proteins can be predicted.

Subcellular Localization Prediction of BpeAQPs
Predicted spatial features are helpful for the functional characterization of AQPs. Every BpeAQP member exhibits a highly conserved tridimensional structure composed by six transmembrane helices (Table 1 and Figure S5). The highly conserved hydrophobic regions with specific amino acid sequence and the TM number support the transmembrane structural integrity of the AQPs, and possibly for PIP and TIP protomers. The spatial orientation of specific highly conserved residues is key in the formation of a stable tetramer [118].
The plant aquaporins are mainly located in the plasma membrane. However, specific membrane localizations can vary between the different AQP subfamilies, which, ultimately, influence sub-cellular flow and compartmentalization of solutes. Almost all of the BpeAQPs are predicted to be localized to the plasma membrane, except for the BpeTIP5;1 which would be exclusively assigned to the chloroplast (Table 1 and Figure S8). The BpePIPs are all predominantly assigned to plasma membranes and, very marginally, to tonoplasts, or to ER and Golgi endomembranes. As for the other BpeAQP subfamilies, subcellular localizations are contrasted to varying degrees between members from a same subfamily, being assigned to the vacuole, endoplasmic reticulum, Golgi complex, peroxisomes, mitochondria, and chloroplasts.
Different PIP and TIP members were detected in mitochondrial and chloroplast membranes [119,120], where they could transport key solutes (CO 2 , water and H 2 O 2 , nitrogen) to various plant metabolisms [121,122]. A part of the predictions we obtained in this study is in agreement with the data reported in the literature, but some differences were also observed. In addition, some predicted cell compartmentalizations of several BpeAQPs seem "unusual", such as the TIP and SIP in chloroplast membranes, or the NIP and XIP in tonoplasts. No location should be excluded without further experimental demonstrations, and more studies are required to verify these preliminary data.

Profiling of BpeAQP Transcriptome
Plants are constantly exposed to a large variety of abiotic stresses. Cold stress (CS, chilling, 0-15 • C; freezing, <0 • C) has a strong impact on plant growth and development, and directly influences the species expansion, crop distribution, and yield [123].
Cold stress tolerance in plants is a very complex trait. Studies on chilling and freezing tolerance from woody species, including silver birch, have shown that CS directly influences various physiological and metabolic reactions which are regulated by profound changes in gene expression [124,125]. However, the involvement of AQPs in the acclimation and/or tolerance to chilling is poorly understood. To provide insight into the physiological roles of the various B. pendula AQP isoforms, publicly available whole-transcriptome RNA-seq datasets [33] were processed to study the early expression patterns of the 33 BpeAQP genes in leaves under a cold stress of 6 • C (Figure 3).
The early responses of the BpeAQPs to CS show that the strongest transcriptional modulations are limited to a few isoforms and mainly vary at 1 and 1.5 h after stress application ( Figure 3). However, although several trends in the trimmed mean of M-values (TMM) can be distinguished, they are not statistically significant (p > 0.05) ( Figure S9), mainly due to the paucity of biological repetitions (i.e., two or three biological replicates depending on the kinetic point), which calls for the repetition of the experiment.
The major trends are an up-regulated group with BpePIP2;5, BpePIP2;1, BpeNIP6;1 and BpeSIP2;1, and a down-regulated one with BpePIP1;2, BpePIP1;3, BpePIP2;4, BpeNIP1;2, and BpeTIP1;1. The early responses of various plants to cold stress are reported in the literature, and they concern transcriptomic analysis and transgenic expression [126]. Authors have reported differential expression patterns with down-or up-regulations according to the AQP members (including several BpeAQP orthologues), organs, and genotypes. Interestingly, some BpeAQP genes show similar transcription accumulation kinetic with orthologs from O. europaea (OleurPIP1;2, OleurPIP1;3, OleurPIP1;1, OleurPIP2;5 [5]), Musa acuminate (MaPIP2;4, MaPIP2;5 [127]), rice (OsPIP1;2, OsPIP2;5, OsTIP1;1 [128]) or A. thaliana (At-PIP1;2, AtPIP1;3, AtPIP2;4, AtPIP2;5 [129]) during cold stress. These data suggest that a set of specific aquaporin genes respond to cold acclimation in the leaf of birch, and that particular MIP sequences (e.g., PIP1;2, PIP2;5, TIP1;1) shared by different plant species plausibly play key roles for adaptation to cold stress in plants. In addition, transgenic plants overexpressing some PIPs or TIPs displayed an enhanced tolerance to cold [130,131]. In our results, we could not detect the involvement of the few modulated BpeAQP genes in the CS stress; however the down co-modulation of the BpePIP1s and BpePIP2;4, for example, suggest plausible hetero-tetramerization events between protomers, which would minimize water loss, or the up-regulation of two BpePIP2s which may facilitate CO 2 transport from leaves under cold stress. In spite of these trends, we are not able to propose definitive functional interpretations of the modulation of the BpeAQP isoforms. However, it is becoming clear that the role of the AQPs in the regulation of physiological status under hydraulic constraint is highly complex and still largely unraveled, especially for the plant species that share important multigenic families [132]. In this regard, the two highly expressed BpePIP2s in leaves are the BpePIP2;1 and the BpePIP2;4 (Figure 4), which harbor up-and down-regulated expression during CS, respectively. The opposing patterns suggest compensating events between isoforms, where one AQP group supplements the functions of another group. This compensation could create a differential flow of permeants and possibly a preferential orientation of their flow, revealing a complex participation of different aquaporins in matter flow between cytosol and the different subcellular compartments and/or apoplasm [68]. It is particularly noteworthy that low temperatures have a very discrete impact on only a few isoforms in B. pendula. Several other studies showed substantial and large AQP modulations during cold stress, such as olive [5] or banana [127]. In contrast to these Mediterranean or tropical species that are extremely cold-sensitive, B. pendula is a rather cold-resistant species. In this regard, the Betula genus is represented by a pioneer species which occupy a broad latitudinal range in the Northern Hemisphere [133], from the sub-tropics to the Arctic, and populating various contrasted habitats, ranging from bogs, highlands, to alpine tundra, swampy mountain habitats, and forests with more or less drained soil. Therefore, in view of the prime functions that aquaporins play in the regulation of water and solute homeostasis under diverse environmental conditions, it becomes legitimate to ask the following question: do the discrete modulations in the early expression patterns of BpeAQP genes correlate with the high resilience of Betula leaf to CS, and, more widely, with the high plasticity of this species? It will be very interesting to study the involvement of AQPs in plants growing in very contrasting environments and under various abiotic stresses, under the prism of the phenotypic plasticity of B. pendula.
We also compared the expression patterns of the 33 BpeAQP genes in different organs, including flowers, leaves, roots, and xylem ( Figure 4).
The comparison of normalized gene expression levels shows that 23 BpeAQPs were expressed in at least one of the organs, 11 of which showed significant variations regarding organ type (Figure 4a and Figure S9). Overall, the Betula AQP gene family displays complex differential expression profiles according to the organs, including 18 members that show ubiquitous expression, whereas 11 members exhibit organ-specific expression patterns, thereby suggesting different physiological functions in the organs during developmental, and mineral and/or water absorption processes. Furthermore, out of all the BpeAQPs, the levels of expression observed for the PIP and TIP subfamilies are higher than that for the NIP subfamilies. In addition, transcript accumulations are greater in the root than all of the other organs, xylem being the least represented in this abundance (Figure 4b). These expression profiles are in agreement with previous studies in other plant species such as Brassica rapa [55], Oryza sativa [58], and Eucalyptus [46].
The main aquaporin expressed in the different organs is BpeTIP1;1, predominantly in the leaves (1 253 TMM, Figure S9) and in the flowers (1 165 TMM, Figure S9), as compared with the average of the other BpeAQPs at 250 TMM ( Figure S9). BpeTIP1;2 seems to be specific to roots (888 TMM, Figure S9), but the highest BpeAQPs expressed in this organ is BpePIP2;5. The TIPs ensure high water transport capacity, especially with the TIP1s, reportedly responsible for the high permeability of the tonoplasts [134].
The fact that only one isoform (i.e., BpeTIP1;1) among the TIPs is highly expressed in all of the organs, and that some other BpeAQPs exhibit organ-specific expressions (e.g., BpeTIP1;2, BpePIP1;3, BpePIP2;5 and BpePIP2;6 in the roots; BpeNIP1;2 and BpePIP1;2 in the flowers; BpeTIP1;1 and BpePIP2;4 in the leaves and flowers; BpePIP2;1 in xylem) could be due to a redundancy of functions between counterparts, or to the specialization in particular organs and/or tissues, as revealed with BpeTIP2;2, which is the only isoform that would be able to facilitate the diffusion of NH 4 + (Table 1 and Figure S7) [24]. Moreover, the TIPs in general and the TIP1s in particular are highly involved in the cell division and elongation processes [135,136], suggesting that the BpeTIP1;1 could be mainly responsible for the water transport between the cytoplasm and vacuole in B. pendula plants. The BpeTIP2;2, which expresses in the roots, suggests an ammonium transport facilitation and compartmentation into the vacuole, protecting cells from ammonium toxicity [137].
The functions carried out by the BpeTIP1;2 and BpeTIP2;2 in the root require further elucidation. In our analysis, the BpeTIP3;1 gene showed no expression in any of the organs studied, even in the flowers for which the TIP3 subfamily is known to be specifically and/or strongly expressed [138]. These results suggest that the TIP3 function might not be strictly conserved between plant species, as has also been reported with olive [5], J. curcas [49], banana [127], and carnation [139]. We hypothesize that other AQP proteins, such as the BpeTIP1;1, which shows high expression in the flowers and in the leaves, may be a functional substitute in these organs. Although the aquaporins are quite functionally redundant overall (Table 1), these results support the notion that the functions of the same subgroup can be differentiated between organs, as well as between different species.
The PIP subfamily has the largest number of expressed isoforms, including all BpePIP1s and five out of the six BpePIP2s. Every BpePIP1 expresses in all of the organs tested, but contrasted accumulations can be highlighted with BpePIP1;2, which is expressed in xylem as well as in the leaves and flowers, whereas BpePIP1;3 predominantly accumulates in the flowers and roots. The involvement of PIP1s in the gene network associated with leaf, root, and xylem responses is relatively well-documented, although more so under drought, salinity, and heat conditions. However, there is less study on the involvement of PIP1s with the flower responses, where PIP1;2 showed higher expression in some plant species, and where the related protein was shown to be involved during the flower opening stages, in water transport, and petal expansion [139][140][141]. Regarding the BpePIP2s, BpePIP2;1 mainly accumulates in xylem, BpePIP2;4 in the flowers and leaves, and BpePIP2;5 and BpePIP2;6 in the roots. Interestingly, BpePIP2;3 shows very weak expression in the roots and xylem ( Figure S9), whereas BpePIP2;2 shows no expression in any organ. These two genes are duplicated genes. However, although the duplicate sequences did not diverge significantly during their evolution (Ks/Ka < 1), and where purifying selection might have contributed greatly to the maintenance of the function in common, each duplicate appears to have undergone different fates, including one of the copies which has become silenced over time (i.e., non-functionalization). Gene duplication events are considered as a primary driving force, providing raw material for evolutionary novelty. In the case of BpePIP2;2 and -3, the duplication does not appear to have occurred with the accumulation of mutations in both copies, meaning that the creation of the duplicate should not create functional redundancy, which would affect the fitness of the plant. However, overlapping functions between duplicate genes can result in interfering interactions between paralogues, including major repercussions on related steady-state mRNA and/or protein pools and, potentially, on the regulatory mechanisms controlling various physiological processes with far-reaching phenotypic effects [142,143]. Clearly, the exact functional contribution of each BpePIP2 paralogue will need to be established.
The more weakly expressed subfamilies (TMM < 250, Figure S9) are the NIP with BpeNIP1;2, BpeNIP5;1 and BpeNIP6;1; SIP (where the three members show expression with TMM averaging 50); and the XIP with the BpeXIP2;1 mainly (TMM ≈ 50). These low expression levels are characteristically observed in other species. However, it is likely that their expression might change in response to a specific stimulus or that they are expressed at higher levels but in very specific cell types making up a small population of the total organ sampled and analyzed for RNA-seq.
Overall, RNA expression is a very informative first step for assessing the involvement of the particular members in a biological context, and additional proteomic efforts are needed to confirm these differential AQP abundances in various organs.

Identification of BpeAQP Family Members and Subfamily Classification
BpeAQP genes were searched against the sequenced birch genome available on NCBI (https://blast.ncbi.nlm.nih.gov/Blast.cgi; accessed on 2 July 2021) and the CoGe comparative genomics platform (https://genomevolution.org/coge/). The investigations were conducted using keyword queries ("Major Intrinsic Protein" and "Aquaporin"), complemented by tBLASTn searches with conservative criteria requiring a cut-off of E-value of 1.0 −5 by using the amino acid sequences of aquaporin sequences from A. thaliana (AtAQP) [2] and Olea europaea L. (OleaAQP) [5] as queries. The B. pendula AQP sequences (BpeAQP) were named according to the standardized MIP nomenclature, and each name (i.e., PIP, TIP, NIP, SIP, and XIP) was guided by sequence similarity and phylogenetic analysis with orthologs from A. thaliana, O. europaea, Lycopersicon esculentum (SlAQP; [144]) and Populus trichocarpa (PoptrAQP, [14]). Each BpeAQP gene accession is derived from the identifier of the GenBank GSS contig on which it is located. When necessary, it is subscripted with an additional lowercase letter to indicate that BpeAQP genes accessions differing only by this letter were originally assembled in the same GSS contig. The Betula AQP protein names, accession numbers, and sequences (genomic, CDS, and deduced proteins) used in this work are listed in Figure 1, Figure S1 and Table S1, respectively.

Bioinformatics Analysis
Every putative BpeAQP sequence was carefully scrutinized, including expected AQP motifs (NPA, ar/R selectivity filter, and the FPs) and the prediction of the transmembrane topology, with Interproscan from EMBL (http://www.ebi.ac.uk/Tools/pfa/iprscan/; accessed on 2 July 2021) and the NCBI's conserved domain database (https://www.ncbi. nlm.nih.gov/Structure/cdd/wrpsb.cgi; accessed on 2 July 2021). Motifs were identified by multiple sequence alignment and performed using MUSCLE (https://www.ebi.ac.uk/ Tools/msa/muscle/; accessed on 2 July 2021) [145] with the default parameters. Specificitydetermining positions (SDP1-9; [21]), which were essential to the transport of non-aqua substrates, were identified from multiple sequence alignments as described by [146]. Molecular weight, theoretical isoelectric point, and the grand average of hydropathicity (GRAVY) of the BpeAQPs were analyzed by the ExPaSy compute pI/Mw tool (https://web.expasy.org/ protparam/; accessed on 2 July 2021). The transmembrane regions were predicted using the TMHMM-2.0 software tool (www.cbs.dtu.dk/services/TMHMM/; accessed on 2 July 2021) [147], SOSUI tools (http://harrier.nagahama-i-bio.ac.jp/sosui/sosui_submit.html; accessed on 2 July 2021) [148], and were, when necessary, manually corrected with heterologous AtAQP, SlAQP and OleaAQP sequences. The subcellular localizations of the BpeAQPs were predicted using the online tools WoLF PSORT (http://wolf-psort.hgc.jp; accessed on 2 July 2021) and Plant-mPLoc (http://www.csbio.sjtu.edu.cn/bioinf/plant-multi/; accessed on 2 July 2021) [149]. The phylogenetic studies were conducted with the BpeAQP-deduced peptide sequences, and then with a total of 221 amino acid sequences, including BpeAQPs, AtAQPs, OleaAQPs, PoptrAQPs, and SlAQPs. Alignments were performed using the ClustalW progressive alignment method, and the phylogenetic trees were inferred using the maximum parsimony method. Maximum parsimony analyses were conducted using the subtree-pruning-regrafting (SPR) algorithm, and were bootstrapped with 5000 replicates. All analyses were performed using the MEGA X program. The graphical representation and the edition of the phylogenetic trees were performed with iTOL software (https://itol.embl.de/; accessed on 2 July 2021) [150]. The intron and exon structures of BpeAQP genes were analyzed by online software GSDS 2.0 (http://gsds.gao-lab.org/; accessed on 2 July 2021) [151]. Conserved motifs of the BpeAQP proteins were identified by MEME Suite 4.11.1 (Multiple Expectation Maximization for Motif Elicitation; http://meme-suite.org/tools/meme; accessed on 2 July 2021) [144]. The motif discovery mode was selected as a classic mode by using the default settings, i.e., the zero or one occurrence per sequence (ZOOPS) site distribution per sequence, of a minimum width of 6, a maximum width of 50 amino acid motifs, and a maximum number of motifs to find which was set to 10.

Construction of Homology-Based Tertiary Protein Structure
Homology-based protein tertiary structures (PDB files) of all the Betula AQPs were predicted using the Phyre2 protein-modeling server (http://www.sbg.bio.ic.ac.uk/phyre2 /html/page.cgi?id=index; accessed on 2 July 2021) [152], using the intensive modelling mode as parameter. The results obtained in the form of PDB files were then uploaded to the Mole2.5 server to predict the transmembrane pores and various biochemical properties of the pore-lining residues (https://mole.upol.cz/; accessed on 2 July 2021) [153]. The following default parameters were also used: heteroatoms were ignored for modelization and pore merging was set on. Channel modelization was obtained with cavity probe radius, 5 Å; cavity interior threshold, 1.1 Å; channel origin radius, 5 Å; channel surface cover radius, 5 Å; channel weight function, the Vororoi scale; bottleneck radius, 1.2 Å; bottleneck tolerance, 3 Å; and maximum tunnel similarity, 0.7 Å.

Differential Expression Profile of BpeAQP Gene Family (RNA-Seq Analysis)
The transcriptional expression patterns of all of the BpeAQPs from various plant organs and, subsequently, in leaves undergoing cold stress were recorded from the plant samples that were previously studied in two experimental assays on B. pendula [33]. The transcriptomic data are available in the NCBI SRA (Sequence Read Archive) database with accession numbers PRJNA535361 and PRJNA532995, respectively.
Samples were collected in triplicate from the roots, young leaves, female inflorescences, and xylem (upper stem, about 10th nodes) of healthy two-year-old birch planted in the experimental field of the Northeast Forestry University (Harbin, China). Two biological replicates per sample were sequenced, and each biological replicate consisted of a pool of three plant RNAs. The Northeast Forestry University experimental field is situated at 45.72 • north latitude and 126.63 • east longitude. The average daylight is 14.25 h and the nighttime is 9.75 h, the mean temperature is 17 • C to 27 • C, and the average relative humidity about was 76% at the date of sampling.
Concerning the cold stress assay, samples were taken from young leaves from twomonth-old B. pendula plants grown in the greenhouse at a constant temperature of 25 • C with a photoperiod of 16 h of light and 8 h of dark. A total of six 28W sunlamps were used for illumination, but the intensity of illumination was not measured specifically. For the experimentation, the plants were exposed to cold stress (6 • C) for 0.5 h, 1 h, 1.5 h, 2 h, 2.5 h, and 3 h. Plants left at 25 • C were used as the control (0 h). Two biological replicates for time points of 1 h, 2 h, 2.5 h, and 3 h, control plants, and three biological replicates for time points of 0.5 h and 1.5 h were generated.
Total RNA was extracted using the CTAB (Cetyltrimethylammonium Bromide) method [154]. The constructed cDNA libraries were subjected to paired-end sequencing using the Illumina HiSeq platform at Biomarker Technologies Corporation (Beijing, China). The clean reads of each sample were obtained by filtering out any reads of low quality, and then aligned to the B. pendula reference genome using Bowtie2 [155]. The RNA-sequencing data were analyzed using the RNA-seq by Expectation-Maximization (RSEM) pipeline [156]. RSEM could compute transcript abundance, estimating the number of RNA-seq fragments corresponding to each gene, and normalized expression values as TMM (trimmed mean of M-values). Every methodological step is detailed in [33].
AQPs with zero expression values were considered as non-available data, when distribution analysis showed non-normal distribution of such zero values. Normality and homoscedasticity were checked before analysis of variance (tested at 5% risk level) (Figures S10, S11 and S12). A post-hoc Tukey Honestly Significant Difference (HSD) test was performed at 5% risk level and a Student's T test was performed at 5% risk level with Benjamini and Hochberg FDR adjustments for all AQPs p-values [167]. Principal component analysis was carried out on the first two dimensions and calculated using the first five dimensional variables as parameterized by default in FactoMineR package. The models of variance considered cold stress (CS), time duration of the cold stress (6 • C), and the organ type for the samples analyzed in the different libraries of the organs sampled at 25 • C.

Conclusions
Species of the Betula genus occupy large natural landscapes, playing a key role in forestry and horticulture. Profound climate changes increase the vulnerability of various woody species, making aquaporins (as being vital regulators of the plant water relationship) relevant candidates in developing stress-resistant forest plants. The present experiment identifies that the B. pendula genome harbors 33 genes encoding putative AQPs. Our results determined the variability of the AQPs in Betula, allowing us to establish five BpeAQP subfamilies. The evaluation of the BpeAQP sequences reveals a great diversity in the NPA motifs, ar/R selectivity filters, and FPs, which might promote the transportation of water and a large panel of unconventional non-aqua substrates throughout the plants. This exhaustive study on the BpeAQP specificity to different substrates is only based on predictions. Functional tests in heterologous or homologous systems need to be performed to analyze the functional role of specific BpeAQPs in more detail. Nevertheless, the diverse transport characteristics of BpeAQPs are highly informative and potentially indicate their various roles in the key physiological processes involved in development and low-temperature tolerance in Betula, thus opening up new horizons for very interesting research.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/ijms22147269/s1, Figure S1. Detailed BpeAQP sequences (genomic, CDS and protein sequences) from Betula pendula. Figure S2. Phylogenetic analysis of aquaporin family proteins of Betula pendula (BpeAQPs, filled circles), Arabidopsis thaliana (AtAQPs), Lycopersicon esculentum (SlAQPs), Olea europaea (OeuAQPs) and Populus trichocarpa (PoptrAQPs). Figure S3. Amino acids constituting the ten motifs of the 33 full-length BpeAQP protein sequences. Figure S4. Manual analysis of the BpeXIP2 group leading to the theoretical redefinition of a BpeXIP2;1 used in this work. Figure S5. Predicted 3D models of the BpeAQP monomers (BpePIP1s; BpePIP2s; BpeTIPs; BpeXIPs; BpeNIPs; BpeSIPs). Figure S6. Amino acid sequence alignments per subfamily of the BpeAQP members from Betula pendula. Figure S7. Specificity determining positions (SDP) analysis of BpeAQP transporting non-aqua substrates from Betula pendula. Figure S8. Predicted subcellular localization of Betula pendula aquaporins. Figure S9. Trimmed mean of M-values (TMM) values related to the organ-specific expression profiles of BpeAQP genes from Betula pendula in leaves under cold stress (6 • C), and in various organs (flower, leaf, root and xylem). Table S1. Nomenclature of the BpeAQPs from Betula pendula. Figure S10. Comparison of raw data distribution of regularized-log transformed AQP gene expression in Betula pendula. Figure S11. Exploratory data analysis of cold stress experiment. Figure S12. Exploratory data analysis of organ experiment.

Acknowledgments:
The authors thank those contributors who made the Betula pendula genome data accessible in the public databases of NCBI and CoGe. We are grateful to Céline Sac and Caroline Savel for their technical assistance in molecular biology, and Norbert Frizot for his technical computer support services. We are grateful to the Mésocentre Clermont Auvergne University and AuBi platform for providing computing and storage resources. The authors wish to acknowledge Patricia Mabrut for providing the final linguistic revision of the manuscript. We are also grateful to the anonymous reviewers for their constructive comments.

Conflicts of Interest:
The authors declare no conflict of interest.