Genome-Wide Identification and Expression Profiling Analysis of the Xyloglucan Endotransglucosylase/Hydrolase Gene Family in Tobacco (Nicotiana tabacum L.)

Xyloglucan endotransglucosylase/hydrolase genes (XTHs) encode enzymes required for the reconstruction and modification of xyloglucan backbones, which will result in changes of cell wall extensibility during growth. A total of 56 NtXTH genes were identified from common tobacco, and 50 cDNA fragments were verified by PCR amplification. The 56 NtXTH genes could be classified into two subfamilies: Group I/II and Group III according to their phylogenetic relationships. The gene structure, chromosomal localization, conserved protein domains prediction, sub-cellular localization of NtXTH proteins and evolutionary relationships among Nicotiana tabacum, Nicotiana sylvestrisis, Nicotiana tomentosiformis, Arabidopsis, and rice were also analyzed. The NtXTHs expression profiles analyzed by the TobEA database and qRT-PCR revealed that NtXTHs display different expression patterns in different tissues. Notably, the expression patterns of 12 NtXTHs responding to environment stresses, including salinity, alkali, heat, chilling, and plant hormones, including IAA and brassinolide, were characterized. All the results would be useful for the function study of NtXTHs during different growth cycles and stresses.


Introduction
Xyloglucan is the most abundant hemicellulose in the primary cell walls of dicotyledonous and nongraminaceous monocots plants, where it coats and cross-links adjacent cellulose microfibrils through a non-covalent bond to form a dynamic cellulose-xyloglucan load-bearing cell wall framework [1,2]. In addition, the use of xyloglucan acts as a storage reserve in the seeds of some plant species, such as nasturtium (Tropaeolum majus), where it accumulates as large deposits on the inside of the cotyledon cell wall during seed development and is subsequently hydrolyzed during germination [3].
The xyloglucan endotransglucosylase/hydrolases (XTH) gene family is a subfamily of GH16 based on the carbohydrate-active enzymes (CAZy) classification [4]. Multigene families of XTHs have been identified in a wide variety of plant species using genome sequencing, including Arabidopsis (33), rice (Oryza sativa, 29), poplar (Populus spp. L., 41), wheat (Triticum aestivum L., >57), sorghum (Sorghum bicolor, 35), tomato (Solanum lycopersicum, 25), kiwifruit (Actinidia deliciosa, 14), and apple (Malus sieversii, 11) evolution and divergence after multiple duplications in relation to tobacco genome fusion. To provide useful information for the further functional study of XTH genes in N. tabacum, the expression patterns of XTH genes in different tobacco tissues, as well as in response to stresses such as plant hormones, salinity, alkali, heat, and cold, were characterized by qRT-PCR analysis. We further isolated 50 NtXTH cDNA fragments by PCR amplification. To provide more information for the functional characterization of NtXTHs, the sub-cellular localization of four NtXTHs was identified as well.

Identification of Xyloglucan Endo-Transglucosylase/Hydrolase Family Members in Tobacco and Other Species
Two methods were used to identify tobacco XTH proteins in this study. The first method was based on a previous study [31]. All annotated proteins of the tobacco variety K326 from the Solanaceae crops genome database (SOL Genomics Network, SGN, https://solgenomics.net/ organism/Nicotiana_tabacum/genome) were considered. We employed the Hidden Markov Model (HMM) profile of the XTH protein domains PF00722 and PF06955 [32] as queries to search the database using the program HMMER3.0 with the default E-value. The online program SMART (http://smart.embl-heidelberg.de/) was used to assess the conserved domain of candidate tobacco XTHs with an E-value < 0.1. Only proteins that contain both PF00722 and PF06955 domains were regarded as NtXTHs and reserved for further analysis. The second method was based on the blast of homologous proteins. Thirty-three published Arabidopsis thaliana XTH protein sequences acquired from TAIR 10 (http://www.arabidopsis.org) were used as queries to blast the Solanaceae crops genome database in order to identify tobacco homologous XTH proteins with the parameters E-value < 10 −15 and id% > 50%. After manually removing the redundant sequences, candidate XTH protein sequences with conserved PF00722 and PF06955 domains were filtered by the SMART tool. Finally, tobacco XTH proteins were identified based in the two methods described above. Information regarding coding sequence (CDS) sequences, genomic sequences, and chromosome locations of NtXTHs was obtained from the SGN database. The names of tobacco XTH genes were nominated according to a previous study [12]. Physicochemical parameters of each gene were calculated using the ProtParam tool (http://web.expasy.org/protparam/). The signal peptide cleavage sites were predicted with SignalP v4.1 server (http://www.cbs.dtu.dk/services/SignalP/). The sub-cellular localizations were predicted with ProtComp 9.0 (http://linux1.softberry.com).
The protein sequences of XTH genes from Nicotiana tomentosiformis and Nicotiana sylvestris, which are considered to be the paternal and maternal donors of N. tabacum [33], were also downloaded from the SGN database to investigate the evolutionary relationships of the XTH gene family in ancestor species and common tobacco. In addition, unique members of the XTH gene family in tomato, potato, pepper, eggplant, petunia, and coffee were also downloaded from the SGN database.

Phylogenetic Analysis
Multiple sequence alignments and phylogenetic tree construction of the full-length XTH protein sequences at amino acid level were performed using ClustalW and MEGA 6.0 software [34], respectively. Detailed parameters have been described previously [35]; the gap extension penalty 0.2 was replaced with 0.1.

Gene Structure Analysis and Weblogo of Conserved Catalytic Site
The exon/intron organization of the NtXTH genes was identified with the Gene Structure Display Server (GSDS) tool (http://gsds.cbi.pku.edu.cn/) [36] by aligning the cDNA sequences with the corresponding genomic DNA sequences. The weblogo of conserved catalytic domains was illustrated using the MEME tool (http://meme-suite.org/tools/meme).

Chromosomal Location and Gene Duplication
The chromosomal locations of NtXTHs were determined based on the chromosomal information derived from the SGN database. The positions of the NtXTH genes were physically mapped to each chromosome according to their coordinates on the tobacco genome. Tandem duplications were defined as adjacent NtXTH genes separated by five or fewer genes in a 100 kb region within an individual chromosome [37].

Ka/Ks Values Estimation
The Ka/Ks value is the ratio between the number of nonsynonymous substitutions per nonsynonymous site (Ka) and the number of synonymous substitution per synonymous site (Ks). To estimate the type of selection NtXTH genes are under, the ratio of the rates of nonsynonymous to synonymous substitutions (Ka/Ks) of all sister pairs was calculated for each terminal branch of the phylogenetic trees of N. tabacum using DnaSPv5 software [38]. To confirm the selection pressure, a Ka/Ks ratio greater than 1, less than 1, and equal to 1 represented positive selection, negative selection, and neutral selection, respectively [39]. For each gene pair, the Ks value was used to estimate the divergence time in millions of years based on a rate of 6.1 × 10 −9 substitutions per site per year, and the divergence time (T) was calculated as T = Ks/(2 × 6.1 × 10 −9 ) × 10 −6 million years ago (Mya) [40].

Microarray Expression Profiles of NtXTHs
The 56 NtXTHs sequences identified were used as queries to blast against the tobacco SGN Unigene databases to find out and identify the corresponding Unigene IDs in the TobEA microarray database [30] in order to analyze the expression profiles of NtXTHs in different tobacco tissues. Then, expression data for NtXTHs in 19 tobacco tissues were extracted from TobEA and normalized by log 10 (FPKM+1). A hierarchical map was constructed and viewed with the software Mev [41] based on the normalization data.

Plant Materials, Growth Conditions, and Stress Treatments
Seedlings of N. tabacum species K326 were used to study gene expression in all experiments. To analyze the expression of tobacco XTH genes under abiotic stresses, 4-week-old seedlings were grown in a culturing room at 23 ± 1 • C under a 16-h light/8-h dark cycle, with relative humidity controlled at approximately 60%. Solutions of 150 mM NaCl [42], 50 mM Na 2 CO 3 , 1 µM 2,4-epi-brassinolide (epiBL, Solarbio, B8780, Beijing, China), and 1 µM 3-Indole acetic acid (IAA, Solarbio, I8020, Beijing, China) [10] were used to water the tobacco plants. Leaves of stress-treated plants were collected at 0.5, 2, and 6 h after treatment initiation. Leaves that were collected at 0.5, 2, and 6 h intervals from tobacco plants that were only fed water were used as the corresponding control. To induce heat and chilling stress, tobacco seedlings were placed in a constant temperature incubator or a refrigerator at 37 • C [25] and 4 • C, respectively. Leaves of stress-treated plants were collected at 0.5, 2, and 6 h after treatment initiation. Non-treated seedlings were used as a control for heat and chilling stress treatments. After the materials were collected, they were immediately frozen in liquid nitrogen and stored at −80 • C for RNA extraction. Three biological replicates were employed per sample. The root, stalk, leaves, and flower tissues of the adult plants grown in the culturing room were also collected and stored at −80 • C and subsequently used for the amplification of NtXTH cDNA fragments.
Because members of the NtXTH gene family share homology in their coding regions, gene-specific primers based on the 3 non-coding regions of NtXTHs were designed for qRT-PCR using Oligo Calc (http://biotools.nubic.northwestern.edu/OligoCalc.html) to avoid non-specific amplification. The qRT-PCR primers used in this study are shown in Supplementary File 1.

Amplification and Sequencing of NtXTH cDNAs
A mixture of cDNA extracted from root, leaf, stem, and flower tissues of K326 species was used as an amplification template. Gene-specific primers to amplify each of the NtXTHs were designed using the Oligo Calc software. Forward/reverse primers were designed in the nearest upstream/downstream regions close to the start/stop codon of the CDS. To obtain the correct cDNA sequences, the super-high fidelity TranStart FastPfu Fly DNA polymerase (Transgen, AP231-13, Beijing, China) was used to amplify the NtXTHs. PCR products that generated bright and single bands, or multiple bands that could be easily separated, were sequenced directly. Other PCR products with multiple bands that did not separate well were cloned into the pEasy-Blunt vector (Transgen, CB101-02, Beijng, China). After transformation into Escherichia coli, the positive colonies were sequenced to obtain the NtXTHs sequence.

Subcellular Localization of NtXTH Proteins
The CDS fragments of NtXTH11, NtXTH19, NtXTH40, and NtXTH45 were amplified using cDNA from K326 leaves by gene-specific primers with a homologous recombination arm (Supplementary File 2) using TranStart FastPfu Fly DNA polymerase, as mentioned above. The fragments were cloned into the binary pcam35tlegfps2#4 vector (modified based on pCAMBIA1300) to generate 35S::NtXTHs-GFP fusion proteins using seamless cloning kits (CloneSmarter, 5891-25, Houston, TX, USA). AtCESA1 was then cloned into the binary pcam35tlerfps2#4 vector (modified based on pCAMBIA1300) to generate the 35S::AtCESA1-RFP fusion protein as a plasma membrane-anchored marker [45]. Positive clones were confirmed by DNA sequencing and transformed into Agrobacterium tumefaciens strain GV3101. Nicotiana benthamiana plants with six leaves were used for transient expression using the infiltrated method [46]. Confocal laser scanning microscopy was used to analyze the GFP and RFP fluorescent signals.

Identification of NtXTHs Based on the SGN Database
Gene family members are generally conserved in different species. In the present study, 56 members of candidate tobacco XTH proteins containing both PF00722 and PF06955 domains were identified. All of these have orthologous genes in Arabidopsis, and the identity ranges from 54.68 to 82.53% ( Table 1). The 56 NtXTHs were renamed as NtXTH1 to NtXTH56 based on the results of the phylogenetic analysis (Table 1, Figure 1A). The length of proteins encoded by the NtXTH genes varied from 217 to 365 amino acids, and the average length was 296 amino acids. All of the NtXTHs proteins possess a signal peptide sequence. The prediction results relating to sub-cellular localization revealed that 48 NtXTHs were plasma membrane-anchored proteins and the other eight NtXTHs were extracellular proteins. Information on parameters such as isoelectric point (PI), molecular weight (MW), and intron numbers of NtXtH proteins are provided in Table 1 In our study, 56 XTH genes were identified in tobacco, which is greater than the number identified previously in other representative species, including Arabidopsis, rice, sorghum, poplar, tomato, kiwifruit, and apple. To gain insight into the size characteristics of XTH genes in other species, members of the XTH gene family in N. sylvestris (37), N. tomentosiformis (30), N. benthamiana (47), potato (32), pepper (23), eggplant (22), petunia (32), and coffee (23) were identified from the SGN database using the same method mentioned above. The unique accession numbers used to search the genome database are listed in Supplementary File 6. The number of XTH genes in these 17 species varies, ranging from 11 to 57. We noticed that multiploid plants, such as wheat and tobacco, have a higher number of XTH genes compared with other species. In particular, N. tabacum carries more XTH genes than its two ancestral species N. tomentosiformis and N. sylvestris, but less than the sum of the two species.
Genes 2018, 9, x 6 of 24 higher number of XTH genes compared with other species. In particular, N. tabacum carries more XTH genes than its two ancestral species N. tomentosiformis and N. sylvestris, but less than the sum of the two species.

Phylogenetic and Structural Analyses of NtXTH Genes
Unlike AtXTH genes, which could be classified into three groups, the divergence between Group I and Group II NtXTHs was no longer apparent. However, Group II was clearly distinct from Groups I and II, indicating that the NtXTH genes could be divided appropriately into two major subfamilies: subfamilies I/II and subfamily III ( Figure 1A). Subfamilies I/II consisted of 42 gene members and subfamily III has 14 members. A total of 23 sister pairs were identified according to phylogenetic analysis ( Table 2) and all sister pairs showed high bootstrap support (>96%). Interestingly, the Ka/Ks ratios of these 23 NtXTHs sister pairs were less than 1 and divergence was estimated between 5.4 to 20.4 Mya ( Table 2).
To gain further insight into the structural diversity of the NtXTH genes, we compared the exon/intron organization in the coding sequences of individual NtXTH genes in tobacco. As shown in Figure 1B, the most closely related genes in the subfamilies share similar exon/intron structures and intron numbers, which was consistent with the characteristics defined in the phylogenetic analysis. Overall, 55 out of 56 NtXTH genes contained three-to-four exons, except for NtXTH15, which contained five exons (Table 1, Figure 1B). To obtain intron gain/loss information of all sister pairs, the intron/exon structures of the genes that clustered together at the terminal branch of the phylogenetic tree were also compared. Among these, four pairs showed substantial changes in their intron/exon structure, including NtXTH2/-3, NtXTH14/-15, NtXTH20/-21, and NtXTH28/-29. Comparison of the four gene pairs with neighboring genes, NtXTH2 and NtXTH14 lost one exon compared to NtXTH3 and NtXTH15, respectively. During the long evolutionary period, sister pairs NtXTH20/-21 and NtXTH28/-29 presented exons and introns of different lengths. In particular, NtXTH28 and NtXTH29 presented longer introns than the other genes ( Figure 1B). Notably, the recognition splicing site of the first exon and intron of NtXTH10, NtXTH11, NtXTH37, NtXTH38, NtXTH39, and NtXTH40 was GC, which differs from the regular GT splicing site (Supplementary File 4).
SignalP analysis predicted a signal peptide for entry into the secretory pathway for each of the 56 candidate NtXTH proteins (Table 1). A total of 54 out of 56 signal peptides were located in the first exon, except for NtXTH28 and NtXTH29, whose signal peptides were coded by two exons ( Figure 1B). Furthermore, the conserved diagnostic amino acid sequence motif DEIDFEFLG was found in all the NtXTHs (Supplementary File 5, Figure 1C). Unlike Arabidopsis, the DEIDFEFLG is not confined to the second exon, and is also found in either the second, third, or fourth exons. Among the 56 NtXTHs, 18 genes encode the motif in the second exon, there are 35 genes in the third exon, and three genes in the fourth exon ( Figure 1B).

Chromosomal Location, Gene Duplication, and Evolution
A total of 37 members of 56 tobacco XTH genes were mapped to the 18 chromosomes, while 19 genes were not ( Figure 2). The tobacco XTH genes are unevenly distributed among all chromosomes. None of the NtXTH genes were mapped to the six chromosomes, including chromosomes 2, 12, 15, 16, 19, and 24. Chromosome 8 was found to carry five NtXTH genes, and the maximum number of NtXTH genes among chromosomes 7, 11, and 13 were four, four, and three, respectively. All other chromosomes only contained one or two NtXTH genes. Among the four NtXTH genes previously reported, NtXTH11, NtXTH19, and NtXTH35 were mapped to chromosome 17, 18, and 8, respectively, while NtXTH18 was mapped to unattributed scaffolds.

Chromosomal Location, Gene Duplication, and Evolution
A total of 37 members of 56 tobacco XTH genes were mapped to the 18 chromosomes, while 19 genes were not ( Figure 2). The tobacco XTH genes are unevenly distributed among all chromosomes. None of the NtXTH genes were mapped to the six chromosomes, including chromosomes 2, 12, 15, 16, 19, and 24. Chromosome 8 was found to carry five NtXTH genes, and the maximum number of NtXTH genes among chromosomes 7, 11, and 13 were four, four, and three, respectively. All other chromosomes only contained one or two NtXTH genes. Among the four NtXTH genes previously reported, NtXTH11, NtXTH19, and NtXTH35 were mapped to chromosome 17, 18, and 8, respectively, while NtXTH18 was mapped to unattributed scaffolds. Five or fewer genes located in a of 100 kb range are usually regarded as tandem duplicates [47]. Notably, among the 37 genes, eight gene pairs were detected within a distance of less than 10 to 100 kb on chromosomes 7, 8, 10, 11, 13, and 17, which may be resulted from tandem duplication ( Figure  2). The results of Smith-Waterman algorithm (http://www.ebi.ac.uk/Tools/psa/) alignment showed that the sequence similarities of five pairs of genes (NtXTH21/-22, NtXTH21/-23, NtXTH9/-11, NtXTH22/-23 and NtXTH24/-25) exceeded 90%. Hence, the five pairs were regarded as tandem duplicates. Detailed information on the sequence similarity of these paralogous pairs is shown in Supplementary File 7.
To investigate the evolutionary relationships of the identified XTH proteins, the predicted fulllength amino acid sequences from N. sylvestris, N. tomentosiformis, and N. tabacum were used to generate a phylogenetic tree. The XTHs protein sequences of N. sylvestrisis (Supplementary File 8) and N. tomentosiformis (Supplementary File 9) were also downloaded from the SGN database using the aforementioned method. As illustrated in the Neighboring-Joining phylogenetic tree (Figure 3), Five or fewer genes located in a of 100 kb range are usually regarded as tandem duplicates [47]. Notably, among the 37 genes, eight gene pairs were detected within a distance of less than 10 to 100 kb on chromosomes 7, 8, 10, 11, 13, and 17, which may be resulted from tandem duplication (Figure 2). The results of Smith-Waterman algorithm (http://www.ebi.ac.uk/Tools/psa/) alignment showed that the sequence similarities of five pairs of genes (NtXTH21/-22, NtXTH21/-23, NtXTH9/-11, NtXTH22/-23 and NtXTH24/-25) exceeded 90%. Hence, the five pairs were regarded as tandem duplicates. Detailed information on the sequence similarity of these paralogous pairs is shown in Supplementary File 7.
To investigate the evolutionary relationships of the identified XTH proteins, the predicted full-length amino acid sequences from N. sylvestris, N. tomentosiformis, and N. tabacum were used to generate a phylogenetic tree. The XTHs protein sequences of N. sylvestrisis (Supplementary File 8) and N. tomentosiformis (Supplementary File 9) were also downloaded from the SGN database using the aforementioned method. As illustrated in the Neighboring-Joining phylogenetic tree (Figure 3), XTH proteins from the two ancestral species and N. tabacum were organized in a similar way, and some orthologous relationships were identified. The two ancestral XTH genes were named based on their relationship to known N. tabacum XTHs according to the phylogenetic tree. The phylogenetic tree results suggested that gene fusion or elimination greatly impacted the evolution of this gene family in the common tobacco genome. Forty-seven of 56 NtXTHs were found to be orthologous genes of N. sylvestrisis or/and N. tomentosiformis. Among them, NtXTH1, NtXTH24, and NtXTH27 clustered with XTH genes from both N. sylvestrisis and N. tomentosiformis indicating that those donor XTHs fused during evolution. However, other genes, such as NtXTH14, NtXTH20, NtXTH29, NtXTH30, NtXTH32, NtXTH37, and NtXTH56 were clustered closely with other NtXTHs indicating that the gene duplications occurred during evolution (Figure 3). In addition, some donor XTH genes, for example, NtoXTH23, NsyXTH25, NsyXTH26, and NsyXTH27 did not cluster with any NtXTHs, suggesting that a gene loss event occurred in the XTH gene family after polyploidization in the allotetraploid tobacco (Figure 3). XTH proteins from the two ancestral species and N. tabacum were organized in a similar way, and some orthologous relationships were identified. The two ancestral XTH genes were named based on their relationship to known N. tabacum XTHs according to the phylogenetic tree. The phylogenetic tree results suggested that gene fusion or elimination greatly impacted the evolution of this gene family in the common tobacco genome. Forty-seven of 56 NtXTHs were found to be orthologous genes of N. sylvestrisis or/and N. tomentosiformis. Among them, NtXTH1, NtXTH24, and NtXTH27 clustered with XTH genes from both N. sylvestrisis and N. tomentosiformis indicating that those donor XTHs fused during evolution. However, other genes, such as NtXTH14, NtXTH20, NtXTH29, NtXTH30, NtXTH32, NtXTH37, and NtXTH56 were clustered closely with other NtXTHs indicating that the gene duplications occurred during evolution (Figure 3). In addition, some donor XTH genes, for example, NtoXTH23, NsyXTH25, NsyXTH26, and NsyXTH27 did not cluster with any NtXTHs, suggesting that a gene loss event occurred in the XTH gene family after polyploidization in the allotetraploid tobacco (Figure 3).

Analysis of XTH Genes in N. tabacum, N. tomentosiformis, N. sylvestris, Arabidopsis, and Rice
To further examine the phylogenetic relationships of XTH proteins in dicotyledons and monocotyledon, a phylogenetic tree was constructed using the full-length XTH protein sequence alignments from N. tabacum, N. tomentosiformis, N. sylvestris, Arabidopsis, and rice. Initially, it appeared that the tree was no longer divided into two or three subfamilies, hence, eight subclades were generated, as indicated in Figure 4 (I to VIII). To clarify the evolutionary relationships among dicotyledons and monocotyledon, subclades II, IV, and VII were further divided into five, three, and three subclasses, respectively, i.e., a, b, c, d, and e. The phylogenetic tree shows that the plant XTH sequence distribution predominates with species bias (Figure 4). For example, Subclade I contained only rice XTH genes, indicating that they may have been lost in dicotyledons, such as Arabidopsis and some tobacco species, or acquired in the monocotyledon rice after divergence from the last common ancestor. Conversely, subclades II-c, IV-c, VI, and VII-b contained only dicotyledons N. tabacum, N.

Analysis of XTH Genes in N. tabacum, N. tomentosiformis, N. sylvestris, Arabidopsis, and Rice
To further examine the phylogenetic relationships of XTH proteins in dicotyledons and monocotyledon, a phylogenetic tree was constructed using the full-length XTH protein sequence alignments from N. tabacum, N. tomentosiformis, N. sylvestris, Arabidopsis, and rice. Initially, it appeared that the tree was no longer divided into two or three subfamilies, hence, eight subclades were generated, as indicated in Figure 4 (I to VIII). To clarify the evolutionary relationships among dicotyledons and monocotyledon, subclades II, IV, and VII were further divided into five, three, and three subclasses, respectively, i.e., a, b, c, d, and e. The phylogenetic tree shows that the plant XTH sequence distribution predominates with species bias (Figure 4). For example, Subclade I contained only rice XTH genes, indicating that they may have been lost in dicotyledons, such as Arabidopsis and some tobacco species, or acquired in the monocotyledon rice after divergence from the last common ancestor. Conversely, subclades II-c, IV-c, VI, and VII-b contained only dicotyledons N. tabacum, N. tomentosiformis, N. sylvestris, and Arabidopsis while no rice genes were detected in these subclades. Subclades II-a, II-b, and II-e contained mainly tobacco species or Arabidopsis (Figure 4), revealing that these XTHs may have occurred following duplication among these species after the dicot-monocot split. However, subclades II-d, III, IV-a, IV-b, V, VII-a, VII-c, and VIII represented all five species, suggesting that some XTH genes were generated before the dicot-monocot split. tomentosiformis, N. sylvestris, and Arabidopsis while no rice genes were detected in these subclades. Subclades II-a, II-b, and II-e contained mainly tobacco species or Arabidopsis (Figure 4), revealing that these XTHs may have occurred following duplication among these species after the dicot-monocot split. However, subclades II-d, III, IV-a, IV-b, V, VII-a, VII-c, and VIII represented all five species, suggesting that some XTH genes were generated before the dicot-monocot split.

Isolation and Sequencing of NtXTH cDNAs
To verify the NtXTH genes predicted in silico, cDNA of NtXTH was amplified and sequenced. A total of 55 primer pairs were designed based on the SGN database sequence, to amplify NtXTHs. The same primers were used for NtXTH38 and NtXTH39 due to their high level of sequence

Isolation and Sequencing of NtXTH cDNAs
To verify the NtXTH genes predicted in silico, cDNA of NtXTH was amplified and sequenced. A total of 55 primer pairs were designed based on the SGN database sequence, to amplify NtXTHs. The same primers were used for NtXTH38 and NtXTH39 due to their high level of sequence similarity. Additionally, NtXTH48 and NtXTH49 share the same forward primer. The primers that amplified NtXTH11, NtXTH19, NtXTH40, and NtXTH45 were fused to a homologous arm which can also be used to construct GFP fusion proteins (Supplementary File 2). In total, 52 NtXTHs were amplified successfully excluding NtXTH35, NtXTH37, NtXTH38, and NtXTH39 ( Figure 5). Among these successfully amplified NtXTH genes, 44 NtXTHs showed a single clear and sharp band, while another eight NtXTHs (XTH2, XTH14, XTH15, XTH20, XTH23, XTH24, XTH25 and XTH49) presented at least two bands. similarity. Additionally, NtXTH48 and NtXTH49 share the same forward primer. The primers that amplified NtXTH11, NtXTH19, NtXTH40, and NtXTH45 were fused to a homologous arm which can also be used to construct GFP fusion proteins (Supplementary File 2). In total, 52 NtXTHs were amplified successfully excluding NtXTH35, NtXTH37, NtXTH38, and NtXTH39 ( Figure 5). Among these successfully amplified NtXTH genes, 44 NtXTHs showed a single clear and sharp band, while another eight NtXTHs (XTH2, XTH14, XTH15, XTH20, XTH23, XTH24, XTH25 and XTH49) presented at least two bands. Among the 44 NtXTH genes with a single band, the sequences of 38 NtXTHs were consistent with the SGN database. However, NtXTH41 and NtXTH44 presented one single nucleotide polymorphism (SNP) in contrast to the SGN database. NtXTH3, NtXTH4, and NtXTH28 exhibited a 6-64 bp indel before the stop codon. However, the sequence we obtained for NtXTH36 did not match that obtained using the SGN database. Intriguingly, among the eight genes with multiple bands, the sequence of one splice variant was consistent with the SGN database, and the other splice variants, with the insertion of one to four short fragments, formed a stop codon near the start codon to truncate the proteins (Supplementary File 10). Additionally, none of the splice variants obtained for NtXTH2 matched those obtained with the SGN database.
Unfortunately, amplification of NtXTH35, NtXTH37, NtXTH38, and NtXTH39 failed after at least four attempts using different optimization conditions. In total, the cDNA fragments of six genes (NtXTH35, NtXTH37, NtXTH38, NtXTH39, NtXTH2, and NtXTH36) were not obtained by PCR amplification.

Subcellular Localization of NtXTH Proteins
Information on the subcellular localization of proteins can help to clarify their functions, and can also be used to predict the subcellular localization of homologous genes in other species. Most NtXTHs were predicted to be localized at the plasma membrane (Table 1). To confirm this, four NtXTH genes (NtXTH11, NtXTH19, NtXTH40, and NtXTH45), which belonged to different groups and were predicated to occur in the plasma membrane and extracellular space, were selected to construct NtXTHs-GFP fusion proteins. After transient expression of NtXTHs-GFP fusion proteins in tobacco leaves, the GFP signals from all four NtXTHs were merged with the RFP signals of the plasma membrane marker gene AtCESA1 (Figure 6), suggesting that NtXTH11, NtXTH19, NtXTH40, and NtXTH45 were localized to the plasma membrane. Among the 44 NtXTH genes with a single band, the sequences of 38 NtXTHs were consistent with the SGN database. However, NtXTH41 and NtXTH44 presented one single nucleotide polymorphism (SNP) in contrast to the SGN database. NtXTH3, NtXTH4, and NtXTH28 exhibited a 6-64 bp indel before the stop codon. However, the sequence we obtained for NtXTH36 did not match that obtained using the SGN database. Intriguingly, among the eight genes with multiple bands, the sequence of one splice variant was consistent with the SGN database, and the other splice variants, with the insertion of one to four short fragments, formed a stop codon near the start codon to truncate the proteins (Supplementary File 10). Additionally, none of the splice variants obtained for NtXTH2 matched those obtained with the SGN database.
Unfortunately, amplification of NtXTH35, NtXTH37, NtXTH38, and NtXTH39 failed after at least four attempts using different optimization conditions. In total, the cDNA fragments of six genes (NtXTH35, NtXTH37, NtXTH38, NtXTH39, NtXTH2, and NtXTH36) were not obtained by PCR amplification.

Subcellular Localization of NtXTH Proteins
Information on the subcellular localization of proteins can help to clarify their functions, and can also be used to predict the subcellular localization of homologous genes in other species. Most NtXTHs were predicted to be localized at the plasma membrane (Table 1). To confirm this, four NtXTH genes (NtXTH11, NtXTH19, NtXTH40, and NtXTH45), which belonged to different groups and were predicated to occur in the plasma membrane and extracellular space, were selected to construct NtXTHs-GFP fusion proteins. After transient expression of NtXTHs-GFP fusion proteins in tobacco leaves, the GFP signals from all four NtXTHs were merged with the RFP signals of the plasma membrane marker gene AtCESA1 (Figure 6), suggesting that NtXTH11, NtXTH19, NtXTH40, and NtXTH45 were localized to the plasma membrane.

BLASTN analysis identified 22
NtXTHs with Unigene IDs corresponding to the microarray data. A hierarchical map was built with MeV software to analyze the expression profiles of the 22 NtXTHs in 19 different tobacco tissues, representing the tobacco growth cycle (Figure 7). The expression patterns of NtXTHs were divided into two distinct clusters and the two expression patterns were different. In part 1, nine NtXTHs belonging to the Group I/II subfamily and five NtXTHs belonging to the Group III subfamily presented similar expression patterns, with high gene expression in all 19 tissues. NtXTH9 and NtXTH11 were expressed at lower levels in mature roots, young roots, and seeds compared with other tissues. NtXTH51 and NtXTH52 were expressed at lower levels in the floral shoot apex, vegetative shoot apex, and young leaves compared with other tissues. Among the genes in part 1 of the Group I/II subfamily, NtXTH4, NtXTH5, NtXTH12, NtXTH19, NtXTH24, and NtXTH27 generally presented relatively high expression in 19 specific tissues. However, in part 2, NtXTH13, NtXTH21, NtXTH22, NtXTH30, NtXTH33, NtXTH34, NtXTH35, and NtXTH36, which belong to the Group I/II subfamily, were expressed at low levels in the 19 studied tissues. NtXTH13 and NtXTH21 may be expressed at the lowest level in all 22 XTHs (Figure 7). However, among the 22 NtXTHs identified by microarray, NtXTH35 and NtXTH36 were not amplified successfully and two transcripts of NtXTH24 were found ( Figure 5, Supplementary File 10).
To verify whether the microarray data represented true variation in the transcripts, 12 NtXTH

BLASTN analysis identified 22
NtXTHs with Unigene IDs corresponding to the microarray data. A hierarchical map was built with MeV software to analyze the expression profiles of the 22 NtXTHs in 19 different tobacco tissues, representing the tobacco growth cycle (Figure 7). The expression patterns of NtXTHs were divided into two distinct clusters and the two expression patterns were different. In part 1, nine NtXTHs belonging to the Group I/II subfamily and five NtXTHs belonging to the Group III subfamily presented similar expression patterns, with high gene expression in all 19 tissues. NtXTH9 and NtXTH11 were expressed at lower levels in mature roots, young roots, and seeds compared with other tissues. NtXTH51 and NtXTH52 were expressed at lower levels in the floral shoot apex, vegetative shoot apex, and young leaves compared with other tissues. Among the genes in part 1 of the Group I/II subfamily, NtXTH4, NtXTH5, NtXTH12, NtXTH19, NtXTH24, and NtXTH27 generally presented relatively high expression in 19 specific tissues. However, in part 2, NtXTH13, NtXTH21, NtXTH22, NtXTH30, NtXTH33, NtXTH34, NtXTH35, and NtXTH36, which belong to the Group I/II subfamily, were expressed at low levels in the 19 studied tissues. NtXTH13 and NtXTH21 may be expressed at the lowest level in all 22 XTHs (Figure 7). However, among the 22 NtXTHs identified by microarray, NtXTH35 and NtXTH36 were not amplified successfully and two transcripts of NtXTH24 were found ( Figure 5, Supplementary File 10).
To verify whether the microarray data represented true variation in the transcripts, 12 NtXTH genes were randomly chosen and their expression was confirmed in four tobacco tissues (young leaf, mature leaf, young root, and mature root) using qRT-PCR. The results clearly showed that most of the qRT-PCR data were consistent with the microarray data output ( Figure 5B). Linear regression analysis showed a significant correlation (R 2 = 0.6571), which indicates good reproducibility between transcript abundance generated by microarray data and the expression profiles obtained from the qRT-PCR data ( Figure 5C).
Genes 2018, 9, x 17 of 24 mature leaf, young root, and mature root) using qRT-PCR. The results clearly showed that most of the qRT-PCR data were consistent with the microarray data output ( Figure 5B). Linear regression analysis showed a significant correlation (R 2 = 0.6571), which indicates good reproducibility between transcript abundance generated by microarray data and the expression profiles obtained from the qRT-PCR data ( Figure 5C).

Expression Profiling of NtXTH Genes under Abiotic Stress Using qRT-PCR
To explore the expression profiles of NtXTHs in response to different abiotic stress conditions, the expression patterns of 12 selected NtXTHs in response to salinity and alkaline stresses, hormone epiBL and IAA treatment, heat, and chilling stress were studied by qRT-PCR in this study ( Figure  8A-F). Differential expression of NtXTHs was observed under different stress conditions. Almost all of the NtXTHs were found to be up-regulated following various stress treatments (Figure 8). For NaCl

Expression Profiling of NtXTH Genes under Abiotic Stress Using qRT-PCR
To explore the expression profiles of NtXTHs in response to different abiotic stress conditions, the expression patterns of 12 selected NtXTHs in response to salinity and alkaline stresses, hormone epiBL and IAA treatment, heat, and chilling stress were studied by qRT-PCR in this study ( Figure 8A-F). Differential expression of NtXTHs was observed under different stress conditions. Almost all of the NtXTHs were found to be up-regulated following various stress treatments (Figure 8). For NaCl stress, the expression level of NtXTHs reached a peak at 0.5 h time points. NtXTH4 was significantly induced and NtXTH5, NtXTH19, NtXTH27, NtXTH30, NtXTH40 and NtXTH54 were moderately induced at 0.5 h compared with control ( Figure 8A). For Na 2 CO 3 stress, the expression level of NtXTHs reached peaks at 0.5 h and 6 h time points. The expression of most NtXTHs presented a trend of up-regulation at first, then down-regulation and finally up-regulation again. NtXTH4 was induced obviously at 0.5 h and 6 h, and the highest expression time point of NtXTH19 was 6 h. NtXTH45 was significantly induced at 0.5 h compared with control ( Figure 8B). For epiBL stress, the expression level of most NtXTHs reached peaks at 0.5 h and 2 h time points. NtXTH4, NtXTH19, NtXTH27, NtXTH40, NtXTH45, and NtXTH54 was induced obviously at 0.5 h, however, NtXTH30 was significantly induced at 2 h compared with control ( Figure 8C). For IAA stress, the expression of NtXTHs was significantly induced at 0.5 h time points compared with control ( Figure 8D). The expression patterns of NtXTHs that responded to heat and chilling stresses were similar. Most NtXTHs were induced at 0.5 h and 6 h time points compared with control ( Figure 8E,F).  Figure 8B). For epiBL stress, the expression level of most NtXTHs reached peaks at 0.5 h and 2 h time points. NtXTH4, NtXTH19, NtXTH27, NtXTH40, NtXTH45, and NtXTH54 was induced obviously at 0.5 h, however, NtXTH30 was significantly induced at 2 h compared with control ( Figure 8C). For IAA stress, the expression of NtXTHs was significantly induced at 0.5 h time points compared with control ( Figure 8D). The expression patterns of NtXTHs that responded to heat and chilling stresses were similar. Most NtXTHs were induced at 0.5 h and 6 h time points compared with control ( Figure 8E,F).

Discussion
Xyloglucan is a hemicellulosic polysaccharide presented in the primary cell walls of all land plants studied to date [1,48]. XTHs reconstruct the cell wall by cutting and/or rejoining xyloglucan to regulate the composition and organization of the cell wall [12]. Only a few tobacco XTH sequences have been isolated so far, including AB017025.1 [28], HQ108341.1, KJ730270, and D86730.1 [27]. The recent release of the genome database of common tobacco provides the opportunity to systematically study certain gene families [29]. In our study, 56 NtXTHs were divided into two subfamilies, Group I/II and Group III, for the first time (Figure 1).

Discussion
Xyloglucan is a hemicellulosic polysaccharide presented in the primary cell walls of all land plants studied to date [1,48]. XTHs reconstruct the cell wall by cutting and/or rejoining xyloglucan to regulate the composition and organization of the cell wall [12]. Only a few tobacco XTH sequences have been isolated so far, including AB017025.1 [28], HQ108341.1, KJ730270, and D86730.1 [27]. The recent release of the genome database of common tobacco provides the opportunity to systematically study certain gene families [29]. In our study, 56 NtXTHs were divided into two subfamilies, Group I/II and Group III, for the first time ( Figure 1).
Common tobacco is an allotetraploid species formed about 200,000 years ago by the hybridization of two diploid interspecific species: N. sylvestrisis (maternal donor) and N. tomentosiformis (paternal donor) [33]. Polyploid evolution has been accompanied by changes in chromosome number and genome size, probably through dysploid reductions via chromosome deletions or fusions [33]. A previous study reported that genome downsizing was a widespread biological response to polyploidization based on a large-scale analysis of the genome size of 3008 angiosperms [49]. The mean genome size of three N. tabacum species was estimated to be 4.53 Gb [29], which represents a reduction of 5.8% compared with the sum of the ancestral N. sylvestrisis (2.59 Gb) and N. tomentosiformis (2.22 Gb) genomes. This result is consistent with the previously estimated downsizing of 3.7% [33]. The genome downsizing of N. tabacum was also illustrated in the XTH gene family. We found that the number of NtXTHs was higher than the ancestral donors, but less than the sum of these two donors (Supplementary File 6). Phylogenetic analysis of the XTH gene family among N. tabacum, N. sylvestrisis, and N. tomentosiformis in our study revealed that most NtXTHs assembled at the same terminal branches as N. sylvestrisis or/and N. tomentosiformis ( Figure 2). These results suggested that gene loss and fusion occurred during the long evolutionary history. Thus, the genome sequences of N. sylvestris and N. tomentosiformis are assumed to have high identity to the genome of N. tabacum. Furthermore, we could speculate that the 23 NtXTHs sister pairs could be derived from N. sylvestrisis and N. tomentosiformis, respectively, not considering the sister pairs NtXTH22/-23 and NtXTH24/-25, which were regarded as tandem duplicated genes (Supplementary File 7). This hypothesis would also explain the 5.4 to 20.4 Mya divergence time of these 23 sister pairs estimated by the Ka/Ks ratio ( Table 2). This divergence time is consistent with the findings of a previous study, whereby N. sylvestris and N. tomentosiformis diverged about 15 million years ago [50].
Comparison of amino acid sequences of XTH genes isolated from azuki bean (Vigna angularis), soybean (Glycine max), tomato (Lycopersicum esculentum), Arabidopsis (A. thaliana), and common wheat (Triticum aestivum) revealed they were highly conserved despite considerable variability in protein size [51]. XTH proteins were reported to contain several conserved modular structures, such as a short hydrophobic amino area that probably functions as a signal peptide to guide the protein to the plant cell wall, and a highly conserved DEIDFEFLG domain that acts as the catalytic site for both XET and XEH activity [52]. The high degree of conservation of XTH amino acid sequences among various plant species implies the functional conservation of these proteins in the plant kingdom. The catalytic site domain DEIDFEFLG was reported to be conserved among all the XTHs characterized thus far [10,12], and is a structural feature of all XTH proteins. In our study, the conserved domain DEIDFEFLG was identified in each NtXTH (Table 1). However, only the second, sixth, eighth, and ninth residues of the DEIDFEFLG motif were conserved in all 56 NtXTHs, and the other residues showed slight variation among NtXTHs (Table 1). Campbell and Braam compared some Arabidopsis XTH catalytic domains and found out that the third residue, isoleucine (I), may also be replaced by another hydrophobic residues; either leucine (L) or valine (V), and the phenylalanine (F) (fifth residue) may be substituted by I [53]. These substitutions also occurred in NtXTH proteins. These changes are predicted to have no effect on the cleavage of xyloglucan glycan chain linkages because the apolar and uncharged nature of the residues are maintained. However, the change in the first glutamate residue (E), which acts as the active site, has been shown to inactivate the protein [53].
Although the putative signal peptides varied in length, most of the NtXTHs were predicted to be secreted to the cell wall, exhibiting a membrane sub-cellular localization (Table 1), which is consistent with their ability to modify the cell wall. However, a previous study performed a transient expression test and reported that DkXTH8 derived from persimmon was anchored to the cell wall in onion epidermal cells [54]. The cell wall localization of DkXTH8 is consistent with its function in the promotion of fruit ripening and softening, due to xyloglucan endotransglycosylase activity. Furthermore, DkXTH8 overexpression in Arabidopsis resulted in increased leaf senescence [54], which is an important sign of maturity in tobacco leaves. In our study, four NtXTHs derived from different phylogenetic groups and predicted to reside in the plasma membrane (NtXTH11, NtXTH19, and NtXTH40) or extracellular (NtXTH45) space, were co-localized with the plasma membrane-anchored protein AtCESA1 ( Figure 6). This indicates that the four NtXTH proteins may have some biological activity in the cell wall due to the close linkage of the cell wall and plasma membrane. Further study is needed to confirm whether the four NtXTH proteins are anchored to the cell wall and to determine the enzyme (XET or XEH) they inhibit.
TobEA is an ideal database to examine gene expression in tobacco, due to the abundant genes and various tested tissues. The expression profiles of 22 NtXTHs in 19 different tissues of tobacco were exhibited in this study (Figure 7). Interestingly, NtXTH35 and NtXTH36 were found to be highly expressed in the open bud, stem, flower, and leaves based on the microarray data, while no cDNA fragments were obtained with several rounds hard amplification ( Figure 5). It is possible that the different environmental conditions in the present study and the previous study, such as soil nutrients, and the various stress stimuli affecting tobacco growth, alter the gene expression profile. In addition, unpredictable and unidentified factors may prevent normal amplification. Additionally, we do not exclude the possibility that the complex allotetraploid genome of common tobacco would lead to some errors in sequence alignment for microarray data. Furthermore, NtXTH35, NtXTH36, and the other four unsuccessfully amplified NtXTHs may be pseudogenes.
Environmental factors such as salinity, alkali, heat, chilling, and plant endogenous hormones affect the expression of plant genes, resulting in the alteration of morphology and physiology. Given that the growth and productivity of tobacco are frequently threatened by environmental factors during the growth cycle, we simulated abiotic stress to examine the expression profiles of NtXTHs, such as salinity, heat, chilling, and alkali conditions. For stress induced by plant hormones, only IAA and epiBL treatments were used. This was due to the publication of a previous comprehensive expression analysis of all AtXTHs, which revealed that AtXTHs can be induced by a variety of plant hormones, especially brassinolide and IAA [10]. Consistent with previous studies, our qRT-PCR results revealed a complex expression profile of NtXTHs under a variety of stress treatments (Figure 8). It is the first time an exploration of the expression patterns of NtXTHs under alkali stress has been undertaken. A previous study reported that mRNA expressions of AtXTH22 peaked after 2 h following treatment with 1 µM brassinolide and then declined [55]. NtEXGT (NtXTH35) was reported to be induced by auxins, brassinosteroids, salinity, and cold in leaves [27]. In our study, all of the stress treatments induced high expression of NtXTHs at the beginning of treatment, which was maximal at 0.5 or 2 h and then declined (Figure 8). Expressional profiling of NtXTHs suggested that NtXTHs might be involved in resistance to adverse environmental factors, or are regulated by plant hormones. A previous study reported that CaXTH1, 2, and 3, which have been isolated from another solanaceae species, the chili pepper, were concomitantly induced by a broad range of abiotic stresses, including cold temperature, drought, and high salinity [20]. Importantly, the CaXTH3 gene was reported to enhance tolerance to high salinity in transgenic Arabidopsis [20] and tomato [23]. Although XTH showed multifunctional activity in relation to biotic stress tolerance, the actual function of NtXTHs involved in plant growth or development must be verified by overexpression or knock-out plants. Xyloglucan is one of the most important components of the primary cell wall and plays extremely important roles in cellulose-xyloglucan load-bearing cell wall frameworks [1,2]. However, the XET and XEH activity of XTH proteins could modify the structure of xyloglucan resulting in the changes in wall loosening, wall strengthening, cell elongation, fruit softening and so on [13]. A previous study reported that XET activity was greatest in the root hair elongation region where the epidermal cell wall formed bulge in Arabidopsis [56], suggesting the key role of XET in the root hair cell elongation. However, mutants atxth27-1 and atxth27-2 showed multiple phenotypes alteration, such as short-shaped tracheary elements in tertiary veins and yellow lesion-mimic spots in mature leaves [16]. These results indicated that XTHs were required for proper morphogenesis.

Conclusions
The physiological properties of the plant cell wall are modified by XTHs, which could cut or reconnect the xyloglucan molecules. In this study, 56 NtXTH genes were identified from the recently completed genomic database of tobacco. The highly conserved evolution relationship and phylogenetic analysis of NtXTH proteins in N. tabacum, N. sylvestrisis, N. tomentosiformis, Arabidopsis, and rice revealed the similar characteristics of NtXTHs in different species. The high expression level of NtXTH4, NtXTH5, NtXTH12, NtXTH19, NtXTH24, and NtXTH27 in all 19 specific tissues based on the TobEA database suggests that they may play a universal cell wall modification function in various tissues. In addition, the rapidly induced expression of most of NtXTHs with different stresses indicated the possible functions of NtXTHs in tobacco against stresses. To our knowledge, this is the first time the expression pattern of NtXTHs in response to alkali stress has been reported. The membrane-anchored subcellular localization of NtXTH11, NtXTH19, NtXTH40, and NtXTH45, which belong to different phylogenetic groups, confirms the validity of the subcellular localization predication of NtXTHs proteins. All the results presented in this study will provide a foundation for further investigation of the function of XTH genes in common tobacco.