Genome Wide Analysis of Family-1 UDP Glycosyltransferases in Populus trichocarpa Specifies Abiotic Stress Responsive Glycosylation Mechanisms

Populus trichocarpa (Black cottonwood) is a dominant timber-yielding tree that has become a notable model plant for genome-level insights in forest trees. The efficient transport and solubility of various glycoside-associated compounds is linked to Family-1 UDP-glycosyltransferase (EC 2.4.1.x; UGTs) enzymes. These glycosyltransferase enzymes play a vital role in diverse plant functions, such as regulation of hormonal homeostasis, growth and development (seed, flower, fiber, root, etc.), xenobiotic detoxification, stress response (salt, drought, and oxidative), and biosynthesis of secondary metabolites. Here, we report a genome-wide analysis of the P. trichocarpa genome that identified 191 putative UGTs distributed across all chromosomes (with the exception of chromosome 20) based on 44 conserved plant secondary product glycosyltransferase (PSPG) motif amino acid sequences. Phylogenetic analysis of the 191 Populus UGTs together with 22 referenced UGTs from Arabidopsis and maize clustered the putative UGTs into 16 major groups (A–P). Whole-genome duplication events were the dominant pattern of duplication among UGTs in Populus. A well-conserved intron insertion was detected in most intron-containing UGTs across eight examined eudicots, including Populus. Most of the UGT genes were found preferentially expressed in leaf and root tissues in general. The regulation of putative UGT expression in response to drought, salt and heat stress was observed based on microarray and available RNA sequencing datasets. Up- and down-regulated UGT expression models were designed, based on transcripts per kilobase million values, confirmed their maximally varied expression under drought, salt and heat stresses. Co-expression networking of putative UGTs indicated their maximum co-expression with cytochrome P450 genes involved in triterpenoid biosynthesis. Our results provide an important resource for the identification of functional UGT genes to manipulate abiotic stress responsive glycosylation in Populus.


Introduction
The global ecological and economic importance of a terrestrial ecosystem is contingent upon the number of forest trees present in that geographical area [1,2]. Populus is a model species for perennial woody plants and a valuable forest resource for timber yield worldwide. It is a model woody plant because its genome resources (including cDNA clones, expressed sequence tags, microarray data sets and RNA seq data sets) are readily available [3][4][5]. Additionally, it is known as a potential candidate tree for carbon sequestration, nutrient cycling, and phytoremediation [1,6]. Populus trees have evolved a diverse set of defense systems to cope with a diverse range of biotic and abiotic stresses over their evolution [1].
Here, we employed bioinformatics techniques to identify 191 putative UGTs in Populus, and then subjected these to genomic and expression analyses. We then further confirmed the publicly available microarray and transcriptomic data on UGT expression in plant tissues, drought, salt, cold and heat stresses. We designed UGT expression models for up-and down-regulation of UGT genes under drought and control conditions. Finally, co-expression networks were constructed in order to gain a better understanding of the involvement of these genes in other pathways.

Identification of Putative UGTs in the Populus Genome
The Populus genome was searched for putative UGTs through the popgenie database (http://popgenie.org/ accessed on 1 February 2022). Forty previously characterized UGTs were used as models to produce an HMM model of the typical PSPG-box found in UGTs [5]. In addition, 19 reference UGT protein sequences were obtained from the A. thaliana cytochrome P450 website (http://www.p450.kvl.dk accessed on 1 February 2022) and a further 3 from the Z. mays genome database (https://www.maizegdb.org/ accessed on 2 February 2022) for subsequent phylogenetic tree construction (Table S1). To retrieve all the putative UGT sequences from the Populus genome, BLAST (BLASTP) 2.2.28 was performed on the popgenie website using the PSPG motif from each phylogenetic group published by [12] as the query sequence. The following search parameters for BLASTP were used: scoring matrix (BLOSUM62) and expectation value (1e-10). The consensus pattern of the designed HMM model was further used to search for this pattern in all retrieved sequences by using the FUZZPRO program (http://www.bioinformatics. nl/cgi-bin/emboss/fuzzpro accessed on 4 February 2022). Graphical representations of the PSPG motif in putative UGTs were also obtained using the ScanProsite website (http://prosite.expasy.org/prosite.html accessed on 5 February 2022). Functional annotations, such as Alias and GO ontology, chromosomal location, peptide length, PFAM id, pathway information and Arabidopsis synonyms of each UGT were obtained from the popgenie and Phytozome databases.

PCR Cloning and Sequencing of Selected Populus UGTs
To validate the UGT sequences obtained, we extracted genomic DNA from P. trichocarpa leaves using a DNeasy Plant Mini Kit (Qiagen GmbH, Germany). The extracted DNA quality and quantity were measured using a NanoDrop 1000 spectrophotometer (Thermo Scientific Inc., USA). PCR amplification was performed using 30 ng of genomic DNA as a template, with TAKARA Ex Taq HS polymerase (Clonetech, Japan). The UGT genes Potri.016G097400, Potri.006G120600, Potri.010G182575, Potri.016G057300, and Potri.016G057300 were selected, and their full reading frame was amplified using the primers and annealing temperature shown in Table S1. Amplicons were visualized on 0.8% agarose gels and eluted using an AccuPrep Gel Purification Kit (Bioneer Inc., Daejeon, Korea). Eluted amplicons were cloned into pGEM-T Easy vectors (Promega, USA) and sequenced using the BigDyeTM terminator method in an ABI 3730XL DNA analyzer (Bioneer Inc.).

Motif Alignment, Phylogenetic Analysis, and Comparison
Downloaded sequences were aligned with MUSCLE using MEGA 7 software [54]. Sequences that were too short, too divergent, or too long were removed from the input file after initial alignment and were re-aligned. The obtained alignment file contained only those sequences similar to the desired PSPG motif. Phylogenetic analysis was performed using the Neighbor-joining statistical method with 1000 bootstrap replications in MEGA 7.0.1.18. A 100% data coverage was used to construct the phylogeny. The phylogenetic tree was visualized using the iTOL website (http://itol.embl.de/ accessed on 5 February 2022). To compare the phylogenetic groups of putative UGTs in Populus, we collected published data on the number of putative UGTs and the number of phylogenic groups including Prunus persica, Malus domestica, Vitis vinifera, Linum usitatissimum, G. max, Glycine soja, G. hirsutum, Gossypium raimondii and Gossypium arboreum [12,[18][19][20][21][22][23][24][25][26][27][28].

Gene Duplication and Chromosomal Distribution
The physical location of each UGT on chromosomes was retrieved using the start and stop positions of genes taken from the Phytozome database (https://phytozome.jgi.doe. gov/pz/portal.html accessed on 6 February 2022). Mapchart 2.2 was used to visualize UGT gene distribution on Populus chromosomes [55]. A gene cluster was defined as two or more copies located in a chromosomal region <200 kb [56].
To predict UGT duplication patterns, we examined segmental and tandem duplication types. Tandem duplications were defined based on more than one gene family member located in the same or neighboring regions of the genome. If the two UGTs were located on duplicated blocks and highly similar at the amino acid level, they were considered segmental duplication events. The Plant Genome Duplication Database (PGDD) server (http://chibba.agtec.uga.edu/duplication/ accessed on 7 February 2022) was used to retrieve duplications for each UGT gene. Within a 100-kb range, anchors with identity >1.0 were rejected to avoid saturation [57]. Assuming the operation of a molecular clock, the synonymous substitution (Ks) values of duplicated genes are probably similar over time [58]. Hence, to estimate the dates of segmental duplication events, Ks values were cast off and their means calculated for each gene pair inside a duplicated block. The estimated date of the duplication event was then predicted using the mean Ks values (T = Ks/2λ), assuming clock-like rates (λ) of 9.1 × 10 −9 identical substitutions/synonymous site/year for Populus [59].
Gene duplications were further confirmed by searching for all branching points in the topology with at least one species that is present in both subtrees of the branching point. An unrooted gene tree was used for the analysis, such that the search for duplication events was performed by identifying the placement of the root on a branch or branches that produced the minimum number of duplication events. Again, MEGA7 was used to determine the evolutionary relationships of taxa and a duplication tree was generated. Finally, a syntenic analysis among all the retrieved sequences was performed using the Circoletto webtool (http://tools.bat.infspire.org/circoletto/ accessed on 10 February 2022) by providing the FASTA sequences. The resulting image was captured.

Gene Structure Analysis
The exon/intron organization for each phylogenetic group was illustrated using the Gene structure display server (GSDS) program (http://gsds.cbi.pku.edu.cn/ accessed on 10 February 2022), by aligning the coding and genomic sequences obtained from Phytozome. Introns were classified based on the structure in the genome, including phase, length, and number. A UGT intron map was constructed in accordance with a previously established method [18,23]. The insertion events were serially numbered as I-1 to I-10, according to their positions [18,25]. The splice sites of intron-containing UGTs were mapped onto all aligned sequences of intron-containing UGT peptides using the PIECE web tool (https://wheat.pw.usda.gov/piece/ accessed on 11 February 2022). The gene structure of putative UGTs was also compared with other UGTs from Arabidopsis, P. persica, M. domestica, V. vinifera, L. usitatissimum, Glycine max, and Gossypium raimondii using the PIECE webtool. The gene structure was compared based on PFAM domain PF00201.

Digital Expression Analysis
Microarray data of the putative UGTs in different tissues, such as leaves, phloem/cambium, roots, petioles, twigs, buds, flowers, and suckers, were obtained from the popgenie database using the Asp201 EA Affymetrix dataset. A heatmap of the data was generated with distance function (Euclidean) and hierarchical clustering (Average), using the exHeatmap tool on the popgenie website. The same Affymetrix dataset was used to check the expression of these UGT genes under different conditions, such as drought, heat, cold and salt stresses. A heatmap was generated using the previously mentioned conditions.

RNA Sequencing based UGTs Model in Tissues and under Drought Stress
Digital expression of all UGTs related to their maximum varied expression under drought conditions was further evaluated using RNA sequence data taken from [60]. The 24,013 transcripts were analyzed from non-stressed (control) and drought-stressed leaves of Populus euphratica, and we retained those with transcripts per kilobase million (TPM) values of one to maximum in both control and drought-stressed samples [60]. On average, a 1128-bp length was recorded for the yielded transcripts after assembly [60]. Normalized TPM values were calculated for all expressed UGTs under control and drought conditions. The TPM values of each expressed UGT under control and drought were tabulated according to their phylogenetic group and a final graph was drawn for the up-regulated and down-regulated UGTs under drought.
Furthermore, RNA seq data values for tissues (leaf, root, shoot, xylem, phloem, fiber and vessel) and stresses were sourced from Plant FIBer Expression database https://ssl. cres-t.org/cgi-fibex/cluster0.cgi?sp=lu_pt (accessed on 12 February 2022) and then were visualized in the Morphus heat map tool.

Co-Expression Networking of Putative UGTs
Gene and protein interaction networks were examined to detect associations between putative UGTs and biological processes using the famNet database (http://aranet. mpimp-golm.mpg.de/famnet.html accessed on 14 February 2022) tool, which generated co-expression networks for each UGT gene. The gene identifier of each UGT was searched for in the famNet database, which enabled us to generate images of the resulting networks. The networks contained all nodes supported by ELA (ensemble label association) and all genes one step away from the query gene were drawn [61]. We considered gene networks of a single query gene having maximum co-occurrences in order to enhance our understanding of the gene families co-expressed with putative UGTs in Populus. An overall co-expression image was drawn by comparing the individual gene networks.

Identified Putative UGTs in the Populus Genome
We identified 191 putative UGTs possessing the C-terminal PSPG motif from the projected soybean proteome (Table S1 and Figure 1A, B). The tools used for identification of UGTs produced consistent results in terms of PSPG motif presence. The lengths of the deduced UGT proteins ranged from 511 to 117 amino acids, with an average length of 465. All UGT sequences started with a methionine and were full-length. The 191 putative UGTs were found to be involved in nine different biosynthesis pathways, namely, saponin biosynthesis, scopolin and esculin biosynthesis, daphnetin modification, myricetin gentiobioside biosynthesis, rutin biosynthesis, cytokinin-glucoside biosynthesis, anthocyanin biosynthesis, and dhurrin biosynthesis (Table S1). Go ontology analysis showed that most of the UTGs are involved in transferring hexosyl groups and metabolism.

Motif Alignment, Phylogenetic Analysis, and Comparison
Cloned UGTs from chromosomes 16, 10, and 6 have sequences similar to those obtained from the popgenie database. The final alignment file contained 503 aligned positions with two highly conserved, 494 variable, and 20 singleton sites. The PSPG motif was found to vary in each phylogenetic group (Table S1 and Figure 1B). The histidines at positions 10 and 19 in the PSPG motif were the most conserved.
The constructed phylogenetic tree indicated a classification of 16 major groups (A-P), including two newly discovered groups (O and P), which are absent in Arabidopsis but present in higher plants like maize, cotton, peach, and apple ( Figure 1A) [18,20,21,25]. All the known phylogenetic groups were present in Populus except Q, which is present only in Zea mays [21].
Overall, among all eudicots, groups A, D, G, E, and L have the largest numbers of UGTs ( Figure 2). Phylogenetic group E in Populus was the most expanded one among all the compared eudicots except M. domestica, which had 66 genes ( Figure 2). Group N was highly conserved among all eudicots, each having one gene ( Figure 2). The newly identified groups O and P each contained three genes ( Figure 2). Genes 2022, 13, x FOR PEER REVIEW 6 of 20

Motif Alignment, Phylogenetic Analysis, and Comparison
Cloned UGTs from chromosomes 16, 10, and 6 have sequences similar to those obtained from the popgenie database. The final alignment file contained 503 aligned positions with two highly conserved, 494 variable, and 20 singleton sites. The PSPG motif was found to vary in each phylogenetic group (Table S1 and Figure 1B). The histidines at positions 10 and 19 in the PSPG motif were the most conserved.
The constructed phylogenetic tree indicated a classification of 16 major groups (A-P), including two newly discovered groups (O and P), which are absent in Arabidopsis but present in higher plants like maize, cotton, peach, and apple ( Figure 1A) [18,20,21,25]. All the known phylogenetic groups were present in Populus except Q, which is present only in Zea mays [21].
Overall, among all eudicots, groups A, D, G, E, and L have the largest numbers of UGTs ( Figure 2). Phylogenetic group E in Populus was the most expanded one among all the compared eudicots except M. domestica, which had 66 genes ( Figure 2). Group N was highly conserved among all eudicots, each having one gene ( Figure 2). The newly identified groups O and P each contained three genes ( Figure 2).

Gene Duplication and Chromosomal Distribution
With the exception of chromosome 20, all Populus chromosomes contained UGTs

Gene Duplication and Chromosomal Distribution
With the exception of chromosome 20, all Populus chromosomes contained UGTs (Figure 3), although the numbers varied. Chromosomes 16, 06, 01, and 17 possessed the highest numbers of UGTs (44,24,19, and 17, respectively), whereas chromosomes 03, 11, and 13 had only three UGTs. A total of 49 clusters were found, with sizes ranging from 2 to 15 genes per cluster (Figure 3). Chromosomes 16, 06, 09, and 17 had the largest clusters. Maximum clustering was found in groups E and G. Twenty-two UGTs did not cluster.

Gene Duplication and Chromosomal Distribution
With the exception of chromosome 20, all Populus chromosomes contained UGTs (Figure 3), although the numbers varied. Chromosomes 16, 06, 01, and 17 possessed the highest numbers of UGTs (44,24,19, and 17, respectively), whereas chromosomes 03, 11, and 13 had only three UGTs. A total of 49 clusters were found, with sizes ranging from 2 to 15 genes per cluster (Figure 3). Chromosomes 16, 06, 09, and 17 had the largest clusters. Maximum clustering was found in groups E and G. Twenty-two UGTs did not cluster. Sixty-five of the total 191 UTGs exhibited segmental duplication (Table S2). Of these, 44 have single segmentally duplicated loci and 21 have two to four loci. Interestingly, two Sixty-five of the total 191 UTGs exhibited segmental duplication (Table S2). Of these, 44 have single segmentally duplicated loci and 21 have two to four loci. Interestingly, two UGTs from group G, Potri.002G098300 and Potri.002G098300, were duplicated at four different loci. Moreover, three tandem duplications were also observed. Phylogenetic groups G, E, D, and L contain the largest number of segmentally duplicated UGTs (19, 12, 9, and 5, respectively), whereas both O and P have only four. Group I is the only phylogenetic group having three tandemly duplicated UGTs. All phylogenetic groups contain duplicated genes but the numbers vary.
The estimated time for segmental duplication in the 65 segmentally duplicated UGTs ranged from 11 to 189 mya. Approximately 40 UGTs were segmentally duplicated at around 11-65 mya, whereas duplication of the remaining 25 UGTs occurred during the period 65-189 mya (Table S2). We also calculated the time of duplication within each group (Table S2). For some UGTs, there were two different duplicated loci, duplicated at different times on different chromosomes. For example, Potri.002G168600 was segmentally duplicated from Potri.003G138200 and Potri.014G095900 at around 107 and 138 mya. Some UGT genes have recently undergone segmental duplication because they have no Ka/Ks values in the duplication database (Table S2). Thus, Populus UGT evolution involves a series of segmental duplications, during which members of groups E, M, C, D, L, I, M, and F have been duplicated twice, whereas members of groups A and G have been duplicated two, three, or four times. Whole-genome duplication (WGD) and syntenic analysis using MEGA7 and Circoletto confirmed that all 191 UGTs were duplicated from each other (Figures 4 and S1).
Ka/Ks values in the duplication database (Table S2). Thus, Populus UGT evolution involves a series of segmental duplications, during which members of groups E, M, C, D, L, I, M, and F have been duplicated twice, whereas members of groups A and G have been duplicated two, three, or four times. Whole-genome duplication (WGD) and syntenic analysis using MEGA7 and Circoletto confirmed that all 191 UGTs were duplicated from each other (Figures 4 and S1).

Gene Structure Analysis
Of the 191 UGTs, 83 possessed no introns, whereas 92, 12, one, and three possessed one, two, three, and four introns, respectively (Table S3 and Figure S2). A maximum of four introns were found in the following UGT genes: Potri.014G082500 (UGT73), Potri.001G303100 (UGT73), and Potri.007G140600 (UGT74). A total of 133 introns were found across all intron-containing UGTs. Among these, 7 introns were present in phase 2, 35 in phase 1, and 31 in phase 0 (Table S3 and Figure S2). A total of 77 introns were found in phase 1, 27 in phase 0, and 29 in phase 2 ( Figure 5). Collectively, intron sizes ranged from 6 to 6299 bp. Of the 68 introns, nine were longer than 2000 bp Table S3). An average of 1.23 introns/gene was found among all intron-containing UGTs. The intron size ranged from 1 to 8308 bp in length among all the intron-containing UGTs. Intron size was also compared between the phylogenetic groups:   After mapping the introns to the amino acid sequence alignments, at least 10 independent intron insertion events were observed by following [18,25]. These insertion events are serially numbered I-1 to I-10 according to their positions ( Figure 5). Insertion between 150-200 aligned amino acids is highly conserved in phase 1. For group G, 35 out of 36 UGTs have highly conserved introns between these amino acids insertions. Another insertion in intron was predominantly observed only in group L. Interestingly, all the conserved intron insertions were found only in phase 1. Seventeen UGTs out of 53 from group E have introns with deletion of the conserved insertion.
We identified 1632 UGT transcripts in Arabidopsis, P. persica, M. domestica, V. vinifera, L. usitatissimum, G. max, G. raimondii, and Populus, based on a comparison of gene structure and conservation of introns using the search keyword PFAM00201, on the PIECE website. Six hundred and thirty-two transcripts were found to contain no introns. All the remaining 1010 intron-containing UGTs from these eudicots have the same conserved intron and phase 1 ( Figure S3).

Digital Expression Analysis
Microarray data obtained from the popgenie database identified 164 UGTs with varied expression across roots, leaves, flowers, buds, phloem/cambium, twigs, seeds, and suckers ( Figure S4). Most of the UGTs were abundantly expressed in flowers, roots, and leaves. Least expression was found in phloem/cambium tissues. A range of UTG expression levels was also found with respect to the following conditions: drought, mechanical damaged, beetle damage, dormancy, girdled and non-girdled, whole sucker, expanding and expanded, young and freshly expanded ( Figure 6). On the basis of this microarray data, there is a clear up-and down-regulation of these UGTs genes in response to drought conditions ( Figure 6). Genes preferentially expressed in response to drought were observed from the following phylogenetic groups: Potri.001G030600 (UGT91) from group A, Potri.007G030300 (UGT72) and Potri.012G036000 (UGT88) from group E, Potri.001G303300 (UGT73) and Potri.009G099000 (UGT73) from group D, Potri.001G281800 (UGT87) and Potri.009G077500 (UGT87) from group J, Potri.009G039000 (UGT76), Potri.004G119700 (UGT83), and Potri.006G022500 (UGT85) (Figure 6). Genes from groups B, C, M, and F were also expressed but at low levels.

RNA-Seq-based UGT Model under Drought
On the basis of RNA-seq data, we identified 152 UGTs differentially expressed under control and drought stress condition in Populus euphratica leaves (Table S4). Eighty-four UGT genes were found to be either up-or down-regulated under control and drought conditions based on their TPM values ( Figure 7A,B). In total, we identified 42 UGT genes that were down-regulated under drought stress, belonging to the following phylogenetic groups: E, M, B, C, D, L, K, and F. Groups E and L contained the highest numbers of down-regulated UGTs-20 and 13, respectively ( Figure 7A). Groups, B, C, M, D, L, and F share only nine UGT genes in common. Potri.016G014500 (UGT71) from group E was maximally down-regulated under drought conditions, with 23.32 TPM compared to the control with 218.76 TPM. Potri.009G095100 (UGT84) from group L was down-regulated under drought conditions, with a TPM value of 76.42 compared to the control with a TPM value of 137.38. Groups A, P, and O contained no UGT genes that were down-regulated under conditions of drought stress (Figures 7A and 8).

RNA-Seq-based UGT Model under Drought
On the basis of RNA-seq data, we identified 152 UGTs differentially expressed under control and drought stress condition in Populus euphratica leaves (Table S4). Eighty-four UGT genes were found to be either up-or down-regulated under control and drought conditions based on their TPM values ( Figure 7A, B). In total, we identified 42 UGT genes that were down-regulated under drought stress, belonging to the following phylogenetic groups: E, M, B, C, D, L, K, and F. Groups E and L contained the highest numbers of downregulated UGTs-20 and 13, respectively ( Figure 7A). Groups, B, C, M, D, L, and F share only nine UGT genes in common. Potri.016G014500 (UGT71) from group E was maximally The same number of genes (42) was found to be up-regulated under drought conditions, and these belong to the following phylogenetic groups: E, A, M, D, O, L, J, H, and P ( Figure 7B). The highest differential expression under drought conditions was observed in the UTGs from groups L, E, and P. For example, Potri.017G032300 (UGT74), Potri.017G032500 (UGT74), Potri.017G032700 (UGT74), Potri.001G389200 (UGT74), Potri.007G140500 (UGT74), and Potri.007G117200 (UGT74) from group L were highly upregulated under drought stress. The group E UGTs Potri.009G044600 (UGT71) and Potri.006G010000 (UGT71) were also found to be up-regulated under drought stress. Interestingly, one UGT, Potri.016G105400 (UGT70), from the newly formed phylogenetic group   The same number of genes (42) was found to be up-regulated under drought conditions, and these belong to the following phylogenetic groups: E, A, M, D, O, L, J, H, and P ( Figure 7B). The highest differential expression under drought conditions was observed in the UTGs from groups L, E, and P. For example, Potri.017G032300 (UGT74), Potri.017G032500 (UGT74), Potri.017G032700 (UGT74), Potri.001G389200 (UGT74), Potri.007G 140500 (UGT74), and Potri.007G117200 (UGT74) from group L were highly up-regulated under drought stress. The group E UGTs Potri.009G044600 (UGT71) and Potri.006G010000 (UGT71) were also found to be up-regulated under drought stress. Interestingly, one UGT, Potri.016G105400 (UGT70), from the newly formed phylogenetic group P was also found to be up-regulated, with a TPM value of 61.25, under drought stress compared with 36.2 TPM for the control. The UTGs in groups, C, B, and F were not preferentially up-regulated under drought stress (Figures 7B and 8).

Co-Expression Networking of Putative UGTs
Out of the 191 identified UTGs, 71 UGT identifiers were found in the famNet coexpression network database. Only 12 UGT genes were found to be co-expressed with their closest neighboring genes ( Figure S5). The remaining UGTs were independently expressed. These 12 UGTs were found to be co-expressed with cytochrome P450, prenyl-transferases, plant lipid transfer proteins, hemolysin-III, and protein-tyrosine phosphatase-like genes ( Figure S5). For example, Potri.016G014500 (UGT71) from group E, Potri.006G179700 (UGT79) from group A, Potri.016G097400 (UGT73) from group D, and Potri.009G133300 (UGT78) from group F were found to be co-expressed with cytochrome P450 genes as their closest neighboring gene. Potri.016G022500 (UGT85) from group G was found to be co-expressed with both prenyl-transferases and tyrosine phosphatase-like genes, and Potri.003G210400 (UGT92) from group M was found to be co-expressed with a glucosemethanol-choline oxidoreductase gene. Some genes showed duplicated modules within Populus ( Figure S5).
Structural investigation of the PSPG motifs in each phylogenetic group revealed the role of specific amino acid residues that are highly conserved at positions 1 (W), 4 (Q), 10 (H), 19 (H), and 44 (Q) ( Figure 1B). The occurrence of these specific amino acids at these positions in the sequence provides certain evolutionary and functional information that could be helpful for enzyme discovery [12].
On the basis of phylogenetic analysis, all the 191 UGTs were clustered into 16 phylogenetic groups (A-P), including two newly defined groups, O and P ( Figure 1A). Groups O and P groups are observed in all the examined eudicots, including Populus, although they are absent in Arabidopsis, indicating that they have been lost at some stage during the evolution of this plant ( Figure 1A) [12]. Only one member of group N was detected in dicot plants, whereas the same group is larger in monocots [25]. In addition, Populus has a larger number of UGT genes than Arabidopsis (107 members), which is due primarily to an expansion within groups E, D, G, and M. This expansion indicates that in Populus these groups of UGTs are involved in several secondary metabolite glycosylations.
Groups E, D, G, L, and M were the most expanded of the identified phylogenetic groups, indicating that multiple functions are associated with these groups of UGTs, and they have a broad substrate specificity. Groups B, C, and O are not expanded in eudicots, which suggests that they have a conserved substrate specificity [12] (Figure 2). Surprisingly, among all eudicots, group H is the only phylogenetic group with a lower number of UGT genes compared with Arabidopsis, which indicates that these UGTs are no longer required in eudicots and have a limited function associated with them. An expansion, conservation, and reduction of UGT genes in each phylogenetic group of eudicots reflects the physiological challenges that plant has to overcome for survival on land [12].
To understand the evolution of UGTs, a knowledge of intron gain and loss events and the positions and phases of introns relative to the protein sequence is very important [25]. An equal percentage (43.45%) of intron-less UGT genes are found in Populus and P. persica, which is close to the percentage for flax (40%) and peach (42%), but lesser than that for Z. mays (60%) and Arabidopsis (58%) [17,18,21,25]. Conserved intron is the most widespread and oldest intron observed in most members of the different phylogenetic groups in Populus. In Z. mays, P. persica, and flax, intron 5 is also considered to be the oldest intron [18,21,25] ( Figure 5). Most of the intron insertions were observed in phase 1 across all the compared eudicots gene structures, suggesting that the majority of conserved introns are ancient elements and that their phases remain stable [25,63] (Figure S3). Intron size within each phylogenetic group appears to be variable (Table S3), suggesting that intron size was gene-specific during evolution [23].
With the exception of chromosome 20, the 191 Populus UGTs are dispersed throughout all the chromosomes of the Populus genome ( Figure 3). A clustering of 2 to 15 genes per cluster showed high-sequence similarity and were frequently classified into the same phylogenetic group. This tendency reflects the occurrence of recent gene duplication events and close phylogenetic relationships, which is consistent with the findings for Arabidopsis, soybean, and cotton UGTs [20,23,31,64]. Gene duplication plays a critical role in increasing the number of these genes, the generation of new genes, and dispersing them throughout the genome [59]. The expansion of the UGT gene family is primarily due to WGD events ( Figure S1). Within the Populus genome, 65 UGTs have been segmentally duplicated and three UGTs have been tandemly duplicated, suggesting that segmental duplication has been the dominant form of duplication during the evolution of UGTs in Populus (Table S2) [23]. These results are consistent with data showing that the Populus genome has undergone at least one round of WGD [4]. Some UGTs from groups D and L have evolved through a series of duplications, indicating that they have been individually duplicated and later became the part of the genome. Sixty-one percent of the segmentally duplicated UGTs were duplicated around 11-65 mya, which indicates that the genome duplication in Populus is very recent 8 to 13 mya, as reported by [64]. However, the fossil record shows that the Populus and Salix lineages diverged 60 to 65 mya [4]. All the three tandemly duplicated UGT genes were duplicated 19-65 mya, suggesting that both tandem and segmental duplication events occurred at the same time as WGD (Table S2).
Prior to the current study, the role of UGTs in drought tolerance was not clearly defined. The exception being the UGT74E2 and UGT76E11 genes in Arabidopsis, which has been found to be involved in IBA glycosylation and flavonoid accumulation which is associated with the water stress response [39,65]. A high expression of all UGTs across leaves and roots suggests their drought responsiveness because these are the primary tissues affected under water deficit conditions. Eighty-four droughts responsive UGTs from Populus were marked based on microarray and RNA-seq data ( Figure 7A, B). The UGTs in phylogenetic groups E, B, C, D, K, F, and L were down-regulated in Populus, showing differential expression between control and drought treatments ( Figure 7A). Most of the down-regulated UGTs from phylogenetic group E belong to subgroup UGT71 genes, which have been functionally characterized for ABA glycosylation in strawberry and Arabidopsis [43,44,66,67]. Genes in the other two subgroups, UGT72 and UGT88, from phylogenetic group E are also down regulated but their functional characterization regarding hormonal glycosylation has yet to be reported ( Figure 7A). Hence, they could be considered target genes to investigate their substrate specificity for drought responsive phytohormones. The UGT gene Potri.016G014500 (UGT71) from phylogenetic group E was the most down-regulated UGT observed under drought conditions. All the subgroups UGT74, UGT75, and UGT84 were found to be negatively regulated under drought conditions and have already been characterized for glycosylation of IAA and IBA in Arabidopsis [39,45,68,69]. In Arabidopsis, UGT84A1, A2, A3, and A4 have been functionally characterized against UV-B radiation by [70]. In G. hirsutum, Arabidopsis, and tomato, UGT73 from group D has been characterized for glycosylation of ABA, brassinosteroids, and salicylic acid [42,50,[66][67][68]. The UGTs of groups B, C, K, and F still need to be assessed for their putative role in drought response.
Phylogenetic groups E, A, M, D, O, L, I, J, H, P, and G were actively up-regulated in response to drought conditions ( Figure 7B). Again, the most up-regulated UGTs belong to groups E and L, indicating that these UTGs are both up-and down-regulated under drought stress. Regarding the UGTs from group J, UGT87A2 from Arabidopsis has been characterized for ABA glycosylation (P. Li et al., 2017). We propose here the UGTs from phylogenetic groups M, O, I, H, and P are potential candidates for hormonal substrate specificity, and still need to be assessed regarding their response to drought. Overall, we obtained consistent results based on the microarray data and RNA-seq data generated in this study.
Most of the UGTs in Populus were found to be expressed independently under specific circumstances. The maximum co-expression of UGTs with cytochrome P450 from phylogenetic groups E, A, D, and F confirmed that they are members of a triterpenoid biosynthesis pathway involved in several secondary metabolite glycosylations ( Figure S5).

Conclusions
We identified 191 UGT genes in the Populus genome. On the basis of phylogenetic analysis, these genes were found to cluster into 16 distinct evolutionary groups (A-P). The PSPG motif amino acid sequence was found to vary within each phylogenetic group, with the exception of amino acid positions 1, 4, 10, 19, and 44. An intron between amino acids 150 and 200 is the oldest intron found across eudicots, including Populus. Segmental duplication in the Populus genome contributed dominantly during the evolution of the UGT family. Most of the identified UGT genes are expressed in roots, leaves, flowers, and seeds. The results of digital expression analysis were confirmed by RNA-seq data. Eighty-four UGT genes were found to be up-and down-regulated in response to drought, and were shown to be putatively involved in ABA, IBA, and IAA glycosylation. Phytohormonal glycosylation is the most important phenomena associated with UGTs in their response to drought stress. Molecular evolution and transcriptome analyses can be useful for understanding the structure-function relatedness of the UGT family members and might further facilitate their functional analysis.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/genes13091640/s1, Figure S1: Gene duplication tree and evolutionary relationships of 191 UGTs; Figure S2: Gene structure of all 191 UGTs in Populus against their phylogenetic groups; Figure S3: Gene structure comparison among eight eudicots; Figure S4: Heat map of 168 UGTs expressed in various tissues of Populus; Figure S5: Co-expression networking of UGT genes in Populus; Table S1: Identified UGTs in Populus and their genomic information alongside PSPG motif sequences; Table S2: Gene duplication estimation of UGTs in Populus using Ka/Ks calculations; Table S3: Gene structure analysis in Populus UGTs; Table S4 Expression

Conflicts of Interest:
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.