Genomic-Wide Identification and Characterization of the Uridine Diphosphate Glycosyltransferase Family in Eucommia ulmoides Oliver

Eucommia ulmoides Oliver is a woody plant with great economic and medicinal value. Its dried bark has a long history of use as a traditional medicinal material in East Asia, which led to many glycosides, such as aucubin, geniposide, hyperoside, astragalin, and pinoresinol diglucoside, being recognized as pharmacologically active ingredients. Uridine diphosphate glycosyltransferases (UGTs) catalyze a glycosyl-transferring reaction from the donor molecule uridine-5′-diphosphate-glucose (UDPG) to the substrate, which plays an important role in many biological processes, such as plant growth and development, secondary metabolism, and environmental adaptation. In order to explore the biosynthetic pathways of glycosides in E. ulmoides, 91 putative EuUGT genes were identified throughout the complete genome of E. ulmoides through function annotation and an UDPGT domain search. Phylogenetic analysis categorized them into 14 groups. We also performed GO annotations on all the EuUGTs to gain insights into their functions in E. ulmoides. In addition, transcriptomic analysis indicated that most EuUGTs showed different expression patterns across diverse organs and various growing seasons. By protein–protein interaction predication, a biosynthetic routine of flavonoids and their glycosides was also proposed. Undoubtedly, these results will help in future research into the biosynthetic pathways of glycoside compounds in E. ulmoides.


Introduction
Glycosyltransferases (GT, EC 2.4.x.y) catalyze to transfer a glycosyl moiety from a donor molecule to an acceptor substrate to form a glycosidic bond, which is employed by cells for the biosynthesis of glycolipids, glycoproteins, hormones, and various glycosides. GTs are ubiquitous in all kinds of organisms, from bacteria to animals. To date, 114 families of glycosyltransferases containing more than 850,000 members are recognized and classified in the CAZy database (http://www.cazy.org, accessed on 5 August 2021) [1,2], among which, the largest family (GT1) transfers sugars onto numerous small molecules.
In plants, GTs typically utilize uridine diphosphate (UDP)-activated sugars as donor molecules, by which glycosylation reactions not only synthesize essential substances for growth, but also glycosylate various secondary metabolites, such as monolignols, anthocyanins, and terpenoids [3,4]. Plant UDP-glycosyltransferases (UGTs) have a conserved motif of 44 residues called PSPG box (putative secondary plant glycosyltransferase box) in the C-terminal, which is featured as a UDP-sugar binding domain [5]. In light of their important roles in many cellular processes, genomic-wide identification and phylogenetic analyses of UGTs were carried out for many plant species. For example, 123, 147, 179, 180 and 137 UGTs have been identified in Arabidopsis thaliana [6], Zea mays [7], Triticum In (A), each color arc corresponds to a phylogenetic group and each gray arc to a class of plant UGTs according to the UGT nomenclature system [5].
The distribution of UGTs in each group were compared among the selected plant species [12] ( Figure 1B). Most of the EuUGTs were categorized into the groups A, D, E, and L. The number of EuUGTs in these groups are believed to expand rapidly during the evolution of higher plants [9,12]. In general, the UGTs of group E and L have a wide range of Figure 1. Phylogenetic tree of UGTs in E. ulmoides, L. usitatissimum and A. thaliana constructed by neighbor-joining method (A) and the distribution of members in the phylogenetic groups among selected plants species (B). In (A), each color arc corresponds to a phylogenetic group and each gray arc to a class of plant UGTs according to the UGT nomenclature system [5].
The distribution of UGTs in each group were compared among the selected plant species [12] ( Figure 1B). Most of the EuUGTs were categorized into the groups A, D, E, and L. The number of EuUGTs in these groups are believed to expand rapidly during the evolution of higher plants [9,12]. In general, the UGTs of group E and L have a wide Plants 2021, 10,1934 5 of 21 range of substrates. Group E has been identified in most plants [9], indicating the existence of Group E is conservative during plant evolution. Group E mainly contains flavonoids glycosyltransferases, abscisic acid glycosyltransferases, and monolignol glycosyltransferases. Group L appears only in vascular plants [9]. Substrates of members of Group L contain auxin, anthranilate, anthocyanin, phenylpropanoids, xenbiotics, triterpenes, and coumarins, etc [11]. The enzymes of group A are more specific to the substrates, such as flavonoids, anthocyanidins, and triterpenes. Members of groups B, C, and D are mainly flavonol-7-O-glycosyltransferases, benzoic acid glycosyltransferases, and brassinosteroids glycosyltransferases. Most of group F are flavonoid glycosyltransferases [11]. Three EuUGTs were clustered into group O, which are generally associated with the Oglycosylation of the hydroxylated isoprenoid side chain of cytokinin [12]. Group O is absent in A. thaliana and L. usitatissimum, and may have been lost during the evolution of these two species [12]. The members of group R are found in few plants, but E. ulmoides has a single UGT occurring in group R, which has been reported to catalyze the C-glycosylation and O-glycosylation of flavonoids in maize [35] and gallic acid in bamboo [36], respectively.

Analysis of Conserved Motifs and Prediction of Cis-Acting Elements
All of the EuUGTs' amino acid sequences were submitted to the MEME website for analysis of the conserved motifs. Like the previous analysis [6,37], nine conserved motifs were identified within almost all of the EuUGTs (Figure 2A). However, some EuUGTs may lack one or more motifs. For example, motifs 2 and 3 are missing from three EuUGTs (GWH-PAAAL015516, GWHPAAAL024578, and GWHPAAAL024577) in group O (Figure 2A). Motif 5 is considered to separate the UGT sequence into the N-terminal and C-terminal regions. The C-terminal is better conserved than the other terminals. Motif 7-also called the PSPG box-is the UGT-defining consensus sequence and exists across all the putative EuUGTs ( Figure 2B). It is almost identical to the PSPG motif within the UGTs of A. thaliana and L. usitatissimum. The consensus sequence is highly conserved, especially at sites such as 1W, 4Q, 8L, 10H, 16F, 19H, 21GWNS24, 27E, 39P, 43D/E and 44Q. Further analysis showed some differences within the PSPG box in each group of EuUGTs ( Figure 3A). In addition to a few positions of motif 7 conserved in most EuUGTs, there are also positions conserved within EuUGT proteins from the same group, such as: 14G in group A; 14G and 28G in group D; 26L, 38W, 42A in group E; and 36V in group L.
Except for the PSPG box, the other eight motifs of the EuUGTs were found to be somewhat conservative ( Figure 3B). However, the catalytic functions of these motifs remained unclear until now. By comparing the 3-D structures of several GT-B fold enzymes, it was found that these enzymes showed high structural similarity despite their lower sequence identity [38]. Therefore, it could be speculated that these motifs may play a role in maintaining the stability of the GT-B fold.
The cis-acting elements, including the promoter, enhancer, silencer, and various response elements, are important in the regulation of gene transcription. Therefore, the 2000 bp 5 -upstream sequence of each EuUGT gene coding sequence was retrieved and used to predicate the putative cis-acting elements on the PlantCARE webserver (http: //bioinformatics.psb.ugent.be/webtools/plantcare/html/, accessed on 8 December 2020). Overall, 26 various cis-acting elements and their locations were predicted ( Figure 4). Almost all the EuUGT genes have one or more cis-elements, except for GWHPAAAL015755 and GWHPAAAL018171. It is worth noting that some EuUGT genes have multiple tandem elements, such as light-responsive elements (GWHPAAAL014996 and GWHPAAAL018716), low-temperature-responsive elements (GWHPAAAL011200), plant-hormone-responsive elements (GWHPAAAL014206), and wound-responsive elements (GWHPAAAL019177), etc. Therefore, the expression of the EuUGT genes is supposed to be extensively regulated.

GO Annotation of EuUGTs
Through GO ontology annotation, the putative EuUGTs were categorized into 45 molecular functions and 93 biological processes ( Figure 5A,B, Table S3). Most of the EuUGTs were annotated as flavonoids glucosyltransferase (quercetin 3-O-glucosyltransferase and quercetin 7-O-glucosyltransferase) and hormonal glycosyltransferase (indole-3-acetate beta-glucosyltransferase, indole-3-butyrate beta-glucosyltransferase, and abscisic acid glucosyltransferase). In addition, GWHPAAAL025999 was annotated as coniferyl alcohol glycosyltransferase, which may be related to the biosynthesis of lignans, such as PDG, a unique and valuable secondary metabolite in E. ulmoides. In view of these bioprocesses, most of the EuUGTs were enriched to be involved in the responses to the various stresses, suggesting that the glycoside molecules could participate in the defense of abiotic and biotic stresses.

GO Annotation of EuUGTs
Through GO ontology annotation, the putative EuUGTs were categorized into 45 molecular functions and 93 biological processes ( Figure 5A,B, Table S3). Most of the EuUGTs were annotated as flavonoids glucosyltransferase (quercetin 3-O-glucosyltransferase and quercetin 7-O-glucosyltransferase) and hormonal glycosyltransferase (indole-3-acetate beta-glucosyltransferase, indole-3-butyrate beta-glucosyltransferase, and abscisic acid glucosyltransferase). In addition, GWHPAAAL025999 was annotated as coniferyl alcohol glycosyltransferase, which may be related to the biosynthesis of lignans, such as PDG, a unique and valuable secondary metabolite in E. ulmoides. In view of these bioprocesses, most of the EuUGTs were enriched to be involved in the responses to the various stresses, suggesting that the glycoside molecules could participate in the defense of abiotic and biotic stresses.

Transcriptome Analysis for Organ-Specific and Time-Specific Gene Expression
The third-party transcriptome data [33] were used to analyze the expression patterns of the EuUGT genes in various organs and in different seasons. The relative transcriptional levels of the EuUGT genes are summarized in Table S4, with TPM values ranging from 0 to 277,336, indicating that most of the EuUGT genes were differentially expressed. Figure  6A shows the expression patterns of EuUGT genes in various organs. Globally, many EuUGT genes exhibited an organ-specific expression pattern, which was highly expressed in one or two organ(s), but lower in others. Based on the expression levels of all of the EuUGT genes, the five organs could be roughly clustered into three groups: root, bark/stem, and flower/leaf. The root showed a greater difference from the other four organs in terms of EuUGT gene transcription. Some EuUGT genes could be recognized as organ-specific based on their transcriptional level. For example, GWHPAAAL009436 and GWHPAAAL020336 tended to be expressed at higher levels in the root than in other organs; GWHPAAAL023788 was only expressed in the bark and stem. In contrast, GWHPAAAL007266 was highly expressed in the flower, but had extremely low expression in the other four organs. It was observed that all the eight EuUGT genes from group D were expressed primarily in the root and flower. Meanwhile, 6 out of the 16 EuUGT genes belonging to group A had higher expression in the flower than in the other organs. In addition, there were more highly expressed genes in the leaf than in the other four organs, which may be consistent with the fact that plant leaves contain many polysaccharides [39].

Transcriptome Analysis for Organ-Specific and Time-Specific Gene Expression
The third-party transcriptome data [33] were used to analyze the expression patterns of the EuUGT genes in various organs and in different seasons. The relative transcriptional levels of the EuUGT genes are summarized in Table S4, with TPM values ranging from 0 to 277,336, indicating that most of the EuUGT genes were differentially expressed. Figure 6A shows the expression patterns of EuUGT genes in various organs. Globally, many EuUGT genes exhibited an organ-specific expression pattern, which was highly expressed in one or two organ(s), but lower in others. Based on the expression levels of all of the EuUGT genes, the five organs could be roughly clustered into three groups: Root, bark/stem, and flower/leaf. The root showed a greater difference from the other four organs in terms of EuUGT gene transcription. Some EuUGT genes could be recognized as organ-specific based on their transcriptional level. For example, GWHPAAAL009436 and GWHPAAAL020336 tended to be expressed at higher levels in the root than in other organs; GWHPAAAL023788 was only expressed in the bark and stem. In contrast, GWHPAAAL007266 was highly expressed in the flower, but had extremely low expression in the other four organs. It was observed that all the eight EuUGT genes from group D were expressed primarily in the root and flower. Meanwhile, 6 out of the 16 EuUGT genes belonging to group A had higher expression in the flower than in the other organs. In addition, there were more highly expressed genes in the leaf than in the other four organs, which may be consistent with the fact that plant leaves contain many polysaccharides [39].
Furthermore, several EuUGTs, such as GWHPAAAL015368, GWHPAAAL017961, GWHPAAAL000493, GWHPAAAL018716, GWHPAAAL022859, GWHPAAAL025754, GWHPAAAL025999, and GWHPAAAL07799, had higher expression levels in all five organs, with a TPM value greater than ten thousand, suggesting that these genes might be widely involved in the entire growth and development process. The contents of many secondary metabolites exhibited a fluctuation over the seasons in E. ulmoides. For example, the content of some active ingredients, including PDG, aucubin, and chlorogenic acid, was reported to be higher from May to July than during the other seasons [40,41]. Therefore, we analyzed UGT gene expression in the bark and leaves during various seasons (May, July, and September). A large number of the EuUGT genes did not show obvious fluctuations in transcription over the seasons in bark or leaves ( Figure 6B). Nevertheless, a small fraction of the genes exhibited a season-dependent expression pattern. For example, the expression of genes such as GWHPAAAL000847, GWHPAAAL000531, GWHPAAAL020336, GWHPAAAL025545, and GWHPAAAL018171 in the leaf increased significantly in September compared to May. In contrast, some genes' expression levels in the leaf were much higher in May compared to September. A similar situation occurred in the bark. Genes including GWHPAAAL025080, GWHPAAAL025081, GWHPAAAL025999, GWHPAAAL021381, GWHPAAAL007800, Furthermore, several EuUGTs, such as GWHPAAAL015368, GWHPAAAL017961, GWHPAAAL000493, GWHPAAAL018716, GWHPAAAL022859, GWHPAAAL025754, GW-HPAAAL025999, and GWHPAAAL07799, had higher expression levels in all five organs, with a TPM value greater than ten thousand, suggesting that these genes might be widely involved in the entire growth and development process.
The contents of many secondary metabolites exhibited a fluctuation over the seasons in E. ulmoides. For example, the content of some active ingredients, including PDG, aucubin, and chlorogenic acid, was reported to be higher from May to July than during the other seasons [40,41]. Therefore, we analyzed UGT gene expression in the bark and leaves during various seasons (May, July, and September). A large number of the EuUGT genes did not show obvious fluctuations in transcription over the seasons in bark or leaves ( Figure 6B). Nevertheless, a small fraction of the genes exhibited a season-dependent expression pattern. For example, the expression of genes such as GWHPAAAL000847, GWHPAAAL000531, GWHPAAAL020336, GWHPAAAL025545, and GWHPAAAL018171 in the leaf increased significantly in September compared to May. In contrast, some genes' expression levels in the leaf were much higher in May compared to September. A similar situation occurred in the bark. Genes including GWHPAAAL025080, GWHPAAAL025081, GWHPAAAL025999, GWHPAAAL021381, GWHPAAAL007800, and GWHPAAAL015516 had significantly increased expression in the bark during September compared to May.

RT-qPCR Analysis of the Selected EuUGT Genes
Lignans, generated by the polymerization of three major monolignols (ρ-coumaryl, coniferyl, and sinapyl alcohols), constitute a major class of secondary metabolites [42,43]. In E. ulmoides, up to 46 lignans were identified in various organs or tissues. Most of these lignans belong to glycosidic compounds [16]. In order to discover the key UGT genes possibly involved in the glycosylation of lignans in E. ulmoides, RT-qPCR was performed to detect the expression level of six selected EuUGTs belonging to groups E and C in root, bark, and leaves, which were sampled from three 18-year-old trees in the harvesting season (in June). Figure 7 shows the relative expression levels of the six genes in the three samples. Among them, GWHPAAAL002229, GWHPAAAL015368, and GWHPAAAL025081 were expressed more efficiently in root than that in the bark and leaves. The expression levels in the root were 5.5, 6.4 and 6.9 times as high as those in bark, and 7.0, 15.3 and 9.1 times as high as those in the leaves, respectively. The transcriptional level of GWH-PAAAL005635 was increased by 5.9-and 11.3-fold in bark compared to the root and leaf, respectively. GWHPAAAL001760 had a higher expression level in the bark and root than in the leaves. The expression of GWHPAAAL025999 was noticeably up-regulated in leaf, but its transcripts were observed at a high level in all three organs. and GWHPAAAL015516 had significantly increased expression in the bark during September compared to May.

RT-qPCR Analysis of the Selected EuUGT Genes
Lignans, generated by the polymerization of three major monolignols (ρ-coumaryl, coniferyl, and sinapyl alcohols), constitute a major class of secondary metabolites [42,43]. In E. ulmoides, up to 46 lignans were identified in various organs or tissues. Most of these lignans belong to glycosidic compounds [16]. In order to discover the key UGT genes possibly involved in the glycosylation of lignans in E. ulmoides, RT-qPCR was performed to detect the expression level of six selected EuUGTs belonging to groups E and C in root, bark, and leaves, which were sampled from three 18-year-old trees in the harvesting season (in June). Figure 7 shows the relative expression levels of the six genes in the three samples.
Among them, GWHPAAAL002229, GWHPAAAL015368, and GWHPAAAL025081 were expressed more efficiently in root than that in the bark and leaves. The expression levels in the root were 5.5, 6.4 and 6.9 times as high as those in bark, and 7.0, 15.3 and 9.1 times as high as those in the leaves, respectively. The transcriptional level of GWHPAAAL005635 was increased by 5.9-and 11.3-fold in bark compared to the root and leaf, respectively. GWHPAAAL001760 had a higher expression level in the bark and root than in the leaves. The expression of GWHPAAAL025999 was noticeably upregulated in leaf, but its transcripts were observed at a high level in all three organs.

Construction of Protein-Protein Interaction (PPI) Network of EuUGTs
All 91 EuUGTs were used to construct the PPI network on the String web server (https://string-db.org/, accessed on 9 September 2021) [44]. As a result, a PPI network was obtained (Figure 8), in which nine EuUGTs constituting three nodes (in blue circle) were predicted to be interacted with ten protein partners. All of the proteins included in this network are summarized in Table S5. The UGT79 (GWHPAAAL009324) was predicted to be

Discussion
E. ulmoides is one of the important medical plants in East Asia. More than 46 lignans and their glycosides were identified in different tissues or organs of E. ulmoides [16]. However, no enzyme was found to be involved in their biosynthesis. Therefore, we performed a comprehensive analysis on the UGTs in silico. By searching the draft genome of E. ulmoides, 91 UGT genes were identified, which is less than the number in other plants, which in general have more than 100 UGTs. For example, A. thaliana contains 123 UGTs [6]; the tree Quercus robur has 244 UGTs [46]. In addition, three EuUGTs (GWHPAAAL000153, GWHPAAAL020411, and GWHPAAAL023819) are clearly missing an N-terminal. This may be due to the incomplete genome of E. ulmoides. The E. ulmoides draft genome was assembled previously, with a size of 1.18 Gb which consisted of 29,348 scaffolds with an N50 of less than 1.03 Mb [33]. More than 10 pairs of EuUGT sequences exhibited high similarity (Table S1). The same phenomenon was found in the other plants and is thought In view of the catalytic function of the proteins in the network, three possible biological processes were assigned in the biosynthesis of flavonoid, anthocyanin-containing compound, and pigment. For example, a large fraction of the protein partners, such as DFR, FLS1, TT5, TT4, TT7, FLS3, is involved in the biosynthesis of flavonoid; UGT75 and LDOX are included in the biosynthesis of anthocyanin-containing compound, which is downstream of the flavonoid biosynthesis [45].

Discussion
E. ulmoides is one of the important medical plants in East Asia. More than 46 lignans and their glycosides were identified in different tissues or organs of E. ulmoides [16]. However, no enzyme was found to be involved in their biosynthesis. Therefore, we performed a comprehensive analysis on the UGTs in silico. By searching the draft genome of E. ulmoides, 91 UGT genes were identified, which is less than the number in other plants, which in general have more than 100 UGTs. For example, A. thaliana contains 123 UGTs [6]; the tree Quercus robur has 244 UGTs [46]. In addition, three EuUGTs (GWHPAAAL000153, GWHPAAAL020411, and GWHPAAAL023819) are clearly missing an N-terminal. This may be due to the incomplete genome of E. ulmoides. The E. ulmoides draft genome was assembled previously, with a size of 1.18 Gb which consisted of 29,348 scaffolds with an N50 of less than 1.03 Mb [33]. More than 10 pairs of EuUGT sequences exhibited high similarity (Table S1). The same phenomenon was found in the other plants and is thought to be a result of tandem duplication during long-term evolution [2,6,47]. It may also reflect that the plant UGTs expand in order to produce more secondary metabolites [11,12]. As in other plant species, EuUGT genes are mostly intronless, except for a small fraction of family members containing one to three introns, which may be derived from gain or loss events during their evolution [47].
The EuUGTs were categorized into 14 previously identified groups, such as A to E, G to M, and O and R. Although the amnio acid sequence similarity is low among the UGT groups, the protein structure of the EuUGTs was shown to be highly conserved. Nine motifs could be distinguished among the EuUGTs (Figure 2A). Among these, motif 7 (PSPG box) was previously discussed extensively in an analysis of plant UGTs due to the highly conserved amino acid residues [5,6]. All 91 of the EuUGTs also have a PSPG box with nearly identical conservative amino acid residues ( Figure 2B). The function of the PSPG box is to serve as a binding domain of UDP-sugar [5]. Masada et al. (2007) replaced the PSPG box of CaUGT2 with the NtGT1b's, leading to the loss of catalytic activity in the CaUGT2 variant, indicating its important role in catalysis [48]. Further, site-directed mutagenesis of a non-conservative Cys377 in the PSPG box of CaUGT2 to Arg led to the variant losing catalytic activity [48], suggesting that the non-conservative residues in the PSPG box in CaUGT2 may also participate in substrate recognition. As Figure 2A shows, the other eight motifs were found to be conservative to a certain extent, but almost no studies have focused on them. Through C-terminal domain swapping, it was found that the affinity of the chimeric enzyme of AtUGT71C1 and AtUGT71C2 to the substrate etoposide changed significantly, suggesting that the substrate specificity of UGT may have resided in the N-terminal as well as the C-terminal domains [49]. Taken together, both the C-terminal and N-terminal domains of plant UGTs may be involved in substrate recognition, which is an important possibility to explore further in the future.
The cis-elements within the 2000 bp upstream sequence of each EuUGT gene were identified. It was shown that up to 26 cis-elements were recognized, suggesting that EuUGTs' transcription might be extensively regulated. For example, after being treated with methyl jasmonate, several UGTs in the adventitious roots of P. ginseng were upregulated [13]; in Brassica and Arabidopsis, changes in the transcriptional level of many UGTs happened in response to various stresses [50]. Our transcriptome analysis also confirmed that the transcription of most EuUGT genes was organ-specific ( Figure 6). In addition, it is worth noting that many of the cis-elements were enriched to be related to stress responses, such as low temperatures, wounds, and the stresses caused by bacteria and fungi, etc. GO annotation also categorized these EuUGTs into the biological processes of stress responses ( Figure 5B). Taken together, the tissue-specific expression of plant UGTs suggests that specific glycosylation may largely take place in a given tissue or organ. For example, several UGT genes were reported to be highly expressed in germinating seeds of Cicer arietinum and were localized in the regions of rapidly dividing cells; thus the authors suggested that these tissue-specific expressed UGTs may be involved in cell cycle regulation [51]. Furthermore, the transcriptomic data may help to identify the UGTs potentially involved in the biosynthetic pathways of a given glycoside [13,14,52].
Since many bioactive compounds isolated from E. ulmoides belong to lignan and their glycosylated derivatives, such as aucubin, PDG, etc., it is interesting to identify the genes involved in their biosynthesis [25,26,53]. For example, PDG is a kind of natural glycosylated lignan existing in several plant species, such as E. ulmoides [16], L. usitatissimum [54], Actinidia arguta [55], and Valeriana officinalis [56]. Previous studies have proposed that the upstream biosynthesis pathway of PDG starts from phenylalanine, then synthesizes cinnamic acid, coumaric acid, coffee acid, ferulic acid, and finally coniferyl alcohol, through the phenylpropanoid pathway [57]. This pathway has been reconstituted in E. coli, which could utilize glucose as a substrate to produce coniferyl alcohol [58]. Coniferyl alcohol has been proposed as a precursor molecule to the synthesis of PDG through dimerization to form pinoresinol, which is then glycosylated to produce PDG [59,60]. The pathway of PDG biosynthesis from coniferyl alcohol in plants is not defined and no enzyme has been identified to catalyze the glycosylation of pinoresinol to produce PDG. However, a coniferyl alcohol glucosyltransferase was purified and characterized biochemically from the cambial sap of spruce (Picea abie L.) [61]. Two genes, UGT71A18 (found in Forsythia koreana [62]) and UGT71C1 (found in A. thaliana [63]), were shown to catalyze the glycosylation of pinoresinol to produce pinoresinol monoglucoside. In addition, some other UGTs were also found to be related to the lignin biosynthesis [52,[64][65][66]. For this purpose, we screened out six EuUGTs for an expression analysis by combining the sequence characteristics and their expression patterns in various organs. GWHPAAAL001760 and GWHPAAAL015368 were clustered into group E with a sequence similarity close to UGT71, and two proteins belonging to this clade in F. koreana and A. thaliana were reported to catalyze pinoresinol to form pinoresinol monoglucoside [62,63]. In addition, GWHPAAAL025999 was annotated to be coniferyl alcohol glucosyltransferase. The other two EuUGTs (GWHPAAAL002229 and GWHPAAAL025081) were also selected from group E. As shown in our transcriptomic data, another EuUGT (GWHPAAAL005635) showed higher transcription levels in the root and bark, which were previously reported to contain higher PDG content [67,68]. Thus, an RT-qPCR was performed to verify the transcription levels of the selected UGT genes in root, bark, and leaves. It was shown that five selected EuUGT genes, with the exception of GWHPAAAL025999, were transcribed much more efficiently in root or bark than in the leaf, which is consistent with the higher content of PDG and other glycosidic metabolites in the bark and root [67,68]. However, a deviation appears between the qPCR data and transcriptome profiling for some EuUGT genes. For example, the highest expression of the GWHPAAAL005636 gene was observed in bark when using a qPCR analysis but appeared to be in the root when using a transcriptome analysis. These differences may arise from the different sampling locations and trees used in each analysis. Taken together, these results suggest that these EuUGTs could be candidates for future research on the biosynthesis of PDG and other glucosides. The catalytic functions of the selected EuUGTs should be verified experimentally. For this purpose, our laboratory is studying the cloning and expression of the cDNAs and the encoding these EuUGTs.
In addition to the lignan compounds, various organs or tissues of E. ulmoides also contain plenty of flavonoids and their glycoside derives, such as quercetin, hyperin, and rutin [16]. Thus, we construct a PPI network with 91 EuUGTs to identify the UGTs that may be involved in biosynthesis of flavonoids and their glycosides. Through the PPI network, a biosynthetic pathway of flavonoids in E. ulmoides could be schemed based on the previous proposal [45,69]. As shown in Figure 9, almost all the key enzymes can be deduced in E. ulmoides (Table S5) in this pathway. In the PPI network, UGTs constitute three nodes. UGT79 (GWHPAAAL009324) from group A was annotated to encode anthocyanidin 5-Oglucosyltransferase, which may participate in the biosynthesis of anthocyanidin glycosides. The other two nodes consist of UGT90 (belonging to group C, containing four EuUGT members: GWHPAAAL005635, GWHPAAAL005636, GWHPAAAL005637, GWHPAAAL005638, GWHPAAAL009069) and UGT91 (belonging to group A, containing three members: GWH-PAAAL007941, GWHPAAAL015754, and GWHPAAAL015755) in E. ulmoides. All of the above EuUGTs may be involved in the conversion of quercetin or kaempferol into various quercetin or kaempferol glycosides, respectively, such as isoquercitrin, rutin and so on. However, the exact EuUGT cannot be assigned into each glycosylation reaction such as that present in Figure 9, which must be confirmed experimentally. Plants 2021, 10, x FOR PEER REVIEW 16 of 22

Data Resources Used
The draft genome and annotation of E. ulmoides [33] were downloaded from the National Genomics Data Center under BioProject accession PRJCA000677 (https:// bigd.big.ac.cn/bioproject/browse/PRJCA000677, accessed on 17 March 2020). The transcriptome data of E. ulmoides were downloaded from NCBI under BioProject accession PRJNA357336. One hundred and seven UGT sequences of A. thaliana (two UGT80s not included) were obtained from Uniprot (https://www.uniprot.org/, accessed on 17 March 2020) and 137 UGT sequences of Linum usitatissimum L. were obtained from NCBI (JN088282 to JN088418). In addition, the UGT708C1 (XP_007216617) of Prunus persica [70] and the UGTPg36 (AKA44596.1) of P. ginseng [71] were used to construct the phylogenetic tree.

Genome-Wide Identification of Putative UGTs in E. ulmoides
The hidden Markov model UDPGT (PF00201.18) obtained from the Pfam database (http://pfam.xfam.org/, accessed on 5 May 2020) was used to search for the putative UGTs against the E. ulmoides draft genome by employing the HMMER-3.3 software with a cutoff E-value 1 × 10 −5 [72]. Key word searches were also performed through genome annotation using the words 'UGT', 'glycosyltransferase', and 'glucosyltransferase'. All sequences screened by the two above methods were merged and de-duplicated. ClustalW was used to perform multiple sequence alignment and the MEME web server (http://meme-suite.org/tools/meme, accessed on 6 May 2020) [73] was used to find conserved motifs within the candidate EuUGTs. We re-defined the complete sequence of EuUGTs, since parts of genes appear to lack the N-terminal or the C-terminal when compared to the UGTs of A. thaliana and L. usitatissimum. The upstream and downstream sequences of putative UGT genes were extracted by Python 3.7 in order to re-annotate them using FGENESH and Softberry Online Services (http://www.softberry.com/, accessed on 1 September 2020) [74]. The transcriptome was assembled with and without the reference genome, using Trinity software [75] in order to help to determine the correct CDS of putative genes. After removing sequences with conservative motifs of less than 3 and without a complete PSPG box, the putative EuUGTs were obtained and further verified by

Data Resources Used
The draft genome and annotation of E. ulmoides [33] were downloaded from the National Genomics Data Center under BioProject accession PRJCA000677 (https://bigd.big.ac. cn/bioproject/browse/PRJCA000677, accessed on 17 March 2020). The transcriptome data of E. ulmoides were downloaded from NCBI under BioProject accession PRJNA357336. One hundred and seven UGT sequences of A. thaliana (two UGT80s not included) were obtained from Uniprot (https://www.uniprot.org/, accessed on 17 March 2020) and 137 UGT sequences of Linum usitatissimum L. were obtained from NCBI (JN088282 to JN088418). In addition, the UGT708C1 (XP_007216617) of Prunus persica [70] and the UGTPg36 (AKA44596.1) of P. ginseng [71] were used to construct the phylogenetic tree.

Genome-Wide Identification of Putative UGTs in E. ulmoides
The hidden Markov model UDPGT (PF00201.18) obtained from the Pfam database (http://pfam.xfam.org/, accessed on 5 May 2020) was used to search for the putative UGTs against the E. ulmoides draft genome by employing the HMMER-3.3 software with a cut-off E-value 1 × 10 −5 [72]. Key word searches were also performed through genome annotation using the words 'UGT', 'glycosyltransferase', and 'glucosyltransferase'. All sequences screened by the two above methods were merged and de-duplicated. ClustalW was used to perform multiple sequence alignment and the MEME web server (http://meme-suite.org/tools/meme, accessed on 6 May 2020) [73] was used to find conserved motifs within the candidate EuUGTs. We re-defined the complete sequence of EuUGTs, since parts of genes appear to lack the N-terminal or the C-terminal when compared to the UGTs of A. thaliana and L. usitatissimum. The upstream and downstream sequences of putative UGT genes were extracted by Python 3.7 in order to reannotate them using FGENESH and Softberry Online Services (http://www.softberry. com/, accessed on 1 September 2020) [74]. The transcriptome was assembled with and without the reference genome, using Trinity software [75] in order to help to determine the correct CDS of putative genes. After removing sequences with conservative motifs of less than 3 and without a complete PSPG box, the putative EuUGTs were obtained and further verified by matching the Pfam families on the HMMER website (https://www.ebi.ac.uk/Tools/hmmer/search/hmmscan, accessed on 13 September 2020).

Prediction of the Cellular Location, Physical and Chemical Properties of EuUGTs
The subcellular localization of putative EuUGTs was analyzed by the CELLO v2.5 online server (http://cello.life.nctu.edu.tw/, accessed on 18 September 2020) [76,77]. The Protparam tool on the ExPASy web server (https://web.expasy.org/protparam/, accessed on 18 September 2020) [78] was used to the calculate sequence length, molecular weight, theoretical pI, instability index, and the grand average of hydropathicity of the EuUGTs. The amino acid sequences of the selected UGTs, including 107 UGTs of A. thaliana, 137 UGTs of L. usitatissimum, and 91 putative UGTs of E. ulmoides, along with UGT708C1 as a member of phylogenetic group R [70] and UGTPg36 as a member of phylogenetic group O [71], were aligned by ClustalW in MEGAX software [79]. Then, the output file was used to construct a neighbor-joining tree with the p-distance method and pairwise deletion.

Analysis of Conserved Motifs and Prediction of Cis-Acting Elements
MEME web server [73] was used to search conserved motifs within all the EuUGTs. The 44 residues of the PSPG box at the C-terminus were selected to draw a motif logo using the online Weblogo server (https://weblogo.berkeley.edu/logo.cgi, accessed on 18 September 2020) [80]. PlantCARE (https://bioinformatics.psb.ugent.be/webtools/ plantcare/html, accessed on 20 September 2020) was applied to predict the cis-acting elements within the 2000 bp upstream of each EuUGT gene [81].

GO Annotation of EuUGTs
The protein sequences of putative EuUGTs were used to blast against 109 UGTs of A. thaliana using Omicsbox software followed with Blast2Go [82] to obtain the molecular function and biological process of each EuUGT.

Transcriptome Analysis for Organ-Specific and Season-Specific Gene Expression
The RNA-seq datasets generated from different organs and seasons were downloaded from the NCBI SRA database (https://www.ncbi.nlm.nih.gov/sra/, accessed on 12 July 2020) [33]. The information on plant samples and the RNA-seq are summarized in Table S6. The expression level of each EuUGT gene were calculated in TPM value by using the Hisat2 [83], Samtools, and Stringtie [84] software, during which 91 EuUGT gene sequences were input as references. The heatmaps of EuUGT gene expression patterns were constructed by TBtools [85].

Quantitative Real-Time PCR Analysis of the Selected EuUGT Genes
In June 2020, three plant samples of E. ulmoides, including bark, leaves, and root, were collected from three 18-year-old trees at a local E. ulmoides forest farm (103 • E, 30 • N) in Chengdu, China. A piece of bark approximately 5 × 5 cm 2 in size was peeled from each individual trunk at approximately 0.5 m above the ground. The roots were collected from the fibrous roots with a diameter of approximately 5 mm. Three mature leaves were sampled on an individual tree. All of the plant materials were immediately immersed in liquid nitrogen after sampling. The plant materials were smashed in liquid nitrogen, and then used to isolate the complete RNAs using the CTAB protocol [86]. Reverse transcription and the real-time PCR were performed using the PrimeScript RT kit and TB Green Premix Ex Taq II (Takara Bio Co. Dalian, China) following the manufacturer's guides, respectively. Six EuUGT genes were selected to confirm the organ-specific expression, with β-actin as the reference gene. The primers are listed in Table S7. The qPCR reaction was performed with the following parameters: An initial denaturation for 30 s at 95 • C; then, 45 cycles, including denaturation, for 5 s at 95 • C; annealing for 30 s at 55 • C; and extension for 30 s at 72 • C. The result was calculated with the following formula [87]:

Construction of Protein-Protein Interaction (PPI) Network of EuUGTs
In order to exploit the biological functions of EuUGTs, a PPI network was constructed. Briefly, 91 sequences of EuUGTs were input into the String web server with A. thaliana as the reference (https://string-db.org/, accessed on 9 September 2021) [44] to predict the PPI network. The generated result was imported into Cytoscape [88] for plotting.

Conclusions
An increasing number of studies have shown that UGTs play an important role in various biological processes in organisms. Therefore, the identification and in silico analysis of UGT gene families in different species have attracted many researchers. Up to now, almost no studies have been performed on the UGT family of E. ulmoides, an important pharmacological woody plant. Here, we identified 91 EuUGTs in E. ulmoides through genome-wide analysis, which could be clustered into 14 out of the 18 previously described phylogenetic groups (A to R). However, no EuUGT was categorized into groups F, N, P, or Q. Group L was the largest group of EuUGTs, which has a wide variety of substrates. Through transcriptome analysis, differentially expressed patterns were observed for most of the EuUGT genes, which tended to be expressed at higher levels in one or two organs. The organ-specific high expression of the EuUGT genes may be due to the specific metabolism requirements in the organs or in response to various stresses. Through the in silico analysis of the protein-protein interaction network of EuUGTs, a set of proteins including some UGTs were identified, perhaps to constitute a biosynthetic pathway of flavonoids and their glycosides in E. ulmoides. Combining multiple types of information from the phylogenetic analysis, the structural characteristics, and the function predictions was helpful to our exploration of the UGTs of interest. Our results could lay the foundation for further studies on UGTs and the biosynthesis of secondary metabolites of E. ulmoides.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10 .3390/plants10091934/s1, Supplementary file S1: List of the nucleotide sequences of 91 EuUGT genes; Supplementary file S2: List of the amino acid sequences of 91 EuUGTs; Figure S1: Organization of the EuUGT genes; Table S1: Characteristics of EuUGT genes with highly similar sequences; Table S2: Physical and chemical features and subcellular location of EuUGTs; Table S3: GO annotation of EuUGTs; Table S4: The transcriptional level of EuUGTs in TPM value in various organs and on various seasons; Table S5: Summary of proteins involved in the protein-protein interaction network; Table S6: Summary of RNA-seq datasets used for transcriptome analysis; Table S7: The nucleotide sequences of primers used for RT-qPCR.