Insights into Lignan Composition and Biosynthesis in Stinging Nettle (Urtica dioica L.)

Stinging nettle (Urtica dioica L.) has been used as herbal medicine to treat various ailments since ancient times. The biological activity of nettle is chiefly attributed to a large group of phenylpropanoid dimers, namely lignans. Despite the pharmacological importance of nettle lignans, there are no studies addressing lignan biosynthesis in this plant. We herein identified 14 genes encoding dirigent proteins (UdDIRs) and 3 pinoresinol-lariciresinol reductase genes (UdPLRs) in nettle, which are two gene families known to be associated with lignan biosynthesis. Expression profiling of these genes on different organs/tissues revealed a specific expression pattern. Particularly, UdDIR7, 12 and 13 displayed a remarkable high expression in the top internode, fibre tissues of bottom internodes and roots, respectively. The relatively high expression of UdPLR1 and UdPLR2 in the young internodes, core tissue of bottom internode and roots is consistent with the high accumulation of lariciresinol and secoisolariciresinol in these tissues. Lignan quantification showed a high abundance of pinoresinol in roots and pinoresinol diglucosides in young internodes and leaves. This study sheds light on lignan composition and biosynthesis in nettle, providing a good basis for further functional analysis of DIRs and PLRs and, ultimately, engineering lignan metabolism in planta and in cell cultures.


Introduction
Lignans, a large group of phenylpropanoid dimers, are widely distributed across the plant kingdom. Their primary biological function in planta is supposed to be associated with plant defence [1,2], particularly in response to pathogen attack [3]. In addition, lignans have received great interest due to their numerous beneficial effects in mammals, such as antihypertensive, antitumor, hepatoprotective, insecticidal, estrogenic, sedative and antioxidant activities [4]. For centuries, plants with a high lignan content have been used as an important and popular herbal medicine in the Eastern World [5]; one of these plants is stinging nettle (Urtica dioica L.), a perennial dioecious plant spread throughout the temperate zones of the world [6][7][8].
Stinging nettle is commonly considered an invasive weed; nevertheless, the leaf and root of this herbaceous plant have been widely used to treat many ailments including arthritis, rheumatism, hypertension, eczema, allergic rhinitis and muscular paralysis [9][10][11][12][13]. The extracts of nettle roots have been used in the treatment of benign prostatic hyperplasia and prostatic disease [8]. It was

Identification of UdDIRs
The previously established U. dioica de novo transcriptome was used to identify UdDIR genes [30]. Sixteen contigs were annotated as DIRs using Blast2GO against the A. thaliana and Viridiplantae database. BLASTN and BLASTX analyses against nettle leaf transcriptome at oneKP database were further carried out to examine and verify the obtained contigs. The sequence of some contigs was reconstructed to obtain the full length. A total of 14 DIRs were ultimately identified in U. dioica and 8 contain a full-length predicted open reading frame (ORF), ranging from 178 (UdDIR6) to 203 (UdDIR14) amino acids (Table 1). UdDIR nucleotide and protein sequences are listed in Text S1. We performed the alignment on the protein sequences of UdDIRs and selected AtDIRs. The five conserved motifs described previously [22] were identified in the sequences of UdDIRs ( Figure 1A).   (At) sequences. The alignment was generated with CLUSTAL-Ω and the conserved residues were highlighted using Jalview. Five conserved motifs (I-V) reported previously in [22] were identified in the amino acid sequences of UdDIRs and are underlined in red. (B) Phylogenetic analysis of UdDIRs. The tree was built by the maximum likelihood method with 1000 bootstraps. The scale bar indicates 0.1 amino acid substitutions per site. To investigate the similarities and divergences of the 14 UdDIRs, a multiple alignment of amino acid sequences was used to build a maximum likelihood tree ( Figure 1B). UdDIRs were mainly separated into two main groups. UdDIR14/5/3/13/12 clustered into Group I, while UdDIR1/2/6/7/9/10/11 into Group II. UdDIR4 and UdDIR8 did not cluster into any group, indicative of the high sequence divergence of these two DIRs as compared to the others.
To further understand the evolutionary relationships of DIRs among U. dioica and other plant species, an unrooted phylogenetic tree was constructed using 218 DIRs protein sequences from different plant species. As shown in Figure 2, all DIRs were classified into six subfamilies based on the classification of [22], with subfamily-c consisting of only angiosperm monocot DIRs, as previously reported [22]. UdDIRs from Group I (5 DIRs) and Group II (7 DIRs) were assigned to subfamily-a and b/d, respectively. UdDIR8 and UdDIR4 clustered into subfamily-e and g, respectively. Interestingly, the majority of UdDIRs clustered closely with DIRs derived from C. sativa and L. usitatissimum, suggesting a phylogenetic relatedness of DIRs among fibre crops. For example, in the subfamily-a, a grouping was observed for UdDIR12, UdDIR13 and CsaDIR6A, as well as for UdDIR14 and LuDIR1/2/3. highlighted using Jalview. Five conserved motifs (I-V) reported previously in [22] were identified in the amino acid sequences of UdDIRs and are underlined in red. (B) Phylogenetic analysis of UdDIRs. The tree was built by the maximum likelihood method with 1000 bootstraps. The scale bar indicates 0.1 amino acid substitutions per site.
To investigate the similarities and divergences of the 14 UdDIRs, a multiple alignment of amino acid sequences was used to build a maximum likelihood tree ( Figure 1B). UdDIRs were mainly separated into two main groups. UdDIR14/5/3/13/12 clustered into Group I, while UdDIR1/2/6/7/9/10/11 into Group II. UdDIR4 and UdDIR8 did not cluster into any group, indicative of the high sequence divergence of these two DIRs as compared to the others.
To further understand the evolutionary relationships of DIRs among U. dioica and other plant species, an unrooted phylogenetic tree was constructed using 218 DIRs protein sequences from different plant species. As shown in Figure 2, all DIRs were classified into six subfamilies based on the classification of [22], with subfamily-c consisting of only angiosperm monocot DIRs, as previously reported [22]. UdDIRs from Group I (5 DIRs) and Group II (7 DIRs) were assigned to subfamily-a and b/d, respectively. UdDIR8 and UdDIR4 clustered into subfamily-e and g, respectively. Interestingly, the majority of UdDIRs clustered closely with DIRs derived from C. sativa and L. usitatissimum, suggesting a phylogenetic relatedness of DIRs among fibre crops. For example, in the subfamily-a, a grouping was observed for UdDIR12, UdDIR13 and CsaDIR6A, as well as for UdDIR14 and LuDIR1/2/3.  (Li). The tree was constructed by the maximum likelihood method with 1000 bootstraps. Bootstrap values are indicated for nodes with support higher than 90% (black circles; the bigger the circle, the higher the value). The scale bar indicates 1 amino acid substitutions per site.  (Li). The tree was constructed by the maximum likelihood method with 1000 bootstraps. Bootstrap values are indicated for nodes with support higher than 90% (black circles; the bigger the circle, the higher the value). The scale bar indicates 1 amino acid substitutions per site.

Identification of UdPLRs
The same approach described above was used to obtain PLR genes in U. dioica. Three genes encoding PLR were identified (nucleotide and protein sequences are listed in Text S1). Among them, UdPLR2/3 contain a complete ORF sequence with 312 and 309 amino acids, respectively (Table 1).
It has been reported that PLRs display different affinity and enantiospecificity for the substrates (i.e., PINO and LARI enantiomers), which results in the complexity of the action of PLRs and consequently the difficulties in understanding their catalytic function. The foregoing phylogenetic analyses demonstrated that PLRs with similar catalytic activity clustered together [18,31]. In order to shed some light on the function of UdPLRs, a phylogenetic tree was constructed using 170 PLRs full-length protein sequences from 73 plant species, including 12 PLRs that were characterized for their enantiomeric selectivity ( Figure 3). The generated tree illustrates that UdPLR1 clusters together with PLRs preferring (+)-PINO and (+)-LARI to form (-)-SECO, namely LuPLR2 [32], LaPLR1 [32], LcPLR1 [33], FiPLR1 [34] and PhPLR [35,36]. Moreover, UdPLR2 is distributed in the same cluster with PLRs from Prunus persica (Pp), Malus domestica (Md) and Fragaria vesca (Fv), which all belong to the order Rosales. However, no PLR in this cluster has been functionally characterized so far. UdPLR3 does not branch together with other PLRs, which is indicative of low similarities in the protein sequence when comparing this gene with other PLRs. An analogous result was obtained when the phylogenetic tree was built using UdDIRs and PLRs that were characterized by their enantiospecificity ( Figure S1).
Molecules 2019, FOR PEER REVIEW 6 of 17 The same approach described above was used to obtain PLR genes in U. dioica. Three genes encoding PLR were identified (nucleotide and protein sequences are listed in Text S1). Among them, UdPLR2/3 contain a complete ORF sequence with 312 and 309 amino acids, respectively (Table 1).
It has been reported that PLRs display different affinity and enantiospecificity for the substrates (i.e., PINO and LARI enantiomers), which results in the complexity of the action of PLRs and consequently the difficulties in understanding their catalytic function. The foregoing phylogenetic analyses demonstrated that PLRs with similar catalytic activity clustered together [18,31]. In order to shed some light on the function of UdPLRs, a phylogenetic tree was constructed using 170 PLRs fulllength protein sequences from 73 plant species, including 12 PLRs that were characterized for their enantiomeric selectivity (Figure 3). The generated tree illustrates that UdPLR1 clusters together with PLRs preferring (+)-PINO and (+)-LARI to form (-)-SECO, namely LuPLR2 [32], LaPLR1 [32], LcPLR1 [33], FiPLR1 [34] and PhPLR [35,36]. Moreover, UdPLR2 is distributed in the same cluster with PLRs from Prunus persica (Pp), Malus domestica (Md) and Fragaria vesca (Fv), which all belong to the order Rosales. However, no PLR in this cluster has been functionally characterized so far. UdPLR3 does not branch together with other PLRs, which is indicative of low similarities in the protein sequence when comparing this gene with other PLRs. An analogous result was obtained when the phylogenetic tree was built using UdDIRs and PLRs that were characterized by their enantiospecificity ( Figure S1).  It was shown that the enantiospecificity of PLRs could be determined by certain amino acids [31,37]. In light of this observation, we conducted a multiple sequence alignment using amino acid sequences of UdPLRs and others catalysing opposite enantiospecific conversions. As shown in Figure 4, similarly to other PLRs, all UdPLRs contained K138, which is associated with the general base catalysis and the NAD(P)H-binding motif "GxxGxxG". The stabilization of 2'-phosphate group of NADPH and nicotine amide ring was attributed to two sites, namely K52 and F160. The latter was observed in all UdPLR sequences, while the former residue was absent in UdPLR3. Previous studies revealed that some amino acids in PLRs are conservative and discriminative with respect to their enantiospecificity, such as residue 164, 174, 267 and 271. Interestingly, these residues in UdPLR1 were consistent with the ones of PLRs that convert (+)-PINO to (-)-SECO via (+)-LARI, namely LaPLR1, FiPLR1, TpPLR2, LuPLR2 and CasPLR2.  Table S2. The scale bar indicates 0.1 amino acid substitutions per site.
It was shown that the enantiospecificity of PLRs could be determined by certain amino acids [31,37]. In light of this observation, we conducted a multiple sequence alignment using amino acid sequences of UdPLRs and others catalysing opposite enantiospecific conversions. As shown in Figure  4, similarly to other PLRs, all UdPLRs contained K138, which is associated with the general base catalysis and the NAD(P)H-binding motif "GxxGxxG". The stabilization of 2'-phosphate group of NADPH and nicotine amide ring was attributed to two sites, namely K52 and F160. The latter was observed in all UdPLR sequences, while the former residue was absent in UdPLR3. Previous studies revealed that some amino acids in PLRs are conservative and discriminative with respect to their enantiospecificity, such as residue 164, 174, 267 and 271. Interestingly, these residues in UdPLR1 were consistent with the ones of PLRs that convert (+)-PINO to (-)-SECO via (+)-LARI, namely LaPLR1, FiPLR1, TpPLR2, LuPLR2 and CasPLR2. The conserved motif ''GxxGxxG'' of the NADPH binding domain is enclosed in the yellow frame. The asterisk indicates amino acid K138 that is involved in general base catalysis [37]. Amino acids that are involved in the enantiospecificity are enclosed with red boxes [31,32,37]. K52 and F160 are associated with the stabilisation of 2'-phosphate group of NADPH and the nicotine amide ring [31,37] and are indicated with triangles. The numbering of amino acids is based on the sequence of UdPLR2. The conserved motif "GxxGxxG" of the NADPH binding domain is enclosed in the yellow frame. The asterisk indicates amino acid K138 that is involved in general base catalysis [37]. Amino acids that are involved in the enantiospecificity are enclosed with red boxes [31,32,37]. K52 and F160 are associated with the stabilisation of 2'-phosphate group of NADPH and the nicotine amide ring [31,37] and are indicated with triangles. The numbering of amino acids is based on the sequence of UdPLR2.

Targeted Quantification of Lignans in Different Tissues of Stinging Nettle
To understand lignan composition in different tissues, we quantified the six most common lignans, namely pinoresinol (PINO), lariciresinol (LARI), secoisolariciresinol (SECO), matairesinol (MATA), pinoresinol diglucoside (PDG) and secoisolariciresinol diglucoside (SDG) in the young internodes (TOP and MID), core and cortical tissues of old internodes (BOT-C and BOT-F), leaves and roots ( Figure 5A). SDG and MATA were not detected in any of the studied tissue, possibly due to their very low abundance. To understand lignan composition in different tissues, we quantified the six most common lignans, namely pinoresinol (PINO), lariciresinol (LARI), secoisolariciresinol (SECO), matairesinol (MATA), pinoresinol diglucoside (PDG) and secoisolariciresinol diglucoside (SDG) in the young internodes (TOP and MID), core and cortical tissues of old internodes (BOT-C and BOT-F), leaves and roots ( Figure 5A). SDG and MATA were not detected in any of the studied tissue, possibly due to their very low abundance. Error bars represent the standard deviation calculated from four independent biological replicates and two technical replicates. Significant differences among groups were analysed using one-way ANOVA followed by Tukey's post-hoc test and are indicated with different letters.
As can be seen in Figure 5B, the accumulation of four lignans varied substantially among different tissues/organs. More specifically, TOP and MID internodes displayed a similar composition, in which the predominant lignan was PDG (5.49 ± 0.76 and 3.97 ± 0.85 µg/g DW in TOP and MID, respectively). In the BOT-C, four lignans were present without a significant difference in the amount. Neither LARI nor SECO were detected in BOT-F, while PDG showed a 4-fold higher amount (2.43 ± 0.44 µg/g DW) with respect to PINO (0.58 ± 0.08 µg/g DW). The predominant lignan in LEAF was PDG (6.23 ± 0.52 µg/g DW), showing a comparable amount as compared to TOP, while no LARI was detected in LEAF. Strikingly, ROOT displayed a remarkably high amount of PINO (87.19 ± 13.73 µg/g DW), which was > 30-fold higher with respect to BOT-F and > 50-fold higher as compared to all other tissues (i.e., TOP, MID, BOT-C and ROOT). It is interesting to note that PINO and PDG accumulated in different tissues and organs in an opposite fashion; tissues/organs with high PINO content have low PDG content and vice versa.

Gene Expression Analysis of DIRs and PLRs in Different Tissues
To provide further insight into the possible roles of U. dioica DIRs and PLRs, we investigated their gene expression in different tissues. The RT-qPCR analysis was carried out on eight UdDIRs that were differentially expressed in different internodes and tissues of nettle stem based on previously published RNA-Seq data. The RPKM value of UdDIRs and UdPLRs in different tissues are given in Table S3 [30]. We reasoned that these UdDIRs, rather than those showing a constant expression level, would be the best targets to understand, at the gene level, the differences in the abundance of lignans in different tissues and organs. Of these UdDIRs, three genes (i.e., UdDIR5/12/13) clustered in subfamily-a and five (i.e., UdDIR1/2/7/9/11) were assigned to subfamily-b/d (Figure 1). All genes showed distinct expression patterns over different tissues/organs and interestingly, some genes was performed using LC-HRMS. Error bars represent the standard deviation calculated from four independent biological replicates and two technical replicates. Significant differences among groups were analysed using one-way ANOVA followed by Tukey's post-hoc test and are indicated with different letters.
As can be seen in Figure 5B, the accumulation of four lignans varied substantially among different tissues/organs. More specifically, TOP and MID internodes displayed a similar composition, in which the predominant lignan was PDG (5.49 ± 0.76 and 3.97 ± 0.85 µg/g DW in TOP and MID, respectively). In the BOT-C, four lignans were present without a significant difference in the amount. Neither LARI nor SECO were detected in BOT-F, while PDG showed a 4-fold higher amount (2.43 ± 0.44 µg/g DW) with respect to PINO (0.58 ± 0.08 µg/g DW). The predominant lignan in LEAF was PDG (6.23 ± 0.52 µg/g DW), showing a comparable amount as compared to TOP, while no LARI was detected in LEAF. Strikingly, ROOT displayed a remarkably high amount of PINO (87.19 ± 13.73 µg/g DW), which was > 30-fold higher with respect to BOT-F and > 50-fold higher as compared to all other tissues (i.e., TOP, MID, BOT-C and ROOT). It is interesting to note that PINO and PDG accumulated in different tissues and organs in an opposite fashion; tissues/organs with high PINO content have low PDG content and vice versa.

Gene Expression Analysis of DIRs and PLRs in Different Tissues
To provide further insight into the possible roles of U. dioica DIRs and PLRs, we investigated their gene expression in different tissues. The RT-qPCR analysis was carried out on eight UdDIRs that were differentially expressed in different internodes and tissues of nettle stem based on previously published RNA-Seq data. The RPKM value of UdDIRs and UdPLRs in different tissues are given in Table S3 [30]. We reasoned that these UdDIRs, rather than those showing a constant expression level, would be the best targets to understand, at the gene level, the differences in the abundance of lignans in different tissues and organs. Of these UdDIRs, three genes (i.e., UdDIR5/12/13) clustered in subfamily-a and five (i.e., UdDIR1/2/7/9/11) were assigned to subfamily-b/d ( Figure 1). All genes showed distinct expression patterns over different tissues/organs and interestingly, some genes showed an exceedingly high expression in certain tissues ( Figure 6A,B). As shown in the heat map hierarchical clustering of UdDIRs expression profiles, four unique expression patterns can be identified by setting 0.68 as the threshold value for the correlation coefficient. These patterns were characterised by those genes that were highly expressed in (1) ROOT (UdDIR1/7), (2) young internodes at the TOP and MID (UdDIR13 and UdDIR2), (3) both young internodes and BOT-C (UdDIR5/9/11) and (4) BOT-F (UdDIR12), respectively ( Figure 6A). showed an exceedingly high expression in certain tissues ( Figure 6A,B). As shown in the heat map hierarchical clustering of UdDIRs expression profiles, four unique expression patterns can be identified by setting 0.68 as the threshold value for the correlation coefficient. These patterns were characterised by those genes that were highly expressed in (1) ROOT (UdDIR1/7), (2) young internodes at the TOP and MID (UdDIR13 and UdDIR2), (3) both young internodes and BOT-C (UdDIR5/9/11) and (4) BOT-F (UdDIR12), respectively ( Figure 6A). More specifically, for UdDIR1, while an extremely low expression was observed in BOT-F and LEAF, its expression level showed over 17-fold higher value in ROOT with respect to young internodes (TOP and MID) and 6-fold higher abundance as compared to the BOT-C ( Figure 6B). UdDIR13 was predominantly expressed in the TOP (FC TOP vs. MID and ROOT > 4 and 10, respectively), with low expression in BOT tissue and LEAF. The expression level of UdDIR12 displayed a sharp peak in the BOT-F. All other UdDIRs were nevertheless expressed at low levels in BOT-F with respect to other tissues.
Concerning PLRs, while UdPLR3 showed a comparable expression level in different tissues, UdPLR1 and UdPLR2 both displayed a significantly low expression in BOT-F and LEAF, as compared to the other tissues ( Figure 6C).

Discussion
In this study, with the aim of improving the knowledge on lignan biosynthesis in U. dioica, nettle members of DIRs and PLRs were identified and analysed. Interestingly, the expression of these genes, as well as the lignan profile, showed organ/tissue-specific patterns, which is summarized in Figure  7. More specifically, for UdDIR1, while an extremely low expression was observed in BOT-F and LEAF, its expression level showed over 17-fold higher value in ROOT with respect to young internodes (TOP and MID) and 6-fold higher abundance as compared to the BOT-C ( Figure 6B). UdDIR13 was predominantly expressed in the TOP (FC TOP vs. MID and ROOT > 4 and 10, respectively), with low expression in BOT tissue and LEAF. The expression level of UdDIR12 displayed a sharp peak in the BOT-F. All other UdDIRs were nevertheless expressed at low levels in BOT-F with respect to other tissues.
Concerning PLRs, while UdPLR3 showed a comparable expression level in different tissues, UdPLR1 and UdPLR2 both displayed a significantly low expression in BOT-F and LEAF, as compared to the other tissues ( Figure 6C).

Discussion
In this study, with the aim of improving the knowledge on lignan biosynthesis in U. dioica, nettle members of DIRs and PLRs were identified and analysed. Interestingly, the expression of these genes, as well as the lignan profile, showed organ/tissue-specific patterns, which is summarized in Figure 7. Mining of the U. dioica transcriptome revealed a family of (at least) 14 DIRs, showing less gene members with respect to other herbaceous plant species, such as 45 members in M. truncatula [27,38], 54 in O. sativa [39], 35 in Picea spp. [22] and 44 in L. usitatissimum [40]. This suggests that some DIRs are missing from our analysis, possibly due to an incomplete transcriptome. According to our phylogenetic analysis, UdDIRs mainly clustered into subfamily-a and subfamily-b/d (Figure 2). It was demonstrated that members of subfamily-a have capabilities to form either (+)-PINO or (-)-PINO by stereoselective coupling of coniferyl alcohol, such as DIRs from T. plicata (TpDIR5) [41], Podophyllum peltatum [42], F. x intermedia [20] and L. usitatissimum (LuDIR1) [21]. Therefore, five UdDIRs that belong to subfamily-a (i.e., UdDIR3/5/12/13/14) are most likely involved in lignan formation ( Figure 2). Moreover, RT-qPCR analyses of three subfamily-a UdDIRs showed spatial (i.e., different stem heights; TOP, MID and BOT internodes) and tissue-specific expression patterns ( Figures 6A,B and 7). Based on the gene expression level, UdDIR12 could potentially contribute to the PINO biosynthesis in fibres, while UdDIR13 could play an important role in forming PINO in young internodes.
Interestingly, a recent study showed that DIR22 from Glycine max (GmDIR22), a member of subfamily-b/d, is also involved in lignan biosynthesis [43]. In addition, a significant amino acid sequence homology was found between GmDIR22 and UdDIRs from subfamily-b/d (i.e., UdDIR1/2/6/7/9/10/11) (Table S4). Hence, it is plausible that these seven UdDIRs may partake in lignan biosynthesis. This may explain the predominant abundance of PINO in roots ( Figure 5B), despite the low expression of subfamily-a members (UdDIR5/12/13) observed in the same tissue ( Figure 6B). It is, therefore, tempting to speculate that UdDIR1 and UdDIR7, two members of subfamily-b/d, could be implicated in PINO biosynthesis in roots, due to their high transcript abundance ( Figure 6B). Further studies are needed to confirm the biochemical role of these UdDIRs.
Glycosylation via uridine diphosphate-glycosyltransferases (UGT) is a commonly occurring modification in lignan biosynthesis, as evidenced by the presence of diverse lignan glycosides in divergent plant species [44][45][46][47][48][49][50]. For instance, the most abundant lignan glycoside in sesame seeds is sesaminol triglucoside, while secoisolariciresinol diglucoside (SECO) is the major lignan in flaxseed. Mining of the U. dioica transcriptome revealed a family of (at least) 14 DIRs, showing less gene members with respect to other herbaceous plant species, such as 45 members in M. truncatula [27,38], 54 in O. sativa [39], 35 in Picea spp. [22] and 44 in L. usitatissimum [40]. This suggests that some DIRs are missing from our analysis, possibly due to an incomplete transcriptome. According to our phylogenetic analysis, UdDIRs mainly clustered into subfamily-a and subfamily-b/d (Figure 2). It was demonstrated that members of subfamily-a have capabilities to form either (+)-PINO or (-)-PINO by stereoselective coupling of coniferyl alcohol, such as DIRs from T. plicata (TpDIR5) [41], Podophyllum peltatum [42], F. x intermedia [20] and L. usitatissimum (LuDIR1) [21]. Therefore, five UdDIRs that belong to subfamily-a (i.e., UdDIR3/5/12/13/14) are most likely involved in lignan formation ( Figure 2). Moreover, RT-qPCR analyses of three subfamily-a UdDIRs showed spatial (i.e., different stem heights; TOP, MID and BOT internodes) and tissue-specific expression patterns ( Figure 6A,B and Figure 7). Based on the gene expression level, UdDIR12 could potentially contribute to the PINO biosynthesis in fibres, while UdDIR13 could play an important role in forming PINO in young internodes.
Interestingly, a recent study showed that DIR22 from Glycine max (GmDIR22), a member of subfamily-b/d, is also involved in lignan biosynthesis [43]. In addition, a significant amino acid sequence homology was found between GmDIR22 and UdDIRs from subfamily-b/d (i.e., UdDIR1/2/6/7/9/10/11) (Table S4). Hence, it is plausible that these seven UdDIRs may partake in lignan biosynthesis. This may explain the predominant abundance of PINO in roots ( Figure 5B), despite the low expression of subfamily-a members (UdDIR5/12/13) observed in the same tissue ( Figure 6B). It is, therefore, tempting to speculate that UdDIR1 and UdDIR7, two members of subfamily-b/d, could be implicated in PINO biosynthesis in roots, due to their high transcript abundance ( Figure 6B). Further studies are needed to confirm the biochemical role of these UdDIRs.
Glycosylation via uridine diphosphate-glycosyltransferases (UGT) is a commonly occurring modification in lignan biosynthesis, as evidenced by the presence of diverse lignan glycosides in divergent plant species [44][45][46][47][48][49][50]. For instance, the most abundant lignan glycoside in sesame seeds is sesaminol triglucoside, while secoisolariciresinol diglucoside (SECO) is the major lignan in flaxseed. In this study, we observed the presence of PDG in all tissues of U. dioica clone 13, but, interestingly, with an opposite trend in relation to PINO content ( Figure 5B). PDG is preferentially accumulated in young internodes (TOP and MID) and leaves, rather than older tissue (BOT) and roots, suggesting a possible role of PDG in the regulation of plant development. This is particularly relevant if one considers that, on the one hand, lignans were shown to affect plant growth [51][52][53] and, on the other hand, that fibre cells in the TOP and MID internodes are in the rapid elongation phase under a strict control involving gene regulatory network, reactive oxygen species and secondary metabolites [30].
Currently, the demand for PDG is rapidly increasing due to its pharmacological effects, such as antihypertension [54,55] and prevention of osteoporosis [56]. To date, Tu-chung (Eucommia ulmoides Oliv.) is the main source of PDG in nature; however, this tree grows very slow and the yield of PDG is also low [57]. Therefore, considerable efforts have been devoted to increasing the yield of PDG in vitro using a fungal strain [57][58][59]. Given the shorter growth cycle as compared to Tu-chung, nettle could be a good alternative source of PDG. In line with this, leaf extracts of U. dioica were demonstrated to be able to decrease both systolic and diastolic blood pressure [12], which could be partially attributed to the relatively high level of PDG in the leaves ( Figure 5B). It would be very interesting to identify the UGTs that are involved in the glycosylation of PINO for metabolic engineering of PDG biosynthesis.
Most of PLRs catalyse two subsequent reductions from PINO to SECO via LARI [18,32], except for the ones from A. thaliana that have low or no affinity towards LARI [60]. Furthermore, it has been demonstrated that PLRs display enantiospecificity in both reduction steps, adding more complexity to the interpretation of the catalytic function of PLRs [18]. In our study, 3 UdPLRs were differentially expressed in various tissues (Figures 6C and 7) and, therefore, could be responsible for the tissue-specific accumulation of lignans (perhaps with different enantiomeric composition) (Figures 5B and 7). The phylogenetic analysis displayed that UdPLR1 clustered together with PLRs from other plant species that convert (+)-PINO to (+)-LARI and then to (-)-SECO, indicative of a similar enantiospecificity for UdPLR1. This result was further supported by the multiple sequence alignment of PLRs, showing that the amino acids associated with enantiospecificity were consistent between UdPLR1 and PLRs that convert (+)-PINO to (-)-SECO via (+)-LARI ( Figure 4). Further investigations on the enantiospecificity and kinetic properties of each UdPLR will shed more light on this. Moreover, determining the enantiomeric configuration of nettle lignans via chiral HPLC will also advance our understanding of the biochemical role of UdPLRs. We observed a higher expression level of UdPLR1 and UdPLR2 in TOP, MID, BOT-C and ROOT as compared to that in BOT-F and LEAF ( Figure 6C), which is in line with the higher accumulation of LARI and SECO in these tissues ( Figure 5B). It is worthwhile mentioning that the transcripts of PLRs were also found to be highly abundant in the inner stem tissue of flax, although no differences in the level of PINO and LARI were observed between inner and outer stem tissues (corresponding to BOT-C and BOT-F in this study, respectively) [61]. LARI and SECO were not detected in BOT-F ( Figure 5B), notwithstanding UdPLRs were all expressed in BOT-F ( Figure 6B). This discrepancy suggests a possible involvement of post-translational regulation mechanism. Concurrently with the development of prediction methods and bioinformatic tools [62], as part of future research, it is important to further identify the sites of post-translational modification for a better understanding of the molecular mechanisms involved in lignan biosynthesis. Post-translational regulation is known to play an important role in controlling key biosynthetic pathways of secondary metabolites, notably phenylpropanoids, through the regulation of the gateway enzyme phenylalanine ammonia lyase (PAL) [63].
As shown in a series of recent publications (see, e.g., [62,64]) demonstrating new findings or approaches, user-friendly and publicly accessible webservers will significantly enhance their impacts [65]), driving medicinal chemistry into an unprecedented revolution.

Plant Material
The propagation, growth condition and sampling of U. dioica "clone 13" [66] were performed as described previously [30,67]. Briefly, the stem cuttings of plants were grown in growth chambers until about 70 cm tall under standard conditions, i.e., 25 • C 16 h light and 20 • C 8h dark. Different internodes were sampled along the stem. The TOP internode is located just below the apex. The middle internode (MID) shows a kink when tilting the plant (and may, therefore, include the snap point) and the BOT bottom internode (BOT) is the third internode underneath the MID. BOT internodes were peeled to separate core (BOT-C) and cortical tissues containing bast fibres (BOT-F). Four biological replicates (7 plants for each replicate) were collected for all stem samples, leaves (LEAF) and roots (ROOT).

Gene Identification and Phylogenetic Analysis
The predicted nettle DIRs and PLRs were identified via blasting the Arabidopsis thaliana DIRs and PLRs against the transcriptome of nettle "clone 13" [30] and nettle leaf database at China National GeneBank DataBase-oneKP (https://db.cngb.org/onekp/). The sequences of each gene were further examined via BLASTX analysis against the Viridiplantae database. The CDS and protein sequences of 14 DIRs and 3 PLRs are listed in Text S1.
Sequence alignments of DIRs and PLRs were carried out using CLUSTAL-Ω [68] and conserved residues were highlighted with Jalview [69].
The phylogenetic tree of both DIRs and PLRs was constructed. The alignment of full-length amino acid sequences constructed with CLUSTAL-Ω was subjected to phylogenetic analysis using maximum likelihood method via W-IQ-TREE with 1000 bootstraps [70]. The online program iTOL (https://itol.embl.de/) was used to visualise the tree.

Lignan Extraction and Quantification
The extraction of lignans was performed on different organs and tissues of nettle plants with four biological replicates (7 plants for each replicate) and two technical replicates. The extraction procedure was adapted from [72]. Samples of 30 mg of ground lyophilised material were suspended in 1 mL of 300 mM NaOH (in 70% methanol, v/v) and placed in a thermomixer at 60 • C and 750 rpm for 1 h. To neutralise the extract, 20 µL of acetic acid were added after cooling to room temperature and 500 µL of supernatant were collected after centrifugation. The extract was then diluted 10 times with ultrapure water and filtered on a polytetrafluoroethylene membrane (cut-off 0.45 µm) and an injection volume of 10 µL was used for each analysis.
Quantitative analyses were carried out using Ultra-High Performance Liquid Chromatography (U-HPLC, 1290 Infinity II, Agilent, Waldbronn, Germany) coupled to high-resolution Quadrupole-Time of Flight (Q-ToF) mass spectrometry (X500R, Sciex, Darmstadt, Germany). The chromatographic separation was carried out on a BEH C18 column, 50 × 2.1 mm ID with the particle size of 1.8 µm (Waters, Milford, MA, USA). The flow rate of the mobile phase was kept constantly at 0.3 mL/min and the column oven was set at 40 • C. The mobile phases were LC-MS grade acetonitrile and 2.5 mM ammonium acetate in ultrapure water. The eluent gradient started with 5% of acetonitrile for 1 min, increased to 30% within 3 min, then to 90% within 1 min, kept at 90% for 2 min, returned to 5% within 1 min and equilibrated for 2 min. The mass spectrometer was operated in negative electrospray ionisation mode. The ion spray voltage was -4.5 kV and the source temperature 500 • C. Quantitative results were provided in High-Resolution Multiple Reaction Monitoring (MRM-HR) mode. Results were confirmed by a second MRM-HR transition and by the ToF-MS signal. Six standard lignans (Sigma-Aldrich, Darmstadt, Germany) were used, i.e., pinoresinol (PINO), pinoresinol diglucoside (PDG), lariciresinol (LARI), secoisolariciresinol (SECO), secoisolariciresinol diglucoside (SDG) and matairesinol (MATA). The amount of each lignan was calculated against an external calibration curve obtained by different standard concentrations ranging from 1-200 ng/mL. A one-way ANOVA with Tukey's post-hoc test was carried out to determine the significant differences among groups using SPSS 13.0 (SPSS Inc., Chicago, IL, USA).

Total RNA Extraction, cDNA Synthesis and Quantitative Real-Time PCR (RT-qPCR)
Total RNA extraction, cDNA synthesis and RT-PCR were carried out as previously reported [64]. Primers were designed using "Primer3Plus" [73]. Seven serial dilutions of cDNA (12.5, 2.5, 0.5, 0.1, 0.02, 0.004, 0.0008 ng/µL) were used to calculate the primer efficiency. Primer sequences and their primer efficiency are provided in Table 1. Five reference genes published previously were used in this study (i.e., RAN, EF2, tubulin and eTIF4E) [64] and the normalisation of data was performed using RAN and EF2, which were identified as the most stable genes by geNorm TM , as implemented in the qbase+ software (Biogazelle, Zwijnaarde, Belgium). The log2 transformed data were used for statistics using a one-way ANOVA followed by a Tukey's post-hoc test (SPSS 13.0, SPSS Inc.). Hierarchical clustering of gene expression data was carried out using Cluster 3.0 [74] with Pearson correlation and complete linkage and the heat map was visualised with Java TreeView.
Supplementary Materials: The following are available online, Text S1: CDS and protein sequences of nettle DIRs and PLRs, Figure S1: phylogenetic analysis of UdPLRs and PLRs that were characterised for the enantiospecificity, Table S1: list of the dirigent protein (DIR) accession numbers and amino acid length from different plant species used in this study, Table S2: list of the pinoresinol-lariciresinol reductase (PLR) protein accession numbers and amino acid length from different plant species used in this study, Table S3: the RPKM value of UdDIRs and UdPLRs in different tissues obtained from our previous transcriptomic analysis, Table S4: amino acid sequence homology between DIR22 from Glycine max (accession number: ADX66343.1) and UdDIRs from subfamily-b/d.
Author Contributions: X.X. and G.G. conceived this study and designed the experiment. X.X. performed the bioinformatics analyses and RT-qPCR analysis, interpreted the data and wrote the paper. C.G. analysed lignan content. G.G., J.R., J.-F.H., E.G. and S.P. contributed to data interpretation and manuscript revision.
Funding: The Fonds National de la Recherche, Luxembourg (Project CABERNET C16/SR/11289002), financially supported this study.