Comparative Transcriptome Analyses of Different Rheum officinale Tissues Reveal Differentially Expressed Genes Associated with Anthraquinone, Catechin, and Gallic Acid Biosynthesis

Rheum officinale Baill. is an important traditional Chinese medicinal herb, its dried roots and rhizomes being widely utilized to cure diverse diseases. However, previous studies mainly focused on the active compounds and their pharmacological effects, and the molecular mechanism underlying the biosynthesis of these ingredients in R. officinale is still elusive. Here, we performed comparative transcriptome analyses to elucidate the differentially expressed genes (DEGs) in the root, stem, and leaf of R. officinale. A total of 236,031 unigenes with N50 of 769 bp was generated, 136,329 (57.76%) of which were annotated. A total of 5884 DEGs was identified after the comparative analyses of different tissues; 175 and 126 key enzyme genes with tissue-specific expression were found in the anthraquinone, catechin/gallic acid biosynthetic pathway, respectively, and some of these key enzyme genes were verified by qRT-PCR. The phylogeny of the PKS III family in Polygonaceae indicated that probably only PL_741 PKSIII1, PL_11549 PKSIII5, and PL_101745 PKSIII6 encoded PKSIII in the polyketide pathway. These results will shed light on the molecular basis of the tissue-specific accumulation and regulation of secondary metabolites in R. officinale, and lay a foundation for the future genetic diversity, molecular assisted breeding, and germplasm resource improvement of this essential medicinal plant.


Introduction
Rheum officinale Baill. is an essential medical plant that belongs to the genus Rheum within Polygonaceae, which has been formally included in the Chinese Pharmacopoeia as one of the resource plants of rhubarb (Da-Huang in Chinese), a traditional medicine in China. Its dried roots and rhizomes have been extensively used in many countries with various therapeutic effects such as clearing body heat, cooling blood and detoxifying toxins, relieving dampness, and abating jaundice [1][2][3]. Numerous studies have indicated that rhubarb contains various chemical ingredients, and anthraquinones which make up the major bioactive composition with multiple pharmacological effects [3][4][5]. Especially, Rolta et al. [6] reported emodin can inhibit the spike protein of SARS-CoV-2 binding to the ACE2 receptor for the treatment of COVID-19. However, research about the candidate genes for the biosynthesis of effective ingredients in R. officinale is limited, and the molecular mechanism of the biosynthesis in different tissues is still poorly understood due to inadequate genomic information.
In embryophytes, the biosynthesis of anthraquinones is principally involved in the upstream pathways consisting of the shikimate pathway, methyl-D-erythritol 4-phosphate (MEP) pathway, mevalonate (MVA) pathway, polyketide pathway, and the downstream pathways for anthraquinone modification derivatized by UDP-glycosyltransferases (UGTs) and Cytochrome P450s (CYPs) genes [7]. Nevertheless, a previous study indicated that the synthesis of emodin-type anthraquinone is predominantly through the polyketide pathway, while the Rubia-type anthraquinones are derived from the shikimate pathway [8]. Moreover, the biosynthesis of catechin and gallic acid of our concern is a complex process involving the shikimate pathway, phenylpropanoid biosynthetic pathway, and flavonoid biosynthetic pathway [9]. Secondary metabolites are usually differentially distributed in diverse tissues of higher plants, which are associated with the tissue-specific expression of biosynthetic enzyme genes and transcript factors (TFs) related to the corresponding biosynthetic pathways [10][11][12]. Therefore, elaborating the expression differences of key enzyme genes in different tissues would deepen our understanding of the molecular mechanisms underlying the tissue specificity of the active compounds [5,13,14].
Currently, RNA sequencing (RNA-seq) is broadly used as an important approach for gene expression level analyses and molecular marker development [7,12,13,15]. Many studies have utilized transcriptomic analyses to identify critical candidate genes and tissuespecific differentially expressed genes (DEGs) for the biosynthesis of terpenoid, stilbene, saponin, lignin, anthraquinone, and flavonoid [11,16,17]. Zhou et al. [7] have reported the candidate genes associated with anthraquinone biosynthesis and their tissue-specific expression patterns in Rheum tanguticum. Liu et al. [18] combined transcriptomic and metabolomic analyses to further reveal the differences in the expression of anthraquinones and flavonoids between two Rheum species. Although previous studies have confirmed the differences in the accumulation of secondary metabolites in different tissues of R. officinale and used RNA-seq to identify candidate genes involved in the biosynthesis of anthraquinones, etc. [16,[19][20][21][22], the differences in the expression of the enzyme genes associated with the synthesis of these components are still indefinite in R. officinale.
In this work, we generated a comprehensive transcriptome for R. officinale and identified the candidate genes associated with anthraquinone, catechin, and gallic acid biosynthesis. Furthermore, comparative transcriptome analyses were carried out to compare and elucidate the expression profiles in R. officinale root, stem, and leaf tissues. The results could be valuable genetic resources for providing comprehensive insights into molecular mechanisms of tissue-specific distribution of active components and improving the quality and production of this essential medicinal plant.

Plant Materials
Root, leaf, and stem samples of R. officinale were gathered from the three individuals as biological replicates at the time of blossoming in Pingli County, Shaanxi Province, China (32 • [23], then paired-end sequencing with the length of 150 bp was performed to generate raw data for nine samples by the Illumina Hiseq X Ten platform.

De Novo Assembly and Functional Annotation
To obtain high-quality clean reads, all the raw reads of R. officinale transcriptome generated by Illumina sequencing were processed through the removal of the adaptor, artificial primers, read with uncertain bases "N" over 5%, and low-quality sequences using Trimmomatic v0.35 [24]. After trimming, all clean reads were de novo assembled using Trinity v2.13.2 with the default parameters to generate transcripts [25]. Furthermore, such assembled transcripts were clustered to remove redundancy using CD-HIT-EST v.4.8.1 with a 95% identity and 95% query coverage (for the shorter sequence) threshold, and the remaining unigenes were then used for the subsequent analyses [26].
To predict the potential functions of assembled unigenes, all non-redundant unigenes were searched against public databases including Nr

Phylogenetic Analyses of Type III Polyketide Synthases (PKS III) Genes in R. officinale
To identify the PKS III genes in R. officinale, 145 protein sequences belonging to PKS III were selected and downloaded from NCBI (https://www.ncbi.nlm.nih.gov/, accessed on 20 November 2021) ( Table S9). The Getorf program (https://emboss.bioinformatics.nl/ cgi-bin/emboss/getorf, accessed on 23 November 2021) was used to detect open reading frames (ORFs) containing at least 150 amino acids for the unigenes of R. officinale, and then we used BLASTP to query 147 protein sequences with the ORFs with the e-value threshold of 10 −5 . Seven unigenes that displayed high similarity to PKS III were identified and used in the subsequent analyses. To infer the phylogenetic relationships of the PKS III family in Polygonaceae, 50 protein sequences of PKS III downloaded from NCBI (Table S9) and 7 protein sequences detected from R. officinale were prepared, and the bacterial PKS III Mycobacterium tuberculosis PKS10 (CAB06631.1) was used as the outgroup. Alignment of all protein sequences was conducted using the MUSCLE algorithm in MEGA-X, and then alignment was trimmed with trimAL. Finally, a maximum likelihood (ML) phylogenetic tree was constructed using MEGA-X with the JTT + G model selected by modeltest, and the bootstrap replicate was set to 1000 [28].

Differential Gene Expression Analyses
Clean reads of nine R. officinale samples were matched to the unigenes through Bowtie v2.4.4 software, and quantification of gene expression level was performed using RSEM v1.3.3 with default parameters to assess the expression abundance of all unigenes based on FPKM (fragments per kilobase of transcript per million mapped reads) [29,30]. Analyses of DEGs between two samples were conducted by DESeq2 package in RStudio and set thresholds with |log 2 (foldchange)| ≥1 and false discovery rate (FDR) ≤ 0.05 (Control/Treated) to evaluate the statistically significant level [31]. In three comparisons of the root, stem, and leaf tissues in R. officinale, unigenes having foldchange (FC) value greater than 2 were regarded as up-regulated, while those less than 0.5 were down-regulated. DEGs of different tissues were also functionally annotated with the aforementioned eight public databases to elaborate the functions and metabolic pathways. Afterward, GO and KEGG enrichment analyses of DEGs were carried out with ClusterProfile package [32]. The FPKM values were log-transformed and normalized using the Pheatmap package to draw a heatmap describing transcript abundance levels.

Transcription Factor Analysis
The predicted longest ORF of each sequence was considered as the potential coding region sequence (CDS). The CDS of DEGs of R. officinale roots, stems, and leaves were used to predict the putative TF families through the Transcription Factor Prediction (http:// planttfdb.gao-lab.org/prediction.php, accessed on 3 November 2021) in Plant Transcription Factor Database (PlantTFDB).

Quantitative Real-Time PCR Verification
qRT-PCR was conducted for 16 DEGs related to the anthraquinone, catechin and gallic acid biosynthetic pathways to validate the reliability of the RNA-seq. We used NCBI Primer-BLAST (https://www.ncbi.nlm.nih.gov/tools/primer-blast/, accessed on 20 December 2021) to design and validate the primers (Table S10). The total RNA of each sample was extracted using the E.Z.N.A. ® Plant RNA Kit (Omega Bio-Tek). Then, the extracted RNA was removed from the genomic DNA using DNase (Tiangen, Beijing, China), and the cDNA was synthesized using Thermo Scientific RevertAid First Strand cDNA Synthesis Kit (Thermo Fisher, Beijing, China). A 20 µL qRT-PCR reaction volume (10 µL of 2× NovoStart ® SYBR qPCR SuperMix Plus, 0.4 µL of each forward and reverse primers, 2 µL of cDNA template, and 7.2 µL of RNase Free water) was prepared with the NovoStart ® SYBR qPCR SuperMix Plus (Novoprotein, Shanghai, China), and three technical replicates were conducted for each sample to guarantee reliability. Next, all the reactions were further performed with BIO-RAD CFX Connect Real-Time PCR detect System (BIORAD, Hercules, CA, USA) based on the following protocol: 95.0 • C for 60 s (initial denaturation), followed by 42 cycles of 95.0 • C for 20 s (denaturation), 56.0 • C for 20 s (annealing), 72.0 • C for 30 s (extension), and melt curve: 97.0 • C for 10 s, 65.0 • C for 30 s, 95.0 • C for 30 s. Eventually, the housekeeping gene actin was normalized to other genes and relative expression levels of genes were determined based on the 2 −∆∆Ct method [33].

Identification of Simple Sequence Repeat (SSR)
The candidate SSRs of all the non-redundant R. officinale unigenes were detected by MISA software with parameters set to dinucleotide, trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide, and the motifs minimum repeats were 6, 4, 3, 3, 2, respectively [34]. In this study, mononucleotide repeats were excluded due to possible sequencing errors, mismatch, and the difficulty of distinguishing this SSR type on a polypropylene electrophoresis gel. In addition, we used the program Primer 3 to select SSRs with appropriate flanking lengths to design PCR primers (Table S11). The following criteria were considered for designing the primers: primer length of 18-23 nucleotides; PCR product size range of 100 to 300 bp; GC content of 30-70% and annealing temperature between 50 and 70 • C with 55 • C as the optimum melting temperature.

Subsection RNA Sequencing and De Novo Assembly
The nine cDNA libraries from the roots, stems, and leaves of R. officinale were named Ro_R1, Ro_R2, Ro_R3, Ro_S1, Ro_S2, Ro_S3, Ro_L1, Ro_L2, Ro_L3, respectively, and sequenced via the Illumina Hiseq X Ten platform. After sequencing and quality control, we obtained libraries with clean reads ranging from 24,265,150 to 32,077,349, GC content ranging from 48.61 to 50.72%, and Q30 values up to 94.26% (Table S1). These high-quality reads were available for the subsequent correlation analyses.
After de novo assembly, a sum of 463,056 transcripts with an average sequence length of 839.83 bp and N50 of 1675 bp was generated. Then CD-HIT-EST was used to remove redundant parts of these transcripts, resulting in a total of 236,031 unigenes with an average length of 563.98 bp, a GC content of 45.83%, and N50 of 769 bp for the following analyses ( Table 1). The length distribution of the transcripts and unigenes is demonstrated in Figure  S1, where it can be seen that the unigene length of R. officinale was mainly distributed between 200 and 400 bp.

Differentially Expressed Gene Analyses between Different Tissues
The overall quality assessment of gene expression in the nine samples using FPKM values showed that the correlation coefficients of samples between biological replicates of R. officinale were greater than those outside the biological replicates. Among the 3 biological replicates of roots: the correlation coefficients between Ro_R1 and Ro_R2, Ro_R2 and Ro_R3, Ro_R1 and Ro_R3 were 0.66, 0.70, and 0.92, respectively, smaller than the coefficients between replicates from stems (over 0.74) and leaves (over 0.75), which indicated that the differences of gene expression in roots were higher than those in stems and leaves ( Figure S3).
We found that 3362, 2984, and 2118 DEGs in Ro_R vs. Ro_L, Ro_R vs. Ro_S, and Ro_S vs. Ro_L could be annotated against the Nr, COG, KOG, GO, KEGG, Swiss-Prot, Pfam, and TrEMBL databases, respectively. We further performed GO and KEGG enrichment analyses of these DEGs in different tissues. In the GO enrichment analysis, the enriched items of all paired comparison groups were "embryo development ending in seed dormancy" (GO:0009793) for the BP category, and "chloroplast" (GO:0009507) and "chloroplast stroma" (GO:0009570) in the CC category. In the MF category, "xyloglucan:xyloglucosyl transferase activity" (GO:0016762) was significantly enriched in Ro_R vs. Ro_L and Ro_R vs. Ro_S, and "sucrose synthase activity" (GO:0016157) was enriched in Ro_S vs. Ro_L ( Figure S4). KEGG enrichment results demonstrated that the enriched terms for Ro_R vs. Ro_L, Ro_R vs. Ro_S, and Ro_S vs. Ro_L were "starch and sucrose metabolism", "ribosome biogenesis in eukaryotes", and "biosynthesis of amino acids", respectively ( Figure S5). Further statistics of the secondary metabolic pathways in the KEGG enrichment analysis revealed that phenylpropanoid biosynthesis was the most abundant in the three comparison groups. In addition, flavonoid biosynthesis, carotenoid biosynthesis, isoquinoline alkaloid biosynthesis, ubiquinone, and other terpenoid-quinone biosyntheses were present (Table S3).

Identification of Genes Associated with the Anthraquinone Biosynthesis
In this study, we identified 175 structural enzyme unigenes potentially regulating the biosynthesis of anthraquinone in the shikimate pathway, MEP pathway, MVA pathway, and polyketide pathway based on homology search and functional annotation ( Figure 4A). In total, 50 unigenes encoding 11 structural enzymes were screened in the shikimate pathway. The expression analysis showed that DAHPS and EPSPs were expressed in 3 tissues, DHQS was highly expressed in stems and leaves, DHQD/SDH, SK, CS, and MenE were mainly expressed in stems, while ICS, MenC, MenB had higher expression levels in roots. There were 25 unigenes encoding seven enzymes of the MEP pathway. Among them, CMS and CMK tended to be highly expressed in roots; DXR, CMK, and HDS were highly expressed in stems; while DXS, HDR, and HDS were mainly expressed in leaves. We also found that the expression patterns of the 93 unigenes encoding seven enzymes of the MVA pathway were similar, all being highly expressed in roots and stems ( Figure 4A and Table S5). Interestingly, the volcano plot demonstrated the up-regulated and down-regulated unigenes in the three comparative groups of anthraquinone biosynthetic pathways. Of these genes, MenE was shown to be up-regulated in all three comparative groups; IDI3 and HMGR11 were upregulated in Ro_R vs. Ro_L and Ro_R vs. Ro_S; DAHPS1 was up-regulated in Ro_R vs. Ro_L and Ro_S vs. Ro_L; HMGR3 was up-regulated in Ro_R vs. Ro_S; and DHQD/SDH was down-regulated in Ro_S vs. Ro_L ( Figure 4B-D). In addition, we identified seven candidate genes probably encoding PKSIII in the polyketide pathway (Table S5). The results of the phylogenetic tree revealed that only PL_741 PKSIII1, PL_11549 PKSIII5, and PL_101745 PKSIII6 were clustered in the CHS group, while the other four unigenes (PL_2139 PKSIII2, PL_3171 PKSIII3, PL_50026 PKSIII4, and PL_182516 PKSIII7) were classified to the non-CHS group ( Figure 5). Finally, we also found that 112 unigenes were predicted to be CYP family members and 2 unigenes could encode UGT enzymes (Table S5).

Catechin and Gallic Acid Biosynthesis in R. officinale
Here, we detected a total of 126 structural enzyme unigenes for the biosynthetic pathway of catechin and gallic acid in R. officinale, of which nine encoding HCT, seven encoding CHS, three encoding C3 H, two encoding F3H and F3 H, and one encoding F3 5 H, DFR, LAR were identified (Table S6). Interestingly, DAHPS expressed in roots, stems, and leaves, HCT and CHS showed high expression levels in roots, while the rest of the unigenes were mainly expressed in stems and leaves ( Figure 6A). Furthermore, we also specifically focused on DEGs involved in catechin and gallic acid biosynthesis in three comparison groups and found the number of DEGs between Ro_S vs. Ro_L (12 DEGs) was more than that of Ro_R vs. Ro_L (5 DEGs), and Ro_R vs. Ro_S (4 DEGs) (Table S6). In Ro_R vs. Ro_L, DAHPS1, CHS4, and F3 5 H were up-regulated in leaf tissues, while the expression of CHS7 and HCT5 was higher in roots than in the leaves ( Figure 6B and Table S6). PAL3, HCT9, and C3 H2 were up-regulated and CHS7 was down-regulated in Ro_R vs. Ro_S ( Figure 6C and Table S6). For Ro_S vs. Ro_L, PAL3, CHS7, HCT9, and C3 H2 were found to express more highly in stems, while the expression of the remaining eight unigenes (DAHPS1, CHS4, F3 5 H, F3 H2, DHQD/SDH9, F3H1, LAR, ANS2) were up-regulated in leaves ( Figure 6D and Table S6).

Detection of SSR Loci
We identified a total of 127,612 SSR loci in 236,031 unigene sequences of R. officinale using MISA, and 26,674 unigenes contained at least two SSR sites. The SSRs in the transcriptome of R. officinale were abundant, and all repeat types from dinucleotide to hexanucleotide were present, with the largest number of hexanucleotide repeats accounting for 74.24% of the total SSRs, followed by trinucleotides (18,635, 14.6%), and the remaining four nucleotide repeat types were relatively few, accounting for only 11.16% ( Figure S7). In addition, the count of repeat motifs was mainly 2 to 5, which accounted for 95.06% of the total, followed by 6-10 repeats (4.54%) mainly distributed in tri-and hexanucleotide repeats, and more than 15 repeats were primarily observed in dinucleotide repeat types (Table S7). Among the trinucleotide repeats, AGG/CCT, with 3165 SSRs (16.98%), was the most dominant repeat motif, while ACT/AGT (1.44%) occupied the least. The most frequent hexanucleotide repeat motif was AAAAAG/CTTTTT with the number of 2086, accounting for 2.14% (Table S8).

De Novo Assembly and Functional Annotation
Following the emergence of next-generation sequencing (NGS) technology, RNA-seq has become a central and powerful tool for studying metabolites of interest in non-model species, with the advantage of high throughput and relative rapidity [35][36][37]. RNA-seq can be an effective method to study gene function annotation, the discovery of novel biosynthetic enzyme genes, differential distribution of secondary metabolites in diverse tissues, gene expression levels, gene regulation, environmental factors, and gene expression networks [13,37,38]. Here, we report 236,031 assembled unigenes with N50 of 769 bp from diverse tissues (i.e., roots, stems, and leaves) of nine R. officinale samples. This result far exceeds the gene database of the transcriptome of R. palmatum seedlings [39] but is similar to the genes yielded by high-throughput sequencing in the roots, stems, and leaves of R. tanguticum [7], which may be related to different sampling periods and immature tissues in seedlings containing limited transcriptome information.
After sequencing and assembly, functional annotation is one of the most important steps, which can offer relevant biological insights into genomic and transcriptomic data through homology alignment [40]. For example, DEGs screened from a certain secondary metabolite biosynthetic pathway in different tissues are functionally annotated to reveal the molecular mechanism of tissue-specific distribution of metabolites [11,13,41]. We found a total of 136,329 (57.76%) of the assembled unigenes could be successfully annotated in eight public databases, indicating that there are many unigenes with unknown sequence characteristics and functions in the transcriptome of R. officinale. Wang et al. [11] obtained a total of 80,981 unigenes from the root, stem, and leaf tissues in Polygonum cuspidatum, a related species to R. officinale, and 40,729 of them (50.29%) were annotated. The result may be due to the scarcity of reference genomic resources for Rheum or even in the Polygonaceae. In brief, the assembly and functional annotation of R. officinale are broadly available, which will offer an abundant genetic resource for studying the molecular mechanisms of tissue-specific distribution of active secondary metabolites and aiding efficient breeding to alleviate this medicinal plant shortage.

Candidate Genes Identification and DEG Analyses Associated with Secondary Metabolite Biosynthesis
Anthraquinone and its derivatives are aromatic polyketides that can be widely synthesized by bacteria, fungi, lichens, insects, and plants, and have various functions such as photoprotection, improvement of plant disease resistance, as well as various medicinal effects [42]. The biosynthesis of anthraquinone mainly results from the shikimate/osuccinylbenzoic acid pathway and polyketide pathway, and the products can be modified with UGTs, CYP450s [7,42,43]. PKSs can be divided into types I, II, and III PKSs according to their structure [44]. Type III PKSs, the chalcone synthase superfamily, are mainly distributed in the plants, but have also been identified recently in bacteria and fungi [19,44,45]. At present, the mechanisms underlying most polyketide biosynthetic pathways are still elusive, but PKS III of plants certainly plays a key role in the initial reaction of these pathways [46,47]. The 26 unigenes found in the roots, stems, and leaves of P. cuspidatum could be annotated to 18 enzymes participating in the formation of the anthraquinone skeleton [11].
Here, we carried out comparative transcriptome analyses with specific attention to the expression levels of the enzyme genes in the biosynthetic pathways of anthraquinone, catechin, and gallic acid in different tissues. Combined with functional annotation, we identified 175 tissue-specific candidate genes in the anthraquinone biosynthetic pathway of R. officinale, and only three of the seven candidate genes that might encode PKSIII in the polyketide pathway ( Figure 5 and Table S5). The results of the analysis revealed that many DEGs displayed tissue-specific expression (e.g., ICS, MenC, MenB were highly expressed in roots; DHQD/SDH, SK, CS, and MenE were mainly expressed in stems), which implied that the accumulation levels of secondary metabolites might differ among different tissues [12]. The biosynthetic pathways of catechins and gallic acid start with the common precursors, phosphoenolpyruvate and erythrose 4-phosphate, and share part of the shikimate pathway with anthraquinone biosynthesis (Figures 4A and 6A). Nonetheless, there are two synthetic pathways for gallic acid, but it is generally accepted that the short pathway directly from 3-dehydroshikimic acid is predominant. A total of 126 structural enzyme genes for the biosynthesis processes were found, among them, DAHPS, HCT, and CHS had high expression levels in roots, while the rest were expressed in stems and leaves ( Figure 6A). Indeed, these results further demonstrate that RNA-seq is an important method for exploring key enzyme genes associated with the biosynthesis of interest secondary metabolites, enabling potential genetic improvements, enhancing the content of compounds, assisting molecular breeding, and facilitating functional gene research in rhubarb.

Transcription Factor Analysis
TFs are proteins that bind DNA in a sequence-specific manner and regulate transcription by acting as recruitment required for RNA polymerase during transcription initiation [16,48]. In plants, TFs can improve disease and stress resistance, mediate growth and development, affect intercellular signaling, regulate the synthesis and accumulation of secondary metabolites, as well as influence plant evolution [10,16,49,50]. Several studies have demonstrated that the MYB-bHLH-WD40 complex in plants can regulate the biosynthesis of flavonoids, changing the types and contents of metabolites [51]. In addition, NAC, ERF, C2H2, C3H, WRKYs, and MYB-related TF families can also mediate the synthesis of secondary metabolites [10,16,52]. The most abundant TF families predicted in R. officinale included bHLH, followed by ERF, C2H2, bZIP, and MYB ( Figure 3C). Notably, the bHLH and ERF families were the most differentially expressed in Ro_R vs. Ro_L and Ro_R vs. Ro_S, respectively ( Figure 3D). The bHLH and bZIP TF families mainly play crucial roles in plant growth and development, physiological metabolism, and stress response, and the ERF TF family is involved in signaling pathways such as salicylic acid, jasmonic acid, and ethylene [13,53]. However, TFs related to the biosynthesis of anthraquinones have not yet been reported, and this should be explored more in-depth in the future. The MYB TF family is necessary for the phenylpropanoid synthesis and metabolism pathway, therefore, it can regulate the biosynthetic pathway of catechins [54]. TFs identified from the R. officinale database may contribute in altering the content and tissue-specific accumulation of secondary metabolites as well as responding to adverse effects.

Simple Sequence Repeat (SSR) Analysis
With the rapid development of molecular biology-related detection technologies such as high-throughput sequencing, the use of RNA-seq expressed sequences to develop molecular markers has obvious advantages, especially for non-model species without a reference genome [20,23,55]. The developed SSRs are tandem repeat sequences consisting of 1-6 nucleotides in the genome, distributed in both coding and non-coding regions of genes, and are the most suitable markers for constructing high-throughput genotyping with co-dominant inheritance, high polymorphism, good reproducibility, extensive genomic coverage, and cost-saving [20]. Moreover, because SSR originates from expressed gene regions and can directly reflect the diversity of related genes, it is widely used in plant genetic breeding, germplasm resource conservation and development, etc. [23,56]. Through SSR analysis of the R. officinale transcriptome database, we obtained a large number of unigene sequences containing SSR sites, with an average frequency of 1 SSR/1.1Kb, which is higher than that of species such as coffee (1 SSR/2.16 kb) [57], chickpea (1 SSR/8.66 kb) [58], and Medicago truncatula (1 SSR/1.8 kb) [59], but similar to that in Cassia angustifolia (1 SSR/1.08 kb) [37]. The SSRs discovered in our work can not only assist the molecular reproduction of R. officinale, but also provide more candidate molecular markers to study the genetic variation of R. officinale and even Rheum species.

Conclusions
R. officinale is a well-known traditional Chinese medicine that is needed for its root and rhizome which have important pharmacological effects. Here, we performed comparative transcriptome analyses on leaves, stems, and roots of R. officinale. A total of 236,031 unigenes with N50 of 769 bp was generated, 136,329 (57.76%) of the assembled unigenes were annotated, and 5884 DEGs were identified in the three comparison groups with 175 and 126 key enzyme genes being found in the anthraquinone and catechin/gallic acid biosynthesis pathways, respectively. Interestingly, the phylogenetic analysis of the PKSIII superfamily in Polygonaceae indicated only PL_741 PKSIII1, PL_11549 PKSIII5, and PL_101745 PKSIII6 of the seven candidate genes probably encoding PKSIII in the polyketide pathway, which belonged to the CHS group. This valuable genetic information could lay a solid foundation for improving the content of bioactive secondary metabolites in R. officinale. Furthermore, the SSRs identified for R. officinale would supply substantial genetic molecular markers, together with the transcriptome dataset also providing useful genetic resources for genetic diversity analysis and molecularly assisted breeding at the genomic level.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/genes13091592/s1, Figure S1: Length distribution of the assembled transcripts and unigenes obtained from transcriptome sequencing data of R. officinale. Figure S2: (A) COG classification of assembled unigenes of R. officinale. The assembled unigenes were classified into 25 categories in the COG classification. The x-axis indicates the COG classification, and the y-axis indicates the number of unigenes in the category. (B) Distribution of R. officinale unigenes in different KEGG pathways. The mainly four representative categories according to corresponding biological pathways are "metabolism", "genetic information processing", "cellular processes", and "organismal systems". Figure S3: Overview of gene expression in R. officinale. (A) Boxplot analysis of gene expression profiles in the nine samples. The hyphens above and below the boxes represent the maximum and minimum values, respectively; the upper and lower boundaries of the boxes represent the interquartile ranges, the line in the boxes represents the median, and dots represent the outliers. (B) The correlation between samples of R. officinale. The more orange in color and the larger the circle, the closer the correlation between the two samples becomes. Figure Figure S7: Number distribution of the SSR types identified in unigenes of R. officinale. Table S1: Reads information of nine cDNA libraries from three different tissues. Table S2: Statistics of metabolic pathways in KEGG functional annotations. Note: The 19 secondary metabolic pathways are marked in orange. Table S3: DEGs involved in secondary metabolic pathways in KEGG enrichment analysis. Table S4 Table S7: The repeat number of SSR repeat types. Table S8: Statistics of the number of SSR motifs. Table S9: Candidate type III polyketide synthases downloaded from NCBI. Table S10: The primers used in this study for qRT-PCR of randomly selected unigenes. Table S11: SSR primer pairs designed based on the candidate SSR loci in the transcriptome of R. officinale.  Data Availability Statement: The data and materials supporting the conclusions of this study are included within the article and its additional files. All the raw reads generated in this study have been deposited in the NCBI with the BioProject accession number of PRJNA827652.