Genomic and Transcriptional Profiling Analysis and Insights into Rhodomyrtone Yield in Rhodomyrtus tomentosa (Aiton) Hassk

Rhodomyrtus tomentosa is a source of a novel antibiotic, rhodomyrtone. Because of the increasing industrial demand for this compound, germplasm with a high rhodomyrtone content is the key to sustainable future cultivation. In this study, rhodomyrtone genotypes were verified using the plastid genomic region marker matK and nuclear ribosomal internal transcribed spacer ITS. These two DNA barcodes proved to be useful tools for identifying different rhodomyrtone contents via the SNP haplotypes C569T and A561G, respectively. The results were correlated with rhodomyrtone content determined via HPLC. Subsequently, R. tomentosa samples with high- and low-rhodomyrtone genotypes were collected for de novo transcriptome and gene expression analyses. A total of 83,402 unigenes were classified into 25 KOG classifications, and 74,102 annotated unigenes were obtained. Analysis of differential gene expression between samples or groups using DESeq2 revealed highly expressed levels related to rhodomyrtone content in two genotypes. semiquantitative RT-PCR further revealed that the high rhodomyrtone content in these two genotypes correlated with expression of zinc transporter protein (RtZnT). In addition, we found that expression of RtZnT resulted in increased sensitivity of R. tomentosa under ZnSO4 stress. The findings provide useful information for selection of cultivation sites to achieve high rhodomyrtone yields in R. tomentosa.


Introduction
Rhodomyrtus tomentosa-has long been used as a traditional medicine in Asian countries such as China, Vietnam, Indonesia, Malaysia, and Thailand. It has been reported that many parts of the plant contain various phytochemical compositions. Indigenous people in Southeast Asia use the berries as a remedy for dysentery and diarrhea. Parts of the roots and stem are used for stomach ailments and as a traditional medicine for women after childbirth. In Thailand, R. tomentosa is used as an antipyretic, antidiarrheal, and antidysenteric agent [1].
Rhodomyrtone, an acyl phloroglucinol compound from R. tomentosa, has a strong antibacterial activity against Gram-positive bacteria, including important pathogens such as Enterococcus faecalis, Staphylococcus aureus, Staphylococcus epidermidis, Streptococcus gordonii, Streptococcus mutans, Streptococcus pneumoniae, Streptococcus pyogenes, Streptococcus salivarius, and Propionibacterium acnes. In addition, it has notably specific activity against methicillin-resistant S. aureus (MRSA) with a minimum inhibitory concentration (MIC) and a minimum bactericidal concentration (MBC) of 0.39 to 0.78 mg/mL [2,3]. Its ability

Genotyping Analysis in Rhodomyrtus tomentosa
DNA barcoding has recently become a widely used and effective tool for rapid and accurate species discrimination. Several plastid DNA barcoding regions, e.g., matK, that function in all plant species have been evaluated and the combination of non-coding intergenic spacer identification (ITS) was found to be the best option for correct plant species discrimination [27,28]. In this work, based on the available information on plant DNA barcoding aimed at identifying the botanical origin of different species of R. tomentosa, different sites were selected in Surat Thani and Songkhla provinces in southern Thailand.
The total genomic DNA of the R. tomentosa samples was extracted and tested for quality using universal 18S rDNA primers (18SF/18SR), as described by Nakkaew et al. [29]. This showed positive amplification in all samples, confirming that there were no false negatives. Therefore, the loci matK and ITS were amplified from the R. tomentosa samples from Surat Thani (RtST1-RtST6) and Songkhla provinces (RtSK1-RtSK6), with direct sequencing of the two strands in opposite directions because the platform often does not allow for perfect resolution of the first 40-60 bp at the 5 end of the sequence. The nucleotide sequences of matK and ITS were searched using BLASTN to identify homologous sequences. The nucleotide sequences were aligned using the Multiple Sequence Alignment option in ClustalX [30] and GeneDoc [31] standalone software, and the alleles amplified from the R. tomentosa samples were sequenced for further analysis. The alignment results showed that two single nucleotide polymorphisms (SNPs) existed in the sequences together with the matK and ITS loci ( Figure 1A,B). The first SNP from the matK locus was located at the 569th nucleotide position and was found to be a transition SNP (C to T). The nucleotide 'C' was present in all the R. tomentosa samples from Surat Thani, namely, the Surat Thani genotype, whereas 'T' was present in all the samples from Songkhla province, namely, the Songkhla genotype. The second SNP, from the ITS1 locus, was a transitional SNP (from A to G) found at nucleotide position 561. The nucleotide 'A' was present only in the Surat Thani genotype, whereas nucleotide 'G' was present only in the Songkhla genotype. From the above results, it is evident that the two SNPs contribute to the differences in the genotyping of R. tomentosa from Surat Thani and Songkhla provinces. Similar results were found in relation to Amomum villosum Lour., which is used in traditional Chinese medicine. SNP genotypes were identified in the plastid genes rbcL., matK, and ITS regions which could be used as a multiregional DNA barcode to accurately identify Amomi Fructus landraces in different producing areas. This may imply the typing of a single nucleotide polymorphism based on DNA barcoding markers in order to identify the germplasm of Amomi Fructus [32]. In a recent study on the rhizomatous medicinal herb Polygonatum odoratum, plastid genomic information and nuclear SNP markers were used to assess the quality of P. odoratum cultivars and classify them as medicinal resources with premium medicinal and edible properties [33].

Rhodomyrtone Quantitation Analysis in Rhodomyrtus tomentosa
Based on genotyping analysis, R. tomentosa was divided into two groups: the Surat Thani and Songkhla genotypes. Next, the rhodomyrtone in these two genotypes was obtained via supercritical fluid extraction (SFE) and then determined using high-performance liquid chromatography (HPLC). The amounts of rhodomyrtone found in the two genotypes were very different. The average amounts of rhodomyrtone in the Surat Thani and Songkhla genotypes were 149.83 ± 0.08 and 25.89 ± 0.09 µg/mL, respectively ( Figure 2). Based on the above results, in this study we describe the Surat Thani genotype (RtST) as a high-rhodomyrtone and the Songkhla genotype (RtSK) as a low-rhodomyrtone genotype.

Rhodomyrtone Quantitation Analysis in Rhodomyrtus tomentosa
Based on genotyping analysis, R. tomentosa was divided into two groups: the Surat Thani and Songkhla genotypes. Next, the rhodomyrtone in these two genotypes was obtained via supercritical fluid extraction (SFE) and then determined using highperformance liquid chromatography (HPLC). The amounts of rhodomyrtone found in the two genotypes were very different. The average amounts of rhodomyrtone in the Surat Thani and Songkhla genotypes were 149.83 ± 0.08 and 25.89 ± 0.09 µg/mL, respectively ( Figure 2). Based on the above results, in this study we describe the Surat Thani genotype (RtST) as a high-rhodomyrtone and the Songkhla genotype (RtSK) as a low-rhodomyrtone genotype.

De Novo Transcriptome Analysis of High-and Low-Rhodomyrtone R. tomentosa
Four cDNA libraries were prepared from the low-(RtSK1 and RtSK2) and highrhodomyrtone genotypes (RtST1 and RtST2) of R. tomentosa. The libraries were sequenced using the BGISEQ-500 sequencer for transcriptome analysis. The libraries sequenced with paired-end sequence reads (PE) contained approximately 9.85, 9.97, 9.90, and 9.80 Gb of total clean bases from the RtSK1, RtSK2, RtST1, and RtST2 libraries, respectively (BioProject submission No. PRJNA956731). Then, the adapter and low-quality reads were trimmed, and the short reads (<20 bp) were removed. Quality analyses of the clean reads from the RtSK1, RtSK2, RtST1, and  (Table 1). In addition, the average GC content was 47.79% and the assembled transcripts that were longer than 500 bp were 37.55%, 40.53%, 43.83%, and 34.27% of each library. Then, all transcripts from the four libraries were collected for further analysis by clustering the assembled transcripts and using the TGI Clustering Tool (TGICL) [35] to remove redundant transcripts from each library and obtain all the unigenes and quality metrics ( Table 2 and Figure 3). The length of most of the unigenes ranged from 300 bp to ≥3000 bp. In addition, 186,033 contigs were predicted with an average length of 584 bp, and 52,018 (30.08%) of the predicted contigs were longer than 500 bp. The high-quality sequencing results support the following functional annotations.

R. tomentosa Unigene Annotation and Classification
A total of 186,033 contig sequences were identified using the Nr database (Figure 4), the NCBI NT database, the Swiss Institute of Bioinformatics (Swiss-Prot) databases, the Kyoto Encyclopedia of Genes and Genomes (KEGG), the Clusters of Orthologous Groups of Proteins (COG) database, and the Gene Ontology (GO) resource with a statistical significance E value < 0.00001 (Table 3). A total of 186,033 contig sequences were identified using the Nr database (Figure 4), the NCBI NT database, the Swiss Institute of Bioinformatics (Swiss-Prot) databases, the Kyoto Encyclopedia of Genes and Genomes (KEGG), the Clusters of Orthologous Groups of Proteins (COG) database, and the Gene Ontology (GO) resource with a statistical significance E value < 0.00001 (Table 3).  The 83,402 unigenes were analyzed for similarity to NR (nonredundant database) annotated plant datasets using BLAST searches with an e-value < 0.00001, and similarities with Eucalyptus grandis (53.49%), Hordeum vulgare subsp. Vulgare (4.32%), Punica granatum (1.06%), Zea mays (1.03%), and others (40.10%) were identified ( Figure 5). These results are  The 83,402 unigenes were analyzed for similarity to NR (nonredundant database) annotated plant datasets using BLAST searches with an e-value < 0.00001, and similarities with Eucalyptus grandis (53.49%), Hordeum vulgare subsp. Vulgare (4.32%), Punica granatum (1.06%), Zea mays (1.03%), and others (40.10%) were identified ( Figure 5). These results are consistent as E. grandis is closely related to R. tomentosa. respectively). The 83,402 unigenes were analyzed for similarity to NR (nonredundant database) annotated plant datasets using BLAST searches with an e-value < 0.00001, and similarities with Eucalyptus grandis (53.49%), Hordeum vulgare subsp. Vulgare (4.32%), Punica granatum (1.06%), Zea mays (1.03%), and others (40.10%) were identified ( Figure 5). These results are consistent as E. grandis is closely related to R. tomentosa. Figure 5. Distribution of NR annotated species from R. tomentosa. Figure 5. Distribution of NR annotated species from R. tomentosa.
In addition, the sequence homology of 83,402 unigenes was annotated. The results from the NT database showed that 58,242 unigenes matched the NCBI official protein database. Next, the unigenes were annotated by noting their alignment with the other func-  [36,37], and it was found that the WRKY transcription factor family might be involved in the accumulation of kaempferol and quercitrin in Camellia vietnamensis [38].
There were 17 categories in the cellular component domain and these were mainly involved in cell functions (19%, 18,160 unigenes), cell parts (19%, 17,794 unigenes), and membrane (16%, 14,791 unigenes) ( Figure 9B). Finally, 16 domains were found in the molecular function category. These were mainly involved in catalytic activity (43%, 25,711 unigenes), binding (40%, 24,001 unigenes), structural molecular activity (7%, 3898 unigenes), and the lowest with just 1 unigene was toxin activity ( Figure 9C). These GO mapping results indicate that most DEGs responding to heat stress are involved in metabolic processes, cell and catalytic activity, and enzymatic activity. These affected activities indicate that heat stress treatment primarily causes functional changes in physiological metabolism and cell differentiation.  involved in defense mechanisms; 3,418 unigenes (3.08%) in lipid transport and metabolism; 2513 unigenes (2.41%) in the biosynthesis, transport, and catabolism of secondary metabolites; 839 unigenes (0.81%) in nucleotide transport and metabolism, 628 unigenes (0.06%) in nuclear structure and the lowest, 67 unigenes (<0.06%) in cell motility ( Figure 8). The GO annotation identified 55 functional groups belonging to the three main GO ontologies: molecular function, cellular component, and biological process. Twenty-two groups, the majority, belonged to the domain of metabolic processes (33%, 20,500 unigenes), followed by cellular processes (32%, 20,116 unigenes), and biological regulatory processes (7%, 4131 unigenes) ( Figure 9A). To identify active biological pathways in the transcriptome, sequences were mapped to the canonical reference pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG). A total of 63,831 sequences were classified using the KEGG database and grouped into five main categories. The highest results were found in metabolic pathways (64.15%, 41,974 unigenes), followed by genetic information processing pathways (24.18%, 15,824 unigenes). These were followed by the cellular processes and environmental information processing pathways with 4.63% and 4.59% (3028 and 3004 unigenes), respectively, and the lowest value was that for organismic systems (4.63%, 1602 unigenes) ( Figure 10). Thus, the reported unigenes were found in the metabolic pathways subcategories such as biosynthesis of other secondary metabolites (2301 unigenes), metabolism of terpenoids and polyketides (1083 unigenes), and metabolism of other amino acids (1739 unigenes). These subcategories should therefore be considered as sequences that are potentially useful for defining metabolic pathways that might be related to the accumulation of rhodomyrtone in R. tomentosa.

Analysis of Differentially Expressed Genes of R. tomentosa
Based on the clean reads from four libraries with low (RtSK1 and RtSK2) and high rhodomyrtone contents (RtST1 and RtST2), an analysis of gene expression levels was performed. The gene expression levels in each library were calculated using RSEM and, to provide a more intuitive measurement, gene expressions at different FPKM intervals (FPKM ≤ 1, FPKM: 1~10, FPKM ≥ 10) were calculated for each library. In Figure 11  The gene expression levels of all 172,223 DEGs in the low-and high-rhodomyrtone R. tomentosa datasets were compared. The DEseq2 results were plotted using a Venn diagram, and scatter and volcano plots were used to show the expression levels of the different sample groups. From the scatter plot, it is clear that 4380 and 5513 DEGs were up-and downregulated, respectively, in the high-rhodomyrtone R. tomentosa datasets (Figure 12). grouped into five main categories. The highest results were found in metabolic pathways (64.15%, 41,974 unigenes), followed by genetic information processing pathways (24.18%, 15,824 unigenes). These were followed by the cellular processes and environmental information processing pathways with 4.63% and 4.59% (3028 and 3004 unigenes), respectively, and the lowest value was that for organismic systems (4.63%, 1602 unigenes) ( Figure 10). Thus, the reported unigenes were found in the metabolic pathways subcategories such as biosynthesis of other secondary metabolites (2301 unigenes), metabolism of terpenoids and polyketides (1083 unigenes), and metabolism of other amino acids (1739 unigenes). These subcategories should therefore be considered as sequences that are potentially useful for defining metabolic pathways that might be related to the accumulation of rhodomyrtone in R. tomentosa.  Additionally, a false discovery rate (FDR) < 0.05 and a |log2(fold-change)| > 1 were used as thresholds for identifying differentially expressed genes (DEGs) of the 9893 genes in the low-and high-rhodomyrtone samples of R. tomentosa. The DEGs were subjected to GO and KEGG enrichment analyses to determine the differences in biological processes and pathways between the low-and high-rhodomyrtone genotypes of R. tomentosa. Then, DEGs between the low-and high-rhodomyrtone genotypes were annotated using GO terms. In the biological processes category, the top GO terms were metabolic and cellular processes with 2083 and 1992 DEGs, respectively, followed by biological regulation, localization, and regulation of biological processes, which were assigned 378, 349, and 349 DEGs, respectively. In the cellular component category, there were only three terms, cellular anatomical unit, proteinaceous complex, and virion component, which were assigned 3086, 468, and 3 DEGs, respectively. In addition, the most common molecular function GO terms among the DEGs were catalytic activity, binding, and structural molecular activity, which were assigned to 2703, 2198, and 404 DEGs, respectively ( Figure 13B). rhodomyrtone contents (RtST1 and RtST2), an analysis of gene expression levels was performed. The gene expression levels in each library were calculated using RSEM and, to provide a more intuitive measurement, gene expressions at different FPKM intervals (FPKM ≤ 1, FPKM: 1~10, FPKM ≥ 10) were calculated for each library. In Figure 11, the depth of color indicates the different gene expression levels: FPKM ≤ 1 indicates an extremely low expression level and 1 < FPKM ≤ 10 indicates a high expression level.  Additionally, a false discovery rate (FDR) < 0.05 and a |log2(fold-change)| > 1 were used as thresholds for identifying differentially expressed genes (DEGs) of the 9893 genes in the low-and high-rhodomyrtone samples of R. tomentosa. The DEGs were subjected to GO and KEGG enrichment analyses to determine the differences in biological processes and pathways between the low-and high-rhodomyrtone genotypes of R. tomentosa. Then, DEGs between the low-and high-rhodomyrtone genotypes were annotated using GO terms. In the biological processes category, the top GO terms were metabolic and cellular processes with 2083 and 1992 DEGs, respectively, followed by biological  Out of 134 KEGG pathways, all low-and high-rhodomyrtone DEGs were classified into seven branches of the KEGG functional classification and matched to five KEGG pathway branches, with 4463 associated with the metabolic branches. This was followed by genetic information processing, environmental information processing, cellular processes, and organismal systems branches, which were associated with 1751, 387, 253, and 201 DEGs, respectively ( Figure 13B).

In Silico Analysis of Enhancement Gene Provides Insights into Rhodomyrtone Biosynthesis
Transcriptome profiling identified 9893 differentially expressed genes which were divided into the five KEGG pathway branches with the most abundant metabolites. The smallest P/Q values between the low-and high-rhodomyrtone genotypes were nucleocytoplasmic transport, RNA polymerase, the phenylpropanoid biosynthesis pathway, the plant MAPK signaling pathway, and plant hormone signal transduction ( Figure 14A,B). Based on these results, analysis of differentially expressed unigenes between the two groups was performed to identify potential genes involved in regulating the rhodomyrtone content of R. tomentosa. Using the DEseq2 method [39], compared with the low-rhodomyrtone genotype, 4,380 genes were upregulated and 5513 genes were downregulated in the high-rhodomyrtone genotype. In addition, the fifteen genes with the highest and lowest expressions were identified in the high-rhodomyrtone groups. The results showed that several hormone-signaling transcription factors (CL14267.Contig2_All, CL12904.Contig1_All, CL11462.Contig1_All, and CL5679.Contig2_All) and stress signaling pathways (CL1945.Contig1_All and CL3612.Contig4_All) were related to rhodomyrtone in R. tomentosa (Table 4 and Figure 15) and were therefore identified as having possible involvement in rhodomyrtone biosynthesis. A candidate gene that may contribute to further improving yield, nutraceutical quality, and secondary metabolites in R. tomentosa is the zinc transporter protein (CL1945.Contig1_All), because zinc is essential for plant growth and is involved in the accumulation of phytochemicals [40,41]. Out of 134 KEGG pathways, all low-and high-rhodomyrtone DEGs were classified into seven branches of the KEGG functional classification and matched to five KEGG pathway branches, with 4463 associated with the metabolic branches. This was followed by genetic information processing, environmental information processing, cellular processes, and organismal systems branches, which were associated with 1751, 387, 253, and 201 DEGs, respectively ( Figure 13B).

In Silico Analysis of Enhancement Gene Provides Insights into Rhodomyrtone Biosynthesis
Transcriptome profiling identified 9893 differentially expressed genes which were divided into the five KEGG pathway branches with the most abundant metabolites. The smallest P/Q values between the low-and high-rhodomyrtone genotypes were nucleocytoplasmic transport, RNA polymerase, the phenylpropanoid biosynthesis pathway, the plant MAPK signaling pathway, and plant hormone signal transduction ( Figure 14A,B). Based on these results, analysis of differentially expressed unigenes between the two groups was performed to identify potential genes involved in regulating the rhodomyrtone content of R. tomentosa. Using the DEseq2 method [39], compared with the lowrhodomyrtone genotype, 4,380 genes were upregulated and 5513 genes were downregulated in the high-rhodomyrtone genotype. In addition, the fifteen genes with the highest and lowest expressions were identified in the high-rhodomyrtone groups. The results showed that several hormone-signaling transcription factors (CL14267.Contig2_All, CL12904.Contig1_All, CL11462.Contig1_All, and CL5679.Contig2_All) and stress signaling pathways (CL1945.Contig1_All and CL3612.Contig4_All) were related to rhodomyrtone in R. tomentosa (Table 4 and Figure 15) and were therefore identified as having possible involvement in rhodomyrtone biosynthesis. A candidate gene that may contribute to further improving yield, nutraceutical quality, and secondary metabolites in R. tomentosa is the zinc transporter protein (CL1945.Contig1_All), because zinc is essential for plant growth and is involved in the accumulation of phytochemicals [40,41].

Expression Analysis of Rhodomyrtone Related Genes through qRT-PCR Validation
To verify the accuracy of the RNA-seq results, three genes associated with rhodomyrtone were randomly selected for semiquantitative RT-PCR validation (CL11462.Contig1_All, CL1945.Contig1_All, and CL3457.Contig3 as shown in Supplementary Table S1) and 18S rRNA primers were used as internal controls. These checks amplified only the Zn transporter, CL1945.Contig1_All. The RNA-seq data and recent studies suggest that Zn transporter may be a candidate gene potentially related to rhodomyrtone biosynthesis. Zn transporters (ZnTs) are the proteins mainly involved in Zn transport within the plant and zinc plays a role in plant defenses against pathogens, including the enhancement of bioactive compounds in the plant cells [40,42]. Based on these results, RtZnT was selected for RT-qPCR validation to verify the reliability of the transcriptome data. Semiquantitative RT-PCR was performed to evaluate the RtZnT gene expression patterns. The total RNAs from the two low-and high-rhodomyrtone genotypes of R. tomentosa were amplified using RtZnT-specific primers and 18S rRNA primers as internal controls. RtZnT showed relatively strong transcription in the Surat Thani genotype (ST1-6) samples, whereas no RtZnT expression was detected in the Songkhla genotype (SK1-6) ( Figure 16). These expression patterns were generally consistent with the RNA-seq results, confirming the accuracy of the transcriptome data.

Expression Analysis of Rhodomyrtone Related Genes through qRT-PCR Validation
To verify the accuracy of the RNA-seq results, three genes associated with rhodomyrtone were randomly selected for semiquantitative RT-PCR validation (CL11462.Contig1_All, CL1945.Contig1_All, and CL3457.Contig3 as shown in Supplementary Table S1) and 18S rRNA primers were used as internal controls. These checks amplified only the Zn transporter, CL1945.Contig1_All. The RNA-seq data and recent studies suggest that Zn transporter may be a candidate gene potentially related to rhodomyrtone biosynthesis. Zn transporters (ZnTs) are the proteins mainly involved in Zn transport within the plant and zinc plays a role in plant defenses against pathogens, including the enhancement of bioactive compounds in the plant cells [40,42]. Based on these results, RtZnT was selected for RT-qPCR validation to verify the reliability of the transcriptome data. Semiquantitative RT-PCR was performed to evaluate the RtZnT gene expression patterns. The total RNAs from the two low-and high-rhodomyrtone genotypes of R. tomentosa were amplified using RtZnT-specific primers and 18S rRNA primers as internal controls. RtZnT showed relatively strong transcription in the Surat Thani genotype (ST1-6) samples, whereas no RtZnT expression was detected in the Songkhla genotype (SK1-6) ( Figure 16). These expression patterns were generally consistent with the RNA-seq results, confirming the accuracy of the transcriptome data. In addition, the expression of RtZnT was observed in R. tomentosa grown hydroponically with two different ZnSO4 concentrations (100 mM and 500 mM) for 7 days. The genes were more highly expressed in the 500 mM ZnSO4 than in the 100 mM ZnSO4 conditions ( Figure 17). This result is consistent with the effects observed in habanero pepper plants, in which ZnSO4 was found to increase the phenol and flavonoid content [43].  In addition, the expression of RtZnT was observed in R. tomentosa grown hydroponically with two different ZnSO 4 concentrations (100 mM and 500 mM) for 7 days. The genes were more highly expressed in the 500 mM ZnSO 4 than in the 100 mM ZnSO 4 conditions ( Figure 17). This result is consistent with the effects observed in habanero pepper plants, in which ZnSO 4 was found to increase the phenol and flavonoid content [43].       22 30 E, respectively), Thailand, during the harvest season of May 2021. A quantity of 1 g of mature leaves from each group (high-and lowrhodomyrtone) was sampled with five replicates, frozen in liquid nitrogen, and then stored at −80 • C prior to de novo transcriptome analysis. In addition, 100 g of mature leaves from each group was sampled in triplicate and dried in a hot air oven at 60 • C for 16-18 h prior to rhodomyrtone extraction.

DNA Barcoding Analysis of R. tomentosa Using matK and ITS Markers
DNA extraction and PCR amplification of the matK and ITS loci of R. tomentosa. Plant leaves were frozen in liquid nitrogen and ground into a fine powder. Genomic DNA was then isolated and purified using a Tissue Genomic DNA Mini Kit (Geneaid, New Taipei City, Taiwan) in accordance with the manufacturer's instructions. The ribosomal region ITS was amplified with the primer pair ITSF (5 -CGTAACAAGGTTTCCGTAGGTGAAC-3 ) and ITSR (5 -TTATTGATATGCTTAAACTCAGCGGG-3 [44], and the chloroplastic matK loci were amplified with the primer pair MatKF (5 -CGTACAGTACTTTTGTGTTTACGAG-3 ) and MatKR (5 -ACCCAGTCCATCTGGAAATCTTGGTTC-3 ) [45]. The PCR reaction was performed in a total volume of 50 µL per 2 µL DNA template, 10 µL 5X HOT FIREPol Blend Master Mix (containing 10 Mm MgCl 2 ), 2 µL of the mixed primer pair, and 36 µL deionized water. The PCR profile for matK and ITS was optimized via initial denaturation at 95 • C for 15 min, followed by 35 cycles starting with denaturation at 95 • C for 30 s and annealing at 60 • C for 30 s, followed by a final extension at 72 • C for 10 min. The amplified PCR products were sequenced in both directions using an automated DNA ABI PRISM 3700 sequencer, (Applied Biosystems, Waltham, MA, USA). Nucleotide sequence homology searches were performed using the BLASTN tool, and the alignment sequence was analyzed using the clustalX2.0 and GENDOC programs.

Extraction and Quantitative Determination of Rhodomyrtone
A quantity of 10 g of R. tomentosa leaf powder obtained from Surat Thani and Songkhla provinces was filtered using a supercritical fluid extractor at the Office of Scientific Instrument and Testing (OSIT), Prince of Songkla University (PSU) (Hat Yai, Songkhla, Thailand) prior to extraction. The bioactive compound was extracted using 100% CO 2 at a flow rate of 18 g/min, an extraction temperature of 55 • C, and a pressure of 250 bar. While one operating parameter was changed, the other operating parameters were kept constant. The obtained extract was dissolved in 100% DMSO and stored at −20 • C until further use. All samples were filtered through a 0.45 µm syringe filter. Aliquots of these samples were then taken for the determination of the total rhodomyrtone content at OSIT, PSU (Hat Yai, Songkhla, Thailand). Quantitative determination of rhodomyrtone substances was performed using an HPLC Agilent 1100 liquid chromatography system consisting of an Agilent Chemstation for GC, LC, LC /MSD, CE, UV-VIS detector (Agilent Technologies Inc., Santa Clara, CA, USA) and A/D Systems-Rev. 08. Ox. An aliquot of 20 uL of each sample solution was directly injected into the HPLC system using a mixture of acetonitrile (Merck Millipore, Burlington, MA, USA)) and water, which was run in the analytical Symmetry C8 column (4.6 × 150 mm, particle size 3 um) at 30 • C, and the peak was detected at 254 nm. The column was equilibrated for 6 min before each subsequent run and saturated with the mobile phase for at least 3 h before each assay. Rhodomyrtone (Sigma-Aldrich, St. Louis, MO, USA) was used as the internal standard for the assay.

De Novo Transcriptome Analysis of the High-and Low-Rhodomyrtone R. tomentosa
Five independent samples of leaves (n = 5) of equal weights were collected at the same stage for each of the low-(two replicate samples: SK1 and SK2) and high-(two replicate samples: ST1 and ST2) rhodomyrtone groups and sent to BGI-Shenzhen (Shenzhen, China) for library construction. Total RNA was extracted and measured using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific Inc., Waltham, MA, USA) and RNA integrity was determined using an Agilent 2100 Bioanalyzer (Agilent Technologies Inc., Santa Clara, CA, USA). Sequencing libraries were constructed in accordance with the BGISEQ-500 library construction protocol. mRNA enrichment and reverse transcription to double-stranded cDNA (dscDNA) was conducted using N6 random primers and the synthesized cDNA was subjected to end repair and then 3 adenylated. Adaptors were ligated to the ends Plants 2023, 12, 3156 20 of 23 of these 3 -adenylated cDNA fragments. The ligation products were purified and many rounds of PCR amplification were performed to enrich the purified cDNA template with PCR primers. Then, the PCR product was denatured using heat, and the single-stranded DNA was cyclized via splint oligo and DNA ligase and sequenced using the BGISEQ-500 platform (BGI, Shenzhen, China) to generate 100-bp paired-end reads.
Then, quality control, de novo transcriptome composition, and functional annotation of genes were performed. A total of 4 RNA-seq datasets were used for the study, 2 of which were from a low-rhodomyrtone sample and 2 were from a high-rhodomyrtone sample deposited in the Sequence Read Archive database. The raw RNA-seq reads were trimmed and cleared of adapters using Trimmomatic v.0.36 [46] and AdapterRemoval (v. 2.1.3) [47]. Raw reads were then analyzed using FastQC [48] to calculate read quality metrics and Trinity was used to perform de novo assembly with clean reads. All unigenes were subjected to functional annotation analysis. Using Diamond, the assembled unigenes were aligned to the publicly available protein databases, including the NCBI nonredundant protein (NR), the Swiss-Prot protein (Swiss-Prot), Gene Ontology (GO), Clusters of Orthologous Groups (COG), and the Kyoto Encyclopedia of Genes and Genomes (KEGG). We used Blast2GO with a cutoff E value of 10 −5 and InterProScan5 for InterPro annotation analysis. In addition, for CD prediction of unigenes we used Transdecoder (version 3.0.1) [34] and for plant TF prediction analysis we used getorf to find the ORF for each unigene, then aligned the ORF to TF domains using hmmsearch and identified TFs according to the PlantfDB database.

Unigene Expression Analysis
Unigene expression levels were calculated as Reads Per Kilobase Million Mapped Reads (RPKM), eliminating the effects of sequencing depth and gene length on gene expression levels and allowing direct data comparison with the DESeq2 method [39]. The expression levels of Unigenes involved in metabolic pathways related to seed oil accumulation were calculated. The DEGs between different time points were identified with padj < 0.05 and |log2 (fold change value)| ≥ 1.

Verification of Related Genes Using Semiquantitative RT-PCR
The semiquantitative RT-PCR technique was used to test the reliability of the RNAseq data. In this experiment, DEGs were randomly selected and primers were designed based on the RNA-seq contig data; information on the design of the semiquantitative RT-PCR primers is set out in Supplementary Table S1. Total RNA of R. tomentosa collected from different locations in Surat Thani and Songkhla provinces in southern Thailand was extracted using a Plant Total RNA Mini Kit (Geneaid, New Taipei City, Taiwan) according to the manufacturer's protocol. The RT-PCR reactions contained 500 ng of RNA as template and were performed using the OneStep RT-PCR reaction kit in accordance with the manufacturer's instructions (Qiagen, Hilden, Germany). The reaction was started at 50 • C for 30 min, followed by an initial PCR activation step at 95 • C for 15 min, then 40 cycles at 94 • C for 25 s, 60 • C for 25 s, and 72 • C for 25 s, and terminated by a 10 min incubation step at 72 • C. RT-PCR was performed to amplify a fragment of four DEG contigs and the 18S rRNA gene was amplified as a reference control. The RT-PCR products were separated using gel electrophoresis, visualized using ethidium bromide staining, and photographed to analyze expression levels using Quantity One software (Bio-Rad, Hercules, CA, USA).

Conclusions
This study is the first attempt to use the SNP haplotypes of matK and ITS barcodes to identify the rhodomyrtone genotype and determine the species composition of highrhodomyrtone genotypes of Rhodomyrtus tomentosa. The results presented in this study could be a key to the sustainable cultivation of R. tomentosa plant genotypes with a high rhodomyrtone content. The results of RNA-Seq gene expression profiling showed that the two genotypes had different gene expression levels and that the highest gene expression level was related to the genotypes with a high rhodomyrtone content. The zinc transporter protein gene (CL1945.Contig1_All; RtZnT) was the most highly DEG in the Surat Thani genotype which had a high rhodomyrtone content, and the RNA-seq result was confirmed via semiquantitative RT-PCR analysis. Interestingly, we found that the expression of RtZnT was increased under ZnSO 4 stress in R. tomentosa. These findings could be used in the production of R. tomentosa to improve the yields of rhodomyrtone.