Transcriptional Comparison of New Hybrid Progenies and Clone-Cultivars of Tea (Camellia sinensis L.) Associated to Catechins Content

Heterosis or hybrid vigor is the improved performance of a desirable quality in hybrid progeny. Hybridization between high-productive Assam type and high-quality Chinese type clone-cultivar is expected to develop elite tea plant progenies with high quality and productivity. Comparative transcriptomics analyses of leaves from the F1 hybrids and their parental clone-cultivars were conducted to explore molecular mechanisms related to catechin content using a high-throughput next-generation RNA-seq strategy and high-performance liquid chromatography (HPLC). The content of EGCG (epigallocatechin gallate) and C (catechin) was higher in ‘Kiara-8’ × ‘Sukoi’, ‘Tambi-2’ × ‘Suka Ati’, and ‘Tambi-2’ × ‘TRI-2025’ than the other hybrid and clone-cultivars. KEGG (Kyoto Encyclopedia of Genes and Genomes) and GO (Gene Ontology) analysis found that most pathways associated with catechins content were enriched. Significant differentially expressed genes (DEGs) mainly associated with phenylpropanoid, flavonoid, drug metabolism-cytochrome P450, and transcription factor (MYB, bHLH, LOB, and C2H2) pathways appeared to be responsible for the high accumulation of secondary metabolites in ‘Kiara-8’ × ‘Sukoi’, ‘Tambi-2’ × ‘Suka Ati’, and ‘Tambi-2’ × ‘TRI-2025’ as were detected in EGCG and catechin content. Several structural genes related to the above pathways have been obtained, which will be used as candidate genes in the screening of breeding materials.


Introduction
Tea (Camellia sinensis (L.) O. Kuntze) is a perennial plantation crop grown in more than 60 countries for the production of the most popular non-alcoholic beverage, tea. Global statistics indicate that approximately five million tons of tea leaves are harvested annually, and more than two billion cups are consumed daily [1]. Tea quality is a complex characteristic that exhibits tremendous genetic variability due to the accumulation of an array of secondary metabolites [2]. Due to awareness and multiple health benefits of tea, quality remains a key target for its genetic improvement [3,4].
Tea breeders in different tea-growing countries have developed anthocyanin-rich purple-colored tea varieties [3,5]. A recent study revealed that the antioxidant values of purple-leaf tea were higher than green-leaf tea due to the presence of catechins and anthocyanins [5,6]. Therefore, tea from anthocyanin-rich cultivars may become specialty tea with higher antioxidant activity [6]. Hybridization between clone-cultivars Assam type that has green-leaf and Chinese type that have purple-leaf are expected to develop new hybrid progenies with a combination of morphological characteristics that have high productivity and quality.
New tea cultivars have been developed through conventional breeding methods. The main breeding approaches in the tea breeding system are controlled hybridization and individual selection. Controlled hybridization and individual selection have been performed based on phenotypic evaluations. In breeding a clonal crop, such as tea, the selection is a two-stage process. In the first stage, the highest yielding individual seedlings are chosen; these are then propagated and planted in clone trials in the second stage [7]. Integration of breeding tools is a powerful strategy to enhance the efficiency of the breeding programs with an early screening of genotypes, which leads to the release of a new, improved cultivar within a comparatively short period of time unlike conventional tools [8].
Heterosis is widely used in plant breeding programs. Heterosis is the phenomenon in which a hybrid progeny has greater biomass and higher yield and quality than the parents; volatile heterosis was obvious in F1 hybrids [9]. The catechins and free amino acids in hybrids showed negative heterosis compared to its maternal cultivar TGY [10]. Earlier, we crossed Chinese-type clone-cultivars 'Sukoi', 'Tambi-1', and 'Tambi-2' with Assam-type clone-cultivars 'TRI-2025', 'Suka Ati-40', and 'Kiara-8' and produced some hybrid progenies. We selected five hybrid progenies that showed high potential yield for transcriptomic study related to catechins content and compared them to the parents, which were widely planted (Tambi-2 and TRI-2025). The Chinese-type parents exhibited unique morphological characteristics such as dark purple pigmentation with dense pubescence in shoot leaves. The Assam-type parents showed morphological characteristics such as broad leaves, lack of pigmentation in shoot leaves and petiole, with pubescence. Characterization of hybrid progeny is an initial step toward proper utilization of genetic resources [11]. We used morphological, biochemical, and molecular approaches to characterize hybrid progenies and compare them to the parent.
Cell-cell differences are determined by the expression of different sets of genes. All such proteins are often called transcription factors (TFs). Each TF has a specific DNA binding domain that recognizes a 6-10 base pair motif in the DNA [12]. Transcription factors regulate a complex network of gene expression. A better understanding of the transcription factors and their shared interactions among a set of co-regulated or differentially expressed genes can provide powerful insights into the key pathways governing such expression patterns.
The study of morphological characteristics [13], biochemical characteristics [14,15], molecular characteristics [16], and the diversity of tea is important for plant genetic improvement programs. Biochemical and DNA-based markers have been explored to improve quality traits in tea [17,18]. Although most of these studies have provided clear evidence that genotypic variation determines quality characteristics in tea, the molecular basis of these variations remains obscure. In this study, we investigated comparative transcriptomic and expressional regulation of putative functional genes in quality-related pathways between selected F1 hybrids and clone-cultivars. The results reported here provide a comprehensive analysis of transcriptomic differences that are reflected at the phenotypic level in tea plants. Because there have been limited reports comparing EGCG and C content among hybrid progenies and clone-cultivars as parents, these results are important for determining the relationship between genetic background and catechins content and the early study of catechins heterosis. Several structural genes have been obtained, which will be used as candidate genes in the screening of breeding materials
The longest transcripts of each cluster were selected as unigenes. The total number of transcripts obtained was 333,310, and the total number of unigenes was 121,530, which, in turn, could be broken down into different groups of length intervals (Supplementary Table S2). The length distribution of transcripts and unigenes is shown in Supplementary Table S3. A total of 384,480,750 transcripts with a mean length of 1154 bp and an N50 length of 1649 bp were obtained. A total of 124,031,600 unigenes with a mean length of 1021 bp and an N50 length of 1477 bp were obtained.

Gene Functional Annotation and Classification
To achieve comprehensive gene functional annotation, seven databases were applied, resulting in a total of 121,530 functionally annotated genes (Supplementary Table S4). Gene functional classification is shown in Figure 3a. The cellular process and metabolic process were the most enriched GO terms, cellular anatomical entity, intracellular, and protein-containing complexes were the most enriched cellular component GO terms, while binding and catalytic activities were clearly enriched as molecular functions (Figure 3a). These results suggested that transcription factors (binding activity) and high enzymatic activity were involved in the biosynthesis of samples. The genes successfully annotated in KEGG could further be classified according to the KEGG pathway they were joined in. The metabolic pathways were mainly related to carbohydrate metabolism, amino acid metabolism, lipid metabolism, metabolism of cofactors and vitamins, biosynthesis of another secondary metabolism, nucleotide metabolism, and metabolism of terpenoids and polyketides ( Figure 3b).

Transcription Factor Analysis
Transcription factors (TFs) are well known to play a crucial role in second tabolite biosynthesis in plants [19]. We searched for genes encoding TFs and ob total of 2348 TFs classified into various families. The five most transcription fac ysis results were GNAT, LOB, bHLH, C2H2, and MYB family (Supplementary T

Gene Expression Analysis
To understand gene expression between samples, we measured transcript terms of fragments per kilo-base exon per million reads (FPKM) based on RNA Genes were called Differentially Expressed Genes (DEGs) if they exhibited a 2-fold change in transcript abundance using edgeR (p < 0.005). De novo trans filtered by Corset was used as a reference [20].

Transcription Factor Analysis
Transcription factors (TFs) are well known to play a crucial role in secondary metabolite biosynthesis in plants [19]. We searched for genes encoding TFs and obtained a total of 2348 TFs classified into various families. The five most transcription factor analysis results were GNAT, LOB, bHLH, C2H2, and MYB family (Supplementary Table S5).

Gene Expression Analysis
To understand gene expression between samples, we measured transcript levels in terms of fragments per kilo-base exon per million reads (FPKM) based on RNAseq data. Genes were called Differentially Expressed Genes (DEGs) if they exhibited at least a 2-fold change in transcript abundance using edgeR (p < 0.005). De novo transcriptome filtered by Corset was used as a reference [20].

Differential Expression Analysis
To uncover the genes involved in different samples, the gene expression values expressed as FPKM were compared between samples. The Venn diagram presents the number of genes that are uniquely expressed differentially within each group, with the overlapping regions showing the number of genes that are expressed in two or more groups ( Figure 4). The Venn diagram depicts common and unique genes from a differentially expressed gene.

KEGG Enrichment Analysis
To identify the molecular mechanism underlying specific and secondary metabolites of tea, we carried out a KEGG pathway enrichment analysis using KOBAS v3.0. KEGG enrichment analysis of DEGs indicates that the translation and secondary metabolic processes dominated the difference in biological processes between pairwise comparison samples. The top 20 DEGs enriched pathways of each sample were displayed ( Figure 6). The KEGG enrichment pathway shows that the DEGs significantly enriched the pathways. We have looked for gene clusters that were significantly different between pair-

KEGG Enrichment Analysis
To identify the molecular mechanism underlying specific and secondary metabolites of tea, we carried out a KEGG pathway enrichment analysis using KOBAS v3.0. KEGG enrichment analysis of DEGs indicates that the translation and secondary metabolic processes dominated the difference in biological processes between pairwise comparison samples. The top 20 DEGs enriched pathways of each sample were displayed ( Figure 6). The KEGG enrichment pathway shows that the DEGs significantly enriched the pathways. We have looked for gene clusters that were significantly different between pairwise comparison samples associated with tea quality. In the present work, KEGG enrichment analysis of DEGs in the pairwise comparison of each sample showed that biosynthesis and metabolism pathways were enriched, and the significantly different pathways associated with tea quality were phenylpropanoid, flavonoid, sesquiterpenoid and triterpenoid biosynthesis, drug metabolism-cytochrome P450, and alpha-linolenic acid metabolism ( Figure 6). Unigenes involved in these pathways were identified with FPKM values > 50 (Figure 7). We used KEGG mapping to map molecular objects to molecular interaction (Supplementary Figures S1-S5). In the present work, KEGG enrichment analysis of DEGs in the pairwise comparison of each sample showed that biosynthesis and metabolism pathways were enriched, and the significantly different pathways associated with tea quality were phenylpropanoid, flavonoid, sesquiterpenoid and triterpenoid biosynthesis, drug metabolism-cytochrome P450, and alpha-linolenic acid metabolism ( Figure 6). Unigenes involved in these pathways were identified with FPKM values > 50 (Figure 7). We used KEGG mapping to map molecular objects to molecular interaction (Supplementary Figures S1-S5). Flavonoids are a large and diverse group of secondary metabolites that are synthesized through a specific branch of a phenylpropanoid pathway [21]. Highly expressed structural genes involved in the phenylpropanoid biosynthesis identified were phenylalanine ammonia-lyase (PAL), 4-coumarate-CoA ligase 2 (4CL), beta-glucosidase-12 like isoform X2, caffeoyl-CoA O-methyltransferase At1g67980 (CCoAOMT), cinnamyl alcohol dehydrogenase 1 (CAD1), cytochrome P450 98A2, mannitol dehydrogenase, caffeic acid 3-O-methyltransferase (COMT), cinnamate-4-hydroxylase (C4H), and spermidine Flavonoids are a large and diverse group of secondary metabolites that are synthesized through a specific branch of a phenylpropanoid pathway [21]. Highly expressed structural genes involved in the phenylpropanoid biosynthesis identified were phenylalanine ammonia-lyase (PAL), 4-coumarate-CoA ligase 2 (4CL), beta-glucosidase-12 like isoform X2, caffeoyl-CoA O-methyltransferase At1g67980 (CCoAOMT), cinnamyl alcohol dehydrogenase 1 (CAD1), cytochrome P450 98A2, mannitol dehydrogenase, caffeic acid 3-O-methyltransferase (COMT), cinnamate-4-hydroxylase (C4H), and spermidine hydroxycinnamoyl transferase (SHT), which have shown differences in the number of DEGs among the analyzed samples (Figure 7a). Most of the genes encoding phenylpropanoid biosynthesis enzymes were identified and found to be differentially expressed in the shoot leaves of different tea clone-cultivars and hybrid progenies.  Figure S1). Figure 8 shows the genes that influence the formation of EGCG and C.  Figure S1). Figure 8 shows the genes that influence the formation of EGCG and C.  (Figure 7b). PAL, 4CL, CHS, CHI/CFI, F′3H, F3′5′H, ANS, and LAR genes are involved in the flavonoid pathway and showed a significant positive correlation with catechin content in tea; those genes with UGT may be associated with the modification and stabilization of these metabolites hence their abundance for enhancing flavor-related attributes in tea cultivars [2]. LAR, ANS, and ANR are downstream enzymes in the flavonoid biosynthetic pathway, which play key roles in determining individual catechins [22]. CHS (Cluster-19805.50241) showed up-regulation in 'Tambi-2′ × 'Suka Ati-40′,  Figure S1). Figure 8 shows the genes that influence the formation of EGCG and C.  (Figure 7b). PAL, 4CL, CHS, CHI/CFI, F′3H, F3′5′H, ANS, and LAR genes are involved in the flavonoid pathway and showed a significant positive correlation with catechin content in tea; those genes with UGT may be associated with the modification and stabilization of these metabolites hence their abundance for enhancing flavor-related attributes in tea cultivars [2]. LAR, ANS, and ANR are downstream enzymes in the flavonoid biosynthetic pathway, which play key roles in determining individual  (Figure 7b). PAL, 4CL, CHS, CHI/CFI, F 3H, F3 5 H, ANS, and LAR genes are involved in the flavonoid pathway and showed a significant positive correlation with catechin content in tea; those genes with UGT may be associated with the modification and stabilization of these metabolites hence their abundance for enhancing flavor-related attributes in tea cultivars [2]. LAR, ANS, and ANR are downstream enzymes in the flavonoid biosynthetic pathway, which play key roles in determining individual catechins [22].  Figure S3). In plants, a family of TPSs is responsible for the synthesis of the various terpene molecules from two isomeric 5-carbon precursor 'building blocks', leading 5-carbon isoprene, 10-carbon monoterpenes, 15-carbon sesquiterpenes, and 20-carbon diterpenes [23]. Methyl jasmonate resulted in the enhanced expression of the majority of CsTPS (terpenoid synthase gene) [24]. Methyl jasmonate (MeJA), a volatile organic compound, is a principal flowery aromatic compound in tea [25].
Annotation  (Figure 7f). Seven TFs, including those mentioned above, were identified as having possible crucial roles in controlling the transcriptomic regulation of flavonoids [28].
ANR, ANS, CHS, CHI, DFR, F3H, F35H, LAR, and UGT detected in the samples were some key enzymes that directly regulated the flavonoid biosynthesis in tea. The DEGs of those genes had higher expression numbers at 'Kiara-8' × 'Sukoi', 'Tambi-2' × 'Suka Ati-40', and 'Tambi-2' than the other samples. ANR is one of the two enzymes of the flavonoid biosynthesis pathway that produces flavan-3-ols (epicatechin) monomers from anthocyanidin [41]. ANS is the last key enzyme in anthocyanin synthesis, directly affecting anthocyanin accumulation [42]. Interestingly, the gene upstream and downstream in the flavonoid biosynthesis pathway differed significantly among the different tea clonecultivars and progenies. The downstream enzymes LAR and ANR are responsible for the conversion of anthocyanidin to catechin and epicatechin [43,44]. UDP-Glycosyltransferase (UGT), the final enzyme in anthocyanin biosynthesis, catalyzes glucosyl transfer from UDP-glucose to 3-hydroxyl group to form stable cyanine glucosides [42]. The expression of CsLAR in tobacco overproducing anthocyanin led to the accumulation of higher levels of epicatechin and its glucoside than of catechin [45].
R2R3-MYB families (Cluster-19805.57717 and Cluster-19805.51520) were identified in the samples that had high DEG numbers in 'Kiara-8' × 'Sukoi', 'Tambi-2', and 'Tambi-2' × 'Suka Ati-40' (Figure 7f). An R2R3-MYB transcription factor (CsMYB6A) can activate the expression of flavonoid-related structural genes, especially CHS and 3GT, controlling the accumulation of anthocyanins in the leaf [46]. In this study, the majority of identified DEGs encoding TFs beside MYB families were highly expressed, including bHLH, LOB, and C2H2 (Supplementary Table S5). A complex of MYB-bHLH-WDR (MBW) shows action in order to trigger the structural genes responsible for the process of flavonoid biosynthesis [47]. The C2H2-zinc finger protein (C2H2-ZFP) is essential for the regulation of plant development and is widely responsive to diverse stresses, including drought, cold, salt stress, and flavonoid accumulation, in higher plants [48]. The lateral organ boundary (LOB) domain (LBD) genes play important roles in abiotic stress and flavonoid biosynthesis [49].
Transcriptomic variation and expressional regulation of putative functional genes in quality-related pathways between green-leaf and purple-leaf clone-cultivars and selected F1 hybrids from crossing between clone-cultivar parents were detected using a comprehensive analysis of transcriptomic differences that are reflected at the phenotypic level in tea plants. 'Kiara-8' × 'Sukoi', 'Tambi-2' × 'Suka Ati', and 'Tambi-2' × 'TRI-2025' were selected as candidate high-quality tea hybrids based on EGCG and catechin content and DEGs involved in phenylpropanoid, flavonoid, and drug metabolism-cytochrome P450 pathways, and also high expression level of TF (MYB, bHLH, LOB, and C2H2) related to the phenylpropanoid and flavonoid pathways. These results are important for determining the relationship between genetic background and the catechin content of tea to improve future breeding materials.

Extraction and Quantification of Catechins
To determine major constituents in tea, fresh leaves were crushed into a fine powder with liquid nitrogen using a mortar and pestle. About 1 g of fine powder was dissolved in 10 mL methanol. Further, the solution was subjected to a continuous vortex for 24 h and subsequently centrifuged at 2000 rpm (15 min). The supernatant was then filtered through 5C filter paper. The solution was then collected in small bottles and evaporated to receive the crude extract. About 1.5 mg of crude extract was dissolved in 1.5 mL methanol for HPLC. The solution was subjected to a continuous vortex for 5 min. The solution was then filtered through a 0.45-micron syringe filter (nylon) into an HPLC vial screw. The standard solution of EGCG and catechin (C) was diluted in the mobile phase, yielding the standard solutions with different concentrations. Subsequently, the solution was filtered using a 0.45-micron syringe filter into an HPLC vial screw. Sample-extracted solutions and standard solutions were later degassed by sonication for 15 min before injecting into the HPLC system to obtain the chromatograms. The standard and sample-extracted solutions (1 mg/mL) were run to ensure that both analytes were chemically identical with similar molecular structures. This analysis was performed on a high-pressure liquid chromatograph (Shimadzu Prominence Detector SPD-M20A) installed with Analyst software. A C18 column (Shim-pack GIST 150 mm × 4.6 mm, 5 µm) was used at a flow rate of 1.0 mL/min. The detection wavelength was set to 200-600 nm, and the column temperature was 25 • C. The mobile phase consists of MeOH: 0.05% TFA aq 25:75. The total run time of the analytical method was set to 70 min in the HPLC system. The samples were analyzed in triplicate.

Transcriptome Sequencing, Assembly, and Quality Assessment
The transcriptome was sequenced by Novogene Technology using the Illumina HiSeq 2500 platform. After sequencing, the raw reads underwent quality control. Quality control for transcriptomic reads consists of error-rate distribution along with reads and GC content distribution. Raw transcriptomic sequencing data were preprocessed using Filter Tools from Galaxy Platform to remove reads with adaptor contamination, when uncertain nucleotides constitute more than ten percent of either read (N > 10%) and when low-quality nucleotides (base quality less than 5) constitute more than 50% of the reads. High-quality sequencing data were first assembled into contigs and assembled using Geneious Prime 2021.1.1 Align/Assemble tools. Clean reads were de novo assembled by Trinity 2.6.6 software [50] to get assembly transcriptome. 4.6. Transcription Factor Analysis, Reference Allignment, and Gene Expression Analysis iTAK was used to perform the transcription factor analysis of tea. De novo transcriptome filtered by Corset was used as a reference. To map reads back to the transcriptome and quantify the expression level, we used RSEM 1.2.28 [52]. To calculate the gene expression level, RSEM analyzed the mapping results of Bowtie and then got the read count for each gene of each sample. Furthermore, they were converted into FPKM (Fragments Per Kilobase of transcript sequence per Millions of base pairs sequenced) values to estimate the gene expression levels.

Differentially Expressed Genes, GO Enrichment Analysis, and KEGG Pathway Enrichment Analysis
We used edgeR 3.28.0 padj < 0.005 |log 2 (fold change)| > 1 for the differential expression analysis. Volcano plots were used to infer the overall distribution of differentially expressed genes. The Venn diagram presents the number of genes that are uniquely expressed differentially within each group, with the overlapping regions showing the number of genes that are expressed in two or more groups. A major bioinformatics classification system, Gene Ontology (http://www.geneontology.org/ accessed on 9 September 2021), was used to unify the presentation of gene properties across all species. We used GOSeq 1.32.0 software by Young & Davidson, California, U.S. state with a corrected p-value of < 0.05 for significant enrichment. For KEGG enrichment, we used KOBAS v3.0 with a corrected p-value of <0.05 Pathway enrichment analysis identified significantly enriched metabolic pathways or signal transduction pathways associated with differentially ex-pressed genes compared with the whole genome background (www.genome.jp) accessed on 9 September 2021.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/plants11151972/s1, Table S1: Data quality control summary of each sample. Table S2: The number of transcripts and unigenes in different length intervals. Table S3: The length distribution of transcripts and unigenes. Table S4: The ratio of successfully annotated genes by each database. Table S5: Five most-common transcription factors. Figure S1: KEGG pathway of phenylpropanoid biosynthesis. Figure S2: KEGG pathway of flavonoid biosynthesis. Figure S3: KEGG pathway of sesquiterpenoid and triterpenoid biosynthesis. Figure S4: KEGG pathway of alpha-linolenic acid metabolism. Figure