Characterization and Expression Profiling of Camellia sinensis Cinnamate 4-hydroxylase Genes in Phenylpropanoid Pathways

Cinnamate 4-hydroxylase (C4H), a cytochrome P450-dependent monooxygenase, participates in the synthesis of numerous polyphenoid compounds, such as flavonoids and lignins. However, the C4H gene number and function in tea plants are not clear. We screened all available transcriptome and genome databases of tea plants and three C4H genes were identified and named CsC4Ha, CsC4Hb, and CsC4Hc, respectively. Both CsC4Ha and CsC4Hb have 1518-bp open reading frames that encode 505-amino acid proteins. CsC4Hc has a 1635-bp open reading frame that encodes a 544-amino acid protein. Enzymatic analysis of recombinant proteins expressed in yeast showed that the three enzymes catalyzed the formation of p-coumaric acid (4-hydroxy trans-cinnamic acid) from trans-cinnamic acid. Quantitative real-time PCR (qRT-PCR) analysis showed that CsC4Ha was highly expressed in the 4th leaf, CsC4Hb was highly expressed in tender leaves, while CsC4Hc was highly expressed in the young stems. The three CsC4Hs were induced with varying degrees by abiotic stress treatments. These results suggest they may have different subcellular localization and different physiological functions.


Introduction
Tea, one of the most popular non-alcoholic beverages in the world, is rich in polyphenoid compounds derived from phenylpropanoid pathways, e.g., catechins (flavan-3-ols), flavonols, and their derivatives [1]. These compounds are closely related to the flavor of tea [2] and benefit human health through their anti-retroviral, anti-hypertensive, anti-inflammatory, anti-aging, and insulin-sensitizing activities. In addition, these compounds inhibit low-density lipoprotein (LDL) oxidation and reduce the risk of a wide range of chronic diseases, including cardiovascular disease, cancer, and osteoporosis [3].
In the currently online available database of transcriptomes and genomes of tea plants, three CsC4H transcripts have been screened, but their functions in tea plants remain unclear. For example, it remains to be determined whether their coding candidate CsC4H enzymes have enzymatic activities or whether these genes participate in the response of tea plants to biotic or abiotic stresses. C4H, a member of the cytochrome P450 CYP73A group, has been most extensively studied among plant P450s [10][11][12][13][14]. The number of C4H gene families varies considerably between different plants. Arabidopsis, Parthenocissus henryana, parsley, Scutellaria baicalensis, and Korean black raspberry are thought to contain only one gene for C4H [13,15,16]. On the contrary, in Leucaena leucocephala, Camptotheca acuminata, and Brassica napus, C4H is encoded by a small gene family [5,17,18].
In the currently online available database of transcriptomes and genomes of tea plants, three CsC4H transcripts have been screened, but their functions in tea plants remain unclear. For example, it remains to be determined whether their coding candidate CsC4H enzymes have enzymatic activities or whether these genes participate in the response of tea plants to biotic or abiotic stresses.
In this study, we cloned the three genes, CsC4Ha, CsC4Hb, and CsC4Hc, verified their enzymatic functions using yeast recombinant proteins, and analyzed their expression profiles in tea plants subjected various abiotic stresses.

Plant Materials
Samples of the tea plant Camellia sinensis cv. 'Shucazao' were obtained from the experimental tea garden of Anhui Agricultural University in Hefei, China. Leaves at six different developmental stages (bud, 1st leaf, 2nd leaf, 3rd leaf, 4th leaf, and mature leaf), and young stems and young roots were collected. The samples were immediately frozen in liquid nitrogen and stored at −80 • C.
With respect to the abiotic stresses, the approximately 10 cm long shoots were cultured in water for one day and then subjected to the treatments. The samples were treated under 100 mM abscisic acid (ABA) and 90 mM sucrose for 12 h, 20 mM salicylic acid (SA) treatment for 48 h, respectively. The control plants were cultivated in deionized water. For heat stress, the shoots were treated at 50 • C for 30 min with the controls treated at 20 • C.
The tender shoots were illuminated under ultraviolet radiation b (UVB) for 30 min, blue light (455-460 nm) for 48h, red light (655-660 nm) for 48 h, and in dark for 12 h, respectively. The control plants were treated under white light.
All samples were immediately frozen in liquid nitrogen, and total RNA was extracted as described below.

RNA and cDNAPreperation
Total RNA was extracted from the tea plants using RNAiso Mate and RNAiso Plus (Takara, Dalian, China) according to the manufacturer's instructions. The quality of the RNA was checked using gel electrophoresis, and total RNA was quantified using NanoVue plus (GE Healthcare, Waukesha, WI, USA). The different cDNAs were reverse transcribed using a PrimeScript RT Reagent Kit (Takara), following the manufacturer's protocol.

Quantitative Real-Time PCR
All primers were blasted against the NCBI database (United States National Library of Medicine, Bethesda, MD, USA) to guarantee specificity. The values were normalized against the expression levels of the housekeeping gene glyceraldehyde-3-phosphate dehydrogenase (GAPDH) from the tea plant [19]. The first-strand cDNA samples for quantitative real-time PCR (qRT-PCR) were synthesized using the PrimeScript RT reagent Kit (Takara). The PCR mixture contained cDNA template (approximately 0.01 µg/µL), 10 µL SYBR Green PCR Master Mix (Takara), and 200 nmol/L of each gene-specific primer in a final volume of 20 µL.
Real-time PCR was performed as suggested by Lei Zhao et al. [20]. Data were expressed as the mean value of three biological replicates, normalized against the expression levels of GAPDH. The relative expression was derived using the 2 −∆∆CT method. ∆CT = CT target − CT internal standard , −∆∆CT = −(∆CT target − ∆CT control ), where CT target and CT internal standard are the cycle threshold (CT) values for the target and housekeeping genes, respectively.

Heterologous Expression and Enzymatic Activity Analysis of Recombinant CsC4Hs
The PCR products of CsC4Ha, CsC4Hb, and CsC4Hc, obtained using end-to-end PCR, were gel purified and ligated into pENTR/TEV/D-TOPO vectors using Top cloning (pENTR /D-TOPO Cloning Kits, Invitrogen, Carlsbad, CA, USA). Then, the entry vectors pENTR-CsC4Ha, pENTR-CsC4H,b and pENTR-CsC4Hc were cloned into the destination vector pYES-DEST52 using the Gateway LR Clonase enzyme (Invitrogen, Carlsbad, CA, USA). The resulting pYES-DEST52-CsC4Ha, pYES-DEST52-CsC4Hb, and pYES-DEST52-CsC4Hc were transformed into S. cerevisiae WAT11 using Frozen-EZ yeast Transformation II (Zymo Research, Irvine, CA, USA). Yeast cells were propagated at 28 • C for 12 h in 10-mL Synthetic Dropout-Ura Media (SD-U) liquid medium containing 20 g/L glucose, by inoculation of a single colony from an SD-U plate. The thalli collected were transferred into 10-mL SD-U medium containing 20 g/L galactose and were grown at 28 • C for 5 h. The substrate t-cinnamate was added to the yeast culture to a final concentration of 0.2 mM and incubated at 28 • C for 1 h. The reactions were terminated by sonication for 15 min and the addition of methanol. Microzymes from each reaction were extracted using the same volume of methanol after high-speed HPLC: After removal of the denatured proteins by centrifugation, the formation of p-coumaric acid was analyzed using a HPLC equipped with an Altima C18 analytical column (250 mm × 4.6 mm, 5 µm) (Agilent, Santa Clara, CA, USA) with a gradient elution of solvent B (CH 3 CN) and solvent A (1% acetic acid) at a flow rate of 1 mL/min at 35 • C over a 30-min period as follows: 0 min, 10% solvent B; 5 min, 15% solvent B; 15 min, 40% solvent B; 20 min, 60% solvent B; 25 min, 80% solvent B, and 30 min, 10% solvent B. A diode array detector (DAD) was used for monitoring purposes. All experiments were performed in duplicate.

Bioinformatics Analysis
Cinnamate 4-hydroxylase candidate genes were analyzed using online bioinformatics tools from NCBI and ExPASy (SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland). Open reading frame (ORF) identification was performed using an online program (National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD, USA) [21]. The amino acid sequence of the ORF was deduced and analyzed using the ProtParam tool (National Center for Biotechnology Information, U.S. National Library of Medicine) [22]. CsC4Hs and other known C4H sequences retrieved from NCBI database (U.S. National Library of Medicine) were aligned with DNAMAN (Lynnon Corporation, San Ramon, CA, USA) [20]. Subsequently, a phylogenetic tree was constructed using the neighbor-joining (NJ) method with MEGA 5.0 software (Mega, Raynham, MA, USA). The reliability of the tree was measured using bootstrap analysis based on 1000 replicates.

Screening, Analysis, and Cloning of CsC4H Candidate Genes
After careful analysis of the CsC4H sequences based on the nine transcriptome databases and one genome database, three CsC4H transcripts were screened out after removing redundancies. The accession numbers of the three CsC4H genes in GenBank are KY615675 (CsC4Ha), KY615676 (CsC4Hb), and KY61567 (CsC4Hc). CsC4Ha, CsC4Hb, and CsC4Hc were isolated using PCR, with cDNA from Camellia sinensis leaves as a template using specific primers (Table 1). Both CsC4Ha and CsC4Hb have a 1518-bp open reading frames that code for a 505-amino acid protein. They have a predicted molecular mass of 58.15 kDa and 58.00 kDa and a predicted isoelectric point (pI) of 9.29 and 9.26, respectively. CsC4Hc has a 1635-bp open reading frame that codes for a 544-amino acid protein, with a predicted molecular mass of 62.95 kDa and a pI of 8.68 ( Figure S1).

Bioinformatics Analysis
DNAMAN software (Lynnon Corporation, San Ramon, CA, USA) analysis shows that the amino acid sequences of CsC4Ha share 93.07% identity with that of CsC4Hb. The amino acid sequences of CsC4Ha and CsC4Hb share only 59.93% and 58.82% identity with CsC4Hc, respectively.
To investigate the evolutionary relationship among CsC4Hs and C4Hs from other plant species, a phylogenetic tree was constructed using a neighbor-joining method. As shown in Figure 2, the phylogenetic tree was divided into five main groups: including Angiosperm Class I group (dicot and monocot), Angiosperm Class II group (dicot and monocot), Gymnosperm group, Bryophyte group, and Pteridophyte group. CsC4Ha and CsC4Hb belong to Angiosperm Class I and CsC4Hc belongs to Angiosperm Class II.    The Angiosperm Class I proteins, including CsC4Ha and CsC4Hb, are conserved at the N-terminal and have a 21-amino acid N-terminal hydrophobic domain which is flanked by an acidic residue (Asp) and several basic residues [23]. This region is predicted to be a signal-anchor sequence and to determine the correct orientation of P450s in the Endoplasmic Reticulum (ER) [24]. However, the Angiosperm Class II, including the CsC4Hc sequence, is not conserved in this region, which may result in different subcellular localization and different physiological functions.
Genes 2017, 8,193 7 of 13 Figure 3 shows the amino acid sequence alignment of three C. sinensis C4Hs with known functional C4Hs in other plants. There are five SRS (P450 substrate recognition sites) regions in these sequences, in addition to a heme-binding domain (P474FGXGRRSCPG484) and a hinge region (P64PGPXXXP72). The Angiosperm Class I proteins, including CsC4Ha and CsC4Hb, are conserved at the N-terminal and have a 21-amino acid N-terminal hydrophobic domain which is flanked by an acidic residue (Asp) and several basic residues [23]. This region is predicted to be a signal-anchor sequence and to determine the correct orientation of P450s in the Endoplasmic Reticulum (ER) [24]. However, the Angiosperm Class II, including the CsC4Hc sequence, is not conserved in this region, which may result in different subcellular localization and different physiological functions.  Completely identical residues are reverse-displayed, while residues with dark gray, light gray, and white backgrounds are conserved, weakly similar, and non-similar residues, respectively. Underlined regions indicate P450-featured motifs, i.e., the hinge region, the T-containing binding pocket motif, the ERR triad and the Haem-domain, while boxes represent the five P450 substrate recognition sites (SRS) regions.

Heterologous Expression in Yeast and Enzymatic Analysis of CsC4H Proteins
We expressed CsC4Hs recombinant proteins in a genetically modified S. cerevisiae strain, i.e., WAT11. PCR products of CsC4Ha, CsC4Hb, and CsC4Hc were ligated into the destination vector pYES-DEST52 ( Figure 4A). The resulting pYES-DEST52-CsC4Ha, pYES-DEST52-CsC4Hb, and pYES-DEST52-CsC4Hc were transformed into S. cerevisiae WAT11. The enzymatic catalytic identity of the three proteins was verified by enzyme assays using trans-cinnamic acid as a substrate, which was added to the yeast culture. The product p-coumarate was detected by HPLC analysis using an empty vector, and pYES-DEST52-CsC4Hs without a substrate as controls, respectively ( Figure 4C). The peak area of the enzymatic product indicates that three recombinant CsC4Hs are able to convert t-cinnamate to yield p-coumarate when expressed in yeast.

Heterologous Expression in Yeast and Enzymatic Analysis of CsC4H Proteins
We expressed CsC4Hs recombinant proteins in a genetically modified S. cerevisiae strain, i.e., WAT11. PCR products of CsC4Ha, CsC4Hb, and CsC4Hc were ligated into the destination vector pYES-DEST52 ( Figure 4A). The resulting pYES-DEST52-CsC4Ha, pYES-DEST52-CsC4Hb, and pYES-DEST52-CsC4Hc were transformed into S. cerevisiae WAT11. The enzymatic catalytic identity of the three proteins was verified by enzyme assays using trans-cinnamic acid as a substrate, which was added to the yeast culture. The product p-coumarate was detected by HPLC analysis using an empty vector, and pYES-DEST52-CsC4Hs without a substrate as controls, respectively ( Figure 4C). The peak area of the enzymatic product indicates that three recombinant CsC4Hs are able to convert t-cinnamate to yield p-coumarate when expressed in yeast.

Real-Time PCR Analysis of C4H Genes Expression in C. sinensis
To analyze the expression patterns of the CsC4H genes in various tissues and at different developmental stages, quantitative real-time PCR was performed using gene-specific primers. The expression patterns of the three CsC4H genes are found to be distinct from each other ( Figure 5) CsC4Ha is highly expressed in the 4th leaf and the roots. The expression level of CsC4Hb in tender leaves is significantly higher than that in old leaves, stems, and roots, which is consistent with the flavonoid accumulation pattern in tea plants [19]. Among the various tissues, CsC4Hc is mainly expressed in the young stems. In leaves at different developmental stages, CsC4Hc is relatively highly expressed in old leaves compared with tender leaves.

Real-Time PCR Analysis of C4H Genes Expression in C. sinensis
To analyze the expression patterns of the CsC4H genes in various tissues and at different developmental stages, quantitative real-time PCR was performed using gene-specific primers. The expression patterns of the three CsC4H genes are found to be distinct from each other ( Figure 5) CsC4Ha is highly expressed in the 4th leaf and the roots. The expression level of CsC4Hb in tender leaves is significantly higher than that in old leaves, stems, and roots, which is consistent with the flavonoid accumulation pattern in tea plants [19]. Among the various tissues, CsC4Hc is mainly expressed in the young stems. In leaves at different developmental stages, CsC4Hc is relatively highly expressed in old leaves compared with tender leaves. Phenylpropanoid compounds can be induced by various biotic or abiotic stresses, such as high UV light intensity, wounding, and pathogen attack [25]. Therefore, we analyzed the inducible expression patterns of CsC4H genes in response to different abiotic stresses, including UVB, heat stress, ABA, sucrose, dark conditions, SA, red light, and blue light ( Figure 6). The results show that the levels of the three genes are increased approximately 2-6-fold in the sucrose and SA treatments compared to the control. ABA and blue light significantly increase the expressions of CsC4Ha and CsC4Hb. By contrast, darkness treatment decreases the expressions of CsC4Ha and CsC4Hb. In addition, CsC4Hb is up-regulated under heat stress.
The CsGAPDH gene was used as an internal control. Induced expression analysis of CsC4Hs in tea leaves subjected to 100 mM ABA, 90 mM sucrose or dark treatment for 12 h. Induced expression analysis of CsC4Hs in tea leaves under 20 mM SA and in tea leaves treated with red (655-660 nm) or blue light (655-660 nm) for 48 h. Induced expression analysis of CsC4Hs in tea leaves exposed to UVB or heat stress for 30 min was conducted. The data represent the mean standart deviation (SD) from three independent measurements. Phenylpropanoid compounds can be induced by various biotic or abiotic stresses, such as high UV light intensity, wounding, and pathogen attack [25]. Therefore, we analyzed the inducible expression patterns of CsC4H genes in response to different abiotic stresses, including UVB, heat stress, ABA, sucrose, dark conditions, SA, red light, and blue light ( Figure 6). The results show that the levels of the three genes are increased approximately 2-6-fold in the sucrose and SA treatments compared to the control. ABA and blue light significantly increase the expressions of CsC4Ha and CsC4Hb. By contrast, darkness treatment decreases the expressions of CsC4Ha and CsC4Hb. In addition, CsC4Hb is up-regulated under heat stress.
The CsGAPDH gene was used as an internal control. Induced expression analysis of CsC4Hs in tea leaves subjected to 100 mM ABA, 90 mM sucrose or dark treatment for 12 h. Induced expression analysis of CsC4Hs in tea leaves under 20 mM SA and in tea leaves treated with red (655-660 nm) or blue light (655-660 nm) for 48 h. Induced expression analysis of CsC4Hs in tea leaves exposed to UVB or heat stress for 30 min was conducted. The data represent the mean standart deviation (SD) from three independent measurements.

Discussion
Tea leaves of C. sinensis are an important non-alcoholic beverage resource [26]. The tea beverage is becoming increasingly popular worldwide because of its refreshing, mild stimulatory and medicinal properties [27]. Thus, research on the genes encoding crucial metabolic enzymes that are responsible for the biosynthesis of chemicals is essential and critical in understanding plant metabolic pathways. C4H is a key gene in the phenylpropanoid pathway; their function determines the downstream synthesis of flavonoid compounds and lignin.
C4H genes are known to exist as small gene families in various plants. For example, four homologous genes of C4H have been detected in Populus tremuloides and P. kitakamiensis [28]. In addition, two C4H genes have been detected in Leucaena leucocephala [5], and there are at least two C4H genes in C. acuminata and Brassica napus [17,18]. However, C4H is encoded by a single-copy gene in many species, such as Arabidopsis, parsley, P. henryana, and S. baicalensis [13,15,16,29]. To confirm the numbers of CsC4H members, we carefully screened all available transcriptome and genome databases, and three CsC4H transcripts were screened out after redundancies were removed. Three C4H transcripts were cloned fromC. sinensis, which indicates there are at least three C4H genes in the tea genome. The addition of a substrate to a yeast culture experiment showed that the three recombinant C4H proteins had enzymatic activities that resulted in the formation of 4-coumarate (or para-coumarate). Quantitative expression analysis indicated different expression patterns of the

Discussion
Tea leaves of C. sinensis are an important non-alcoholic beverage resource [26]. The tea beverage is becoming increasingly popular worldwide because of its refreshing, mild stimulatory and medicinal properties [27]. Thus, research on the genes encoding crucial metabolic enzymes that are responsible for the biosynthesis of chemicals is essential and critical in understanding plant metabolic pathways. C4H is a key gene in the phenylpropanoid pathway; their function determines the downstream synthesis of flavonoid compounds and lignin.
C4H genes are known to exist as small gene families in various plants. For example, four homologous genes of C4H have been detected in Populus tremuloides and P. kitakamiensis [28]. In addition, two C4H genes have been detected in Leucaena leucocephala [5], and there are at least two C4H genes in C. acuminata and Brassica napus [17,18]. However, C4H is encoded by a single-copy gene in many species, such as Arabidopsis, parsley, P. henryana, and S. baicalensis [13,15,16,29]. To confirm the numbers of CsC4H members, we carefully screened all available transcriptome and genome databases, and three CsC4H transcripts were screened out after redundancies were removed. Three C4H transcripts were cloned fromC. sinensis, which indicates there are at least three C4H genes in the tea genome. The addition of a substrate to a yeast culture experiment showed that the three recombinant C4H proteins had enzymatic activities that resulted in the formation of 4-coumarate (or para-coumarate). Quantitative expression analysis indicated different expression patterns of the CsC4Hs in various tissues and under abiotic stresses. The tissue-and induction-specific expressions of CsC4Hs indicated their different functions in vivo.
According to the phylogenetic tree, the angiosperms were classified into two groups (Class I and Class II dicot and monocot). CsC4Ha and CsC4Hb belong to Class I dicot and monocot, whereas CsC4Hc belongs to Class II dicot and monocot. This branch of the phylogenetic tree indicates that gene duplication prior to the divergence of monocots and dicots led to the divergent isoforms in angiosperms [23]. However, some dicots have lost their Class II protein, and the complete sequence of the Arabidopsis genome revealed no Class II gene [15].
Different functions of C4Hs in Class I and Class II have been reported. Eucalyptus Class II C4H is primarily involved in stress responses, as well as in wood lignin biosynthesis, and Class I C4H is constitutively expressed in any tissue that requires phenylpropanoid metabolites [30]. There are three C4H gene models in the P. trichocarpa genome. Transcripts of PtrC4H1 and PtrC4H2 (belonging to Class I) are abundant in differentiating xylem, suggesting that both are important in monolignol biosynthesis. Transcripts of PtrC4H3 (belonging to Class II), not previously characterized, have been shown to have low or no expression in all examined tissues [31]. C4H genes from P. tremuloides and P. trichocarpa are differentially expressed in tissues, and individual isoforms have been shown to play specific physiological roles in development [32]. In this work, the expression patterns of CsC4Hb belonged to Class I in different tissues is consistent with the flavonoid accumulation pattern [19], indicating that CsC4Hb are involved in flavonoid biosynthesis in tea plants.
Phenylpropanoid compounds can be induced by various biotic and abiotic stresses [25]. C4H is induced by light; UVB, such as in Arabidopsis, Dryopteris fragrans, and Salvia miltiorrhiza [33][34][35]; wounding [36]; NaCl [37]; cold; H 2 O 2 ; ABA; SA, such as in Carthamus tinctorius and kenaf [38,39]; and pathogen attack, for example, in cucumber and melon plants, C4H is up-regulated by viruses [40]; drought [41]; and elicitors [42]. Our work showed that the CsC4Ha in Class I and CsC4Hc in Class II were obviously induced by SA. This suggests that these two genes are involved in the defense of tea plants.

Conclusions
We cloned three CsC4H transcripts, and the enzymatic activity of these proteins was characterized in vitro. The amino acid sequence alignment of CsC4H proteins and expression patterns of CsC4H genes in leaves at different developmental stages and abiotic stress treatments suggest they may have different subcellular localization and different physiological functions. The future work is under way.