Exploring the RNA Editing Events and Their Potential Regulatory Roles in Tea Plant (Camellia sinensis L.)

RNA editing is a post-transcriptional modification process that alters the RNA sequence relative to the genomic blueprint. In plant organelles (namely, mitochondria and chloroplasts), the most common type is C-to-U, and the absence of C-to-U RNA editing results in abnormal plant development, such as etiolation and albino leaves, aborted embryonic development and retarded seedling growth. Here, through PREP, RES-Scanner, PCR and RT-PCR analyses, 38 and 139 RNA editing sites were identified from the chloroplast and mitochondrial genomes of Camellia sinensis, respectively. Analysis of the base preference around the RNA editing sites showed that in the −1 position of the edited C had more frequent occurrences of T whereas rare occurrences of G. Three conserved motifs were identified at 25 bases upstream of the RNA editing site. Structural analyses indicated that the RNA secondary structure of 32 genes, protein secondary structure of 37 genes and the three-dimensional structure of 5 proteins were altered due to RNA editing. The editing level analysis of matK and ndhD in six tea cultivars indicated that matK-701 might be involved in the color change of tea leaves. Furthermore, 218 PLS-CsPPR proteins were predicted to interact with the identified RNA editing sites. In conclusion, this study provides comprehensive insight into RNA editing events, which will facilitate further study of the RNA editing phenomenon of the tea plant.


Introduction
RNA editing is a post-transcriptional process that modifies the nucleotide sequence of RNA [1]. During editing events, information from genomic DNA (gDNA) to the RNA and protein is concomitantly trimmed [2]. RNA editing events can be divided into nucleotide conversion and nucleotide deletion/insertion. Nucleotide conversion has three main forms [2]: cytosine (C) to uracil (U), uracil (U) to cytosine (C) and adenosine (A) to inosine (I). In plants, C-to-U conversion is the most prevalent form [3], while U to C transitions have only been reported in a few species, such as ferns, lycophytes and hornworts [4]. A to I editing commonly occurs in introns, as well as in the 5 and 3 untranslated regions (UTRs) of RNA sequences in a few mammals [2,5]. Nucleotide deletion/insertion mainly occurs at U and guanosine (G) [2,6].
RNA editing of C-to-U was first reported in the mitochondria of flowering plants [7][8][9], followed by in plant chloroplasts [10]. C-to-U conversion seems to occur only in these two energy-producing organelles [11]. In addition to the coding sequence (CDS) of messenger RNA (mRNA), C-to-U editing events occur in ribosomal RNA (rRNA) and transfer RNA (tRNA), as well as in the intron sequences and 5 and 3 UTRs of mRNAs [2]. Most C-to-U editing sites are located at the first or second position of a codon and can alter the codons by introducing a new initiation (ACG to AUG) or stop codon (CGA to UGA, CAA to
To determine RNA editing events further, DNA-seq and RNA-seq data of LJ43 were applied to detect editing sites using RES-Scanner software (https://github.com/ZhangLabSZ/ RES-Scanner (accessed on 1 October 2021)) [38]. A total of 125 RNA editing sites were found in the tea chloroplast genome (Table S3), 65.6% (82/125) of which were successfully annotated with BEDtools software (version 2.29.0, https://bedtools.readthedocs.io/en/latest/, accessed on 1 October 2021). Except for two editing sites in tRNA, the other 80 editing sites were annotated in protein-coding regions. A total of 35 editing sites had two types of editing, while the remaining 45 sites had only one type of editing. In mitochondria, 352 RNA editing sites were identified (Table S3), among which 227 were successfully annotated. A total of 91.2% (207/227) of the successfully annotated editing sites were in protein coding regions, and the remaining 8.8% (20/227) were in rRNA.

Confirmation of RNA Editing Sites in Tea Chloroplasts and Mitochondria
To confirm the RNA editing events, regions covering the editing sites were PCRamplified using the gDNA and cDNA of LJ43 as templates. Through sequencing, 38 RNA editing sites from 22 chloroplast genes were identified and confirmed, all of which were of the C-to-U type. Among them, eight sites occurred in ndhB, five sites in ndhD, three sites in matK, and two sites in atpA, rps2 and rpoC2, while the remaining 16 chloroplast genes had only one editing site (Table 1 and Figure S1). For mitochondria, 139 RNA editing sites (Table 2 and Figure S2) were detected distributing in the transcript sequences of 22 mitochondrial genes (average of 6.3 sites per gene). Among these genes, ccmB contained the most editing sites (n = 34), followed by nad5 (n = 17), cob (n = 15), cox2 (n = 15), atp4 (n = 11), atp1 (n = 6), rpl5 (n = 6), matR (n = 4), rpl10 (n = 4) and rps12 (n = 4).     We also investigated the effects of the editing events on the codons. Of the 177 sites, 57 (

Sequence Features around RNA Editing Sites
A previous study found that the −1 position of the edited C is always T [18]. To explore the base preference around the above-described editing sites, adjacent sequences were analyzed. As shown in Figure 3A, the −5 (50.8%), −2 (48%), −1 (66.1%) and +4 (40.6%) positions upstream and downstream of the edited C frequently tended to be T. Further analysis revealed three conserved motifs in the upstream regions of 37 RNA editing sites ( Figure 3B). Motifs 1 and 3 were observed upstream of six and seven RNA editing sites, respectively. The overall E-value cut-off for Motif 1 was 4.3 × 10 −4 , whereas that for Motif 3 was 1.6 × 10. Motif 2 was discovered 25 bp upstream of 23 RNA editing sites, and its overall E-value cut-off was 5.5 × 10 −4 . These three conserved motifs imply potential roles in the recognition of editing sites by RNA editing factors. We also investigated the effects of the editing events on the codons. Of the 177 sites, 57 ( (Table 1 and Table  2).

Sequence Features around RNA Editing Sites
A previous study found that the −1 position of the edited C is always T [18]. To explore the base preference around the above-described editing sites, adjacent sequences were analyzed. As shown in Figure 3A, the −5 (50.8%), −2 (48%), −1 (66.1%) and +4 (40.6%) positions upstream and downstream of the edited C frequently tended to be T. Further analysis revealed three conserved motifs in the upstream regions of 37 RNA editing sites ( Figure 3B). Motifs 1 and 3 were observed upstream of six and seven RNA editing sites, respectively. The overall E-value cut-off for Motif 1 was 4.3 × 10 −4 , whereas that for Motif 3 was 1.6 × 10. Motif 2 was discovered 25 bp upstream of 23 RNA editing sites, and its overall E-value cut-off was 5.5 × 10 −4 . These three conserved motifs imply potential roles in the recognition of editing sites by RNA editing factors.

Impact of RNA Editing on the Subsequent Interpretation of Genetic Information
To understand the effect of RNA editing on the target sequences, protein transmembrane domains, RNA secondary structures, protein secondary structures and partial protein 3-D structures of genes containing RNA editing sites were predicted before and after RNA editing. Although the transmembrane domains of all proteins were not altered by RNA editing (Table S4), the protein secondary structure of 16 chloroplast proteins (atpA, atpB, psbE, rpoA, psbZ, ndhF, ndhD, ndhB, ndhC, ndhH, ndhA, rps18, rps2, matK, rpoC1 and rpoC2) and 21 mitochondrial proteins except nad41 were changed, as were the RNA secondary structure of 11 chloroplasts genes (psbE, rpoA, psbZ, ndhF, ndhD, ndhB, ndhC, ndhH, ndhA, rps18 and rpoC2) and all mitochondrial genes except rps7. By contrast, neither the RNA nor protein secondary structures of psbF, atpF, rps8, ndhK and psaI were affected by RNA editing. In addition, 3-D models of six proteins with at least eight editing sites-ndhB, nad5, atp4, cox2, cob and ccmB-were constructed using the SWISS-MODEL. Although eight RNA editing events occurred in ndhB ( Figure S3A,B), the 3-D structure of its protein product was not affected. In mitochondria, the 3-D protein structure of atp4 also did not change significantly ( Figure S3C,D). However, RNA editing of cox2 introduced a DINUCLEAR COPPER ION monomer ( Figure 4A,B). In cob, two PROTOPORPHYRIN IX CONTAINING FE monomers were generated by RNA editing ( Figure 4C,D), and there were only five α-helices before ccmB editing, compared to nine α-helices and two β-sheets thereafter ( Figure 4E,F). nad5 gained one α-helix but lost two β-sheets after RNA editing ( Figure 4G,H). These results imply that the function of organelle proteins might be affected by RNA editing sites.

Impact of RNA Editing on the Subsequent Interpretation of Genetic Information
To understand the effect of RNA editing on the target sequences, protein transmembrane domains, RNA secondary structures, protein secondary structures and partial protein 3-D structures of genes containing RNA editing sites were predicted before and after RNA editing. Although the transmembrane domains of all proteins were not altered by RNA editing (Table S4), the protein secondary structure of 16 chloroplast proteins (atpA, atpB, psbE, rpoA, psbZ, ndhF, ndhD, ndhB, ndhC, ndhH, ndhA, rps18, rps2, matK, rpoC1 and rpoC2) and 21 mitochondrial proteins except nad41 were changed, as were the RNA secondary structure of 11 chloroplasts genes (psbE, rpoA, psbZ, ndhF, ndhD, ndhB, ndhC, ndhH, ndhA, rps18 and rpoC2) and all mitochondrial genes except rps7. By contrast, neither the RNA nor protein secondary structures of psbF, atpF, rps8, ndhK and psaI were affected by RNA editing. In addition, 3-D models of six proteins with at least eight editing sites-ndhB, nad5, atp4, cox2, cob and ccmB-were constructed using the SWISS-MODEL. Although eight RNA editing events occurred in ndhB ( Figure S3A,B), the 3-D structure of its protein product was not affected. In mitochondria, the 3-D protein structure of atp4 also did not change significantly ( Figure S3C,D). However, RNA editing of cox2 introduced a DINUCLEAR COPPER ION monomer ( Figure 4A,B). In cob, two PROTOPORPHYRIN IX CONTAINING FE monomers were generated by RNA editing ( Figure 4C,D), and there were only five α-helices before ccmB editing, compared to nine α-helices and two β-sheets thereafter ( Figure 4E,F). nad5 gained one α-helix but lost two β-sheets after RNA editing ( Figure 4G,H). These results imply that the function of organelle proteins might be affected by RNA editing sites.

Relationship between RNA Editing and Etiolation and Albino Tea Plants
Previous reports found that RNA editing events were associated with leaf color changes in plants [12,30]. To explore the potential relationship between RNA editing and

Relationship between RNA Editing and Etiolation and Albino Tea Plants
Previous reports found that RNA editing events were associated with leaf color changes in plants [12,30]. To explore the potential relationship between RNA editing and tea leaf color changes, editing levels of matK-445 (position 445 of matK), matK-701, ndhD-674 and ndhD-1310 sites in different cultivars (including LJ43 and SC1 with normal green leaves, HJYA and ZH3 with etiolation leaves and HB1 and BY1 with albino leaves) were analyzed. As shown in Figure 5, the ratio of the T peak to the sum of the C and T peak heights in the sequencing chromatogram represents the level of editing in the individual transcripts. The editing level of matK-445 was over 90% in SC1 and HB1, around 65% in BY1 and slightly more than 50% in LJ43, HJYA and ZH3. The editing extent of matK-701 was approximately 30% in the albino cultivars BY1 and HB1, and about 30% and 45% in etiolation cultivars HJYA and ZH3, respectively, whereas it exceeded 95% in green varieties LJ43 and SC1. In LJ43, SC1 and HB1, more than 80% of ndhD-674 was edited, whereas about half C was edited in ZH3 and around 30% in BY1 and HJYA. The extent of editing of ndhD-1310 was completely edited in LJ43,~80% in SC1 and HB1,~50% in ZH3,~40% in HJYA and <10% in BY1. These results indicated that the levels of RNA editing might have some physiological connection with the color change of tea leaves. tea leaf color changes, editing levels of matK-445 (position 445 of matK), matK-701, ndhD-674 and ndhD-1310 sites in different cultivars (including LJ43 and SC1 with normal green leaves, HJYA and ZH3 with etiolation leaves and HB1 and BY1 with albino leaves) were analyzed. As shown in Figure 5, the ratio of the T peak to the sum of the C and T peak heights in the sequencing chromatogram represents the level of editing in the individual transcripts. The editing level of matK-445 was over 90% in SC1 and HB1, around 65% in BY1 and slightly more than 50% in LJ43, HJYA and ZH3. The editing extent of matK-701 was approximately 30% in the albino cultivars BY1 and HB1, and about 30% and 45% in etiolation cultivars HJYA and ZH3, respectively, whereas it exceeded 95% in green varieties LJ43 and SC1. In LJ43, SC1 and HB1, more than 80% of ndhD-674 was edited, whereas about half C was edited in ZH3 and around 30% in BY1 and HJYA. The extent of editing of ndhD-1310 was completely edited in LJ43, ~80% in SC1 and HB1, ~50% in ZH3, ~40% in HJYA and <10% in BY1. These results indicated that the levels of RNA editing might have some physiological connection with the color change of tea leaves.

Interaction Prediction of PLS-CsPPR and the Target Sequences of RNA Editing Sites
Numerous studies indicate that PLS-PPR proteins are important RNA editing factors [2,3], and 295 PLS-CsPPR proteins were found in tea plants [36]. Therefore, we used the method of Kobayashi et al. (see material and methods) to analyze the possible interactions between the PLS-CsPPR proteins and target sequences containing validated RNA editing sites. As shown in Table S6, 159 of the 177 target sequences were predicted to interact with 218 PLS-CsPPR proteins. Among them, 36.7% of PLS-CsPPR proteins had only one target sequence; 25.2% had two target sequences; 19.7% had three target sequences; 8.3% had four target sequences; and 10% had ≥5 target sequences. The potential interactions between PLS-CsPPR proteins and target sequences implied that the PLS-CsPPR proteins might be involved in the recognition of these RNA editing sites.

Discussion
RNA editing seemed to be a challenge to the central dogma of molecular biology at the transcriptional level, and has received increasing attention [2,3]. In recent years, a large number of RNA editing factors were reported to be involved in the C-to-U deamination reaction [39][40][41], and many studies attempted to use expressed sequence tags (EST) sequences [42] or high-throughput sequencing technology to identify additional RNA editing sites in plants [43,44]. Moreover, in order to distinguish the changed sites due to heteroplasmy or the RNA editing events, total RNA-seq as well as DNA-seq was used for analysis [45], but the heteroplasmy could not be excluded in this work.
In this study, we used two software packages to investigate RNA editing sites. PREP predicted 52 and 337 RNA editing sites in the tea plant chloroplast and mitochondrial genomes, respectively (Figure 2), while RES-Scanner software (https://github.com/ZhangLabSZ/RES-Scanner (accessed on 1 October 2021)) identified 125 and 300 sites, respectively. One reason for the different numbers of editing sites was that PREP could only predict RNA editing sites in the protein-coding regions of 35 chloroplast genes and 43 mitochondrial genes; another was that RES-Scanner could cause read mismatches, resulting in false-positive editing sites. Therefore, PCR amplification with gDNA and cDNA templates was performed to verify the combined predicted results. In total, 38 and 139 C-to-U RNA editing events were found in the protein-coding regions of chloroplast and mitochondrial genes, respectively. Similar to previous studies [2,3], the C-to-U type is the main form of RNA editing in tea plants. Although this method is widely used to identify RNA editing sites, we cannot exclude that these editing sites are caused by heterogeneity [45].
Previous studies suggested that RNA editing events mostly occurred at the first or second position of a codon [2,11]. However, we found four RNA editing sites located at the third position of the codon, similar to Sweet Sorghum [46]. A study in Arabidopsis noted a higher editing level of base C when the adjacent −1 position was T, whereas a lower editing level was seen when the −1 position was G [18]. In the −1 position of the validated RNA editing sites in this study, T was more frequent (Figure 3A), similar to A. thaliana. Generally, 20-25 nucleotides upstream of the RNA editing site are involved in the binding of editing factors [47]. Consistent with previous results for S. miltiorrhiza [21], analysis with MEME software (http://meme-suite.org/tools/meme (accessed on 19 March 2022)) found three conserved motifs (with lengths of 15, 15 and 11 nucleotides) in tea plants ( Figure 3B). These motifs might be involved in the recognition of editing sites by RNA editing factors.
RNA editing changes the nucleotide at the corresponding position [2,11] and alters the transmembrane domain and number of alpha helices. However, we found no changes in the protein transmembrane domain before and after RNA editing in tea plants, possibly because RNA editing events rarely or never occurred in the transmembrane region. Similar to previous findings [18], our prediction data showed that the RNA secondary structure of 32 genes and protein secondary structures of 37 genes were affected by RNA editing. In addition, 3-D structure analysis showed that the 3-D structures of five mitochondrial proteins (nad5, atp4, cox2, cob and ccmB) were altered after RNA editing. Abnormal editing of mitochondrial genes can lead to a pale green leaf phenotype [46]. Therefore, RNA editing seems to be tightly associated with the color change of plant leaves.
To date, a total of five types of RNA editing factors were discovered, among which the PLS-PPR protein is considered to be the most important and has been extensively studied [2,3]. The second, fifth, and last positions of the PPR motif play an important role in the recognition of RNA bases by the PPR motif [23,48] and can help the PPR motif recognize RNA bases [49,50]. However, a recent study noted that using the second, fifth and last amino acids of the PPR motif to predict downstream target sequences containing RNA editing sites would obtain more accurate results [51]. An absence of RNA editing can cause albino leaves and etiolation in some Arabidopsis and rice mutants, such as atclb19, osppr6, osdua1 and atgun1 [12,13,30,52], and many albino and etiolation varieties of tea plants have been bred [32], such as Baiye 1, Huabai 1, Huangjinya, etc. However, whether the extent of RNA editing is affected by the color change of tea leaves remains unclear. We investigated the extent of editing of four RNA editing sites in six varieties. Most notably, matK-701 was edited more than 90% in green varieties (SC1 and LJ43), whereas it was about 30% in albino varieties (BY1 and HB1) and no more than 50% in etiolation varieties (HJYA and ZH3). A previous study found that OTP81 (QED1) plays a very important role in the editing of matK-706 [53], and it was found to be associated with the color change of plant leaves [52,54]. Perhaps a similar gene exists in the tea plants; it is our next research focus.

Identification of RNA Editing Sites in Tea Chloroplasts and Mitochondria
The RNA editing sites were identified using the Predictive RNA Editor for Plants suite (PREP-suite, http://prep.unl.edu (accessed on 11 November 2021)) [37]. DNA-seq and RNA-seq of LJ43 were used to determine RNA editing sites further. Firstly, the trimmomatic (Version 0.39) [56] was used to filter low-quality reads in DNA-seq and RNA-seq with the following parameters: LEADING and TRAILING: 3, MINLEN: 51. Then, the FastQC (Version 0.11.9, https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (accessed on 1 October 2021)) was used to check the quality of the reads. Finally, the RES-Scanner [38] was used to identify RNA editing sites using the chloroplast genome of LJ43 and the mitochondrial genome of C. sinensis var. Assamica as reference sequences. After that, RNA editing sites were annotated using bedtools [57] according to gene features.

Validation of Predicted RNA Editing Sites
The tea leaves were divided into two parts along the main vein of the leaves. Half of them were used to extract gDNA (RNA free) using a Plant Genomic DNA Isolation Kit (Tsingke, Beijing, China). The total RNA was extracted from the other half of the samples using a Plant Total RNA Isolation Kit (Vazyme, Nanjing, China). A total of 1% agarose gel electrophoresis was used to check the integrity of total RNA. The concentration of total RNA was measured with a NanoDrop ND 1000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). A total of 1-2 µg of the total RNA was used to synthesize the first-strand complementary DNA (cDNA) with HiScript Q RT SuperMix for qPCR (+gDNA wiper) (Vazyme, Nanjing, China). Specific primers were used to amplify genes containing RNA editing sites using the gDNA and cDNA as templates, respectively (Table S1).

Contextual and Potential Motifs Analysis around RNA Editing Sites
To study the upstream and downstream bases' preference of RNA editing sites, 10 bases (except position 0) from the position −5 to +5 of the editing site were extracted. In addition, the upstream 25 bases of the RNA editing site were used to discover potential binding motifs for RNA editing factors using Multiple Em for Motif Elicitation [58] (MEME, http://meme-suite.org/tools/meme (accessed on 19 March 2022)).

Interaction Analysis between PLS-CsPPR Proteins and Target Sequence Containing RNA Editing Sites
Interaction analysis between the PLS-CsPPR protein and their target sequence was performed according to the method of Kobayashi et al. [49] with minor modifications. In brief, the PPR motifs of PLS-CsPPR were obtained from our previous study [36]; the second, fifth and last amino acid residues of these PPR motifs were extracted to form 3-, 2-and 1-letter codes (Table S5). The PPR code dataset of Kobayashi et al. [49] was used to compose the base preference matrix of PLS-CsPPR proteins. The 51 bases surrounding the RNA editing site (position −25 to position +25) were used as target sequences. The PPR matrices and target sequences as queries to the Find Individual Motif Occurrences [63] (FIMO, https://meme-suite.org/meme/tools/fimo (accessed on 13 April 2022)) program in the MEME suite software (https://meme-suite.org/meme (accessed on 13 April 2022)) in the MEME suite.

Conclusions
RNA editing is an important post-transcriptional process, which alters the nucleotide sequences of RNA, such that the genetic information and phenotype of plants are changed. PPR proteins widely exist in plants and are dominantly localized in plastids and mitochondria, which play essential roles in organellar RNA metabolism. This study systematically analyzed RNA editing events in tea chloroplast and mitochondria genomes; 38 and 139 RNA editing events were validated by PCR and sequence analysis. Analysis of the base preference around the RNA editing sites showed that in the −1 position, C to T were more frequent occurrences. Structural analyses indicated that RNA the secondary structure of 32 genes, protein secondary structure of 37 genes and the 3-D structure of 5 proteins were altered due to RNA editing. The editing level analysis of 4 RNA editing sites (matK-445, matK-701, ndhD-674 and ndhD-1310) in 6 cultivars (2 green varieties, 2 albino varieties and 2 etiolation varieties) indicated that matK-701 might be involved in the color change of tea leaves. Furthermore, the interaction between 159 target sequences and 218 PLS-PPR proteins were predicted through PPR-RNA code. These data provide new insights into of RNA editing phenomenon of the tea plant, which will facilitate further study of albino and etiolation in tea leaves.