Comparative Phylogenetic Analysis of Ancient Korean Tea ‘Hadong Cheon-Nyeon Cha (Camellia sinensis var. sinensis)’ Using Complete Chloroplast Genome Sequences

Wild teas are valuable genetic resources for studying evolution and breeding. Here, we report the complete chloroplast genome of the ancient Korean tea ‘Hadong Cheon-nyeon Cha’ (C. sinensis var. sinensis), which is known as the oldest tea tree in Korea. This study determined seven Camellia sinensis var. sinenesis, including Hadong Cheon-nyeon Cha (HCNC) chloroplast genome sequences, using Illumina sequencing technology via de novo assembly. The chloroplast genome sizes ranged from 157,019 to 157,114 bp and were organized into quadripartite regions with the typical chloroplast genomes. Further, differences in SNPs and InDels were detected across the seven chloroplast genomes through variance analysis. Principal component and phylogenetic analysis suggested that regional constraints, rather than functional constraints, strongly affected the sequence evolution of the cp genomes in this study. These genomic resources provide evolutionary insight into Korean tea plant cultivars and lay the foundation for a better understanding of the ancient Korean tea plant HCNC.


Introduction
Tea (extract from the leaves of tea plant Camellia sinensis (L.) O. Kuntze) is one of the most popular beverages worldwide as it features attractive flavors and provides substantial economic value.The demand for tea production has recently increased due to its characteristic secondary metabolites, such as catechins, polyphenols, and caffeine, which have numerous human health benefits [1].Based on morphological and physiological characteristics, the three main natural hybrids that make up the cultivated taxa of tea are C. sinensis (L.) O. Kuntze (also named China type), C. assamica (Masters) (also named Assam type), and C. assamica subsp.lasiocalyx (Planchon ex Watt.) (also called Cambod or Southern type) [2,3].In Asia, tea plants have been grown for thousands of years.It was said to have originated in the Assam region of India and the Yunnan Province of China [4].
The origin of Korean tea, however, is not clear.Many Korean historical accounts state that tea seeds were first brought from China to Korea in the early 9th century, although widespread tea cultivation did not begin until the 12th century [5].There is still an argument about whether there were wild tea plants native to Korea.Two types of green tea have been produced in Korea: the native variety, which comes naturally in the surrounding area of Mountain Jiri, and the cultivated type, whose breeding lines come from China and Japan [5].Korean tea is mainly produced in Gyeongsang Province, Jeolla Province, and Jeju Island.Among these regions, the tea cultivation area near Mt.Jiri in Hadong in south Gyeongsang province and Boseong in south Jeolla province are known to be the most prominent areas.A study comparing and analyzing the different genetic resources of the wild green tea population was carried out because, among these, there is room for discussion about the genetic background of the wild tea population flourishing in the Hadong region.Consequently, the cultivated and wild populations of green tea in China and Japan originated from distinct sources compared to the cultivated variety [6].The wild tea tree of the Hadong region named 'Hadong Cheonnyeon Cha' ('Cheonnyeon' means thousand years, while 'Cha' means tea in Korean), used in the previous study, is known to be the oldest tea tree in Korea, is estimated to be over 800 years old.Due to its high historical value, the 'Hadong Cheonnyeon Cha (HCNC)' population is managed and protected in Korea.Moreover, the genetic resources found in the native tea plants remain crucial for the breeding of green tea.
Functional genomics is now essential to understanding tea plant biology thanks to the development of sequencing technology [7,8].The advances in next-generation sequencing technology, especially third-generation sequencing technology that produces reads longer than 10 kb, have recently led to decoding several cp genomes [9].Because of this, even within tiny taxonomic groups, phylogenetic analyses based on cp genome data are becoming more accepted [10].The chloroplast DNA, sometimes called the chloroplast genome, is commonly referred to as cp DNA [11].According to Jansen et al. [12] and Palmer [13], it has the usual quadripartite structure, which is typically composed of four parts: a large single copy (LSC), a small single copy (SSC), and two inverted repeats (IRs) [12].Plant cells or leaves typically contain 400-1600 copies of the chloroplast genome [14].Numerous species' taxonomic and phylogenetic analyses have profited from the effective application of the chloroplast genome sequence [15].Additionally, there is a significant difference in the molecular evolution rate between the coding and non-coding regions [13].However, cp DNA does not undergo genetic recombination, is highly conserved [16], and exhibits maternal inheritance [17].Environmental variables and the chloroplast's developmental program impact the regulation of gene expression, which occurs at multiple stages, namely transcription, post-transcription, translation, and post-translation [18].Most of the regulation over chloroplast gene expression occurs at the post-transcriptional stage [18].As a result, the plant chloroplast genome offers several benefits for illuminating species relationships and yielding a wealth of crucial information regarding chloroplast genetic change [19].Because all the cp genomes have a low mutation rate and limited recombination, they are dependable sources of information for determining phylogeny and evolutionary history based on highly conserved gene content and structure.Several cp genomes have been used extensively recently, particularly in the Camellia genus, for phylogenetic reconstruction.Five SSR markers were found by Liu et al. [20] to be a core marker collection that is suggested for fingerprinting tea plant cultivars or accessions [20].Consequently, whole-genome sequencing and resequencing are now valuable techniques for determining the differences between various tea kinds, mining functional genes, and researching the origins of tea plants [21].Lee et al. [22] completed chloroplast genome sequence of the chloroplast genome sequence of the Korean C. sinensis cultivar Sangmok.According to their phylogenetic analysis, the C. sinensis L. cultivar Sangmok is closely related to KJ806277 Camellia pubicosta [22].By combining the RNA-Seq data of 217 different tea accessions with high-quality chromosome-scale reference genome for an ancient tea tree (DASZ), Zhang et al. [23] were able to clarify the lineage of tea cultivars and identify the major players in the breeding of Chinese tea [23].Furthermore, the C. sinensis (CSS) populations underwent a stronger selection for flavor and disease resistance during domestication than the C. sinensis (CSA) populations, according to the whole-genome resequencing of 139 tea accessions worldwide [21].
In this study, we complete cp genome sequences of seven C. sinensis var.cultivars to investigate the origin and genetic comparison between HCNC and other C. sinensis cultivars.Our genetic analyses suggested that HCNC might have evolved differently from the Chinese or the Japanese C. sinensis cultivars.

Plant Materials and DNA Sequencing
Fresh leaves of seven (four Korean, one Chinese, and two Japanese) specimens of C. sinensis cultivars were collected in the experimental field of the Institute of Hadong Green Tea in Korea.The leaves were immediately preserved in liquid nitrogen before DNA extraction.The total genomic DNA was extracted with the DNeasy Plant Mini Kit (Qiagen, Valencia, CA, USA), following the manufacturer's instructions.The final DNA concentration was measured using a NanoDrop and Qubit spectrophotometer.Genome libraries (631 bp) were constructed using the Covaris S series (Covaris, MS, USA), following the manufacturer's instructions.After purification, the extracted DNA was used to generate paired-end sequencing libraries according to the Illumina standard protocol (Illumina, San Diego, CA, USA).Genome sequencing was carried out on the Illumina Hiseq 2500 platform, following the manufacturer's protocol (Illumina, San Diego, CA, USA).After sequencing and data treatment, 157,511,170,132-196,874,632,856 clean reads were retrieved for the seven C. sinensis chloroplast genomes.

Genome Assembly
FastQC and Sickle trimmed all of the raw reads.Next, we performed genome assembly using Trimmomatic0.38,BWA (v0.6.1),Picard 2.9.0, GATK (v4.1.3.0), and SnpEff v4.3t.The low-quality nucleotides and adapter sequences in each read were removed using Trimmomatic.The draft genome sequence of the tea plant (Camellia sinensis), downloaded from the database of the NCBI (https://www.ncbi.nlm.nih.gov/sra,accessed on 15 October 2023), was used as a reference genome.Using the default parameters, paired-end reads were mapped to the tea reference genome with BWA (v0.6.1)software.The Picard package was used to filter the duplicated reads.After removing duplicated reads with Picard 2.9.0, the variants in each sample were called using a GATK HaplotypeCaller.After alignment, SNP calling was conducted per individual using SAMtools.The CoverageBed program in BEDtools (v2.17.0) was used to calculate the coverage of sequence alignments.SAMtools software (v1.10.2) was used to convert mapping results into the BAM format and to filter the unmapped and nonunique reads.

Functional Annotation of Genetic Variants
SNP annotation was conducted based on the draft genome of C. sinensis using the SnpEff program.According to the annotation information, SNPs were distributed in the upstream regions, downstream regions, intergenic regions, exonic regions, 3 ′ UTRs, 5 ′ UTRs, splicing sites (which were distributed in 1 kb regions away from the transcription start site), and intergenic regions.Moreover, SNPs in exonic regions were further divided into synonymous SNPs (sSNPs) or non-synonymous SNPs (nsSNPs).The phylogenetic constructions were applied using two methods: neighbor joining and maximum likelihood based on a distance matrix calculated with MAFFT (v7.123b),MUSCLE (v3.8.31), Gblocks (0.91b), and FastTree (2.1.11).The workflow of the annotation procedure is described in Figure 1.PCA was also used to evaluate the genetic differentiation of the seven C. sinensis populations using the R package (RColorBrew, ggplot2, phylogram; v3.5.3).The aligned results were trimmed by trimAI v1.4.The phylogenetic analyses were implemented using maximum likelihood (ML) and neighbor-joining analysis methods based on the complete cp genome data.SNP data of the seven cp genomes sequenced in this study were used to perform PCA using GCTA v1.25.2, and the first two components were plotted.

Comparative Analysis and Gene Prediction
The chloroplast genome size and organization were compared, and the differences in the IR border of seven C. sinensis chloroplast genomes were analyzed.First, genome masking was conducted through RepeatMasker (version open 4.0.7) and RepeatModeler (version open 1.0.11).Gene prediction was performed with three categories-ab initio prediction, transcript alignments, and related protein alignments-as the tools of ab initio prediction used Augustus (Augustus 3.3) and GlimmerHMM (GlimmerHMM 3.0.4).Also, PASA (PASApipeline 2.2.0) was used for transcript alignments.Related protein alignments were performed with Exonerate (exonerate 2.2.0 x86_64) and GenomeThreader (gth 1.6.6).For consensus gene structure, they used EVidenceModeler (EvidenceModeler1.1.1).Finally, there is UTR annotation with PASA (PASApipeline2.2.0).

Principal Component Analysis (PCA) and Phylogenetic Analysis
Twenty-two complete chloroplast genome sequences were used in the phylogenetic analysis, including seven C. sinensis genome samples and fifteen completed genomes of C. sinensis from GenBank.All chloroplast genome sequences were aligned using the MAFFT algorithm on the MAFFT v7.123b and adjusted manually as needed gene maps of the seven C. sinenesis chloroplast genomes.The circular maps of seven C. sinensis chloroplast's complete genomes were drawn using OrganellarGenomeDRAW v. 1.3.1 (OG-DRAW) [24].The phylogenetic reconstructions were applied using four methods: neighbor joining (NJ), maximum likelihood (ML), UPGMA, and minimum evolution (ME).Phylogenetic trees of seven C. sinensis were constructed using MEGA-X (Version 10.0.5) through the four methods described above.

Chloroplast Genome Sequencing and Assembly
The chloroplast genomes of seven specimens of the C. sinensis cultivars were sequenced using the Illumina HiSeq 2500 system, producing clean data ranging from 157 to 196 Giga base pairs (Table 1).After achieving clean reads (98.5%), they were mapped to the complete genome, respectively.The genome of HCNC was produced using 162 Gbp clean reads, representing a total of 927 Gbp paired-end reads (Table 1).The mapped reads of seven C. sinensis were from 888,862,712 to 1,109,091,777 bp (Table 1).More than 24.3

Comparative Analysis and Gene Prediction
The chloroplast genome size and organization were compared, and the differences in the IR border of seven C. sinensis chloroplast genomes were analyzed.First, genome masking was conducted through RepeatMasker (version open 4.0.7) and RepeatModeler (version open 1.0.11).Gene prediction was performed with three categories-ab initio prediction, transcript alignments, and related protein alignments-as the tools of ab initio prediction used Augustus (Augustus 3.3) and GlimmerHMM (GlimmerHMM 3.0.4).Also, PASA (PASApipeline 2.2.0) was used for transcript alignments.Related protein alignments were performed with Exonerate (exonerate 2.2.0 x86_64) and GenomeThreader (gth 1.6.6).For consensus gene structure, they used EVidenceModeler (EvidenceModeler1.1.1).Finally, there is UTR annotation with PASA (PASApipeline2.2.0).

Principal Component Analysis (PCA) and Phylogenetic Analysis
Twenty-two complete chloroplast genome sequences were used in the phylogenetic analysis, including seven C. sinensis genome samples and fifteen completed genomes of C. sinensis from GenBank.All chloroplast genome sequences were aligned using the MAFFT algorithm on the MAFFT v7.123b and adjusted manually as needed gene maps of the seven C. sinenesis chloroplast genomes.The circular maps of seven C. sinensis chloroplast's complete genomes were drawn using OrganellarGenomeDRAW v. 1.3.1 (OGDRAW) [24].The phylogenetic reconstructions were applied using four methods: neighbor joining (NJ), maximum likelihood (ML), UPGMA, and minimum evolution (ME).Phylogenetic trees of seven C. sinensis were constructed using MEGA-X (Version 10.0.5) through the four methods described above.

Chloroplast Genome Sequencing and Assembly
The chloroplast genomes of seven specimens of the C. sinensis cultivars were sequenced using the Illumina HiSeq 2500 system, producing clean data ranging from 157 to 196 Giga base pairs (Table 1).After achieving clean reads (98.5%), they were mapped to the complete genome, respectively.The genome of HCNC was produced using 162 Gbp clean reads, representing a total of 927 Gbp paired-end reads (Table 1).The mapped reads of seven C. sinensis were from 888,862,712 to 1,109,091,777 bp (Table 1).More than 24.3 billion bases from high-throughput sequencing (Q20 was 99.99%), along with the chloroplast genome of seven C. sinensis, were assembled according to "depth range" (≥510) and used to align with the reference GCF_004153795.1.

Chloroplast Genome Features of HCNC
The total length of seven C. sinensis chloroplast genomes ranged from 157,012 bp to 157,104 bp (Table 1, Figures 2 and 3).All these chloroplast genomes exhibited the typical quadripartite structure, consisting of a pair of IRs separated by the LSC and SSC regions (Figure 3).The entire chloroplast genome of HCNC has four standard structural regions, which include a large single copy (LSC), a small single copy (SSC), and two reverse repeat regions (IRa and IRb) (Figure 2).The total GC content of the HCNC cp genome was 38.3%.The chloroplast genomic characteristics of HCNC were compared with six other C. sinensis cultivars (Figure 3).The whole genome length of HCNC was not much different from that of the other C. sinensis cultivars.Among them, the whole chloroplast genome size of HCNC and Chinese wild type was the closest (Figures 2 and 3).

Chloroplast Genome Sequence Variations in HCNC 3.4.1. Variations in the Chloroplast Genome of HCNC
The seven C. sinensis cp genome sequences were aligned to understand the characteristics of variations through variant analysis.As a result, alternative genotypes of InDel and SNP were detected for each position (Tables 4 and 5).InDel variations were detected to be the most numerous in the Chinese wild type, while HCNC has the most Homozygous type variations number.The SNP variation analyses revealed that the Chinese wild type was the most frequent, and HCNC has 74,542,948 SNP variations in the whole cp genome (Table 5).Table 6 and Figure 3 show a complete genome analysis, including the number and distribution of variations in each region.The region with the most distribution of variations, except for the intergenic region, was the upstream region.Variations in the exon region accounted for 1.43% of the total account (Table 6 and Figure 4).

Comparison of Chloroplast Genome Sequence Variations
The SNP analysis was performed to compare and analyze genetic similarity b seven C. sinensis cp genome sequences.There are two types of SNP in the exon synonymous SNPs and non-synonymous SNPs.Synonymous SNPs usually prod same protein even if the one base changes, whereas non-synonymous SNPs produ sense and nonsense mutations.Non-synonymous SNP analyses were performed ways: constructing a phylogenetic tree using obtained non-synonymous SNP dat variant annotation analysis and principal component analysis (PCA).The phylo tree was constructed by neighbor-joining methods (Figure 5).According to the fin two distinct groupings can be made up of seven cultivars: three Korean cultivars clu into one group, while China and Japanese cultivars clustered into another.The H wild type was the first to be separated from the sister clade.Using SNP data, w formed PCA to evaluate the relationships between the seven C. sinensis cp genom ure 6).Findings suggested that there was a substantial genetic diversity among nomes.In the PCA plot, three Korean cultivars, Beachwisull, Keumsull, and Hadon type, were grouped according to their geographical origin.Interestingly, HCNC separated by far distance from other cultivars.The Japanese cultivar, Saemidori, is d in the opposite direction with HCNC.The Fushun, which is the Japanese cultivar, a Chinese wild type were slightly separated from the Korean group.

Comparison of Chloroplast Genome Sequence Variations
The SNP analysis was performed to compare and analyze genetic similarity between seven C. sinensis cp genome sequences.There are two types of SNP in the exon region: synonymous SNPs and non-synonymous SNPs.Synonymous SNPs usually produce the same protein even if the one base changes, whereas non-synonymous SNPs produce missense and nonsense mutations.Non-synonymous SNP analyses were performed in two ways: constructing a phylogenetic tree using obtained non-synonymous SNP data from variant annotation analysis and principal component analysis (PCA).The phylogenetic tree was constructed by neighbor-joining methods (Figure 5).According to the findings, two distinct groupings can be made up of seven cultivars: three Korean cultivars clustered into one group, while China and Japanese cultivars clustered into another.The Hadong wild type was the first to be separated from the sister clade.Using SNP data, we performed PCA to evaluate the relationships between the seven C. sinensis cp genomes (Figure 6).Findings suggested that there was a substantial genetic diversity among the genomes.In the PCA plot, three Korean cultivars, Beachwisull, Keumsull, and Hadong wild type, were grouped according to their geographical origin.Interestingly, HCNC is only separated by far distance from other cultivars.The Japanese cultivar, Saemidori, is divided in the opposite direction with HCNC.The Fushun, which is the Japanese cultivar, and the Chinese wild type were slightly separated from the Korean group.

Phylogenetic Analysis
Past research has shown that terrestrial plants' chloroplast genome has been a valuable source among related species, which is applied in phylogenetic studies [25,26].This paper aligned all chloroplast genomes of seven cultivars, including three Korean cultivars, one Chinese cultivar, three Japanese cultivars, and Korean HCNC.The phylogenetic tree was constructed using neighbor-joining, maximum likelihood, and UPGMA methods.Firstly, we identified the relationships of seven C. sinensis cultivars complete in this study (Figure 7a-c).The phylogenetic tree shows that HCNC was closely related to the Hadong wild type and the two Japanese cultivars in the neighbor-joining method.In maximum likelihood analysis, HCNC was more closely associated with Chinese wild type and two Korean cultivars than Japanese cultivars.The result of the UPGMA analysis was similar to the neighbor-joining method.

Phylogenetic Analysis
Past research has shown that terrestrial plants' chloroplast genome has been a valuable source among related species, which is applied in phylogenetic studies [25,26].This paper aligned all chloroplast genomes of seven cultivars, including three Korean cultivars, one Chinese cultivar, three Japanese cultivars, and Korean HCNC.The phylogenetic tree was constructed using neighbor-joining, maximum likelihood, and UPGMA methods.Firstly, we identified the relationships of seven C. sinensis cultivars complete in this study (Figure 7a-c).The phylogenetic tree shows that HCNC was closely related to the Hadong wild type and the two Japanese cultivars in the neighbor-joining method.In maximum likelihood analysis, HCNC was more closely associated with Chinese wild type and two Korean cultivars than Japanese cultivars.The result of the UPGMA analysis was similar to the neighbor-joining method.

Discussion
It has been proposed that the cp genome can be helpful for low taxonomic-level phylogenetic reconstructions [25,[27][28][29].Using cp genome data, the evolutionary relationships between the several species within Camellia (Theaceae) were clearly resolved [30,31].In the present study, we constructed three phylogenetic trees to identify relationships between HCNC and 22 C. sinensis, the complete cp genomic tree, and GenBank.According to the phylogenetic relationships, the 22 C. sinensis were mainly clustered into two clades with the neighbor-joining method (Figure 8a-c).HCNC grouped the same clade with C. assamica cultivars from China and India (Figure 8a).In the minimum evolution method, HCNC tends to be closely related to the Chinese C. sinensis cultivar, Anhua.However, it was difficult to close relationships due to their low bootstrap value (33%).Also, C. sinensis cultivars are more closely related than C. assamica cultivars with HCNC.The Saemidori, the Japanese cultivar, has a relationship with HCNC.This result is similar to the PCA result we indicated above in Section 3.4.2.Lastly, the UPGMA method is closely related to C. sinensis cultivars with HCNC.Likewise, HCNC tends to be closely associated with the Chinese C. sinensis cultivar, Anhua.However, it was difficult to close relationships due to their low bootstrap value (54%).

Discussion
It has been proposed that the cp genome can be helpful for low taxonomic-level phylogenetic reconstructions [25,[27][28][29].Using cp genome data, the evolutionary relationships between the several species within Camellia (Theaceae) were clearly resolved [30,31].
In the present study, we constructed three phylogenetic trees to identify relationships between HCNC and 22 C. sinensis, the complete cp genomic tree, and GenBank.According to the phylogenetic relationships, the 22 C. sinensis were mainly clustered into two clades with the neighbor-joining method (Figure 8a-c).HCNC grouped the same clade with C. assamica cultivars from China and India (Figure 8a).In the minimum evolution method, HCNC tends to be closely related to the Chinese C. sinensis cultivar, Anhua.However, it was difficult to close relationships due to their low bootstrap value (33%).Also, C. sinensis cultivars are more closely related than C. assamica cultivars with HCNC.The Saemidori, the Japanese cultivar, has a relationship with HCNC.This result is similar to the PCA result we indicated above in Section 3.4.2.Lastly, the UPGMA method is closely related to C. sinensis cultivars with HCNC.Likewise, HCNC tends to be closely associated with the Chinese C. sinensis cultivar, Anhua.However, it was difficult to close relationships due to their low bootstrap value (54%).Comparative analysis results indicate that seven cp genome sequences of C. sinensis showed highly conserved genomic structures [26,32,33].Genes of ycf1 and infA were found to be pseudogenes in HCNC.The pseudogenizations of ycf1 and locations of ycf1 copies are commonly found in other plants [34][35][36].Although it was formerly believed that the pseu-dogene had lost its capacity to code for proteins, it is now understood to be an evolutionary remnant of the functional gene [37].Comparative study results showed that CDS and IR areas were more conserved within cp genomes than IGS and SCs.Comparative study results showed that, within cp genomes, CDS and IRs areas were more conserved than IGS and SCs, respectively [32,38].Because of their remarkable conservation, the IR regions help stabilize the chloroplast genome's structure.The IR region may have expanded or contracted depending on the species, according to a comparison of the IR/SC boundary areas [39].CP genome length changes are frequently caused by the IR regions' expansion and contraction.According to other studies, the conservatism caused the IR regions to exhibit a lower degree of sequence divergence than LSC and SSC areas in Camellia cp genomes.Numerous studies have examined species identification and molecular phylogeny at the interspecific level using the polymorphic cp DNA non-coding regions [6].Three variable areas that can be utilized for phylogenetic analysis and species identification have been identified: petA-psbJ, psbI-trnS, and ccsA-ndhD [6].Chloroplast genomes have become popular in taxonomy research for assessing the relationships between closely related species [37,40].For instance, the cp genomes of 35 species from the Ranunculaceae family, which comprise 31 genera, were sequenced and used to shed light on the long-standing systematic disputes within this family [35].
From our results and previous reports [41], we propose the following hypothesis.Considering that HCNC is close to the Chinese species, it originated in China and has since evolved and differentiated independently in Korea.In addition, it can be assumed that HCNC was differentiated first, as it shows different types from Hadong wild type and cultivated species.Our phylogenetic analyses based on seven cp genomes successfully resolved intergeneric relationships within C. sinensis.As demonstrated by earlier research, even with the full chloroplast genome material available for analysis, all phylogenetic relationships could not be fully resolved.Furthermore, the comparison object we selected were no plant groups involved in the other genera of the family Theaceae, which might provide useful information for the evolutionary study of HCNC.Future research on evolution and differentiation will be required, and investigations such as molecular biological age estimation and the examination of morphological differences in leaves are thought to be essential.
In summary, our research can promote the exchange of information between the nuclear genomes of Camellia species and provide valuable genomic resources for phylogenetic studies.

Conclusions
Our study provides a valuable resource for understanding ancient tea plants' chloroplast structure, variation information, and phylogenetic relationships in Korea.The significant SNPs associated with favorable variants, selection signals, and candidate genes are a valuable resource for the further improvement of leaf traits and plant types in ancient tea plants in Korea.The complete cp genome sequence will contribute to additional molecular identification, genetic diversity, and phylogeny studies.

Figure 1 .
Figure 1.Workflow of annotation procedure.Overview of the data and workflow of the computational annotation and manual annotation.

Figure 1 .
Figure 1.Workflow of annotation procedure.Overview of the data and workflow of the computational annotation and manual annotation.

Figure 2 .
Figure 2. Chloroplast genome map of HCNC.Genes outside the main circle are transcribed clockwise, while genes on the inside are transcribed counterclockwise.The Organellar Genome Draw (OGDraw) online software (v 1.3.1)was used to draw this map.Different colors represent genes with other functions.The inner circle's gray portion indicates the chloroplast genome's GC content.*: Gene containing a single intron.

Figure 2 .
Figure 2. Chloroplast genome map of HCNC.Genes outside the main circle are transcribed clockwise, while genes on the inside are transcribed counterclockwise.The Organellar Genome Draw (OGDraw) online software (v 1.3.1)was used to draw this map.Different colors represent genes with other functions.The inner circle's gray portion indicates the chloroplast genome's GC content.*: Gene containing a single intron.

Figure 3 .
Figure 3. Chloroplast genome maps of six C. sinensis cultivars.Genes outside the main circle are transcribed clockwise, while genes on the inside are transcribed counterclockwise.The Organellar Genome Draw (OGDraw) online software was used to draw this map.Different colors represent

Figure 3 .
Figure 3. Chloroplast genome maps of six C. sinensis cultivars.Genes outside the main circle are transcribed clockwise, while genes on the inside are transcribed counterclockwise.The Organellar Genome Draw (OGDraw) online software was used to draw this map.Different colors represent genes with other functions.The inner circle's gray portion indicates the chloroplast genome's GC content.(a) Beachwisull; (b) Keumsull; (c) Chinese wild type; (d) Saemidori; (e) Fushun; (f) Hadong wild type.*: Gene containing a single intron.

Figure 4 .
Figure 4.The distribution of the identified SNP in the different genomic regions (interge stream, downstream, intron, exon, and others) of the HCNC.

Figure 4 .
Figure 4.The distribution of the identified SNP in the different genomic regions (intergenic, upstream, downstream, intron, exon, and others) of the HCNC.

Figure 5 .
Figure 5. Phylogenetic trees of seven C. sinenesis cultivars.Phylogenetic trees were constructed using complete cp genome data with the maximum likelihood method.The red color character indicates HCNC.

Figure 6 .
Figure 6.Principal component analysis (PCA) plots of the seven C. sinensis individuals.The legend at the right indicates the cultivars of each circle.

Figure 5 .
Figure 5. Phylogenetic trees of seven C. sinenesis cultivars.Phylogenetic trees were constructed using complete cp genome data with the maximum likelihood method.The red color character indicates HCNC.

Figure 5 .
Figure 5. Phylogenetic trees of seven C. sinenesis cultivars.Phylogenetic trees were constructed using complete cp genome data with the maximum likelihood method.The red color character indicates HCNC.

Figure 6 .
Figure 6.Principal component analysis (PCA) plots of the seven C. sinensis individuals.The legend at the right indicates the cultivars of each circle.

Figure 6 .
Figure 6.Principal component analysis (PCA) plots of the seven C. sinensis individuals.The legend at the right indicates the cultivars of each circle.

Figure 7 .
Figure 7. Phylogenetic trees of seven C. sinenesis var.sinensis cultivars.(a) Phylogenetic trees constructed using complete cp genome data with neighbor-joining method; (b) phylogenetic trees constructed using complete cp genome data with maximum likelihood analysis method; (c) phylogenetic trees constructed using maximum likelihood with UPGMA method.The red color character indicates HCNC.

Figure 7 .
Figure 7. Phylogenetic trees of seven C. sinenesis var.sinensis cultivars.(a) Phylogenetic trees constructed using complete cp genome data with neighbor-joining method; (b) phylogenetic trees constructed using complete cp genome data with maximum likelihood analysis method; (c) phylogenetic trees constructed using maximum likelihood with UPGMA method.The red color character indicates HCNC.

Figure 8 .
Figure 8. Phylogenetic trees of fifteen C. sinenesis varieties.(a) Phylogenetic trees constructed using complete cp genome data with neighbor-joining method; (b) phylogenetic trees constructed using

Figure 8 .
Figure 8. Phylogenetic trees of fifteen C. sinenesis varieties.(a) Phylogenetic trees constructed using complete cp genome data with neighbor-joining method; (b) phylogenetic trees constructed using complete cp genome data with maximum likelihood analysis method; (c) phylogenetic trees constructed using maximum likelihood with UPGMA method.The red color character indicates HCNC.

Table 1 .
Sequencing data used for HCNC genome assembly.

Table 2 .
List of HCNC cp genome genes organized according to their location.

Table 3 .
List of genes encoded in the chloroplast genome of HCNC.

Table 4 .
List of InDels in cp genomes of seven C. sinensis var.sinensis.

Table 5 .
List of SNPs in cp genomes of seven C. sinensis var.sinensis.

Table 6 .
The distribution of the identified SNP in the different genomic regions (intergenic, upstream, downstream, intron, exon, and others) of the HCNC.

Table 6 .
The distribution of the identified SNP in the different genomic regions (interge stream, downstream, intron, exon, and others) of the HCNC.