Complete Chloroplast Genome of Medicinal Plant Lonicera japonica: Genome Rearrangement, Intron Gain and Loss, and Implications for Phylogenetic Studies

The complete chloroplast (cp) genome of Lonicera japonica, a common ornamental and medicinal plant in North America and East Asia, was sequenced and analyzed. The length of the L. japonica cp genome is 155,078 bp, contains a pair of inverted repeat regions (IRa and IRb), of 23,774 bp each, as well as large (LSC, 88,858 bp) and small (SSC, 18,672 bp) single-copy regions. A total of 129 genes were identified in the cp genome, 16 of which were duplicated within the IR regions. Relative to other plant cp genomes, the L. japonica cp genome had a unique rearrangement between trnI-CAU and trnN-GUU. In L. japonica cpDNA, rps19, rpl2, and rpl23 move to the LSC region, from the IR region. The ycf1 pesudogene in the IR region is lost, and only one copy locates in the SSC region. Comparative cp DNA sequence analyses of L. japonica with other cp genomes reveal that the gene order, and the gene and intron contents, are slightly different. The introns in ycf2 and rps18 genes are found for the first time. Four genes (clpP, petB, petD, and rpl16) lost introns. However, its genome structure, GC content, and codon usage were similar to those of typical angiosperm cp genomes. All preferred synonymous codons were found to use codons ending with A/T. The AT-rich sequences were less abundant in the coding regions than in the non-coding ones. A phylogenetic analysis based on 71 protein-coding genes supported the idea that L. japonica is a sister of the Araliaceae species. This study identified unique characteristics of the L. japonica cp genome that contribute to our understanding of the cpDNA evolution. It offers valuable information for the phylogenetic and specific barcoding of this medicinal plant.


Introduction
L. japonica is a sprawling and twining liana of the genus Lonicera in Caprifoliaceae and Dipsacales.It is native to eastern Asia and was cultivated as medicinal plant with great economic value.The Lonicera genus has almost 100 species in China, and half of them have medicinal effects, including L. japonica, L. macranthoides [1], L. similis [2], L. fulvotomentosa [3], and L. hypoglauca [4].To date, more than 140 compounds have been isolated and identified from L. japonica [5].The dried flowers, buds, and leaves of L. japonica are widely used with other Chinese medicines in the treatment of epidemic febrile and infectious diseases, such as SARS and avian influenza [6].
Centuries ago, L. japonica was introduced to North America, South America, and Oceania as an ornamental plant [7].Now, it is well-known in America as a horticultural plant with wind breaker and sand-fixation properties.L. japonica does not wither, even in winter, where mean temperatures are at least −1 • C, and is very effective for ecological protection in China.
However, L. japonica is the only species within the genus used as traditional Chinese Medicine, and species identification of L. japonica, from other Lonicera species, is quite difficult.Molecular barcodes based on the cp genome have shown great potential for species discrimination, especially between closely related taxa [8].The complete chloroplast genome sequence might enhance our ability to explore reliable barcoding for accurate plant identification, at both the species and population levels [9].
In higher plants, photosynthesis occurs in the cp, to provide the essential energy needed for plant growth and survival.New leaves of L. japonica have higher photosynthetic rates than other Lonicera species, whether they are under the forest canopy or in the open [7].The annual carbon gain for Japanese honeysuckle was much greater in different light environments [10].However, the molecular mechanism of photosynthetic adaptability of L. japonica is still beyond our outstanding.The lack of the cp genome of L. japonica has become a bottleneck for investigating whether there are links between L. japonica's high level adaptability and photosynthetic adaptability, as well as chloroplast function.With rapid advances in sequencing technologies, Herbgenomics provides an effective tool to uncover the genetic information of herbs and to clarify their molecular mechanisms in related biological responses [11,12].
Although the transcriptome sequences of L. japonica have been previously reported [13], this study is the first to report its cp genome sequence.Comparative analyses among cp genomes of Apiales species revealed changes in the genome sizes, as well as the loss of genes and introns.Our data will help to identify the genetic and evolutionary mechanisms required for an in-depth study of L. japonica, and will be beneficial for DNA barcoding studies in Lonicera.

Characteristics of L. japonica cpDNA
The library was constructed from the cpDNA of L. japonica leaves with the 454 GS FLX Titanium platform, using the manufacturer's manual.A total of 22,185 reads were obtained, with an average length of 412 bp, yielding approximately 58× coverage of the cp genome.The complete cp genome of L. japonica is 155,078 bp in length (Accession No. KJ170923).Its genome exhibits a typical quadripartite structure that consists of a pair of IR regions (23,774 bp), separated by the LSC (88,858 bp) and SSC (18,672 bp) regions (Table 1, Figure 1).A total of 79 protein-coding genes, 30 tRNA genes, and 4 rRNA genes, were annotated (Table S1).These genes have been retained in several angiosperms [14][15][16][17][18].Among these genes, eight tRNA genes, four rRNA genes, and four protein-coding genes, were duplicated in the IR regions.The LSC region contains 63 protein-coding and 22 tRNA genes, whereas the SSC region contains one tRNA gene and 12 protein-coding genes.
The majority (52%) of the L. japonica cp genome is composed of non-coding regions, including introns, intergenic spacers, and pseudogenes.The overall GC and AT content of the L. japonica cp genome was 38.6% and 61.4%, respectively.The AT content of the LSC, SSC, and IR regions was 62.9%, 66.6% and 56.5%, respectively (Table 1).Within the protein-coding regions (CDS), the AT content of the first, second, and third codon positions, is 53.7%, 61.6% and 68.8%, respectively (Table 1).The bias toward a higher AT representation at the third codon position is generally found in plant cp genomes; this bias is used to distinguish cpDNA from nuclear DNA and mitochondrial DNA [14][15][16]19].Based on the sequences of the protein-coding and tRNA genes, the frequency of codon usage was deduced for the L. japonica cp genome and summarized in Table 2.The high AT content at the third codon position reflects a codon usage bias for A or T. The codon usage frequencies of stop codons are similarly biased to A or T at the second and third codon positions.A total of 2692 codons (10.6%) encoded for leucine, whereas 273 (1.1%) encoded for cysteine, which are the most and least prevalent amino acids, respectively.A general excess of A-and U-ending codons was noted.Except for trL-CAA, all of the types of preferred synonymous codons (RSCU > 1) ended with A or U (Table 2).

Intron Gain and Loss
Advances in phylogenetic research have demonstrated that cp genome evolution includes nucleotide substitutions and structural changes [20,21].A few examples of these changes, including gene or intron losses, have been found in cp genomes [22][23][24][25][26][27].Previous study has demonstrated that introns had important roles in alternative splicing, and it has been demonstrated that the introns can significantly stabilize the transcripts in some eukaryotic lineages [28].Additionally, orthologous genes are also believed to have lost or gained introns throughout evolution.To provide more information for further study on cp genome evolution in Caprifoliaceae and the potential functional change from variations of intron gain and loss, we analyzed the cp genome of L. japonica.
In total, there are intron-containing genes, 14 of which contain one intron, and two of which (rps18 and ycf3) contain two introns (Table 3).The ribosomal protein S18 is essential for plastid translation in plant development [29].Ycf3 is required for the stable accumulation of the photosystem I complex.In green alga Chlamydomonas reinhardtii, ycf3 and rps18 genes belong to the rps9-ycf4-ycf3-rps18 polycistronic transcriptional unit.In land plants, ycf3 and rps18 are found in different clusters [30].The presence of two introns in the rps18 gene in the L. japonica cp genome, is rare.Similarly, the intron in ycf3 was not previously mentioned in other cp genomes.The intron gain in several L. japonica cp genes is first reported, and the intron gain in rps18 and ycf3 of L. japonica may be useful for further studies on the mechanism of photosynthesis evolution.Compared to other cp genes, the introns in the clpP, petB, petD, and rpl16 genes, were lost in the L. japonica cp genome.The rpl16 intron is a highly stable component of angiosperm cp genomes; this intron is absent from very few taxa, namely the Geraniaceae, Goodeniaceae, and Plumbaginaceae families [31].Similarly, previous studies have shown that introns are also absent in the clpP gene of the Jasminum nudiflorum cp genome [22,31].The intron loss of petB, petD, and rpl16 was first found in the lineages of Asterids.Introns are important in the regulation of gene expression.They can enhance the gene expression level, on the special position, in the specific time [16].Some introns are known to enhance, or are required for, normal levels of mRNA transcription, processing, and transport.Several unicellular eukaryotes appear to be under selection pressure to lose introns.However, no studies between intron loss and gene expression, using transcriptome data from L. japonica, have been published.
The intron density in eukaryote genomes varies by more than three orders of magnitude.Therefore, extensive intron gain and/or intron loss must have occurred during evolution.A common partial explanation for the range of intron densities, is the stochastic accumulation of introns in large eukaryote genomes during their evolution from an intron-poor ancestor.We still need more experimental information to reveal whether the variation of the introns in the L. japonica cp genome is related to the adaptability to stress.

Comparison with Other cp Genomes in the Order Apiales
Both L. japonica and Kolkwitzia amabilis [32] belong to the order Dipsacales (Figure S1).Apiales and Dipsacales are both Asterids.Some cp genomes in the Apiales clade have been reported, such as those of the Eleutherococcus senticosus [16], Daucus carota [17], and Panax ginseng [18] chloroplast.These representative cp genome sequences of Apiales were selected for comparison with those of L. japonica and K. amabilis.The overall sequence identification of the cp genomes was plotted using mVISTA, with the annotation of L. japonica as a reference (Figure 2).The length of the LSC and IR regions was the main difference between genomes (Table S2).The comparison showed that the two IR regions were less divergent than the LSC and SSC regions.Coding and non-coding regions were present, and the most divergent regions among the four cp genomes were localized to the intergenic spacers.These highly divergent regions were included in the alignment.A cut-off of 70% identity was used for the plots.The Y-scale axis represents the percent identity within 50%-100%.Genome regions are color-coded as either protein-coding exons, rRNAs, tRNAs, or conserved noncoding sequences (CNS).

IR Contraction in the L. japonica cp Genome
The contraction and expansion at the borders of the IR regions are common evolutionary events and represent the main reasons for the size variation of cp genomes [33].The ends of the IRa and IRb regions, as well as the gene length, differ among plant lineages.Detailed comparisons of IR-SSC and IR-LSC boundaries among the five Asterid cp genomes, are presented in Figure 3. Similar to the N. tabacum [34] and Penthorum chinense [35], the rps19 gene existed in the LSC region.However, some unique structural differences exist: the contraction of the inverted repeat region, to exclude the rpl2, rpl23, and ycf1 genes, typically exists in the IR region in other angiosperm cp genomes; however, they were not excluded from the LSC region in L. japonica.Moreover, rps19, rpl2, rpl23, and ycf1 pseudogenes were found in L. japonica.In Spirogyra maxima, trnI-rpl23-rpl2-rps19is a large operon of angiosperm chloroplast genomes.The rpl23 gene cluster of Spirogyra contains a distinct eubacterial promoter sequence, upstream of rpl23, which is the first gene of the green algal rpl23 gene cluster.This sequence is completely absent in angiosperms, but is present in non-flowering plants.The results imply that, in the rpl23 gene cluster, early charophytes had at least two promoters, one which was upstream of trnI, and another which was upstream of rpl23, which partially or completely lost its function in land plants [36].The IRb/SSC border is generally located between the ycf1 pseudogene and the ndhF gene.However, the ycf1 pseudogene was absent in L. japonica.Ycf1 pseudogenes have been proved useful for analyzing cp genome variation in higher plants and algae, even though their function is not thoroughly known.Ycf1 and ycf2 are essential for plant survival [37].A combined analysis of the chloroplast genome and transcriptome of Deschampsia antarctica Desv, indicated that the rps19 gene was one of the most abundant transcripts in the chloroplast's genome [38].The portion of the ndhF gene located in the IRb region was 8 bp long.The rps19 and ycf1 genes were not found in the IR region, and the two pseudogenes were absent in the cp genome.These features are reported for Asterid plants for the first time.The cp genome size displayed among the examined Apiales species was compared.The length of the IR (23,774 bp) in L. japonica was 3278 bp smaller than that of D. carota, 2157 bp smaller than that of E. senticosus, and 2298 bp smaller than that of P. ginseng.These differences could be attributed to the loss of the rpl2 and ycf15 genes, as well as the ycf1 and rps19 pseudogenes in L. japonica IR regions.However, the length of the whole genome was not significantly different among the five Asterid cp genomes.The genome of L. japonica (155,078 bp) was 834 bp smaller than that of D. carota, 1691 bp smaller than that of E. senticosus, and 1241 bp smaller than that of P. ginseng.Non-functional DNA was likewise rapidly deleted, resulting in the failure of pseudogenes to accumulate, despite the high rates of pseudogenes.

Phylogenetic Analysis
The gene content of cpDNA is highly conserved among most land plants.The cp genome sequence is a useful resource for studying the taxonomic status of the genus Lonicera in the angiosperm clade, and for analyzing evolutionary relationships within the family [22].To obtain a reasonable phylogenetic status of Lonicera, we performed multiple sequence alignments of protein coding genes, from a variety of plant plastomes.A total of 15 complete cp genomes represented six families, within five orders.Phylogenetic analysis was performed on a 71-gene data matrix, using MP and ML methods.MP analysis resulted in a single tree with a length of 17,973, a consistency index (CI) of 0.8080, and a retention index (RI) of 0.8285 (Figure 4).Bootstrap analysis showed that 12 out of the 13 nodes had bootstrap values >95%.retention index (RI) of 0.8285 (Figure 4).Bootstrap analysis showed that 12 out of the 13 nodes had bootstrap values >95%.

DNA Sequencing, Genome Assembly, and Validation
Fresh L. japonica leaves were collected from cultivated fields in Zhengcheng, Shandong Province, China.The samples used in this study were obtained from a local company (Jintai Yaoye Co., Ltd., Linyi, China).The total chloroplast DNA (cpDNA) was extracted from approximately 100 g of leaves via a sucrose gradient centrifugation method that was improved by Li et al. [39].The cpDNA concentration for each sample was estimated by measuring A260 with an ND-2000 spectrometer (Nanodrop Technologies, Wilmington, DE, USA), whereas visual approximation was performed using gel electrophoresis.Pure cpDNA was used to construct shotgun libraries with the 454 GS FLX Titanium platform, according to the manufacturer's instructions.This results in approximately 58× coverage of the cp genome.The obtained Sff-file was pre-processed, including the trimming of low-quality (Q < 20) and short (L < 50 bp) reads.The trimmed and cleaned reads were used for sequence assembly with the GS FLX De Novo Assembler Software (Newbler V2.6).To verify the assembly, four junction regions between the IR regions and LSC/SSC were confirmed by PCR amplifications and Sanger sequencing, with the primers listed in Table S3.The final cp genome sequence of L. japonica was then submitted to GenBank (Accession Number: KJ170923).

Gene Annotation and Sequence Analyses
Gene annotation was performed using BLAST and DOGMA [40].The tRNA genes were identified using DOGMA and tRNAscanSE [41].The circular cp genome map was drawn using the OGDRAW program [42].To analyze the characteristics of the variations in synonymous codon usage, by neglecting the influence of amino acid composition, the relative synonymous codon usage values (RSCU), codon usage, and AT content, were determined using MEGA5.2[43].

DNA Sequencing, Genome Assembly, and Validation
Fresh L. japonica leaves were collected from cultivated fields in Zhengcheng, Shandong Province, China.The samples used in this study were obtained from a local company (Jintai Yaoye Co., Ltd., Linyi, China).The total chloroplast DNA (cpDNA) was extracted from approximately 100 g of leaves via a sucrose gradient centrifugation method that was improved by Li et al. [39].The cpDNA concentration for each sample was estimated by measuring A 260 with an ND-2000 spectrometer (Nanodrop Technologies, Wilmington, DE, USA), whereas visual approximation was performed using gel electrophoresis.Pure cpDNA was used to construct shotgun libraries with the 454 GS FLX Titanium platform, according to the manufacturer's instructions.This results in approximately 58× coverage of the cp genome.The obtained Sff-file was pre-processed, including the trimming of low-quality (Q < 20) and short (L < 50 bp) reads.The trimmed and cleaned reads were used for sequence assembly with the GS FLX De Novo Assembler Software (Newbler V2.6).To verify the assembly, four junction regions between the IR regions and LSC/SSC were confirmed by PCR amplifications and Sanger sequencing, with the primers listed in Table S3.The final cp genome sequence of L. japonica was then submitted to GenBank (Accession Number: KJ170923).

Gene Annotation and Sequence Analyses
Gene annotation was performed using BLAST and DOGMA [40].The tRNA genes were identified using DOGMA and tRNAscanSE [41].The circular cp genome map was drawn using the OGDRAW program [42].To analyze the characteristics of the variations in synonymous codon usage, by neglecting the influence of amino acid composition, the relative synonymous codon usage values (RSCU), codon usage, and AT content, were determined using MEGA5.2[43].

Genome Comparison
MUMmer [44] was used to perform pairwise cp genomic alignment.The mVISTA [45] program in the Shuffle-LAGAN mode [46], was used to compare the cp genome of L. japonica with the cp genomes of K. amabilis, P. ginseng, D. carota, and E. senticosus (KT966716, AY582139, DQ898156 and NC_016430), with the annotation of L. japonica as the reference.REPuter [47] was used to visualize the forward and inverted repeats.

Phylogenetic Analysis
A total of 15 complete cp genome sequences were downloaded from the NCBI Organelle Genome Resources database.For the phylogenetic analysis, a set of 71 protein-coding genes that were common in the 16 analyzed genomes, was used.Maximum parsimony (MP) analysis was performed with PAUP*4.0b10[48], using a heuristic search combined with the random addition of 1000 replicates and tree bisection-reconnection (TBR) branch swapping, in the Multrees option.Bootstrap analysis was also performed with 1000 replicates and TBR branch swapping.Solanum lycopersicum and Nicotiana tabacum were set as outgroups.

Conclusions
High-throughput pyrosequencing technology was used to describe the completely sequenced L. japonica cp genome, which is a very important medicinal plant in East Asia.Compared to the cp genomes of three Apiales species, the cp genome of L. japonica has a relatively small size.Several genes were absent in the IR region, including the rps19, rpl2, and ycf1 pseudogenes.This absence may be attributed to the obvious contraction of the IR region in L. japonica.Phylogenetic relationships among 15 angiosperms strongly supported the known classification of L. japonica.The data presented in this study can facilitate the biological identification of this important medicinal plant.Our data reveal that L. japonica cpDNA possesses several unique features that contribute to our current understanding of cpDNA evolution in seed plants.Additionally, other Dipsacales plastomes need to be sequenced to determine whether the atypical characteristics of L. japonica cpDNA are shared by all Dipsacales species, or if these characteristics are the unusual genomic features of a very unique plant.

Figure 1 .
Figure 1.Map of the L. japonica chloroplast genome.Genes drawn inside the circle are transcribed in a clockwise direction, whereas those outside the circle are transcribed in a counterclockwise direction.Genes belonging to the same functional groups have the same colors.

Figure 2 .
Figure 2. Comparison of four chloroplast genomes using mVISTA program.Grey arrows and thick black lines above the alignment indicate the orientation of genes and the position of the IR regions, respectively.Purple bars represent exons, blue ones represent UTRs, and pink ones represent non-coding sequences (CNS).A cut-off of 70% identity was used for the plots.The Y-scale axis represents the percent identity within 50%-100%.Genome regions are color-coded as either protein-coding exons, rRNAs, tRNAs, or conserved noncoding sequences (CNS).

Figure 3 .
Figure 3.Comparison of the borders of the LSC, SSC, and IR regions among five chloroplast genomes.Ψ: pseudogenes, *: the distance from the edge.

Figure 4 .
Figure 4. MP phylogenetic tree of the Apiales clade based on 71 protein-coding genes.The MP tree has a length of 17,973, with a consistency index of 0.8080 and a retention index of 0.8285.Numbers above each node are the bootstrap support values.Solanum lycopersicum and Nicotiana tabacum were set as the outgroups.

Figure 4 .
Figure 4. MP phylogenetic tree of the Apiales clade based on 71 protein-coding genes.The MP tree has a length of 17,973, with a consistency index of 0.8080 and a retention index of 0.8285.Numbers above each node are the bootstrap support values.Solanum lycopersicum and Nicotiana tabacum were set as the outgroups.

Table 1 .
Base composition in the L. japonica chloroplast genome.

Table 2 .
Codon-anticodon recognition patterns and codon usage of the L. japonica chloroplast genome.

Table 3 .
Genes with introns in the L. japonica chloroplast genome, including the exon and intron length.
* The rps12 gene is divided into 5 -rps12 in the LSC region and 3 -rps12 in the IR region.