Next Article in Journal
Chemo-Enzymatic Synthesis of Oligoglycerol Derivatives
Previous Article in Journal
New Sesquiterpenenoids from Ainsliaea yunnanensis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Complete Chloroplast Genome Sequence of the Medicinal Plant Swertia mussotii Using the PacBio RS II Platform

1
School of Chinese Materia Medica, Tianjin University of Traditional Chinese Medicine, Anshan Road 312, Tianjin 300193, China
2
College of Life Science, Nankai University, Weijin Road 94, Tianjin 300071, China
3
Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Malianwa North Road 151, Beijing 100193, China
4
Tianjin State Key Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Anshan Road 312, Tianjin 300193, China
*
Authors to whom correspondence should be addressed.
Molecules 2016, 21(8), 1029; https://doi.org/10.3390/molecules21081029
Submission received: 21 June 2016 / Revised: 21 July 2016 / Accepted: 4 August 2016 / Published: 9 August 2016

Abstract

:
Swertia mussotii is an important medicinal plant that has great economic and medicinal value and is found on the Qinghai Tibetan Plateau. The complete chloroplast (cp) genome of S. mussotii is 153,431 bp in size, with a pair of inverted repeat (IR) regions of 25,761 bp each that separate an large single-copy (LSC) region of 83,567 bp and an a small single-copy (SSC) region of 18,342 bp. The S. mussotii cp genome encodes 84 protein-coding genes, 37 transfer RNA (tRNA) genes, and eight ribosomal RNA (rRNA) genes. The identity, number, and GC content of S. mussotii cp genes were similar to those in the genomes of other Gentianales species. Via analysis of the repeat structure, 11 forward repeats, eight palindromic repeats, and one reverse repeat were detected in the S. mussotii cp genome. There are 45 SSRs in the S. mussotii cp genome, the majority of which are mononucleotides found in all other Gentianales species. An entire cp genome comparison study of S. mussotii and two other species in Gentianaceae was conducted. The complete cp genome sequence provides intragenic information for the cp genetic engineering of this medicinal plant.

1. Introduction

Swertia mussotii Franch (Zang Yin Chen, in Tibetan medicine) belongs to the family Gentianaceae. This species grows on the Qinghai Tibetan Plateau at an elevation of 3800–5000 m. To date, several pharmaceutically-active compounds have been isolated and structurally identified from the whole S. mussotii plant, including oleanolic acid, ursolic acid, mangiferin, swertiamarin, and gentiopicroside [1,2,3,4]. Modern pharmacological research has demonstrated that these compounds have anti-hepatitis activity [5,6,7]. Due to the overexploitation of this plant, S. mussotii as a wild resource has become rare. S. mussotii seeds only germinate poorly when planted at low elevations.
Chloroplasts originated from the interaction of photosynthetic bacteria with non-photosynthetic hosts through endosymbiosis [8]. Chloroplasts are photosynthetic organelles that synthesise starch, amino acids, pigments, and fatty acids [9,10]. The chloroplast has its own genome, and a typical circular cp genome is composed of four parts: a large single-copy (LSC) region, a small single-copy (SSC) region, and two inverted repeat (IR) regions. The majority of angiosperm cp genomes are highly conserved in gene content and order [11]. However, large-scale genome rearrangement and gene loss have been identified in several angiosperm lineages [12,13].
The third-generation sequencing platform, PacBio, based on single-molecule, real-time (SMRT) sequencing technology, generates average read lengths of over 10 kb, with half of the reads over 20 kb and a maximum read length reaching up to 60 kb, using the newest P6-C4 chemical reagents on the current PacBio RS II machine. In addition to its extraordinarily long read length, this platform provides uniform coverage across GC-abnormal regions because no PCR amplification is required during the library construction [14,15]. Many concerns have concentrated on the high rates of random error in single-pass reads (approximately 11% to 14%) [15]. However, this can be improved given sufficient sequencing depth [15]. Additionally, the optimisation of the PacBio assembly algorithm [16,17,18] has made this platform widely applied in de novo genome sequencing [19,20], as well as full-length transcriptome sequencing [21,22], for a growing number of species.
Due to the low GC content and the IR regions, it is difficult to use short reads from second-generation sequencing to recover a single contig spanning the whole cp genome [14]. Using PacBio, long reads can greatly reduce the complexity of the assembly, and PacBio has already been successfully applied in many chloroplast genome sequencing projects, including Ananas comosus var. comosus [23], Aconitum barbatum var. puberulum [24], Beta vulgaris [25], and Gentiana straminea [26]. Meanwhile, comparative studies among the three generations of sequencing technologies (Sanger, Illumina and PacBio) have demonstrated the reliability and accuracy of SMRT sequencing [27,28].
Currently, more than 1000 complete cp genome sequences have been deposited in the NCBI Organelle Genome Resources [29]. However, few reports have been published on the genetic diversity of cpDNA from Gentianaceae [26]. The chloroplast genome sequences of two members of the Gentianaceae, Gentiana straminea [26] and Gentiana crassicaulis, have been analysed. Here, we report the complete cp genome sequence of S. mussotii as determined using PacBio technology. Comparative sequence analysis was conducted among published Gentianaceae cp genomes.

2. Results and Discussion

2.1. Features of the S. mussotii Chloroplast Genome

The complete cp genome of S. mussotii is 153,431 bp in size, with a pair of IR regions of 25,761 bp that separate an LSC region of 83,567 bp from an SSC region of 18,342 bp (Table 1 and Figure 1). The overall GC content of the S. mussotii cp genome is 38.2%, with the IR regions possessing higher GC content (43.5%) than the LSC (36.2%) and SSC regions (31.9%) (Table 1). The high GC content of the IR regions is caused by the high GC content of the four ribosomal RNA (rRNA) genes (55.2%) present in this region [30]. The S. mussotii cp genome encodes 84 protein-coding genes, 37 transfer RNA (tRNA) genes, and eight rRNA genes (Table 2). Seven protein-coding, seven tRNA, and all rRNA genes are duplicated in the IR regions. The non-coding regions constitute 41.6% of the genome, including introns, pseudogenes, and intergenic spacers; coding regions constitute 58.4%.
There are five pseudogenes, i.e., accD, rps16, infA, rps19, and ycf1. The accD gene in S. mussotii contains internal stop codons. The accD gene also exists as a pseudogene in Jasminum nudiflorum and Trachelium caeruleum, but it is a normal gene in G. straminea. The rps16 gene lacks exon 2, a phenomenon that has been observed in related species. In S. mussotii, rps16 is a pseudogene, whereas in Syzygium cumini, Eucalyptus globulus, and Gossypium barbadense, the rps16 gene encodes a 16S ribosomal protein [31]. The absence or incompleteness of this gene has also been reported in other plants [32,33]. The infA gene is 3′ truncated, though it is a normal gene in many other cp genomes [34,35].
The S. mussotii cp genome has 17 intron-containing genes, of which three (clpP, rps12, and ycf3) contain two introns (Table 3). The rps12 gene is a trans-spliced gene with the 5′ end located in the LSC region and the duplicated 3′ end located in the IR regions. trnK-UUU has the largest intron, which contains the matK gene. Together, all of the genes of S. mussotii are encoded by 25,731 codons. Among these, leucine, with 2769 (10.7%) of the codons, is the most frequent amino acid in the genome, and cysteine, with 292 (1.1%), is the least frequent (Table 4). Within the protein-coding regions (CDS), the percentages of AT content for the first, second, and third codon positions are 54.3%, 61.3%, and 68.8%, respectively. The bias towards a higher AT representation at the third codon position has also been observed in other plant cp genomes [36,37].

2.2. Repeat Analysis

Repeat structure analysis revealed the presence of 11 forward repeats, eight palindromic repeats, and one reverse repeat in the S. mussotii cp genome (Table 5). The repeats were mostly distributed in the intergenic spacer (IGS) and intron sequences. We analysed the repeats of several other species in Gentianales (Figure 2). Interestingly, this comparison revealed that the longest repeats in the five Gentianales cp genomes were 30–39 bp, and the Oncinotis tenuiloba cp genome contained the greatest total number of repeats (54). Chloroplast simple sequence repeats (SSRs) have been accepted as effective molecular markers [38,39]. There were 45 SSRs in the S. mussotii cp genome (Table 6), the majority of which were mononucleotides (30) that we found in all the other species [40]. Pentanucleotides and hexanucleotides were rarely found in the Gentianales cp genomes (Table 7). Most SSR loci were located in LSC regions. In all species, the majority of the tri- to hexanucleotides were AT-rich. An average of 62% of all SSRs in the Gentianales cp genomes were A/T mononucleotides. These results are consistent with the view that SSRs in cp genomes contribute to AT richness [41,42].

2.3. Comparative Chloroplast Genomic Analysis

The whole cp genome sequence of S. mussotii was compared to those of G. straminea and G. crassicaulis. The cp genome of S. mussotii is the longest of the three cp genomes, measuring approximately 4.4 kb and 4.7 kb longer than those of G. straminea and G. crassicaulis, respectively. There are no significant differences in sequence length between the SSCs or the IRs, and the variation in sequence length is mainly attributable to the difference in the length of the LSC region (Table S2) [40].
The overall sequence identity of the three Gentianaceae cp genomes was plotted using mVISTA, with the annotation of S. mussotii as a reference (Figure 3). The comparison shows that the two IR regions are less divergent than the LSC and SSC regions. Additionally, the coding regions are more conserved than the non-coding regions [26], and the highly divergent regions among the three cp genomes occur in the non-coding regions, including ndhD-ccsA, ndhI-ndhG, and trnH-psbA. Similar results have been observed in other plant cp genomes [26,43]. In our study, we observed that all four rRNA genes are the most conserved, while the most divergent coding regions are the clpP, rpl22, ycf1, rpl32, ycf15, and matK genes. The divergent portions of non-coding regions of cp genomes have proven useful for phylogenetic analysis [44,45].

2.4. IR Contraction and Expansion

IR contraction was observed at the junction of the IR and LSC regions of the S. mussotii cp genome. This contraction has also been found in the twelve species of Gentianales analysed (G. straminea, G. crassicaulis, C. arabica, C. roseus, A. nivea, A. syriaca, R. stricta, E. umbellatus, N. oleander, O. tenuiloba, P. luteum, and G. officinalis) (Figure 4). In all of these species, the IRA/SSC junction is situated in the coding region of the ycf1 gene, resulting in the duplication of the 3′ end of this gene. This duplication produces a pseudogene of variable length at the IRB/SSC border. The lengths of the ycf1 pseudogenes varied from 945 bp to 1426 bp. In addition, the ycf1 pseudogene and the ndhF gene overlapped in S. mussotii, G. straminea, G. crassicaulis, N. oleander, and R. stricta by 54 bp, 54 bp, 54 bp, 62 bp, and 3 bp, respectively. The IRb/LSC border is located in the coding region of rps19 in all the compared plants, except for A. nivea, A. syriaca, and G. officinalis. rps19 pseudogenes of various lengths were also found at the IRa/LSC borders in S. mussotii, G. straminea, G. crassicaulis, C. arabica, C. roseus, G. officinalis, and R. stricta. S. mussotii had the longest rps19 pseudogene, at 199 bp in length. The trnH genes of these thirteen species were all located in the LSC region, 0–82 bp away from the IRa/LSC border. In the cp genome, the IR/LSC boundaries are not static, but are subject to a dynamic and random processes that allow conservative expansions and contractions [46].

3. Materials and Methods

3.1. DNA Sequencing, Genome Assembly, and Validation

Fresh leaves were collected from S. mussotii in Yushu County, Qinghai Province. Total DNA was extracted using the NuClean PlantGen DNA Kit (CWBIO, Beijing, China) and was used to construct an SMRT sequencing library with an insert size of 10 kb. The genome was sequenced using the PacBio RS II platform (Pacific Biosciences, Menlo Park, CA, USA) at the Institute of Medicinal Plant Development of the Chinese Academy of Medical Sciences. We assembled the cp genome of S. mussotii as follows: first, the PacBio reads were error-corrected and assembled to produce the initial contigs using the hierarchical genome assembly process (HGAP) of SMRT Analysis (Pacific Biosciences); then, the coverage for each contig was calculated by mapping the PacBio reads to these initial contigs using BLASR [47], and contigs either showing similarity to the closely-related cp genome sequences or exhibiting similar coverage were extracted; finally, the complete cp genome was constructed by assembling these contigs. Based on the BLASR results, 3904 PacBio reads were used in the assembly of the complete cp genome, with a total length of 46,037,271 bp, thus yielding a 300× depth of the cp genome. Four junction regions between IRs and LSC/SSC were verified by PCR amplifications and Sanger sequencing. The final cp genome of S. mussotii was submitted to GenBank under the accession number KU641021.

3.2. Genome Annotation and Codon Usage

DOGMA [48] was used to annotate the cp genome, followed by manual corrections. The tRNA genes were identified using tRNAscan-SE [49]. The circular genome map was drawn using OGDRAW [50]. Codon usage and GC content were analysed using MEGA5 [51].

3.3. Genome Comparison and Repeat Analyses

mVISTA [52,53] was used to compare the cp genome of S. mussotii with two other cp genomes using the annotation of S. mussotii as a reference.
Repeats (forward, palindromic, reverse, and complement) and simple sequence repeats (SSRs) were identified using REPuter [54] and MISA, respectively, with the same parameters as described in Ni et al. [26].

4. Conclusions

This is the first study to analyse the complete cpDNA sequence of S. mussotii. The chloroplast genome structure and composition of S. mussotii are similar to those reported for other Gentianaceae. In addition, the distributions and locations of repeated sequences were determined. All of these repeats, together with the aforementioned SSRs, are informative sources for the exploration of new molecular markers. Studying the cp genome facilitates the identification of the optimal intergenic spacers for transgene integration and the development of site-specific cp transformation vectors in chloroplast genetic engineering. To date, many transgenes have been successfully introduced into the plastid genomes of the tobacco model species and of selected other important crop plants [55,56]. The feasibility of metabolic engineering in transgenic plastids has been demonstrated for several nutritionally important biochemical pathways, including carotenoid biosynthesis [57] and fatty acid biosynthesis [58,59]. With the details of the bioactive compound synthesis pathway in S. mussotii having been described [60], there is no doubt that plastid engineering holds great potential in secondary metabolic engineering to enhance the production of pharmaceutically active compounds.

Supplementary Materials

Supplementary materials can be accessed at: https://www.mdpi.com/1420-3049/21/8/1029/s1.

Acknowledgments

This work was supported by grants from the National Natural Science Foundation of China (No. 81303303) and the Tianjin City High School Science & Technology Fund Planning Project (No. 20130203).

Author Contributions

Y.W. and L.M. conceived and designed the experiments; B.X. and X.L. performed the experiments; B.X. and J.Q. analysed the data; L.W. and X.T. contributed reagents/materials/analysis tools; B.X. wrote the paper. All authors read and approved the final manuscript.

Conflicts of Interest

There is no conflict of interest.

References

  1. Yamahara, J.; Konoshima, T.; Sawada, T.; Fujimura, H. Biologically active principles of crude drugs: Pharmacological actions of Swertia japonica extracts, swertiamarin and gentianine (author′s transl). Yakugaku Zasshi: J. Pharm. Soc. Jpn. 1978, 98, 1446–1451. [Google Scholar]
  2. Kikuzaki, H.; Kawasaki, Y.; Kitamura, S.; Nakatani, N. Secoiridoid glucosides from Swertia mileensis. Planta Med. 1996, 62, 35–38. [Google Scholar] [CrossRef] [PubMed]
  3. Brahmachari, G.; Mondal, S.; Gangopadhyay, A.; Gorai, D.; Mukhopadhyay, B.; Saha, S.; Brahmachari, A.K. Swertia (Gentianaceae): Chemical and pharmacological aspects. Chem. Biodivers. 2004, 1, 1627–1651. [Google Scholar] [CrossRef] [PubMed]
  4. Ma, L.N.; Tian, C.W.; Zhang, T.J.; Zhang, L.J.; Xu, X.H. Advances in study on iridoids in plants of Swertia L. and their pharmacological activity. Chin. Tradit. Herb. Drugs 2008, 39, 790–795. [Google Scholar]
  5. Kong, L.B.; Li, S.S.; Liao, Q.J.; Zhang, Y.N.; Sun, R.N.; Zhu, X.D.; Zhang, Q.H.; Wang, J.; Wu, X.Y.; Fang, X.N. Oleanolic acid and ursolic acid: Novel hepatitis C virus antivirals that inhibit NS5B activity. Antivir. Res. 2013, 98, 44–53. [Google Scholar] [CrossRef] [PubMed]
  6. Zhang, L.J.; Cheng, Y.; Du, X.H.; Chen, S.; Feng, X.C.; Gao, Y.; Li, S.X.; Liu, L.; Yang, M.; Chen, L.; et al. Swertianlarin, an Herbal Agent Derived from Swertia mussotii Franch, Attenuates Liver Injury, Inflammation, and Cholestasis in Common Bile Duct-Ligated Rats. Evid. Based Complement. Altern. Med. 2015. [Google Scholar] [CrossRef]
  7. Zhang, Y.M. The Effect of Gentiopicroside and Mangiferin, Two Major Ingredients of Tibet Capillary Artemisia, on Expression of Hepatocyte Membrane Transporters MRP2 and MRP3. Master′s Thesis, Third Military Medical University, Chongqing, China, 2011. [Google Scholar]
  8. Howe, C.J.; Barbrook, A.C.; Koumandou, V.L.; Nisbet, R.E.R.; Symington, H.A.; Wightman, T.F. Evolution of the chloroplast genome. Philos. Trans. R. Soc. Lond. B 2003, 358, 99–107. [Google Scholar] [CrossRef] [PubMed]
  9. Neuhaus, H.E.; Emes, M.J. Nonphotosynthetic metabolism in plastids. Annu. Rev. Plant Biol. 2000, 51, 111–140. [Google Scholar] [CrossRef] [PubMed]
  10. Rodríguez-Ezpeleta, N.; Brinkmann, H.; Burey, S.C.; Roure, B.; Burger, G.; Löffelhardt, W.; Bohnert, H.J.; Philippe, H.; Lang, B.F. Monophyly of primary photosynthetic eukaryotes: Green plants, red algae, and glaucophytes. Curr. Biol. 2005, 15, 1325–1330. [Google Scholar] [CrossRef] [PubMed]
  11. Wicke, S.; Schneeweiss, G.M.; Depamphilis, C.W.; Kai, F.M.; Quandt, D. The evolution of the plastid chromosome in land plants: Gene content, gene order, gene function. Plant Mol. Biol. 2011, 76, 273–297. [Google Scholar] [CrossRef] [PubMed]
  12. Wolfe, K.H.; Mordent, C.W.; Ems, S.C.; Palmer, J.D. Rapid evolution of the plastid translational apparatus in a nonphotosynthetic plant: Loss or accelerated sequence evolution of tRNA and ribosomal protein genes. J. Mol. Evol. 1992, 35, 304–317. [Google Scholar] [CrossRef] [PubMed]
  13. Lee, H.L.; Jansen, R.K.; Chumley, T.W.; Kim, K.J. Gene relocations within chloroplast genomes of Jasminum and Menodora (Oleaceae) are due to multiple, overlapping inversions. Mol. Biol. Evol. 2007, 24, 1161–1180. [Google Scholar] [CrossRef] [PubMed]
  14. Li, Q.S.; Li, Y.; Song, J.Y.; Xu, H.B.; Xu, J.; Zhu, Y.J.; Li, X.W.; Gao, H.H.; Dong, L.L.; Qian, J.; et al. High-accuracy de novo assembly and SNP detection of chloroplast genomes using a SMRT circular consensus sequencing strategy. New Phytol. 2014, 204, 1041–1049. [Google Scholar] [CrossRef] [PubMed]
  15. Roberts, R.J.; Carneiro, M.O.; Schatz, M.C. The advantages of SMRT sequencing. Genome Biol. 2013, 14. [Google Scholar] [CrossRef] [PubMed]
  16. English, A.C.; Richards, S.; Han, Y.; Wang, M.; Vee, V.; Qu, J.; Qin, X.; Muzny, D.M.; Reid, J.G.; Worley, K.C. Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology. PLoS ONE 2012, 7, e47768. [Google Scholar] [CrossRef] [PubMed]
  17. Koren, S.; Schatz, M.C.; Walenz, B.P.; Martin, J.; Howard, J.T.; Ganapathy, G.; Wang, Z.; Rasko, D.A.; McCombie, W.R.; Jarvis, E.D. Phillippy Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 2012, 30, 693–700. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  18. Chin, C.S.; Alexander, D.H.; Marks, P.; Klammer, A.A.; Drake, J.; Heiner, C.; Clum, A.; Copeland, A.; Huddleston, J.; Eichler, E.E. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 2013, 10, 563–569. [Google Scholar] [CrossRef] [PubMed]
  19. Vanburen, R.; Bryant, D.; Edger, P.P.; Tang, H.; Burgess, D.; Challabathula, D.; Spittle, K.; Hall, R.; Gu, J.; Lyons, E. Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum. Nature 2015, 527, 508–511. [Google Scholar] [CrossRef] [PubMed]
  20. Gordon, D.; Huddleston, J.; Chaisson, M.J.; Hill, C.M.; Kronenberg, Z.N.; Munson, K.M.; Malig, M.; Raja, A.; Fiddes, I.; Hillier, L.W. Long-read sequence assembly of the gorilla genome. Science 2016, 352. [Google Scholar] [CrossRef] [PubMed]
  21. Xu, Z.X.; Peters, R.J.; Weirather, J.; Luo, H.M.; Liao, B.S.; Zhang, X.; Zhu, Y.J.; Ji, A.J.; Zhang, B.; Hu, S.N.; et al. Full-length transcriptome sequences and splice variants obtained by a combination of sequencing platforms applied to different root tissues of Salvia miltiorrhiza and tanshinone biosynthesis. Plant J. 2015, 82, 951–961. [Google Scholar] [CrossRef] [PubMed]
  22. Abdelghany, S.E.; Hamilton, M.; Jacobi, J.L.; Ngam, P.; Devitt, N.; Schilkey, F.; Benhur, A.; Reddy, A.S.N. A survey of the sorghum transcriptome using single-molecule long reads. Nat. Commun. 2016, 7. [Google Scholar] [CrossRef]
  23. Redwan, R.M.; Saidin, A.; Kumar, S.V. Complete chloroplast genome sequence of MD-2 pineapple and its comparative analysis among nine other plants from the subclass Commelinidae. BMC Plant Biol. 2014, 15. [Google Scholar] [CrossRef] [PubMed]
  24. Chen, X.C.; Li, Q.S.; Li, Y.; Qian, J.; Han, J.P. Chloroplast genome of Aconitum barbatum var. puberulum (Ranunculaceae) derived from CCS reads using the PacBio RS platform. Front. Plant Sci. 2015, 6, 42. [Google Scholar] [CrossRef] [PubMed]
  25. Stadermann, K.B.; Weisshaar, B.; Holtgräwe, D. SMRT sequencing only de novo assembly of the sugar beet (Beta vulgaris) chloroplast genome. BMC Bioinform. 2015, 16. [Google Scholar] [CrossRef] [PubMed]
  26. Ni, L.H.; Zhao, Z.L.; Xu, H.X.; Chen, S.L.; Dorje, G. The complete chloroplast genome of Gentiana straminea (Gentianaceae), an endemic species to the Sino-Himalayan subregion. Gene 2016, 577, 281–288. [Google Scholar] [CrossRef] [PubMed]
  27. Ferrarini, M.; Moretto, M.; Ward, J.A.; Šurbanovski, N.; Stevanović, V.; Giongo, L.; Viola, R.; Cavalieri, D.; Velasco, R.; Cestaro, A. An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome. BMC Genomics 2013, 14. [Google Scholar] [CrossRef] [PubMed]
  28. Wu, Z.H.; Gui, S.T.; Quan, Z.W.; Pan, L.; Wang, S.Z.; Ke, W.D.; Liang, D.Q.; Ding, Y. A precise chloroplast genome of Nelumbo nucifera (Nelumbonaceae) evaluated with Sanger, Illumina MiSeq, and PacBio RS II sequencing platforms: Insight into the plastid evolution of basal eudicots. BMC. Plant Biol. 2014, 14. [Google Scholar] [CrossRef] [PubMed]
  29. Organelle Genome Esources. Available online: http://www.ncbi.nlm.nih.gov/genomes/ORGANELLES/organelles.html (access on 21 June 2016).
  30. Raveendar, S.; Na, Y.W.; Lee, J.R.; Shim, D.; Ma, K.H.; Lee, S.Y.; Chung, J.W. The complete chloroplast genome of Capsicum annuum var. glabriusculum using Illumina sequencing. Molecules 2015, 20, 13080–13088. [Google Scholar] [PubMed]
  31. Ibrahim, R.I.; Azuma, J.; Sakamoto, M. Complete nucleotide sequence of the cotton (Gossypium barbadense L.) chloroplast genome with a comparative analysis of sequences among 9 dicot plants. Genes Genetic Syst. 2006, 81, 311–321. [Google Scholar] [CrossRef]
  32. Wu, C.S.; Chaw, S.M. Highly rearranged and size-variable chloroplast genomes in conifers II clade (cupressophytes): Evolution towards shorter intergenic spacers. Plant Biotechnol. J. 2013, 12, 344–353. [Google Scholar] [CrossRef] [PubMed]
  33. Tuskan, G. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 2006, 313, 1596–1604. [Google Scholar] [PubMed]
  34. Sato, S.; Nakamura, Y.; Kaneko, T.; Asamizu, E.; Tabata, S. Complete structure of the chloroplast genome of Arabidopsis thaliana. Dna Res. Int. J. Rapid Publ. Rep. Genes Genomes 1999, 6, 283–290. [Google Scholar] [CrossRef]
  35. Do, H.D.K.; Kim, J.S.; Kim, J.H. Comparative genomics of four Liliales families inferred from the complete chloroplast genome sequence of Veratrum patulum O. Loes. (Melanthiaceae). Gene 2013, 530, 229–235. [Google Scholar] [CrossRef] [PubMed]
  36. Yang, M.; Zhang, X.; Liu, G.; Yin, Y.; Chen, K.; Yun, Q.; Zhao, D.; Al-Mssallem, I.S.; Yu, J. The complete chloroplast genome sequence of date palm (Phoenix dactylifera L.). PLoS ONE 2012, 5, e12762. [Google Scholar] [CrossRef] [PubMed]
  37. Tangphatsornruang, S.; Sangsrakru, D.; Chanprasert, J.; Uthaipaisanwong, P.; Yoocha, T.; Jomchai, N.; Tragoonrung, S. The Chloroplast Genome Sequence of Mungbean (Vigna radiata) Determined by High-throughput Pyrosequencing: Structural Organization and Phylogenetic Relationships. DNA. Res. 2010, 17, 11–22. [Google Scholar] [CrossRef] [PubMed]
  38. Powell, W.; Rafalski, J.A. Polymorphic simple sequence repeat regions in chloroplast genomes: Applications to the population genetics of pines. Proc. Natl. Acad. Sci. USA 1995, 92, 7759–7763. [Google Scholar] [CrossRef] [PubMed]
  39. Jiao, Y.; Jia, H.M.; Li, X.W.; Chai, M.L.; Jia, H.J.; Chen, Z.; Wang, G.Y.; Chai, C.Y.; Weg, E.V.D.; Gao, Z.S. Development of simple sequence repeat (SSR) markers from a genome survey of Chinese bayberry (Myrica rubra). BMC Genomics 2012, 13, 151–154. [Google Scholar] [CrossRef] [PubMed]
  40. Qian, J.; Song, J.; Gao, H.; Zhu, Y.; Xu, J.; Pang, X.; Yao, H.; Sun, C.; Li, X.E.; Li, C. The complete chloroplast genome sequence of the medicinal plant Salvia miltiorrhiza. PLoS ONE 2013, 8, e57607. [Google Scholar] [CrossRef] [PubMed]
  41. Kuang, D.Y.; Wu, H.; Wang, Y.L.; Gao, L.M.; Zhang, S.Z.; Lu, L. Complete chloroplast genome sequence of Magnolia kwangsiensis (Magnoliaceae): Implication for DNA barcoding and population genetics. Genome 2011, 54, 663–673. [Google Scholar] [CrossRef] [PubMed]
  42. Huotari, T.; Korpelainen, H. Complete chloroplast genome sequence of Elodea canadensis and comparative analyses with other monocot plastid genomes. Gene 2012, 508, 96–105. [Google Scholar] [CrossRef] [PubMed]
  43. Nie, X.J.; Lv, S.Z.; Zhang, Y.X.; Du, X.H.; Wang, L.; Biradar, S.S.; Tan, X.F.; Wan, F.H.; Song, W.N. Complete chloroplast genome sequence of a major invasive species, crofton weed (Ageratina adenophora). PLoS ONE 2012, 7. [Google Scholar] [CrossRef] [PubMed]
  44. Takano, A.; Okada, H. Phylogenetic relationships among subgenera, species, and varieties of Japanese Salvia L. (Lamiaceae). J. Plant Res. 2011, 124, 245–252. [Google Scholar] [CrossRef] [PubMed]
  45. Schäferhoff, B.; Fleischmann, A.; Fischer, E.; Albach, D.C.; Borsch, T.; Heubl, G.; Kai, F.M. Towards resolving Lamiales relationships: Insights from rapidly evolving chloroplast sequences. BMC Evol. Biol. 2010, 10. [Google Scholar] [CrossRef] [PubMed]
  46. Ma, J.; Yang, B.; Zhu, W.; Sun, L.; Tian, J.; Wang, X. The complete chloroplast genome sequence of Mahonia bealei (Berberidaceae) reveals a significant expansion of the inverted repeat and phylogenetic relationship with other angiosperms. Gene 2013, 528, 120–131. [Google Scholar] [CrossRef] [PubMed]
  47. Chaisson, M.J.; Tesler, G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): Application and theory. BMC Bioinform. 2012, 13. [Google Scholar] [CrossRef] [PubMed]
  48. Wyman, S.K.; Jansen, R.K.; Boore, J.L. Automatic annotation of organellar genomes with DOGMA. Bioinformatics 2004, 20, 3252–3255. [Google Scholar] [CrossRef] [PubMed]
  49. Schattner, P.; Brooks, A.N.; Lowe, T.M. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 2007, 33, W686–W689. [Google Scholar] [CrossRef] [PubMed]
  50. Lohse, M.; Drechsel, O.; Bock, R. OrganellarGenomeDRAW (OGDRAW): A tool for the easy generation of high-quality custom graphical maps of plastid and mitochondrial genomes. Curr. Genet. 2007, 52, 267–274. [Google Scholar] [CrossRef] [PubMed]
  51. Tamura, K.; Peterson, D.; Peterson, N.; Stecher, G.; Nei, M.; Kumar, S. MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 2011, 28, 2731–2739. [Google Scholar] [CrossRef] [PubMed]
  52. Kurtz, S.; Phillippy, A.; Delcher, A.L.; Smoot, M.; Shumway, M.; Antonescu, C.; Salzberg, S.L. Versatile and open software for comparing large genomes. Genome Biol. 2004, 5. [Google Scholar] [CrossRef] [PubMed]
  53. Frazer, K.A.; Pachter, L.; Poliakov, A.; Rubin, E.M.; Dubchak, I. VISTA: Computational tools for comparative genomics. Nucleic Acids Res. 2004, 32, W273–W279. [Google Scholar] [CrossRef] [PubMed]
  54. Kurtz, S.; Choudhuri, J.V.; Ohlebusch, E.; Schleiermacher, C.; Stoye, J.; Giegerich, R. REPuter: The Manifold Applications of Repeat Analysis on a Genomic Scale. Nucleic Acids Res. 2001, 29, 4633–4642. [Google Scholar] [CrossRef] [PubMed]
  55. Wani, S.H.; Haider, N.; Kumar, H.; Singh, N.B. Plant Plastid Engineering. Curr. Genomics 2010, 11, 500–512. [Google Scholar] [CrossRef] [PubMed]
  56. Bock, R. Engineering Plastid Genomes: Methods, Tools, and Applications in Basic Research and Biotechnology. Annu. Rev. Plant Biol. 2015, 66, 211–241. [Google Scholar] [CrossRef] [PubMed]
  57. Apel, W.; Bock, R. Enhancement of carotenoid biosynthesis in transplastomic tomatoes by induced lycopene-to-provitamin A conversion. Plant Physiol. 2009, 151, 59–66. [Google Scholar] [CrossRef] [PubMed]
  58. Craig, W.; Lenzi, P.; Scotti, N.; Palma, M.D.; Saggese, P.; Carbone, V.; Curran, M.G.; Magee, A.M.; Medgyesy, P.; Kavanagh, T.A. Transplastomic tobacco plants expressing a fatty acid desaturase gene exhibit altered fatty acid profiles and improved cold tolerance. Transgenic Res. 2008, 17, 769–782. [Google Scholar] [CrossRef] [PubMed]
  59. Madoka, Y.; Tomizawa, K.; Mizoi, J.; Nishida, I.; Nagano, Y.; Sasaki, Y. Chloroplast transformation with modified accD operon increases acetyl-CoA carboxylase and causes extension of leaf longevity and increase in seed yield in tobacco. Plant Cell Physiol. 2002, 43, 1518–1525. [Google Scholar] [CrossRef] [PubMed]
  60. Wang, J.; Liu, Y.; Cai, Y.; Zhang, F.; Xia, G.; Xiang, F. Cloning and functional analysis of geraniol 10-hydroxylase, a cytochrome P450 from Swertia mussotii Franch. Biosci. Biotechnol. Biochem. 2010, 74, 1583–1590. [Google Scholar] [CrossRef] [PubMed]
  • Sample Availability: Samples are not available.
Figure 1. Gene map of the S. mussotii chloroplast genome. Genes drawn inside the circle are transcribed clockwise, and those outside are counterclockwise. Genes are colour-coded based on the functional groups to which they belong. CDS: protein-coding regions.
Figure 1. Gene map of the S. mussotii chloroplast genome. Genes drawn inside the circle are transcribed clockwise, and those outside are counterclockwise. Genes are colour-coded based on the functional groups to which they belong. CDS: protein-coding regions.
Molecules 21 01029 g001
Figure 2. Repeat sequences in six Gentianales chloroplast genomes. REPuter was used to identify repeat sequences with length ≥ 30 bp and sequence identify ≥90% in the chloroplast genomes. F, P, R, and C indicate the repeat types F (forward), P (palindrome), R (reverse), and C (complement), respectively. Repeats with different lengths are indicated in different colours.
Figure 2. Repeat sequences in six Gentianales chloroplast genomes. REPuter was used to identify repeat sequences with length ≥ 30 bp and sequence identify ≥90% in the chloroplast genomes. F, P, R, and C indicate the repeat types F (forward), P (palindrome), R (reverse), and C (complement), respectively. Repeats with different lengths are indicated in different colours.
Molecules 21 01029 g002
Figure 3. Comparison of three chloroplast genomes using mVISTA. Grey arrows and thick lines above the alignment indicate genes with their orientation and the position of the IRs, respectively. A cut-off of 70% identity was used for the plots, and the y-axis represents the percent identity between 50%–100%. Genome regions are color-coded as protein-coding (exon), rRNA, tRNA, and conserved noncoding sequences (CNS).
Figure 3. Comparison of three chloroplast genomes using mVISTA. Grey arrows and thick lines above the alignment indicate genes with their orientation and the position of the IRs, respectively. A cut-off of 70% identity was used for the plots, and the y-axis represents the percent identity between 50%–100%. Genome regions are color-coded as protein-coding (exon), rRNA, tRNA, and conserved noncoding sequences (CNS).
Molecules 21 01029 g003
Figure 4. Comparison of the borders of the LSC, SSC, and IR regions among thirteen chloroplast genomes. Ψ indicates a pseudogene. This figure is not to scale.
Figure 4. Comparison of the borders of the LSC, SSC, and IR regions among thirteen chloroplast genomes. Ψ indicates a pseudogene. This figure is not to scale.
Molecules 21 01029 g004
Table 1. Base composition in the S. mussotii chloroplast genome.
Table 1. Base composition in the S. mussotii chloroplast genome.
Region T (U) (%)C (%)A (%)G (%)Length (bp)
LSC 32.618.531.217.783,567
SSC 34.116.334.015.618,342
IRa 28.322.528.221.025,761
IRb 28.221.028.322.525,761
Total 31.319.330.518.8153,431
CDS 31.318.130.220.477,193
1st position23.919.230.426.525731
2nd position32.620.628.718.125731
3rd position37.214.631.616.625731
Table 2. Genes present in the S. mussotii chloroplast genome.
Table 2. Genes present in the S. mussotii chloroplast genome.
No.Group of GenesGene Names
1Photosystem IpsaA, psaB, psaC, psaI, psaJ
2Photosystem IIpsbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
3Cytochrome b/f complexpetA, petB *, petD *, petG, petL, petN
4ATP synthaseatpA, atpB, atpE, atpF *, atpH, atpI
5NADH dehydrogenasendhA *, ndhB * (×2), ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
6RuBisCO large subunitrbcL
7RNA polymeraserpoA, rpoB, rpoC1 *, rpoC2
8Ribosomal proteins (SSU)rps2, rps3, rps4, rps7 (×2), rps8, rps11, rps12 ** (×2), rps14, rps15, rps18, rps19
9Ribosomal proteins (LSU)rpl2 * (×2), rpl14, rpl16 *, rpl20, rpl22, rpl23 (×2), rpl32, rpl33, rpl36
10Other genesclpP *, matK, ccsA, cemA
11Proteins of unknown functionycf1, ycf2 (×2), ycf3 **, ycf4, ycf15 (×2)
12Transfer RNAs37 tRNAs (6 contain one intron each, 7 in the IRs)
13Ribosomal RNAsrrn4.5 (×2), rrn5 (×2), rrn16 (×2), rrn23 (×2)
The presence of one or two asterisks after the name of a gene indicates that that gene contains one or two introns, respectively.
Table 3. The genes with introns in the S. mussotii chloroplast genome and the lengths of the exons and introns.
Table 3. The genes with introns in the S. mussotii chloroplast genome and the lengths of the exons and introns.
GeneLocationExon I (bp)Intron I (bp)Exon II (bp)Intron II (bp)Exon III (bp)
atpFLSC161700403
clpPLSC71784292680228
ndhASSC5611117540
ndhBIR777683756
petBLSC6727642
petDLSC8678475
rpl16LSC9764399
rpl2IR393657435
rpoC1LSC4357341623
rps12 *LSC114-23253526
trnA-UGCIR3882435
trnG-UCCLSC2368948
trnI-GAUIR3795035
trnK-UUULSC37249635
trnL-UAALSC3737450
trnV-UACLSC3860137
ycf3LSC126745228770153
* The rps12 gene is a trans-spliced gene with the 5′ end located in the LSC region and the duplicated 3′ end in the IR region.
Table 4. The codon-anticodon recognition pattern and codon usage for the S. mussotii chloroplast genome.
Table 4. The codon-anticodon recognition pattern and codon usage for the S. mussotii chloroplast genome.
Amino AcidCodonNo.RSCUtRNAAmino AcidCodonNo.RSCUtRNA
PheUUU9811.32 TyrUAU7381.59
PheUUC5070.68trnF-GAATyrUAC1890.41trnY-GUA
LeuUUA8471.84trnL-UAAStopUAA491.75
LeuUUG5511.19trnL-CAAStopUAG210.75
LeuCUU6101.32 HisCAU4671.5
LeuCUC1870.41 HisCAC1570.5trnH-GUG
LeuCUA3920.85trnL-UAGGlnCAA6981.54trnQ-UUG
LeuCUG1820.39 GlnCAG2070.46
IleAUU10471.47 AsnAAU9201.5
IleAUC4350.61trnI-GAUAsnAAC3030.5trnN-GUU
IleAUA6600.92trnI-CAULysAAA9881.45trnK-UUU
MetAUG5821trn(f)M-CAULysAAG3770.55
ValGUU5101.45 AspGAU8021.61
ValGUC1870.53trnV-GACAspGAC1940.39trnD-GUC
ValGUA5281.5trnV-UACGluGAA9231.45trnE-UUC
ValGUG1850.52 GluGAG3500.55
SerUCU5401.6 CysUGU2211.52
SerUCC3521.04trnS-GGACysUGC700.48trnC-GCA
SerUCA3821.13trnS-UGAStopUGA140.5
SerUCG2210.66 TrpUGG4611trnW-CCA
ProCCU3951.42 ArgCGU3391.28trnR-ACG
ProCCC2340.84 ArgCGC1020.39
ProCCA3181.14trnP-UGGArgCGA3561.35
ProCCG1660.6 ArgCGG1390.53
ThrACU4851.46 ArgAGA3851.14trnR-UCU
ThrACC2720.82trnT-GGUArgAGG1430.42
ThrACA4131.24trnT-UGUSerAGU4771.81
ThrACG1570.47 SerAGC1710.65trnS-GCU
AlaGCU6141.8 GlyGGU5341.2
AlaGCC2250.66 GlyGGC1980.45trnG-GCC
AlaGCA3781.11trnA-UGCGlyGGA7051.59trnG-UCC
AlaGCG1480.43 GlyGGG3420.77
RSCU: Relative Synonymous Codon Usage.
Table 5. Repeat sequences and their distribution in the S. mussotii chloroplast genome.
Table 5. Repeat sequences and their distribution in the S. mussotii chloroplast genome.
No.Size (bp)TypeRepeat 1 StartRepeat 1 LocationRepeat 2 StartRepeat 2 LocationRegion
139F97971IGS (rps12, trnV-GAC)119586ndhA (intron)IRb, SSC
238F44377ycf3 (intron 1)97971IGS (rps12, trnV-GAC)LSC, IRb
338F44377ycf3 (intron 1)119586ndhA (intron)LSC, SSC
437F216IGS (trnH-GUG, psbA)244IGS (trnH-GUG, psbA)LSC
538F39302psaB (CDS)41526psaA (CDS)LSC
632F8154trnS-GCU36099trnS-UGALSC
730F7704IGS (psbK, psbI)28958IGS (petN, psbM)LSC
830F9536trnG-UCC37013trnG-GCCLSC
930F38751psaB (CDS)40966psaA (CDS)LSC
1030F58479ΨaccD58512ΨaccDLSC
1130F75545petB (intron)138996IGS (trnV-GAC, rps12)LSC, IRa
1251P114672IGS (ccsA, ndhD)114675IGS (ccsA, ndhD)SSC
1339P119586ndhA (intron)138988IGS (trnV-GAC, rps12)SSC, IRa
1438P44377ycf3 (intron 1)138989IGS (trnV-GAC, rps12)LSC, IRa
1532P8154trnS-GCU45722trnS-GGALSC
1632P36096trnS-UGA45725trnS-GGALSC
1730P44378ycf3 (intron 1)75545petB (intron)LSC
1830P75545petB (intron)119587ndhA (intron)LSC, SSC
1930P75545petB (intron)97972IGS (rps12, trnV-GAC)LSC, IRb
2031R42871IGS (psaA, ycf3)42875IGS (psaA, ycf3)LSC
F = forward, P = palindrome, IGS = intergenic spacer.
Table 6. Simple sequence repeats in the S. mussotii chloroplast genome.
Table 6. Simple sequence repeats in the S. mussotii chloroplast genome.
UnitLengthNo.SSR StartRegion
A16168265LSC
13345315LSC
80949LSC
114240SSC
11122183LSC
1078410LSC
12227LSC
57572LSC
63341LSC
71135LSC
77632LSC
122496SSC
C11160812LSC
T14160823LSC
132118296SSC
118428SSC
1245757LSC
32886LSC
35984LSC
112141SSC
1131828LSC
124064SSC
125507SSC
10792LSC
7909LSC
54930LSC
66001LSC
120007LSC
125752LSC
127189SSC
AT10147791LSC
TA10147617LSC
ATT151119656LSC
TTA121127046LSC
TTC12135761LSC
TTG121111418SSC
AATT16129843LSC
ATTT121116917SSC
CATA121151279IRa
TATG12185709IRb
TATT121116932SSC
TGTC12130554LSC
TAATA151116944SSC
TATTG15162151LSC
CCTTTA18137196LSC
Table 7. Distribution of SSRs present in the Gentianales chloroplast genomes.
Table 7. Distribution of SSRs present in the Gentianales chloroplast genomes.
TaxonGenome Size (bp)AT (%)SSR TypeCDS
MonoDiTriTetraPentaHexaTotal% aNo. b% c
Swertia mussotii153,43162302462145581022
Gentiana straminea148,99162273270039611026
Gentiana crassicaulis148,77662274270141611024
Coffea arabica155,1896331534004359819
Catharanthus roseus154,950623367910565959
Asclepias nivea161,5926247156233498561717
Asclepias syriaca158,71962561371627101551717
Rhazya stricta154,84162335912306258610
Echites umbellatus153,9706247977117259710
Nerium oleander154,90362426382061591016
Oncinotis tenuiloba155,011624174920635858
Pentalinon luteum154,0536234525304957612
Gynochthodes officinalis153,3986226473414560716
CDS: coding regions. a Percentages were calculated using the total length of the CDS divided by the genome size. b Total number of SSRs identified in the CDS. c Percentages were calculated using the total number of SSRs in the CDS divided by the total number of SSRs in the genome.

Share and Cite

MDPI and ACS Style

Xiang, B.; Li, X.; Qian, J.; Wang, L.; Ma, L.; Tian, X.; Wang, Y. The Complete Chloroplast Genome Sequence of the Medicinal Plant Swertia mussotii Using the PacBio RS II Platform. Molecules 2016, 21, 1029. https://doi.org/10.3390/molecules21081029

AMA Style

Xiang B, Li X, Qian J, Wang L, Ma L, Tian X, Wang Y. The Complete Chloroplast Genome Sequence of the Medicinal Plant Swertia mussotii Using the PacBio RS II Platform. Molecules. 2016; 21(8):1029. https://doi.org/10.3390/molecules21081029

Chicago/Turabian Style

Xiang, Beibei, Xiaoxue Li, Jun Qian, Lizhi Wang, Lin Ma, Xiaoxuan Tian, and Yong Wang. 2016. "The Complete Chloroplast Genome Sequence of the Medicinal Plant Swertia mussotii Using the PacBio RS II Platform" Molecules 21, no. 8: 1029. https://doi.org/10.3390/molecules21081029

Article Metrics

Back to TopTop