The Chloroplast Genome of Lilium henrici: Genome Structure and Comparative Analysis

Lilium henrici Franchet, which belongs to the family Liliaceae, is an endangered plant native to China. The wild populations of L. henrici have been largely reduced by habitat degradation or loss. In our study, we determined the whole chloroplast genome sequence for L. henrici and compared its structure with other Lilium (including Nomocharis) species. The chloroplast genome of L. henrici is a circular structure and 152,784 bp in length. The large single copy and small single copy is 82,429 bp and 17,533 bp in size, respectively, and the inverted repeats are 26,411 bp in size. The L. henrici chloroplast genome contains 116 different genes, including 78 protein coding genes, 30 tRNA genes, 4 rRNA genes, and 4 pseudogenes. There were 51 SSRs detected in the L. henrici chloroplast genome sequence. Genic comparison among L. henrici with other Lilium (including Nomocharis) chloroplast genomes shows that the sequence lengths and gene contents show little variation, the only differences being in three pseudogenes. Phylogenetic analysis revealed that N. pardanthina was a sister species to L. henrici. Overall, this study, providing L. henrici genomic resources and the comparative analysis of Lilium chloroplast genomes, will be beneficial for the evolutionary study and phylogenetic reconstruction of the genus Lilium, molecular barcoding in population genetics.


Introduction
The genus Lilium L., including about 115 species, is mainly distributed throughout cold and temperate regions of the Northern Hemisphere [1]. Many species of this genus have horticultural and food uses [2], especially their ornamental value. Lilies, the plants in the genus Lilium, has been widely known as one of the most important cut flowers in the world during the last 50 years [3]. Many Lilium species have large and beautiful flowers, with showy color and fragrant smell, and horticulturists produce many garden cultivars by interspecific hybridization according to their characteristics. China is rich in wild Lilium resources and possesses many endemic species. These rare Lilium species are a valuable raw material for hybrid breeding. In the traditional classification, species of Lilium have been divided into seven sections [4] using 15 morphological characters and many researchers have revised this classification. Currently, many molecular phylogenetic and morphological analysis inform our understating of this genus [5][6][7][8][9][10][11][12][13]. However, species-level relationships in Lilium are not yet clear [14] and more effective molecular markers are needed to determine interspecific phylogeny more accurately.
Lilium henrici Franchet, a Lilium species considered to be closely related to the Nomocharis, grows naturally in forests at an elevation of 2800 m in China's Sichuan and Yunnan provinces [1]. It has a unique flower morphology and high horticultural value. However, the enormous economic value of Lilium plants for ornamental purposes, and as foods and medicines has led to excessive exploitation, causing their habitat to be fragmented [15,16], and unfortunately, many Lilium species, including L. henrici. were classified as endangered in the 2013 report and the impact of habitat degradation or loss has resulted in population decline.
The typical chloroplast genomes of angiosperms usually encode 120 to 130 genes and are between 120 and 170 kb in length [17]. They are covalently closed molecules consisting of four parts, which are a large single copy (LSC) region, a small single copy (SSC) region, and a pair of inverted repeats (IRs) regions. As a consequence of conserved gene content and structure [18], the availability of plastome DNA sequences, an ability to resolve relationships at lower taxonomic levels [19], and maternal inheritance, the chloroplast genome has become particularly useful in phylogenetic and population genetic studies [20].
Advances in next generation sequencing technologies have revolutionized whole genome sequencing. The development of NGS methods achieve a breakthrough in sequencing technology and provide researchers with faster and cheaper methods to acquire chloroplast genome data [21]. More and more chloroplast genomic sequences are being analyzed, and phylogenetics have entered a new era at the same time. The chloroplast genome sequence of L. henrici obtained in this study can raise our awareness of this species, and the results will enrich the genetic information of the genus Lilium we already have, providing a theoretical basis for further study on the evolution of Lilium and species identification.

Plant Material, DNA Extraction and Sequencing
We sampled healthy and mature leaves of L. henrici from Danzha County (25 • 27 39 N, 98 • 15 27 E), Yunnan Province, China and preserved them in liquid nitrogen for further study. The corresponding voucher specimens were deposited at Sichuan University Herbarium (SZ Herbarium). Total genomic DNA of L. henrici was extracted from sampled leaves using the Plant Genomic DNA Kit (Tiangen Biotech, Beijing, China) following the manufacturer's instructions. The isolated genomic was manufactured to average 350 bp paired-end(PE) library using Illumina Hiseq platform (Illumina, San Diego, CA, USA), and sequenced by an Illumina genome analyser (Hiseq PE150).

Chloroplast Genomic Assembly and Annotation
We used the software FastQC [22] v0.11.7 to evaluate the quality of sequenced raw reads. After quality assessment, we filtered the chloroplast genome related reads by mapping all the raw reads to the published chloroplast genome sequences in Liliaceae. All related reads were assembled to contigs using SOAPdenovo2 [23], and all contigs were sorted and joined into a single draft sequence with L. taliense as the reference in the software Geneious [24] v11.0.4. The gaps and ambiguous sequences were manually adjusted after Sanger sequencing. Annotations of chloroplast genome were conducted by the software Geneious v11.0.4, using other existent Lilium chloroplast genome sequences as references. Star/stop codons and intron/exon borders were edited manually after comparation with references. In addition, we used tRNAscan-SE [25] v2.0 to verify the identified tRNA genes. The circular map of L. henrici plastid genome was generated utilizing the OGDRAW program [26].

Analysis on Codon Usage
Codon usage of the L. henrici chloroplast genome was analyzed via codonW [28] software. Protein-coding genes (CDS) were filtered from L. henrici chloroplast genome with following conditions to reduce deviation of the results: (1) the length of every CDS must be more than 300 nucleotides [29,30]; (2) Repeat sequences were removed. Finally, 53 CDS in L. henrici were selected for further study (Table S1).

Lilium Phylogenomic Reconstruction Based on Chloroplast Genome
Phylogenomic reconstruction of the genus Lilium was based on twenty-five whole chloroplast genome sequences, which were one species of Cardiocrinum, four species of Fritillaria and twenty species of Lilium (including N. pardanthina). All the sequences were aligned by MAFFT [41] and trimmed by trimAl [42]. Sequences, after alignment and trim, were used to select the best substitution model in the program jModelTest v2.1.7 [43] and the best model was GTR + I + G model. The phylogenomic relationship was inferred by maximum-likelihood method based on the GTR + I + G substitution model in MEGA 7.0 [44] with 1000 bootstrap replicates. Species in Cardiocrinum and Fritillaria were selected as outgroups.

Chloroplast Genome Features of the L. henrici
We determined the L. henrici whole chloroplast genome sequence and deposited it to GenBank under accession number: MH136807. The complete plastid genome of L. henrici is a typical quadripartite structure with 152,784 bp in length, near to other Lilium chloroplast genome levels [14,[31][32][33][34][35][36][37][38][39]. The genome, displaying a conserved structure found in most chloroplast genomes of plants [45,46], possesses an LSC region (82,429 bp), an SSC region (17,533 bp) and a pair of IR regions (26,411 bp) ( Figure 1, Table 1). In the L. henrici plastid genome, the overall GC content is 37.0%. The GC content of the IR regions (42.5%) is higher than that of the overall genome (37.0%), LSC region (34.83%) and SSC region (30.59%). A total of 136 genes are detected, 20 of which are duplicated in the IR regions. There were 84 genes for protein-coding, 38 for tRNA, eight for rRNA and six for other functions among the 136 genes ( Table 2).
By containing several internal stop codons, the infA, ycf1 (the shorter one), ycf15, and ycf68 were interpreted as pseudogenes in our study. The same phenomenon also appears on other plants [47][48][49][50] or infA gene is lost in some taxa [51]. In a previous study, the ycf1 and ycf68 regions were considered to have potential function and required further study [47]. Different from other Lilium species, L. henciri is similar to L. primulinum var. ochraceum in having only has 23 bp duplicated. Other studies revealed only one rps19 gene in Lilium close species, such as species in Amana, Cardiocrinum and Fritillaria [47,[52][53][54].
Eighteen distinct genes (six of which are tRNAs) contain introns (Table S2), 3 (rps12, clpP and ycf3) of which contain two introns and the others only have one intron. The gene rps12 is a trans splicing gene with its three exons located in the LSC and IR regions, respectively.

Codon Usage Analysis
We used 53 protein coding sequences from the L. henrici chloroplast genome for codon usage analysis. All protein coding sequences contain 21324 codons (Table 3). Leucine (2184 codons, approximately 10.24% of the total) and cysteine (241 codons, approximately 1.13% of the total) are the highest and lowest number of amino acids, respectively. Moreover, Met and Trp are encoded by only one codon. Except these two amino acids, others have obvious codon usage bias. For example, synonymous codons GCU, GCC, GCA and GCG encode alanine and the corresponding Relative Synonymous Codon Usage (RSCU) values [59] for these four codons in L. henrici are 1.75, 0.63, 1.16 and 0.47, respectively. There are 32 codons with a RSCU value more than 1. CDS GC content, about 37.4%, is similar with genome-wide GC content (37.0%). Codon usage bias of chloroplast genome may be affected by selection and mutation [60], and research on codon preferences can help us to better understand the exogenous gene expression and molecular evolution mechanisms of L. henrici.

Overall Sequence Variation of the Chloroplast Genomes among Species in Lilium
Preexisting whole plastome data in Lilium provides a basis for comparing genomic variation in the genus [61]. To compare the sequence variation within the genus, alignments among twenty Lilium (including Nomocharis) plastid genome sequences were implemented in the mVISTA program with Shuffle-LAGAN model (Figure 3). Overall, the comparative genomic analysis showed that twenty Lilium (including Nomocharis) chloroplast genomes were relatively conserved. Among the Lilium species, the IR region is more conserved than the LSC and SSC regions, similar with studies in other plants [47,51,[61][62][63], and more variation were detected in the intergenic spacers in the LSC and SSC regions. In addition, fewer variations were found in protein coding regions. In the whole chloroplast genome sequences of Lilium species, some highly divergent regions including matK, rpoC2, rps3, ycf2, ndhF, ndhH, trnK-rps16, trnS-trnG, atpH-atpI, trnT-psbD, trnF-ndhJ, accD-psaI might be regarded as potential molecular markers for Lilium plants. Further work is needed to be implemented to verify whether these regions are suitable as molecular markers for phylogenetic studies of Lilium.
Molecules 2018, 23, x FOR PEER REVIEW 9 of 14 species, the IR region is more conserved than the LSC and SSC regions, similar with studies in other plants [47,51,[61][62][63], and more variation were detected in the intergenic spacers in the LSC and SSC regions. In addition, fewer variations were found in protein coding regions. In the whole chloroplast genome sequences of Lilium species, some highly divergent regions including matK, rpoC2, rps3, ycf2, ndhF, ndhH, trnK-rps16, trnS-trnG, atpH-atpI, trnT-psbD, trnF-ndhJ, accD-psaI might be regarded as potential molecular markers for Lilium plants. Further work is needed to be implemented to verify whether these regions are suitable as molecular markers for phylogenetic studies of Lilium. The main reason of change in the size of the chloroplast genome is the expansion/contraction of IR regions [64], and the location of the boundaries among the four chloroplast regions is useful for evolutionary studies [62]. The chloroplast genome organization is rather conserved within Lilium. The main reason of change in the size of the chloroplast genome is the expansion/contraction of IR regions [64], and the location of the boundaries among the four chloroplast regions is useful for evolutionary studies [62]. The chloroplast genome organization is rather conserved within Lilium.
The twenty Lilium (including Nomocharis) chloroplast genome sequences range from 151,655 bp (L. bakerianum) to 153,235 bp (L. fargesii), including an LSC region of 81,224-82,542 bp, an SSC region of 17,038-17,620 bp, and a pair of IR regions of 26,394-26,990 bp. The overall GC content of the complete genomes is 36.9-37.1%. There generally are the same gene order in all genomic sequences. We compared the borders of LSC, SSC and IR regions among twenty Lilium (including Nomocharis) plastid genomic sequences ( Figure S1) and the IR regions of twenty Lilium (including Nomocharis) chloroplast genomes are conserved.

Phylogenetic Analysis
In the previous reports, the chloroplast genomes are of great significance in the reconstruction of plants phylogenetic relationships and evolutionary history [63][64][65]. In our study, we constructed a phylogenetic tree using the sequences of the whole chloroplast genomes of twenty-five species in the family Liliaceae, including twenty Lilium (including Nomocharis) species and using five species in Cardiocrinum and Fritillaria as outgroups ( Figure 4). All clades were strongly supported, and L. henrici was sister to N. pardanthina. The phylogenetic tree indicated that twenty species in Lilium (including Nomocharis) clustered into two groups. One group comprised section Sinomartagon (L.   Figure S1) and the IR regions of twenty Lilium (including Nomocharis) chloroplast genomes are conserved.

Phylogenetic Analysis
In the previous reports, the chloroplast genomes are of great significance in the reconstruction of plants phylogenetic relationships and evolutionary history [63][64][65]. In our study, we constructed a phylogenetic tree using the sequences of the whole chloroplast genomes of twenty-five species in the family Liliaceae, including twenty Lilium (including Nomocharis) species and using five species in Cardiocrinum and Fritillaria as outgroups ( Figure 4). All clades were strongly supported, and L. henrici was sister to N. pardanthina. The phylogenetic tree indicated that twenty species in Lilium (including Nomocharis) clustered into two groups. One group comprised section Sinomartagon (L. cernuum, L.  The phylogenetic tree acquired in our study is consistent with previous studies [12,14,39]. The clade Lilium-Nomocharis was highly supported, and this conclusion was already proposed in many studies [9,12,66], and Gao et al. accommodated Nomocharis in Lilium [13]. The sister relationship between L. henrici and N. pardanthina is also supported by morphological and cytological evidence: The phylogenetic tree acquired in our study is consistent with previous studies [12,14,39]. The clade Lilium-Nomocharis was highly supported, and this conclusion was already proposed in many studies [9,12,66], and Gao et al. accommodated Nomocharis in Lilium [13]. The sister relationship between L. henrici and N. pardanthina is also supported by morphological and cytological evidence: (1) characteristic of inner perianth: sometimes anthocyanin rich [12]; (2) characteristics of karyotype: same basic chromosomal number (x = 12); karyotype asymmetry (3A type) [67]. However, we had used only a small number of species in Lilium-Nomocharis clade, other chloroplast genome sequences of species that might be included in this clade should be sequenced as soon as possible.

Conclusions
We have sequenced the whole chloroplast genome of L. henrici using Illumina sequencing technology for the first time and compared its structure with other Lilium species. The plastid genome of L. henrici exhibits quadripartite structure. Compared with other chloroplast genomes in Lilium, it has similar size, genomic structure and gene order. And the chloroplast genome sequences in Lilium is relatively conserved at the boundaries of the IR regions and the SC regions. By comparing twenty Lilium (including Nomocharis) plastid genome sequences, we obtained 12 highly divergent regions. 51 SSRs were identified in L. henrici. We detected nine common SSRs in twenty Lilium (including Nomocharis) chloroplast genome sequences and these loci might play an important role in the further researches on the genetic structure of the genus Lilium. The research on codon usage of L. henrici shows that some amino acids have obvious codon usage bias and the codon preferences may help us understand the evolution mechanisms of L. henrici more deeply. Reconstructed molecular phylogenetic relationships using 25 complete chloroplast genome sequences in Liliaceae strongly supports the close sister relationship of N. pardanthina and L. henrici. We try to find morphological and cytological evidence to support their close relationship. Other whole chloroplast genome sequences of Lilium-Nomocharis clade need to be sequenced to understand the position of this clade or Nomocharis in the genus Lilium. The genome data obtained in this study will provide a theoretical basis for determinating the phylogenetic relationship of Lilium.
Supplementary Materials: The following are available online. Table S1: CDS used in the codon usage analysis and corresponding GC content, Table S2: Intron and exon information in the chloroplast genome of L. henrici, Table S3: Simple sequence repeats (SSRs) detected in the twenty chloroplast genome sequences of Lilium (including Nomocharis).