Complete Chloroplast Genome of Michelia Shiluensis and A Comparative Analysis with Four Magnoliaceae Species

: Michelia shiluensis is a rare and endangered magnolia species found in South China. This species produces beautiful flowers and is thus widely used in landscape gardening. Additionally, its timber is also used for furniture production. As a result of low rates of natural reproduction and increasing levels of human impact, wild M. shiluensis populations have become fragmented. This species is now classified as endangered by the IUCN. In the present study, we characterized the complete chloroplast genome of M. shiluensis and found it to be 160,075 bp in length with two inverted repeat regions (26,587 bp each), a large single-copy region (88,105 bp), and a small copy region (18,796 bp). The genome contained 131 genes, including 86 protein-coding genes, 37 tRNAs, and 8 rRNAs. The guanine-cytosine content represented 39.26% of the overall genome. Comparative analysis revealed high similarity between the M. shiluensis chloroplast genome and those of four closely related species: Michelia odora Magnolia laevifolia Magnolia insignis and Magnolia cathcartii . Phylogenetic analysis shows that M. shiluensis is most closely related to M. odora . The genomic information presented in this study is valuable for further classification, phylogenetic studies, and to support ongoing conservation efforts.


Introduction
Michelia shiluensis Chun and Y. F. Wu (Magnoliaceae) is an endangered flowering plant that is sparsely distributed throughout Hainan Province, China [1]. It is characterized by leafy branches and beautiful flowers, and is, therefore, widely used in landscape gardening [2]. This species is also a source of excellent quality wood which is in demand for furniture production [3]. In recent decades, there has been a serious decline in wild populations of this species as a result of the illegal harvesting to supply both the timber and horticultural markets [4]. Moreover, this species naturally has a low seeding rate and its wild populations are declining [5]. Consequently, M. shiluensis is categorized as a Class II National Key Protected Species in China [6] and is considered endangered (EN) by the International Union for Conservation of Nature [7]. Currently, most studies on M. shiluensis have Forests 2020, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/forests 2 focused on its use in landscape gardening and its protection in China [5]; however, there remains a lack of evolutionary and phylogenetic research.
The chloroplast is an important organelle in plants with its own genome (hereafter, cp genome) and participates in photosynthesis and other functions [8]. The cp genome of most land plants has a circular structure, including four segments: A large single-copy (LSC), a small single-copy (SSC), and two invert repeats (IRs) [9]. Although the cp genome is generally conserved, it has undergone intraand inter-species rearrangement during evolution [10,11], including IR expansion and contraction. The information obtained from sequence rearrangements can be applied in phylogenetic analyses to solve taxonomic problems, such as low-level classifications, using genome comparison [12][13][14][15][16][17]. In the section Michelia, complete cp genomes have been reported for only Magnolia alba (NC037005), Magnolia laevifolia (NC035956), and Michelia odora (NC023239). Therefore, analysis of the cp genomes of other Michelia plants is necessary because of the similarity of morphology among Magnoliaceae species [18].
In the present study, we characterized the cp genome of M. shiluensis and compared its sequence features with four closely related species (M. odora, M. laevifolia, Magnolia insignis, and Magnolia cathcartii). The phylogenetic relationships among 28 Magnoliaceae species were constructed based on 79 protein-coding gene (PCG) sequences and show that M. shiluensis is most closely related to M. odora.

Plant Material and DNA Extraction
Fresh leaves of M. shiluensis were collected in the South China Botanical Garden (113°21ʹE, 23°10ʹN), China and transported to the laboratory at the South China Agricultural University. Total genomic DNA was isolated from the leaves using the CTAB method [19].

Genome Sequencing and Annotation
An Illumina shotgun library was established according to the manufacturer's protocol, and highthroughput sequencing was conducted using the HiSeq X TEN platform (Illumina, San Diego, CA, USA). After filtration using SOAPnuke [20], 4.93 GB of clean data were generated. Filtered reads were assembled de novo using SPAdes (version 3.10.1) [21] by referencing them against the cp genome sequence of M. odora (NC037005.1) using BLAST v2.2.30 (National Center for Biotechnology Information, Bethesda, MD, USA). Gene annotation was performed using GeSeq [22]. The cp genome map was generated using Organellar Genome DRAW (version 1.2) [23]. The annotated sequence was submitted to GenBank (accession number MN418056).

Sequence and Repeat Analysis
We used the Editseq v7.1.0 [24] software to calculate the guanine-cytosine (GC) content. MEGA v7.0.26 [25] was used to generate the relative synonymous codon usage (RSCU) values based on 79 PCGs. RNA editing sites in PCGs were predicted using the PREP suite [26] with the default settings.

Genome Comparison and Sequence Divergence
Comparisons between five Magnoliaceae cp genomes were visualized using online mVISTA software [29] with the annotation of M. shiluensis as the reference in Shuffle-LAGAN mode. The borders of four different regions among the five cp genomes of Magnoliaceae were visualized using IRscope [30]. The nucleotide diversity (Pi), the rate of nonsynonymous (dN) substitutions, the rate of Forests 2020, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/forests 3 synonymous (dS) substitutions were determined using DNAsp v6.12.03 [31] to investigate the nucleotide diversity of sequences and genes that are considered to be under selection pressure.

Phylogenetic Analysis
To research the phylogenetic relationships and allow for comparisons among Magnoliaceae species, a maximum likelihood tree was constructed using RAxML [32], with 1000 bootstrap replicates, based on the PCG sequences found in 28 Magnoliaceae cp genomes. All 28 Magnoliaceae cp genome sequences were downloaded from the NCBI nucleotide database.

Structures and Features of M. shiluensis Chloroplast Genome
The complete cp genome of M. shiluensis was 160,075 bp in length and comprised two IR regions of 26,587 bp each, separated by an LSC region of 88,105 bp, and an SSC region of 18,796 bp. The cp genome had the following base proportions: Adenine (A), 29.99%; thymine (T), 30.75%; cytosine (C), 19.98%; and guanine (G), 19.28%. Therefore, the total GC content was 39.26%. The GC content of the LSC, SSC, and IR regions were 37.95%, 34.28%, and 43.20%, respectively ( Table 1).
The cp genome of M. shiluensis contained 113 unique genes, including 79 PCGs, 30 tRNAs, and four rRNAs (Table 1, Figure 1). A total of 58 genes were found to be involved in self-replication, 12 genes encoded small ribosomal subunit proteins, eight genes encoded large ribosomal subunit proteins, 30 genes encoded tRNA, and four genes encoded RNA polymerase subunits. A total of 44 genes were found to be involved in photosynthesis, including six genes for ATP synthase, 11 genes for NADH dehydrogenase, six genes for the cytochrome b/f complex, five genes for photosystem I, 15 genes for photosystem II, and one gene for the large chain of Rubisco. In total, 18 genes were duplicated in the cp genome of M. shiluensis, including seven PCGs, seven tRNA genes, and four rRNA genes, all of which were located in the IR region ( Table 2). None of the genes contained stop codons in coding sequences, therefore, no pseudogenes were detected.   A total of 16 genes were found to have introns, including 10 PCGs and six tRNA genes. Of these genes, clpP, trnA-UGC, trnI-GAU, and ycf3 had two introns, whereas atpF, ndhA, ndhB, petB, rpl2, rpoC1, rps12, rps16, trnG-UCC, trnK-UUU, trnL-UAA, and trnV-UAC had one intron. The rps12 gene which encodes the 40S ribosomal protein S12, was trans-spliced, with one exon located in the LSC region, and the other two exons located in the IR region. The largest intron was located in the trnK gene (2490 bp) with the matK gene inside; trnL-UAA had the smallest intron (491 bp) ( Table 3).  We compared the basic cp genome features of M. shiluensis with four Magnoliaceae species. The cp genome lengths of M. laevifolia and M. insignis were 45 and 42 bp longer than that of M. shiluensis, respectively, while the cp genome lengths of M. odora and M. cathcartii were five and 125 bp shorter, respectively. Compared with M. shiluensis, the variation in the lengths of the LSC, SSC, and IR regions ranged from 7 to 90, 3 to 14, and 1 to 78 bp, respectively. In addition, the GC content of the whole genome and of each region of M. shiluensis were highly similar to those of the other four species. Moreover, there was no variation with respect to the total number of genes, PCGs, tRNA genes, rRNA genes, and genes with introns (Table 1).

Codon Usage and RNA Analysis
Based on the PCGs, 22,791 codons were detected (excluding the stop codons) ( Table 4). The three most abundant amino acids were leucine (2423 codons), isoleucine (2085 codons), and serine (1719 codons), and the three least abundant amino acids were cysteine (314 codons), tryptophan (427 codons), and methionine (602 codons) ( Figure S1). Of the 30 most frequent codons (RSCU > 1), most of them end with A or U, and only the UUG codon ends with G. In contrast, most of the 32 least frequent codons (RSCU < 1) end with C or G. In addition, two codons, AUG and UGG, have no codon bias (RSCU = 1). PREP suite was used to edit predictions in the genome of M. shiluensis by manipulating the first codon position of the first nucleotide (Table S1). A total of 106 RNA editing sites were detected from the PCGs in M. shiluensis; with the majority of the amino acid conversions involving the conversion of serine to leucine. Most of the RNA editing sites were located on the ndhB gene (14 sites), followed by ndhD (11 sites), and matK (nine sites). Most of the conversions changed from a polar group to a nonpolar group, while only two sites changed from a nonpolar group to a polar group (proline to serine); one of these was located on the psbE gene while the other was located on the ccsA gene.

Repeat Sequence Analysis
The REPuter results show that the M. shiluensis cp genome contains a total of 49 repeats: 23 palindromic, 18 forward, and eight reverse repeats ( Figure 2). The repeat size ranged from 18 to 33 bp. The most abundant repeats were 18 bp (12 sites) followed by 20 bp (10 sites) ( Figure S2).
In the first location, 46.9% of repeats were detected in the intergenic spacers (IGSs), while 34.7% were in the PCGs, and 18.4% were in the tRNA genes. Of all the PCGs, the ycf2 gene had five forward repeats and four palindromic repeats and was the gene with the most repeats (Table S2) (Table S3). The majority of them were mononucleotide repeats (118), followed by tetranucleotide repeats (9), and dinucleotide repeats (8) (Figure 3). No pentanucleotide repeats were detected in the cp genome of M. shiluensis. The longest repeat was 18 bp while the shortest was 8 bp. Noncoding regions, including IGSs (97) and introns (19), contained most of the SSRs while only 25 repeats were located in coding regions, including cemA, ndhD, ndhF, psbC, rpoB, rpoC1, rpoC2, rps19, rps3, ycf1, ycf2, and ycf4 ( Table 5). The cpSSRs were mainly distributed in the LSC region (72.34%), followed by the SSC region (17.73%), with just 4.96% in the IR. The cpSSRs in M.
Forests 2020, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/forests 8 shiluensis had base bias towards A-T bases. In total, 113 SSRs had A or T bases, accounting for 80.14% of the total SSRs ( Figure 3). Comparison among the five species of Magnoliaceae show high similarity in SSR type and distribution. The variation in the total amount, mono-, di-, tri-, tetra-, penta-, and hexanucleotide repeats among the five species was 5, 4, 0, 2, 0, 1, and 1, respectively ( Figure 3). The number of SSRs in the IR were the same among the five species while the counts in different locations and regions were highly conserved.

Genome Comparison and Sequence Divergence
The mVISTA online software was used to compare the variation in the whole cp genome among the five species (Figure 4). The alignments indicated that the whole cp genome of the five species was highly conserved, especially in the IR region. The noncoding sequences had relatively more divergence than the coding sequences. The noncoding sequences that contained high levels of divergence were rps16-trnQ, atpH-atpI, trnT-psbD, petA-psbJ, and ndhF-trnL. In the coding sequences, only ycf1 show relatively more variation than the other genes. No obvious insertions were found among the five species.  The four junctions in the regions of the cp genomes of the five species were shown using IRscope ( Figure 5). There was a conserved structure on each border, and slight distance differences among the five species. Gene rps19 was fully located in the LSC at a distance of 1-6 bp from the LSC/IRb border, while gene rpl2 was fully located in the IRb. The ndhF gene was found in the SSC region and  In the coding region, the mean Pi in the PCGs was 0.00117 (ranging from 0 to 0.00606); and the mean values of Pi in the LSC, IR, and SSC regions were 0.001192, 0.000186, and 0.001634, respectively ( Figure 7). Meanwhile, in the IGSs, the mean Pi value was 0.00295 (ranging from 0 to 0.02416); and the mean Pi value in the LSC, IR, and SSC regions were 0.0297, 0.00045, and 0.00731, respectively. This result indicates that the Pi value in the coding region is lower than that in the IGSs. The results also demonstrate that the IR region is the most conserved region among the five species, followed by the LSC and SSC regions. In total, 20 mutation sites (Pi > 0.005) were identified, including 19 sites in IGSs and one site in a PCG. The mutation sites in IGSs were as follows: trnH-psbA, psbK-psbI, atpA-atpF, rps2-rpoC2, trnT-psbD, ycf3-trnS, ndhJ-ndhK, ndhK-ndhC, accD-psaI, psbL-psbF, petL-petG, trnW-trnP, trnP-psaJ, rpl18-rpl20, ndhF-trnL, ccsA-ndhD, ndhD-psaC, ndhG-ndhI, and ndhI-ndhA. One gene, psaJ, was unique and had a Pi value greater than 0.005.

Phylogenetic Analysis
To reveal the evolutionary relationships between the investigated species and to enable comparison with traditional phylogenies, a maximum likelihood phylogenetic tree was constructed using RAxML (with 1000 bootstrap replicates) based on the PCG sequences found in 28 Magnoliaceae cp genomes (Figure 8). The phylogenetic tree generated 25 nodes; most of which had 100% bootstrap support. This result strongly supports the notion that M. shiluensis is most closely related to M. odora.

Discussion
In this study, we characterized the complete cp genome of M. shiluensis, an endangered and valuable species (Figure 1). By comparing five closely related species, we found that gene content, order, structure, and other features were highly conserved among them (Figures 3, 4, 5). Michelia shiluensis was shown to be most closely related to M. odora (Figure 8). This finding can help to further our understanding of the characterization of the M. shiluensis cp genome and reveal information concerning the evolution, population genetics, and phylogeny of this species.
Normally, the length of the cp genome in higher plants is in the range of about 120-160 kb, with a stable structure and conserved sequence [8,33]. The cp genome of M. shiluensis displayed a typical quadripartite structure, with an LSC and an SSC which were separated by two IR regions (Figure 1). The whole length of this genome was 160,075 bp, with 39.26% GC content, and containing 113 unique genes and 16 genes with one or two introns (Tables 1, 2, 3). Among the 26 Magnoliaceae species, the length of the cp genome ranged from 158,177 to 160,183 bp, the GC content ranged from 39.15% to 39.30%, and they collectively contained 112 common genes, including 79 PCGs, 29 tRNA, and four rRNA genes; also, one or two introns were found among these 16 genes. The results for the M. shiluensis cp genome were consistent with a previous analysis of 26 Magnoliaceae species [34], except for the number of genes, one additional tRNA gene (trnV-GAC) was detected in M. shiluensis. Similar to other angiosperms, a high GC content was detected in the IR region of M. shiluensis, which may be a result of the existence of high-GC rRNA sequences [9,[35][36][37]. Introns play a vital role in selective gene splicing [38]. However, introns have been lost among some species during their evolution [39,40]. In this study, no introns were lost in the cp genome of M. shiluensis during evolution, which reflects the fact that the cp genome is highly conserved in Magnoliaceae [34].
In total, 22,791 codons were found in the cp genome of M. shiluensis (Table 4), among which, the codons for leucine were the most abundant (10.63%). This result has also been observed in Ailanthus altissima [41] and Justicia flava [42]. Among the preferred codons (RSCU > 1), we found that most of them ended with A or U, except UUG. This is not unique to the M. shiluensis cp genome as similar findings have been observed in Papaver rhoeas and P. orientale [36], Ageratina adenophora [43], and Oryza sativa [44]. Of the PCGs in the M. shiluensis cp genome, 106 possible sites for RNA editing were detected (Table S1). The majority of the amino acid conversion was from serine to leucine, and the ndhB gene accounted for a high number of editable sites (14 of the total 106 sites). Similar results have been obtained for Forsythia suspensa [45] and Sanionia uncinata [46].
Repeat sequences play an important role in genomic structural variation, expansion, and rearrangement [8,40]. Previous research has indicated that most of the repeat sequences are located in the IGS regions followed by the coding regions [47,48]. A similar result was found in this study, with 46.9% of repeats detected in the IGS regions, followed by 34.7% in the coding regions, and the remainder in the tRNAs (Table S2). The cpSSR is an effective marker [49,50] that is widely used in population genetics, biogeographic studies, and phylogenetic evaluation [51,52]. In the cp genome of M. shiluensis, over 80% of the SSRs consisted of A or T bases, and over 80% were mononucleotide repeats. Similar results have been observed in other studies [48,50,53]. The majority of SSRs are found in the SSC and LSC regions [50] and, in this respect, M. shiluensis is no exception ( Table 5). These two regions accounted for 90.07% of the SSRs, and only seven SSRs were found in the IR region.
Although the cp genome of angiosperms is relatively conserved in structure and size [54,55], the expansion and contraction of the IR region, as caused by evolutionary events, has resulted in minor changes in the IR boundary and size of the genome [39,56], thus increasing the chloroplast genetic diversity of angiosperms [57,58]. In this study, comparative analysis of five Magnoliaceae species revealed that the IR lengths were similar in M. shiluensis, M. odora, and M. laevifolia ( Figure 5). However, the IR region of M. insignis had completely lost 11 bp in the rps12-trnV, rrn23-rrn4. 5 IGSs, while M. cathcartii had lost 5 bp in the rpl2 intron, 6 bp in the rps12 intron, 26 bp in ycf1, and 41 bp in the trnN-ndhF IGS. The losses of these bases resulted in differences in the lengths of the IR regions among the five species.
DNA barcoding is a technique that is widely applied in plant identification studies [59,60]. However, only a few regions have been used for the DNA barcoding of Magnoliaceae [61][62][63]. We Forests 2020, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/forests 14 used mVISTA to compare the genomes of five Magnoliaceae species and revealed that the IR region is more highly conserved than the LSC and SSC regions, and that the coding region was more highly conserved than the noncoding region (Figure 4), consistent with other angiosperms [9,38]. Five regions in the M. shiluensis cp genome had high levels of variation (four on IGSs and one on a PCG). The Pi value was also investigated among the 79 PCGs and 125 IGSs (Figure 7), and only 20 regions were found to have a Pi value greater than 0.005; which confirmed the low base substitution rate in Magnoliaceae [64]. Regions with a high degree of variation can be used to develop high resolution DNA barcoding for identification.
Due to the high morphological similarity among Magnoliaceae species [18,65], there have been some difficulties with respect to the classification of the family. Thus, the classification of Magnoliaceae has always been controversial [66][67][68][69][70][71][72]. The cp genome contains sufficient information and has been shown to be more effective than cpDNA fragments for clarifying low level phylogenetic relationships in plants [53,73]. In this study, the phylogenetic results of 28 Magnoliaceae plants based on PCG sequences revealed that M. shiluensis is most closely related to M. odora (Figure 8), which is consistent with phylogenetic results based on the ndhF sequence [70]. According to traditional morphological classification, M. insignis has been placed in the subgenera Manglietia, and M. alba has been placed in the section Michelia [69,74]. However, the phylogenetic relationship results based on the cp genome show that M. insignis is located in the section Michelia clade and M. alba is located in the subgenera Yulania clade. This result differs from that of traditional morphological classification [69,74] and the results of three nuclear gene sequences [75]. These findings confirm that not even a complete cp genome can distinguish species in young evolutionary lineages, and that phylogenetic conclusions may require consideration of certain features in the nuclear genome [76].

Conclusions
The complete cp genome provided by this study can be used for in-depth genetic research on M. shiluensis and Magnoliaceae species in general, and may also play an important role in the development of new conservation and management strategies to ultimately aid species conservation efforts.
Supplementary Materials: Supplementary materials can be found at www.mdpi.com/xxx/s1. Figure S1, Amino acid frequency among 79 protein-coding genes in the Michelia shiluensis chloroplast genome; Table S1, Possible RNA editing sites in the chloroplast genome of Michelia shiluensis; Table S2, Repeats in the chloroplast genome of Michelia shiluensis; Figure S2, Different lengths of repeats in the Michelia shiluensis cp genome; Table S3, Single sequence repeats in the chloroplast genome of Michelia shiluensis.