Complete Chloroplast Genome Sequence and Phylogenetic Analysis of Aster tataricus

We sequenced and analyzed the complete chloroplast genome of Aster tataricus (family Asteraceae), a Chinese herb used medicinally to relieve coughs and reduce sputum. The A. tataricus chloroplast genome was 152,992 bp in size, and harbored a pair of inverted repeat regions (IRa and IRb, each 24,850 bp) divided into a large single-copy (LSC, 84,698 bp) and a small single-copy (SSC, 18,250 bp) region. Our annotation revealed that the A. tataricus chloroplast genome contained 115 genes, including 81 protein-coding genes, 4 ribosomal RNA genes, and 30 transfer RNA genes. In addition, 70 simple sequence repeats (SSRs) were detected in the A. tataricus chloroplast genome, including mononucleotides (36), dinucleotides (1), trinucleotides (23), tetranucleotides (1), pentanucleotides (8), and hexanucleotides (1). Comparative chloroplast genome analysis of three Aster species indicated that a higher similarity was preserved in the IR regions than in the LSC and SSC regions, and that the differences in the degree of preservation were slighter between A. tataricus and A. altaicus than between A. tataricus and A. spathulifolius. Phylogenetic analysis revealed that A. tataricus was more closely related to A. altaicus than to A. spathulifolius. Our findings offer valuable information for future research on Aster species identification and selective breeding.


Introduction
Aster tataricus is a tall perennial herb of the genus Aster (family Asteraceae). It has been used as a therapeutic traditional medicine for eliminating phlegm and relieving cough for thousands of years [1,2], and cultivated as a high-value medicinal plant. A number of bioactive compounds have

Features of A. tataricus cpDNA
The complete cp genome sequence of A. tataricus was 152,992 bp (GenBank accession number MH669275). The structure of the A. tataricus cp genome was analogous to those from other Aster species [28], and included an LSC region (84,698 bp; covering 55.4%), an SSC region (18,250 bp; covering 11.9%), and a pair of inverted repeats (IRA/IRB, 25,022 bp; covering 16.4%) ( Table 1). The content of DNA G + C in the LSC, SSC, and IR regions, and the whole genome, was 35.2%, 31.3%, 43%, and 37.3%, respectively. The DNA G + C content is a very significant indicator when evaluating species affinity, and the cpDNA G + C content of A. tataricus is identical to that of other Aster species [28]. The DNA G + C content of the IR regions in A. tataricus was greater than that of other regions (LSC, SSC); this phenomenon is very common in other plants too [16,17]. The relatively high DNA G + C content of the IR regions is generally attributable to the rRNA genes and tRNA genes [29][30][31].  In the A. tataricus cp genome, 115 functional genes were observed, including four rRNA genes, 30 tRNA genes, and 81 protein-coding genes (Table 2). Furthermore, 18 genes-seven tRNA, all four rRNA, and seven protein-coding genes-were repeated in the IR regions ( Figure 1). The LSC region contained 62 protein-coding and 22 tRNA genes, while the SSC region comprised one tRNA gene and 12 protein-coding genes.
The sequences of the tRNA and protein-coding genes were studied, and the frequency of codon usage was inferred and summarized for the A. tataricus chloroplast genome. Our study revealed that 23,441 codons characterize the coding capacity of 81 protein-coding and 30 tRNA genes in A. tataricus (Table 3). Of these codons, 4065 (17.34%) were found to code for leucine and 204 (0.87%) for tryptophan, which represented the maximum and minimum prevalent number of amino acids in the A. tataricus chloroplast genome, respectively. A-and U-ending codons were ordinary.  There were 18 intron-containing genes in total: 12 protein-coding genes and six tRNA genes ( Table 4). Fifteen genes (nine protein-coding and six tRNA genes) comprised one intron, and two genes (ycf3 and clpP) comprised two introns ( Table 4). The intron of the trnK-UUU gene contains the matK gene, and the size of the intron was 2497 bp. The rps12 gene was a trans-spliced gene, with the 5 end located in the LSC region and the copied 3 end located in the IR regions. Earlier studies have reported that ycf3 is essential for the constant accumulation of the photosystem I compound [32]. Therefore, we suppose that the intron gain in ycf3 of A. tataricus may be valuable information in terms of further studies of the mechanism of photosynthesis evolution. We compared the length of exons and introns in genes with introns in the A. tataricus and A. spathulifolius chloroplast genomes (Table 4). While these were found to be broadly similar, some differences were noted: (1) in the A. spathulifolius chloroplast genome, the rpl16 gene had no intron; (2) there was significant variation in rps12 and rps16 intron length between the two species; and (3) in A. spathulifolius, the rpl12 gene had no intron. Advances in phylogenetic research have revealed that chloroplast genome evolution encompasses both structural changes and nucleotide substitutions [33][34][35]. A few examples of these changes, including intron or gene losses [21,36], have been discovered in chloroplast genomes. Introns play an important role in regulating gene expression. They can increase gene expression at a particular position and at a specific time [37]. Intron regulation mechanisms have been reported in other species. More experimental work is required to study the relationship between intron loss and gene expression introns in A. tataricus.  Exon and intron lengths in genes with introns in the A. tataricus chloroplast genome (gray background), and in the A. spathulifolius chloroplast genome (normal background) * The rps12 gene is a trans-spliced gene with the 5 end located in the LSC region and the duplicated 3 ends located in the IR regions.

Simple Sequence Repeat (SSR) Analysis
Simple sequence repeats (SSRs) of 10 bp or longer are inclined toward slipped-strand mispairing, which is known to be the main mutational mechanism utilized in SSR polymorphisms. SSRs in the chloroplast genome can be extremely variable at the intra-specific level and are often used as genetic markers in population genetics and evolutionary studies [38][39][40][41]. In this research, we investigated the SSRs in the chloroplast genome of A. tataricus and in that of two other Aster species (Figure 2). The cp genome of Aster tataricus, Aster altaicus, and Aster spathulifolius contained 70, 58, and 36 SSRs, respectively. The level of mononucleotide repeat content was high (A. tataricus, 51.4%; A. altaicus, 62.1%; A. spathulifolius, 77.8%) in all the above species. These results will provide chloroplast SSR markers that can be used to study genetics, select germplasm for breeding, and facilitate the molecular identification of species.

Comparative Chloroplast Genomic Analysis
Comparative analysis of genomes is a tremendously important step in genomics [42,43]. Comparing the structural changes amongst Aster chloroplast genomes revealed that the chloroplast genome A. spathulifolius was the smallest of the three whole Aster chloroplast genomes (Table 1). A. spathulifolius had the fewest IR regions (17,973 bp) among these sequenced Aster chloroplast genomes. We supposed that the dissimilar length of the IR regions was the principal reason for the change in sequence length. To explicate the level of genome differences, the sequence identity of the Aster chloroplast DNAs was computed using mVISTA software, with A. tataricus as a reference (Figure 3). The results of this comparison showed that the IR (A/B) regions exhibited fewer differences than the LSC and SSC regions. Moreover, the non-coding regions showed more variability than the coding regions, and the marked differences in regions among the three chloroplast genomes were evident in the intergenic spacers. Of the three Aster chloroplast genomes, A. tataricus and A. altaicus exhibited the fewest differences. respectively. The level of mononucleotide repeat content was high (A. tataricus, 51.4%; A. altaicus, 62.1%; A. spathulifolius, 77.8%) in all the above species. These results will provide chloroplast SSR markers that can be used to study genetics, select germplasm for breeding, and facilitate the molecular identification of species.

Comparative Chloroplast Genomic Analysis
Comparative analysis of genomes is a tremendously important step in genomics [42,43]. Comparing the structural changes amongst Aster chloroplast genomes revealed that the chloroplast genome A. spathulifolius was the smallest of the three whole Aster chloroplast genomes (Table 1). A. spathulifolius had the fewest IR regions (17,973 bp) among these sequenced Aster chloroplast genomes. We supposed that the dissimilar length of the IR regions was the principal reason for the change in sequence length. To explicate the level of genome differences, the sequence identity of the Aster chloroplast DNAs was computed using mVISTA software, with A. tataricus as a reference (Figure 3). The results of this comparison showed that the IR (A/B) regions exhibited fewer differences than the LSC and SSC regions. Moreover, the non-coding regions showed more variability than the coding regions, and the marked differences in regions among the three chloroplast genomes were evident in the intergenic spacers. Of the three Aster chloroplast genomes, A. tataricus and A. altaicus exhibited the fewest differences.

Inverted Repeat (IR) Contraction and Expansion in the A. tataricus Chloroplast Genome
Contractions and expansions of the IR regions at the borders are ordinary evolutionary events

Inverted Repeat (IR) Contraction and Expansion in the A. tataricus Chloroplast Genome
Contractions and expansions of the IR regions at the borders are ordinary evolutionary events and represent the main reasons for changes in the size of chloroplast genomes; they play a significant role in evolution [44][45][46]. For A. altaicus, A. spathulifolius and A. tataricus, we conducted an exhaustive comparison of four junctions, LSC-IRA (JLA), LSC-IRB (JLB), SSC-IRA (JSA), and SSC-IRB (JSB), between the two IRs (IRA and IRB) and the two single-copy regions (LSC and SSC) (Figure 4). The JSA junction was placed in the ycf1 pseudogene region in all the Aster species chloroplast genomes and outspread to different lengths (A. altaicus, 563 bp; A. spathulifolius, 608 bp; A. tataricus, 563 bp) within the IRA region of all the genomes; the IRB region contained 563, 567, and 565 bp of the ycf1 gene, respectively. Recently, it was reported that ycf1 is required for plant viability and codes Tic214, a significant component of the ArabidopsisTic complex member [47,48]. Correspondingly, the trnH gene was placed in the LSC region, 3, 22, and 3 bp away from the IRB/LSC border in the three Aster chloroplast genomes, respectively. The JLA in the Aster species was overlapped by rps19. The ndhF gene was found to be 22, 5, and 22 bp away from the IRA/SSC border in the Aster species.
Although the gene order in chloroplasts is generally conserved in most green plants, it has been reported that many sequences are rearranged in chloroplast genomes from an extensive variety of different plant species, including inversions in the LSC region, IR contractions or expansions with inversions, and re-inversion in the SSC region [49][50][51][52][53]. Sequence rearrangements that convert chloroplast genome structure in related species may also reveal information about genetic diversity that could be used for molecular classification and evolution studies.

Phylogenetic Analysis
The availability of a completed A. tataricus cp genome provided us with sequence information that can be used to study the phylogeny of A. tataricus among Asteraceae. We performed multiple sequence alignments using the whole cp genome sequences in 16 Asteraceae species. One additional cp genome, Paeonia ostii (Paeoniaceae), was included as an outgroup ( Figure 5). The method of maximum likelihood (ML) was used to construct a phylogenetic tree. The results strongly supported the finding that A. altaicus and A. tataricus are sister species, and A. tataricus is closer to A. altaicus than to A. spathulifolius.

Phylogenetic Analysis
The availability of a completed A. tataricus cp genome provided us with sequence information that can be used to study the phylogeny of A. tataricus among Asteraceae. We performed multiple sequence alignments using the whole cp genome sequences in 16 Asteraceae species. One additional cp genome, Paeonia ostii (Paeoniaceae), was included as an outgroup ( Figure 5). The method of maximum likelihood (ML) was used to construct a phylogenetic tree. The results strongly supported the finding that A. altaicus and A. tataricus are sister species, and A. tataricus is closer to A. altaicus than to A. spathulifolius. that can be used to study the phylogeny of A. tataricus among Asteraceae. We performed multiple sequence alignments using the whole cp genome sequences in 16 Asteraceae species. One additional cp genome, Paeonia ostii (Paeoniaceae), was included as an outgroup ( Figure 5). The method of maximum likelihood (ML) was used to construct a phylogenetic tree. The results strongly supported the finding that A. altaicus and A. tataricus are sister species, and A. tataricus is closer to A. altaicus than to A. spathulifolius.

DNA Sequencing, Chloroplast Genome Assembly, and Validation
The A. tataricus was planted in the China Academy of Chinese Medical Sciences (N 39 • 56 , E 116 • 25 , Beijing, China). Fresh leaves were gathered and covered with tin foil, frozen in liquid nitrogen, and maintained at −80 • C. An improved cetyltrimethylammonium bromide (CTAB) method was used to obtain the whole genomic DNA of A. tataricus [54]. The concentration of DNA was estimated using an ND-2000 spectrometer (Nanodrop Technologies, Wilmington, DE, USA) [55]. A 250-bp shotgun library was constructed according to the manufacturer's instructions (Vazyme Biotech Co. Ltd., Nanjing, China). The library was sequenced using an Illumina X Ten platform (Illumina, San Diego, CA, USA) double terminal sequencing method (150 pair-ends). The sample contained 5 G of raw data, and over 34 million paired-end reads (SRA accession: SRP154896) were obtained.

Gene Annotation and Sequence Analyses
CpGAVAS [60] was used to annotate the sequences; DOGMA [61] and BLAST were used to check the annotation findings. tRNAscanSEv1.21 [62], with default settings, was used to identify all tRNA genes. OGDRAWv1.2 [63] was used to show the structural features of the chloroplast genomes. Relative synonymous codon usage (RSCU) values were defined using MEGA5.2 [64].

Genome Comparison
mVISTA [65] (Shuffle-LAGAN mode) was used to compare the whole chloroplast genome of A. tataricus, A. altaicus (KX352465), and A. spathulifolius (KF279514), with the annotation of A. tataricus as the reference. Phobos version 3.3.12 [66] was employed to detect SSRs within the cp genome, with the search parameters set at 10 repeat units for mononucleotides, _8 repeat units for dinucleotides,_4 repeat units for trinucleotides and tetranucleotides, and _3 repeat units for pentanucleotide and hexanucleotide SSRs.

Phylogenetic Analysis
We downloaded 16 whole chloroplast genome sequences of Asteraceae species from the National Center for Biotechnology Information (NCBI) Organelle Genome and Nucleotide Resources database. The whole chloroplast genome sequences were used to analyze the phylogenetics. The software clustalw2 (The Conway Institute of Biomolecular and Biomedical Research, Dublin, Ireland) was used to align sequences. MEGA5.2 was used to analyze and plot the phylogenetic tree with ML (maximum likelihood). We used 1000 replicates and TBR (tree bisection and reconnection) branch exchange to complete the bootstrap analysis. Furthermore, Paeonia ostii was set as the outgroup.

Conclusions
To our knowledge, we were the first to complete the sequencing and analysis of the whole chloroplast genome of A. tataricus, showing that the quadruple structure, gene order, DNA G + C content, and codon usage features were similar to those of the other Aster chloroplast genomes studied. Compared with the chloroplast genomes of the other two Aster species, the chloroplast genome of A. tataricus was the largest, while the genome structure and composition were found to be similar. Of the three Aster chloroplast genomes, A. tataricus and A. altaicus exhibited the fewest differences. Examination of the phylogenetic relationships among the three Aster species revealed that A. tataricus was more closely related to A. altaicus than to A. spathulifolius. The findings of this study offer an assembly of a whole chloroplast genome of A. tataricus, which would be valuable for molecular identification, breeding, and further biological discoveries.