Complete Chloroplast Genome Sequence of Triosteum sinuatum, Insights into Comparative Chloroplast Genomics, Divergence Time Estimation and Phylogenetic Relationships among Dipsacales

Triosteum himalayanum, Triosteum pinnatifidum (Triosteum L., Caprifoliaceae, Dipsacales) are widely distributed in China while Triosteum sinuatum mainly occurrs in northeast China. Few reports have been determined on the genus Triosteum. In the present research, we sequenced 2 chloroplast genomes of Triosteum and analyzed 18 chloroplast genomes, trying to explore the sequence variations and phylogeny of genus Triosteum in the order Dipsacales. The chloroplast genomes of the genus Triosteum ranged from 154,579 bp to 157,178 bp, consisting of 132 genes (86 protein-coding genes, 38 transfer RNA genes, and 8 ribosomal RNA genes). Comparative analyses and phylogenetic analysis supported the division of Dipsacales into two clades, Adoxaceae and six other families. Among the six families, a clade of Valerianaceae+Dipsacaceae was recovered as a sister to a clade of Morinaceae+Linnaeaceae. A closer relationship of T. himalayanum and T. pinnatifidum among three species was revealed. Our research supported that Lonicera ferdinandi and Triosteum was closely related. Zabelia had a closer relationship with Linnaea borealis and Dipelta than Morinaceae. The divergence between T. sinuatum and two other species in Triosteum was dated to 13.4 mya.


Introduction
Chloroplasts (cp) are involved in photosynthesis and provide energy for green plants [1,2]. The double-stranded and circular chloroplast genome is composed of two short inverted repeat regions (IRa and IRb) which are separated by a large single copy (LSC) region and a small single copy (SSC) region [3][4][5]. The sequence of chloroplast genome is highly conserved and similar in the order of Dipsacales. Due to its short length, conserved structure and type of genes, together with few repeat regions, the complete chloroplast genome can provide abundant information for species phylogeny and taxonomy. Hence, the complete chloroplast genome is widely used in plant phylogeny and evolution study [6][7][8][9][10].
However, the phylogentic evolution of Triosteum within Dipsacales was poorly invested. In this study, we focused on three species of the genus Triosteum L., including T. himalayanum, T. pinnatifidum, T. sinuatum. These perennial herbs are widely distributed in China as well in Japan and Russia (T. pinnatifidum, T. sinuatum) [15,16]. Studies of chemical constituents have been conducted on T. himalayanum and T. pinnatifidum. Additionally, few evolutionary relationships between Triosteum and other genera have been revealed [17][18][19][20]. Gould and Donoghue (2000) used sequences of ITS and GBSSI to reveal the biogeographic pattern of T. himalayanum and T. pinnatifidum in China and T. sinuatum in Japan. According to the results of the previous studies, Triosteum was close to Lonicera in order Dipsacales. Based on the phylogeographic structure of T. himalayanum localities, the phylogeny of haplotypes and palaeodistributional reconstruction, both westwards and northwards expansion of this species was postulated [15]. For T. pinnatifidum, the genetic structure, unimodal mismatch distribution, star-like network, together with demographic history analysis supported the "dispersal into the Qinghai-Tibetan Plateau" hypothesis [21].
Here, the whole chloroplast (cp) genome sequences of T. pinnatifidum, T. sinuatum and Linnaea borealis were assembled and annotated. The cp genome of T. sinuatum was utilized for the first time for the phylogenetic relationship and tracing of evolutionary lineage in Triosteum. A total of 20 chloroplast genome sequences of Dipsacales were generated, and six sequences from Apiales and Lamiales were used as outgroup taxa. Seventeen cp genome sequences in Dipsacales were downloaded from NCBI database. The sequence variations and detailed interrelationship of 20 species in the order Dipsacales were explored. The present study was carried out to identify the variation of hotspot regions in chloroplast genomes, analyze the simple sequence repeat (SSR) diversity, and calculate relative synonymous codon usage (RSCU) frequency. The phylogenetic relationship between Triosteum and other lineages in the order Dipsacales was reconstructed, and divergence time was estimated based on cp genome.

Sampling, DNA Isolation, Sequencing and Assembly
Fresh leaves of T. pinnatifidum were collected in Tibet. Leaves of Linnaea borealis and T. sinuatum were collected in Jilin Province. All the specimens were tagged, dried, poisoned and pasted on standard herbarium sheets. The voucher specimens are deposited in the Herbarium of Northwest Institute of Plateau Biology (HNWP), Xining, Qinghai, China for future reference. The genomic DNA of T. pinnatifidum was extracted from fresh leaves using the cetyltrimethyl ammonium bromide (CTAB) method (Doyle, 1987) with some modifications [22]. The Illumina sequencing of T. sinuatum and L. borealis was carried out by the Genepioneer Biotechnologies (Nanjing, China). Assembly was performed with SPAdes v3.10.1, SPACE v2.0 and Gapfiller v2.1.1 [23]. We rearrange the corrected pseudo genome in order and obtained a complete circular sequence. In addition, 23 complete sequences of chloroplast genomes were selected and downloaded from GeneBank after sequences check, comprising 17 of Dipsacales, 3 of Apiales and 3 of Lamiales (Table 1). Abbreviations: GC, GC content.

Genome Structure and Comparative Analysis
Structure and gene contents among the plastomes were manually compared with Notepad++ v7.5.9. Alignments of 20 complete chloroplast genome sequences were visually compared using the Shuffle-LAGAN Model of mVISTA [27], which uses T. sinuatum as a reference. Boundaries between IR and SC regions of these species were also visualized with IRscope (https://irscope.shinyapps.io/irapp/ (accessed on 14 April 2021)) [28].

Distribution of Simple Sequence Repeats (SSRs)
Chloroplast simple sequence repeat (cpSSR) was widely used to study the phylogeny of plants and genetic structure for its high variability. MISA_web (https://webblast. ipk-gatersleben.de/misa/ (accessed on 24 January 2021)) was used to detect the SSRs in T. sinuatum with the following parameters: 10 repeat units for mononucleotide SSRs, 5 repeat units for dinucleotide SSRs, 4 repeat units each for trinucleotide, and 3 repeat units each for tetra-, penta-, and hexanucleotide SSRs [29].

Codon Usage Pattern
We used the CodonW software (John Peden, http://www.molbiol.ox.ac.uk/cu (accessed on 24 May 2021), version 1.4.2) to calculate the number of amino acid codon in chloroplast genomes of these twenty species [30]. The FASTA files containing of 20 chloroplast genome sequences were downloaded from NCBI before RSCU analysis. The results were recorded by column graph with R language after running program.

Phylogenetic Analysis
To reveal the phylogenetic position of the genus Triosteum, a total of 26 species were selected. Out of 26 species, the ingroup contains 20 species that belong to the order Dipsacales. Clustalw v2.1 was used for genome alignment of all species. Iqtree v1.6.12 was used to find the best substitution model and constructed maximum likelihood (ML) tree with the use of GTR+F+G4 model [31,32]. The visualization of the ML tree was completed with Figtree v1.4.4 (http://tree.bio.ed.ac.uk/software/fgtree (accessed on 23 March 2022)).

Divergence Time Estimation
BEAST v1.8.0 was used to estimate the divergence time of the genus Triosteum, the and GTP+G+I model was chosen as a substitution model [33]. The analysis was calculated on the basis of two fossils of the genus Dipelta and the genus Viburnum. Tree prior was determined as the Yule process and the clock model was a lognormal relaxed clock model. The three calibration points were the genus Dipelta with lognormal prior (mean = 36 Ma, SD = 1.0), the genus Viburnum with lognormal prior (mean = 89.3 Ma, SD = 1.0) and the order Dipsacales with normal prior (mean = 103 Ma, SD = 1.0). A total of 10,000,000 generations were run, and the Tracer was used to determine the correctness. Finally, with 25% burn-in, a maximum clade credibility tree was conducted using TreeAnnotator [33].

Plastome Features and Gene Content
Genome maps of T. sinuatum in Dipsacales were drawn with ODDRAW ( Figure 1). The chloroplast genome showed a typical double-stranded circular structure, length of T. himalayanum, T. pinnatifidum and T. sinuatum was 154,579 bp, 154,896 bp and 157,178 bp, respectively. Each chloroplast genome was composed of a large single copy region (89,007-90,758 bp) and a small single copy region (18,656-120,543 bp) separated by two inverted repeat regions (22,673-23,882 bp). Protein-coding genes orf42, orf188 and tRNA gene trnP-GGG were lost in T. pinnatifidum. In particular, trnP-UGG could not be found in T. himalayanum.
Twenty complete chloroplast genomes were different in lengths, ranging from 151,267 bp (Patrinia scabra) to 161,576 bp (Linnaea borealis) ( Table 1). Except for Linnaea borealis, the lengths of the other 19 sequences were less than 160 kb, which mainly reflected in the LSC and IR regions. The LSC length in Dipsacales ranged from 86,340 bp (Adoxa moschatellina) to 90,750 bp (Acanthocalyx alba), and the IR region ranged from 22,673 bp (T. pinnatifidum) to 29,210 bp (Linnaea borealis). GC contents of the complete chloroplast genomes are similar, which are approximately 38%. The structure characteristics are similar to the chloroplast genomes of most angiosperms [34][35][36]. Most of plastomes encoded about 132 genes, including 86 protein coding genes, 38 tRNA genes, and 8 rRNA genes (Table 2). lhbA only existed as a protein coding gene in Adoxaceae.

Comparative Analyses
The sequence of T. sinuatum was used as a reference to assess similarity and differences among 20 species of Dipsacales. The alignment of sequences was visualized with an mVISTA plot ( Figure S1). Alignments of 20 species showed that the whole chloroplast genome is highly conserved. The sequences in the single copy regions were more divergent than those in inverted repeat regions. In addition, more variation was found in non-coding regions than in coding regions. The results of the present study are in accordance with previous research by Cheng et al., 2020, Alzahrani et al., 2021and Wang et al., 2021. The sequences of T. sinuatum, T. himalayanum and T. pinnatifidum showed high similarity but exhibited marked differences in the 133k region and the region of accD. There was greater differentiation where five plastid genomes lost the gene ndhF (Adoxa moschatellina, Sinadoxa corydalifolia, Sambucus williamsii, Viburnum odoratissimum and Viburnum dilatatum). This may considered be as evidence that the genera Sambucus and Viburnum should be placed in Adoxaceae from Caprifoliaceae.
The sizes of chloroplast genomes in Dipsacales species are similar but different for the main factors, the expansion and contraction of inverted repeat regions [39][40][41]. The border structure of 19 plastid genomes in the order Dipsacales was compared to analyse how expansion and contraction in IR regions affect the length of chloroplast genome. Detailed comparisons of LSC, SSC and inverted repeat regions are shown in Figure 2. The rpl23 gene of 12 species was located in IRb, extending into the LSC region by about 43-179 bp. By contrast, for Weigela florida and Triplostegia glandulifera, the rpl23 gene was found in the LSC region. Additionally, the IR/SC boundary in three plastomes (Sambucus williamsii, Viburnum dilatatum and Viburnum odoratissimum) was located on rps19. The trnN gene of 18 species excluding T. pinnatifidum and Weigela florida resided in the IRb and IRa region. In addition, the ndhF gene was positioned within the SSC region similarly. The nad5 gene appeared only on the SSC region of Sambucus williamsii and Viburnum odoratissimum, which was 36 bp and 95 bp from the boundary between the IRb/SSC. On the other hand, ycf2 was mainly found in the IRa/LSC region, while the trnH gene was inserted into the fragment distributed in the LSC region. The ycf1 genes of 14 species are mainly located in the SSC region. However, in Linnaea borealis and the family Adoxaceae (except Adoxa moschatellina), the ycf1 gene expanded to IRa region for 5033 bp to 5730 bp.

Codon Usage Pattern
Codon usage preference is a common phenomenon in nature, mainly determined by the dynamic balance of gene mutation and natural selection [47]. In addition, it is also related to gene coding structure and function, gene expression and other factors in the

Codon Usage Pattern
Codon usage preference is a common phenomenon in nature, mainly determined by the dynamic balance of gene mutation and natural selection [47]. In addition, it is also related to gene coding structure and function, gene expression and other factors in the evolution process [37]. Natural selection usually causes organisms to prefer to use optimal codons, and mutation will lead to the existence of some non-optional codons. Different species or different genes of the same species have different preferences for codon use after long-term evolution [48,49]. The codon W was used to calculate the relative synonymous codon usage (RSCU) of 20 species. Additionally, the frequency of used codons for amino acids is presented in Figure S3. A codon with an RSCU value more than 1 (RSCU > 1) was considered as a preferred codon because its investigated usage frequency is more than the expected one [50]. AGA is the most frequently used codon (RSCU > 1) in the chloroplast genomes among 20 species. Meanwhile, the lowest frequency codon is CGC (RSCU < 0.51), which encodes the same amino acid (Arginine) as AGA. There is difference among 20 species, but high conservatism was showed in codon usage. In addition, the analysis of the preference of target gene and receptor genomic codon by bioinformatics is an important supplement to phylogenetic analysis.

Phylogenetic Relationships
The phylogenetic relationship between Triosteum and other genera in the order Dipsacales was constructed on the base of cp genomes of 20 ingroup and 6 outgroup species (Apiales and Lamiales) using Maximum likelihood method ( Figure 3). The current study presents highly resolved phylogenies of Dipsacales based on complete plastome sampling of all families in Dipsacales. The present study strongly supports the previous research which also divided 20 Dipsacales species into two caldes [4,7,[51][52][53]. One clade is Adoxaceae, and the other clade consisted of six major lineages with a highly supported topology of (Diervillaceae, (Caprifoliaceae, ((Valerianaceae, Dipsacaceae), (Morinaceae, Linnaeaceae)))). Moreover, our results indicate that the family Dipsacaceae (Triplostegia glandulifera, Dipsacus japonicus) is a sister to the family Valerianaceae. The family Morinaceae was placed as a sister to the family Linnaeaceae. However, in our study, Zabelia dielsii showed close relationships with Linnaea and Dipelta with high support rather than with Morinaceae, which is in conflict with a previous study [4,51]. According to the ML tree, three species of Triosteum clustered in one clade with a 100% bootstrap value. Among the three species of  [11,54]. Here, complete chloroplast genome provided a stronger support. Lonicera ferdinandi and the three species in the genus Triosteum clustered into a clade, suggesting a very close phylogenetic relationship between the genus Lonicera and the genus Triosteum. Phylogenetic analysis in our study served as a baseline for further phylogeny studies on Triosteum and Dipsacales.

Divergence Time Estimation
The result of divergence time estimation of 26 species based on chloroplast sequences is shown in Figure 4. The divergence times estimated in our study are more recent than those  [4,7]. The age of the crown node of Dipsacales was estimated to be upper Cretaceous, approximately 95.6 million years ago (mya). Morinaceae, Dipsacaceae, Valerianaceae, Linnaeaceae and Caprifoliaceae separated from Diervillaceae at about 51.6 mya. The separation between Caprifoliaceae and the clade Morinaceae+Dipsacaceae+Valerianaceae+Linnaeaceae occurred at 51.1 mya. The divergence between Morinaceae and Linnaeaceae was dated to 41.9 mya in the Eocene, which was later than that of the previous study by Wang et al. (2019, in the upper Cretaceous) [4]. These differences between current and previous studies are likely due to the influence of the amount of data and species sampling. The speciation of most of the species investigated here occurred after the beginning of Miocene. Triosteum and Lonicera separated at the time 20.1 mya. The divergence between T. sinuatum and other two Triosteum species was dated to 13.4 mya. The divergence between T. himalayanum and T. pinnatifidum was dated back to 7.6 mya, suggesting that these two species experienced divergence for a long time.
1 Figure 4. Divergence times analysis based on the chloroplast plastome.

Conclusions
In this study, we reported the complete chloroplast genome of Triosteum of the order Dipsacales based on the Illumina platform. The full length of three species in Triosteum ranged from 154,513 to 157,178 bp. For 20 species in the order Dipsacales, our study investigated the variation of sequences and SSRs, compared genome structure, evaluated Codon usage bias, reconstructed phylogenetic relationships and estimated divergence times. A high similarity of sequences was shown within 20 Dipsacales species, but sequences were significantly different in 133k region and region of accD. Within Dipsacales, two major clades were clearly defined: Adoxaceae and another six families with high support. T. pinnatifidum and T. himalayanum are supposed to have a closer relationship than T. sinuatum in the genus. The divergence between T. sinuatum and other two Triosteum species was dated to 13.4 mya. The diversification of Dipsacales occurred at about 95.6 mya in the Cretaceous. The present research is the first documented report in China that assembled the choloplast sequence of T. sinuatum, contributing to further study. We provided stronger support for future studies on molecular identification and the better understanding of phylogeny in Triosteum and Dipsacales.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/genes13050933/s1, Figure S1: The chloroplast genome of 26 species were compared using the mVISTA program; Figure S2: Analysis of simple sequence repeat (SSR) in twenty-six cp genomes; Figure S3: Codon content of 20 amino acid and stop codons in all proteincoding genes of the twenty cp genomes. Data Availability Statement: The chloroplast genome sequences of Triosteum sinuatum were submitted to GenBank (MW526077).

Conflicts of Interest:
The authors declare no conflict of interest.