Characterization of the Complete Chloroplast Genomes of Buddleja colvilei and B. sessilifolia: Implications for the Taxonomy of Buddleja L.

Buddleja colvilei Hook.f. & Thomson (Scrophulariaceae) is a threatened alpine plant with a distribution throughout the Himalayas, also used as an ornamental plant. The name Buddleja sessilifolia B.S. Sun ex S.Y. Pao was assigned in 1983 to a plant distributed throughout the Gaoligong Mountains, but the name was later placed in synonymy with B. colvilei in the Flora of China. In this study we sequenced the complete chloroplast (cp) genomes of two individuals of B. colvilei and three individuals of B. sessilifolia from across the range. Both molecular and morphological analysis support the revision of B. sessilifolia. The phylogenetic analysis constructed with the whole cp genomes, the large single-copy regions (LSC), small single-copy regions (SSC), inverted repeat (IR) and the nuclear genes 18S/ITS1/5.8S/ITS2/28S all supported B. sessilifolia as a distinct species. Additionally, coalescence-based species delimitation methods (bGMYC, bPTP) using the whole chloroplast datasets also supported B. sessilifolia as a distinct species. The results suggest that the B. sessilifolia lineage was early diverging among the Asian Buddleja species. Overall gene contents were similar and gene arrangements were found to be highly conserved in the two species, however, fixed differences were found between the two species. A total of 474 single nucleotide polymorphisms (SNPs) were identified between the two species. The Principal Coordinate Analysis of the morphological characters resolved two groups and supported B. sessilifolia as a distinct species. Discrimination of B. colvilei and B. sessilifolia using morphological characters and the redescription of B. sessilifolia are detailed here.


Introduction
The Himalayan region is a center of diversity for the genus Buddleja L. (Scrophulariaceae) [1], harboring 75% of the Asian Buddleja species (18 of 24 species).Many species in the genus Buddleja are famous as ornamentals [2,3].Indeed, Buddleja colvilei Hook.f.& Thomson, when discovered by Hooker in 1849, was described by him as "the handsomest of all Himalayan shrubs" [4,5].Buddleja colvilei is a shrub or small tree [6,7], and as an ornamental plant, it was awarded the Royal Horticultural Society (RHS) First Class Certificate and the outstanding excellence for exhibition for the high ornamental value [4].It is endemic to the eastern Himalayas, and has a distribution across altitudes of 1600-4200 m, in Nepal, India (Sikkim), Bhutan, and China (Tibet, Yunnan) [6].In China, it is only found in Yadong, Tibet and the Gaoligong Mountains in Yunnan.The China Biodiversity Red List has evaluated B. colvilei as Vulnerable (VU) [8], although the species has not yet been assessed for the IUCN Red List.
Molecules 2018, 23, x FOR PEER REVIEW 2 of 20 Horticultural Society (RHS) First Class Certificate and the outstanding excellence for exhibition for the high ornamental value [4].It is endemic to the eastern Himalayas, and has a distribution across altitudes of 1600-4200 m, in Nepal, India (Sikkim), Bhutan, and China (Tibet, Yunnan) [6].In China, it is only found in Yadong, Tibet and the Gaoligong Mountains in Yunnan.The China Biodiversity Red List has evaluated B. colvilei as Vulnerable (VU) [8], although the species has not yet been assessed for the IUCN Red List.Species taxonomic designations are important to conservation, as species delimitations are the basis upon which conservation priorities are determined [9].The taxonomy of this Buddleja is interesting, as our previous investigations found that B. colvilei plants from the western part of the range (East Nepal and Tibet) and the eastern part (Gaoligong Mountains) are distinct from each other in several ways: they are morphologically distinct from each other (Figure 1) in the sizes of the plants (small trees vs. subshrubs, Figure 1b); the color and size of the corolla (Figure 1c); the indumentum of the leaves, fruits and outside of the corolla (Figure 1d-g); the appearance and presence of the petiole on the leaves (Figure 1g); and the plants occupy different habitats (dry open areas and river banks (Figure 1a).The type specimen of B. sessilifolia was collected by T.T. Yu in 1938 [10] from the populations in the Gaoligong Mountains.In 1982 Li examined the same specimen and identified it as a new distribution record of B. colvilei in China [11].However, ignoring the identification of Li, the Flora Yunnanica reported a new species based on this specimen in 1983, Buddleja sessilifolia B.S. Sun ex S.Y.Pao.After B. sessilifolia was published as a new species, the authors of the Flora of China (FOC) considered it morphologically identical with B. colvilei and published an article in 1988 sinking the name [12], and B. sessilifolia was synonymized with B. colvilei in FOC in 1992 [13].These taxonomic designations were based on only a single specimen (the type of B. sessilifolia), which was in fruit, and had not been tested with molecular data.Moreover, although the fruits of these species are of a similar size, on closer examination we found that the fruits of "B.sessilifolia" are in fact glabrous, while the fruits of B. colvilei are covered with densely stellate tomentose hairs.Furthermore, because the species "Buddleja sessilifolia" published in the Flora Yunnanica was based on examination of the type specimen only, there are several mistakes or deficiencies in the description: the sizes of leaves and inflorescences were not accurate, the length and width of the corolla tube were not detailed (which are typically shorter than B. colvilei), the color of the corolla was not described, and the phenology description was not accurate.
Cytological research suggests that there are several ploidy levels occurring in this species, including 8x, 16x and in some instances extremely high ploidy levels of 24x [14,15].The flowers and fruits of B. colvilei are larger than the typical butterfly bush (B.davidii) and other Buddlejas, which may be a result of the high ploidy.Polyploidy is thought to indirectly promote adaptation in alpine plants to higher elevations, enabling more rapid adaptation for niche shifts in the allopatric ranges [16][17][18].The different morphological types from different habitats maybe a result of different ploidy levels adapted to the high altitude in the Himalaya, or to vicariance of different species during the uplift of the Himalayas.Thus, the taxonomy of B. colvilei is worth studying.Additionally, B. colvilei is an ideal species in which to conduct studies of alpine reproduction, adaptation to alpine environments, the study of speciation and evolution of Buddleja, and providing breeding stock for the development of new cultivars.
Owing to the lack of recombination, the low mutation rates, and uniparental inheritance on most occasions, the chloroplast genome is excellent for the study of plant speciation and evolution, especially when studying polyploids, as homeologous recombination, paralogy, and aneuploidy can make the use of nuclear genes problematic [19][20][21].The chloroplast (cp) genome has been widely used for investigating phylogenetic relationships and discovering molecular markers for use in DNA barcoding to identify and discriminate between plant species, and in the discovering of new species [22][23][24].
To date, this study represents the first complete chloroplast genome study for species delimitation in Buddleja.Five individuals were collected from across the Himalayas, from five different populations, including three populations from the Gaoligong Mountains that were once recognized as B. sessilifolia.We sought to determine the complete chloroplast genome sequence of plants from each of these five populations, to clarify the taxonomic status of "B.sessilifolia" (three populations of the five) through phylogenetic analysis and the coalescence-based species delimitation methods (bGMYC, bPTP), to describe the structure of the complete chloroplast genome, and give a detailed taxonomic treatment of B. sessilifolia based on morphological and molecular characters.The results will provide taxonomic clarification of these Buddleja species and improve our understanding of Buddleja speciation and evolution.

Phylogenetic Analysis and Species Delimitation
Previous phylogenetic study suggests that B. colvilei forms a sister group to B. asiatica + B. bhutanica, and that these three species together form a sister group to the rest of Asian Buddleja species [25].We therefore chose B. asiatica as reference to determine the taxonomic status of B. sessilifolia.To study the phylogenetic position of B. colvilei within the Scrophulariaceae family, whole cp genomes of B. asiatica, a further four cp genomes from three species of Scrophularia and 75 individuals of 72 species from related families and subfamilies were selected for analysis (Tables S1 and S2).Phylogenetic analysis revealed that all individuals from the Scrophulariaceae formed a monophyletic clade, as did the Buddleja species (Figure 2).

Phylogenetic Analysis and Species Delimitation
Previous phylogenetic study suggests that B. colvilei forms a sister group to B. asiatica + B. bhutanica, and that these three species together form a sister group to the rest of Asian Buddleja species [25].We therefore chose B. asiatica as reference to determine the taxonomic status of B. sessilifolia.To study the phylogenetic position of B. colvilei within the Scrophulariaceae family, whole cp genomes of B. asiatica, a further four cp genomes from three species of Scrophularia and 75 individuals of 72 species from related families and subfamilies were selected for analysis (Tables S1 and S2).Phylogenetic analysis revealed that all individuals from the Scrophulariaceae formed a monophyletic clade, as did the Buddleja species (Figure 2).The analysis strongly supported B. sessilifolia as a distinct species, with the individuals from the three Buddleja sessilifolia populations resolved in a clade that was sister to Buddleja asiatica + B. colvilei (100% bootstrap support), rather than with B. colvilei.Moreover, the individuals from all the B. colvilei populations resolved as sisters to B. asiatica, with a bootstrap value of 100%.The evolutionary relationships resolved in this study are consistent with previous research [25][26][27] but not with previous taxonomic descriptions [6,[11][12][13].We used the large single-copy region (LSC), small single-copy region (SSC), inverted repeat (IR) and the nuclear genes 18S/ITS1/5.8S/ITS2/28Sseparately to study the phylogenetic position of B. colvilei and B. sessilifolia with respect to B. asiatica and three Scrophularia species.All the phylogenetic data from our study showed that the three individuals of B. sessilifolia resolved in a clade that was a sister group to the clade of B. colvilei + B. asiatica, and their phylogenetic positions remained stable.The highly conserved chloroplast genome structures therefore indicated that B. sessilifolia and B. colvilei are distinct species (Figure 3).The analysis strongly supported B. sessilifolia as a distinct species, with the individuals from the three Buddleja sessilifolia populations resolved in a clade that was sister to Buddleja asiatica + B. colvilei (100% bootstrap support), rather than with B. colvilei.Moreover, the individuals from all the B. colvilei populations resolved as sisters to B. asiatica, with a bootstrap value of 100%.The evolutionary relationships resolved in this study are consistent with previous research [25][26][27] but not with previous taxonomic descriptions [6,[11][12][13].We used the large single-copy region (LSC), small single-copy region (SSC), inverted repeat (IR) and the nuclear genes 18S/ITS1/5.8S/ITS2/28Sseparately to study the phylogenetic position of B. colvilei and B. sessilifolia with respect to B. asiatica and three Scrophularia species.All the phylogenetic data from our study showed that the three individuals of B. sessilifolia resolved in a clade that was a sister group to the clade of B. colvilei + B. asiatica, and their phylogenetic positions remained stable.The highly conserved chloroplast genome structures therefore indicated that B. sessilifolia and B. colvilei are distinct species (Figure 3).The coalescence-based species delimitation methods (bGMYC, bPTP) implemented with the whole chloroplast datasets resolved six species (a-f).They were confirmed as segregated entities with the probability threshold >0.5, and the B. sessilifolia was also supported as a distinct species with a divergence time around 14.67 Ma (95% Highest Posterior Density, HPD: 12.19-17.15Ma), earlier than those of B. asiatica and B. colvilei (Figure 4).Our results indicated that the B. sessilifolia lineage is early diverging among the rest of the Asian Buddleja species, with a divergence time earlier than those of B. The coalescence-based species delimitation methods (bGMYC, bPTP) implemented with the whole chloroplast datasets resolved six species (a-f).They were confirmed as segregated entities with the probability threshold >0.5, and the B. sessilifolia was also supported as a distinct species with a divergence time around 14.67 Ma (95% Highest Posterior Density, HPD: 12.19-17.15Ma), earlier than those of B. asiatica and B. colvilei (Figure 4).Our results indicated that the B. sessilifolia lineage is early diverging among the rest of the Asian Buddleja species, with a divergence time earlier than those of B. colvilei and B. asiatica, which were previously considered as the basal groups of Asian Buddleja [25].The divergence of the three Buddleja species may be related to the vicariance during the uplift of the Himalayas during the orogeny in the Miocene [28].Furthermore, the divergent of populations of B. sessilifolia may be related to the uplift of the Gaoligong Mountains, which occurred during or after the Late Pliocene [29].
The conservation status of Buddleja colvilei has been assessed as VU [8], and based on the results of our study and information from herbarium specimens, including specimens from the Kunming Institute of Botany, CAS (KUN) and Chinese Virtual Herbarium (http://www.cvh.ac.cn)B. sessilifolia should be regarded as endangered.As most of the range of both B. colvilei and B. sessilifolia is located at or near the boundaries of several countries, trans-boundary conservation of these two species should be conducted in the future.Furthermore, the mechanism by which B. colvilei became polyploid, its adaption to the alpine environments, and observed pollinator shifts from birds to bumblebees in Buddleja are all worth studying in greater detail.
colvilei and B. asiatica, which were previously considered as the basal groups of Asian Buddleja [25].The divergence of the three Buddleja species may be related to the vicariance during the uplift of the Himalayas during the orogeny in the Miocene [28].Furthermore, the divergent of populations of B. sessilifolia may be related to the uplift of the Gaoligong Mountains, which occurred during or after the Late Pliocene [29].
The conservation status of Buddleja colvilei has been assessed as VU [8], and based on the results of our study and information from herbarium specimens, including specimens from the Kunming Institute of Botany, CAS (KUN) and Chinese Virtual Herbarium (http://www.cvh.ac.cn)B. sessilifolia should be regarded as endangered.As most of the range of both B. colvilei and B. sessilifolia is located at or near the boundaries of several countries, trans-boundary conservation of these two species should be conducted in the future.Furthermore, the mechanism by which B. colvilei became polyploid, its adaption to the alpine environments, and observed pollinator shifts from birds to bumblebees in Buddleja are all worth studying in greater detail.

Genome Organization, Features and the Comparisons between B. colvilei and B. sessilifolia cp Genomes
The complete chloroplast genomes of both individuals of B. colvilei and the three individuals of B. sessilifolia were found to have a total length ranging from 154,202 bp to 154,710 bp.Both B. colvilei and B. sessilifolia had a quadripartite structure like most land plants, consisting of two inverted repeats (IRs) separating the LSC region and the SSC region (Table 1 and Figure 5).In total, the LSC regions accounted for 85,179 to 85,687 bp (55.24 to 55.39%), SSC regions accounted for 17,905 to 17,920 bp (11.57 to 11.62%) and IR regions accounted for 25,553 to 25,559 bp (16.52 to 16.57%).Total GC content was found to be from 38.07 to 38.11%.Like most plants [30,31], the GC content is unevenly distributed throughout the cp genomes of the Buddleja genus.In B. colvilei, the IRs had the highest GC content (43.23%), the LSC region had 36.19% and the SSC region had 32.20% GC content.Similar results were found in B. sessilifolia (Table 1).The previous studies indicated that the high GC content in the IR regions is likely to have been caused by the high percentages of GC nucleotides found in the four rRNA genes rrn4.5, rrn5, rrn16, and rrn23 [31,32].This phenomenon is very commonly found in other plants [24,[31][32][33].

Genome Organization, Features and the Comparisons between B. colvilei and B. sessilifolia cp Genomes
The complete chloroplast genomes of both individuals of B. colvilei and the three individuals of B. sessilifolia were found to have a total length ranging from 154,202 bp to 154,710 bp.Both B. colvilei and B. sessilifolia had a quadripartite structure like most land plants, consisting of two inverted repeats (IRs) separating the LSC region and the SSC region (Table 1 and Figure 5).In total, the LSC regions accounted for 85,179 to 85,687 bp (55.24 to 55.39%), SSC regions accounted for 17,905 to 17,920 bp (11.57 to 11.62%) and IR regions accounted for 25,553 to 25,559 bp (16.52 to 16.57%).Total GC content was found to be from 38.07 to 38.11%.Like most plants [30,31], the GC content is unevenly distributed throughout the cp genomes of the Buddleja genus.In B. colvilei, the IRs had the highest GC content (43.23%), the LSC region had 36.19% and the SSC region had 32.20% GC content.Similar results were found in B. sessilifolia (Table 1).The previous studies indicated that the high GC content in the IR regions is likely to have been caused by the high percentages of GC nucleotides found in the four rRNA genes rrn4.5, rrn5, rrn16, and rrn23 [31,32].This phenomenon is very commonly found in other plants [24,[31][32][33].The chloroplast genomes of B. colvilei and B. sessilifolia contained a total of 135 genes, with a 136th gene in GJ11 identified as a relocated segment of rpl2.All unique genes were arranged in the same order in both species, eighteen of which are duplicated in the IR regions.Among these unique genes, there were 80 protein-coding genes, 37 transfer genes and four rRNA genes (Table 2).Thirteen genes contained a single intron (comprising eight protein-coding and five tRNA genes) and three encoded two introns (ycf3, clpP and rps12).The ycf1 gene was found to be located between the IRa and the SSC regions.The ycf1 gene had premature stop codons in the coding sequence, and has been annotated as a pseudogene here and in other angiosperm chloroplast genomes [34,35].In addition, the pseudogene ycf 15 was also identified.The overall sequence variation in the cp genomes of the five individuals was plotted using the mVISTA program.The results suggest that the organization of the cp genomes of the two Buddleja species is highly conserved (Figure 6).However, certain dissimilarities were identified between species, occurring more frequently in the noncoding regions than the coding regions.Most divergence was found between species, with GJ 2 and GJ 3 (the B. colvilei individuals) being more similar to each other than to GJ 9, GJ 10 and GJ 11 (the B. sessilifolia individuals).
In the cp genome, the IR/LSC boundaries are not static, but are subject to dynamic and random processes that allow expansions and contractions [32], which are important evolutionary events and have measurable influence on the size of the chloroplast genomes [36].The IR junctions of most plastid genomes have been similarly described using the map points JLB (IRb /LSC), JSB (IRb/SSC), JSA (SSC/IRa) and JLA (IRa/LSC) [37].A detailed comparison of the junctions of the IR/SSC boundaries within the five Buddleja plastid genomes is presented in Figure 7.The length of the IR regions was consistent within species, and in B. sessilifolia this region was 5 bp longer than in B. colvilei.The IRb/SSC boundary was located between the ycf 1 pseudogene and trnN-GUU in all the sampled cp genomes studied here.The gene trnN-GUU is found at the IR/SSC boundary in most species, and it is considered as the ancestral IR/SSC endpoint [38].There was expansion of the IR regions in the sample GJ11 into the extra rpl2 gene segment and rpl2 at the IRa/LSC, while in other individuals the IR regions expanded into the rps19 and rpl2 at the IRa/LSC.The relocation of an extra segment of rpl2 at the IRa/LSC boundary of GJ11, made the cp genome in this individual approximately 500 bp longer than those of other individuals.Previous studies have reported that the expansion of IR ranges from several bp to several kb in plants [38,39], results from the relocations of single genes or multiple genes [38][39][40][41] which are assumed to be an important mechanism of molecular evolution in chloroplasts [39,41].The expansion and contraction of rpl2 has been detected numerous in land plants [38,[42][43][44][45].The rpl2 gene is located in the IR at the IR/LSC boundary and present in two copies in most plants [43].It was confirmed that precise excision of rpl2 genes were resulted from the lack of intron, which has led to a model of loss that invokes RNA-mediated recombination as a causation mechanism [43,44].However, the mechanism of the expansion of rpl2 remains unclear.In our analyses (whole cp genomes, three Cp data partitions (SSC, LSC and IR), and the nuclear genes 18S/ITS1/5.8S/ITS2/28S), the expansion at the IRa/LSC boundary of GJ11 did not lead to any change in the phylogenetic position of GJ11.This is probably because the sequence of the remaining cp genomes of the three B. sessilifolia individuals is highly conserved.The pseudogene ycf 1 was completely contained within the IR region, and both JSB and JSA junctions share synapomorphic structural features; a ycf 1 pseudogene of 4,472 bp size spans JSB, and a truncated ycf 1 pseudogene of 912 or 933 bp spans JSA.
A total of 474 single nucleotide polymorphisms (SNPs) were identified in our data as being informative in distinguishing between B. colvilei and B. sessilifolia, suggesting that B. sessilifolia should be regarded as a distinct species from B. colvilei.Most SNPs detected were in the LSC region, accounting for 71.73% of the total SNPs, and of these, 129 SNPs were located in coding regions, and 211 were located in non-coding regions.A further 23.84% were located in the SSC region, with 41 SNPs occurring in coding regions, and 72 in non-coding regions.The IR regions were more conserved than other regions, containing the fewest SNPs (4.43%), with nine occurring in coding regions and 12 in non-coding regions (Figure 8).Similar results from other land plants have been obtained by other studies [45], and there is evidence that the conserved or reduced substitution rate in the IR is due to the copy-dependent repair mechanism of the duplication [46].Moreover, we found that, of all the genes, the ycf 1 gene contains the most SNPs, indicating high divergence in this gene.The ccsA gene and the protein-coding gene psbA also contain more than ten SNPs each.The regions from 67.2 to 67.8 kb, 6.1 to 6.9 kb, and the ndhI-ndhG intergenic spacer region also contain numerous SNPs (Figure 8).These genes and regions could be used as potential barcoding sites for the further identification of and discrimination between Buddleja species.

Morphological Study and Taxonomic Treatment
A previous phylogenetic study indicated that B. asiatica and B. bhutanica were the most closely related species to B. colvilei [25].In contrast to the large flowers of B. colvilei, in which the corolla tube is normally about 20 mm, B. asiatica bears almost the shortest flowers of all the Asian Buddleja species, with a corolla tube of 3-6 mm.Furthermore, the length of the capsule of B. colvilei is triple that of the 3-5 mm long capsules from B. asiatica [11].Compared to other Asian Buddleja species, B. asiatica has narrower leaves, narrower inflorescences and narrower inflructescences (Figure S1), and is thus easy to identify.Buddleja bhutanica is very similar to B. asiatica but distinguished by its connate-perfoliate leaves [7].As B. asiatica and B. bhutanica are easy to distinguish from B. colvilei and B. sessilifolia, in this study we concentrated on the morphological differences between B. colvilei and B. sessilifolia, and our taxonomic treatment is focused on these two species.
We used five morphological characters (length of styles, length and width of fruits, and hairs on both ovary and capsules) measured from plants from two populations of each species, with the sixth morphological character length of corolla tube measured from specimens of B.

Morphological Study and Taxonomic Treatment
A previous phylogenetic study indicated that B. asiatica and B. bhutanica were the most closely related species to B. colvilei [25].In contrast to the large flowers of B. colvilei, in which the corolla tube is normally about 20 mm, B. asiatica bears almost the shortest flowers of all the Asian Buddleja species, with a corolla tube of 3-6 mm.Furthermore, the length of the capsule of B. colvilei is triple that of the 3-5 mm long capsules from B. asiatica [11].Compared to other Asian Buddleja species, B. asiatica has narrower leaves, narrower inflorescences and narrower inflructescences (Figure S1), and is thus easy to identify.Buddleja bhutanica is very similar to B. asiatica but distinguished by its connate-perfoliate leaves [7].As B. asiatica and B. bhutanica are easy to distinguish from B. colvilei and B. sessilifolia, in this study we concentrated on the morphological differences between B. colvilei and B. sessilifolia, and our taxonomic treatment is focused on these two species.
Phenology: -Flowering from June to August, fruiting from September to October.Taxonomic affinities: -Buddleja sessilifolia can be easily differentiated from B. colvilei as they are procumbent subshrubs normally below two meters in size, with the quadrangular glabrous branches, the absence of a petiole on the leaves, the white to pinkish-purple corolla that is yellow inside and less than 2 cm in length, as well as the glabrous calyx, ovary and capsules.Comparison of the characteristics of these closely related species is detailed in Table 3. Phenology: -Flowering from June to August, fruiting from September to October.Taxonomic affinities: -Buddleja sessilifolia can be easily differentiated from B. colvilei as they are procumbent subshrubs normally below two meters in size, with the quadrangular glabrous branches, the absence of a petiole on the leaves, the white to pinkish-purple corolla that is yellow inside and less than 2 cm in length, as well as the glabrous calyx, ovary and capsules.Comparison of the characteristics of these closely related species is detailed in Table 3.The inside of the corolla of B. sessilifolia is yellow, which has hypothesized to be a nectar guide.Interestingly, unpublished data shows that the main pigments of the floral nectar guide of B. sessilifolia contained several diterpenes (non-cyclic crocetin gentiobiose esters), instead of flavonoids [47].These terpenoids have a conjugated double bond which can reflect UV light, whereas flavonoids can absorb UV light.Bees, flies, and butterflies are known to be very sensitive to UV light [48,49], and we suggest that the yellow nectar guide of B. sessilifolia may be a significant attractant for bumblebee pollinators.However, the purple to wine red floral color without nectar guides in B. colvilei could attract bird pollinators, a hypothesis supported by oral reports from Nepali residents: Kanxi and Tamang, from Goruwale, Mechi, Nepal.

DNA Extraction and Sequencing
The B. colvilei (GJ2, GJ3) and B. sessilifolia (GJ9, GJ10, GJ11) plant samples were collected from Goruwale, Mechi, Nepal and Yadong, Tibet, China, B. sessilifolia plant samples were collected from the Gaoligong mountain, Yunnan, China, and the Gaoligong Mountain, Myanmar.Table S2 gives details of the collections.Total genomic DNA was extracted from 100 mg fresh leaves using a modified CTAB method [50].The complete cp genome sequencing was performed on the Illumina MiSeq 2000 (Illumina Inc, San Diego, CA, USA), at the Laboratory of Molecular Biology of Germplasm Bank of Wild Species in Southwest China, following the method of Yang [51].The Illumina raw sequence reads were edited using the NGS QC Tool Kit v2.3.3 [52], with a cut-off value of 80 and 30 respectively for percentage of read length and PHRED quality score respectively.High-quality reads were assembled into contigs using the de novo assembler SPAdes 3.9.0[53], using a k-mer set of 93, 105, 117, 121.The de novo contigs were assembled into complete chloroplast genomes by further connection using NOVOPlasty version 2.6.2 [54].

Complete cp Genome Based Phylogenetic Analysis
Total cp genomes from 85 individuals (Tables S1 and S2) of 78 related higher plants species and Buddleja asiatica (GJ1) were included in our analyses to resolve topology within the Scrophulariaceae (data not shown).The complete genomes reported for other species were downloaded from the NCBI GenBank database (Table S1) and were aligned using HomBlocks [55] with the Gblocks method [56], resulting in 47,802 aligned characters.Phylogenetic analyses were carried out using NJ (MEGA6 [57]), Bayesian (MrBayes v3.2.5 [58]) and maximum-likelihood (RA × ML version 8.1.12[59]) methods.Ten independent ML searches were conducted, and the branch support was determined by computing 1000 non-parametric bootstrap replicates.

Cp Data Partition
The cp genomes were divided into the large single copy (LSC) regions and small single copy (SSC) regions, and the invert repeat regions (IR).We used these partitioned data sets in our neighbor-joining (NJ), Bayesian and maximum-likelihood (ML) analyses respectively for the ten Scrophulariaceae cp genomes.Both maximum likelihood and Bayesian analyses were used to verify the topology indicated in the tree derived from 85 whole cp genomes.These three-part alignments were constructed in MAFFT v7 [60] and trimmed using trimAl 1.2rev59 [61].The length of aligned LSC, SSC and IR regions were 81,378 bp, 17,160 bp and 25,361 bp, respectively.

Maximum Likelihood
ML analyses were performed with RA × ML version 8.1.12[59] coupled with raxml-ng (https://github.com/amkozlov/raxml-ng)for its application of other substitution model, because RA × ML version 8.1.12only uses the invariable (general time reversible) GTR model.The ML trees were inferred with the combined rapid bootstrap (1000 replicates).

Bayesian Analyses
Bayesian inference analyses (BI) were performed in MrBayes v3.2.5 [58].The following settings were applied: 10,000,000 number of Markov chain Monte Carlo (MCMC) generations, including four chains each (three heated, heat parameter = default) with a sampling frequency of 1000 generations, each chain starting with a random tree.The first 25% of trees from all runs were discarded as burn-in and the remaining trees were used to construct a majority-rule consensus tree.The robustness of the resultant BI tree was evaluated using bootstrap probabilities.In the partitioned dataset and 18S-ITS1-5.8S-ITS2-28Soperon data, genus Scrophularia was set as the outgroup.

NJ Method
For the NJ analyses, all trees were built under Maximum Composite Likelihood method with Gamma distribution rates (parameter = 4) using MEGA6 [57].Robustness of nodes was assessed with 1000 NJ-bootstrap replicates using the Maximum Composite Likelihood method under Gamma Parameter 4. Trees were visualized in Figtree v1.4.2 [63].
For the bGMYC analysis, 10,000 random ultrametric trees of the whole chloroplast dataset obtained from the posterior distribution of the BEAST analysis outputs were sampled as an input to integrate over the uncertainty of tree topology.BEAST v2.2.0 [67] was used to generate the chronogram tree, assuming a Bayesian relaxed-clock model.Molecular rates were allowed to vary around an average value among lineages, by imposing an uncorrelated lognormal clock of evolutionary rates.The Yule process prior was applied in this analysis.Fossil-derived timescales calibration assigned to Scrophulariaceae was obtained from TIMETREE [68,69], with the uppermost limit of the time interval set as a minimum hard bound (mean = 44) that includes the entire geological interval (24-64 MYA).The MCMC chain was run in two separate analyses for 10-million posterior iterations sampling every 1000 posterior iterations.The initial 20% iterations of each analysis were discarded as burn-in before combining.The effective sample sizes (ESS) was assessed by using the Tracer v1.6 (http://beast.bio.ed.ac.uk/Tracer) with values greater than 200 considered as indicating optimal convergence and tree likelihood stationarity.A maximum clade credibility (MCC) tree was constructed in TreeAnnotator v1.8.0 [67] depicting the maximum sum of posterior clade probabilities.The MCC tree was visualized in FigTree v1.4.2 [63].In addition, the R package "bGMYC" [65] was used to conduct the bGMYC analysis, and 10,000,000 MCMC generations, a thinning of 1000 generations, and 20% burn-in.Starting parameters were set according to the default settings.For the bPTP algorithm, 10,000 random post-burn-in trees from the posterior distribution of Bayesian analysis were used to shed light on species boundaries.

Chloroplast Genome Annotation and Comparisons
The complete cp genomes were annotated with the identification of introns and exons using the online annotation tool DOGMA [70] and plann 1.1 [71].The positions of start and stop codons and boundaries between introns and exons were investigated according to the published cp genome of S. takesimensis (KM590983.1).The annotated GenBank files were used to draw the circular chloroplast genome maps using OrganellarGenomeDRAW 1.2 [72].The online program mVISTA program [73] was employed in the LAGAN mode to detect the variation within the chloroplast genomes.The cp genome of B. colvilei GJ2 was used as a reference.The boundaries between IR and SC regions of these species were also compared and analyzed using the self-by-self comparison from the online visualization tool Circoletto [74] and the sequence visualization modules from UGENE version 1.29.0 [75].We used Mauve version 2.4.0 [76] to compare the genomes, to filter out the IRb across the five Buddleja individuals and to identify the single nucleotide polymorphisms (SNPs).Divergent frequencies of SNPs between species were calculated manually.

Morphological Study and Taxonomy Treatment
Measurements and morphological character assessments of B. sessilifolia and B. colvilei were taken from both herbarium specimens (obtained from JSTOR Global Plants http://plants.jstor.org,Chinese Virtual Herbarium http://www.cvh.ac.cn,Herbarium, Kunming Institute of Botany, CAS and National Herbarium & Plant Laboratories Government of Nepal, for details of specimens see Table S3), and from field observations of living plants.Morphological measurements were made from two populations for each species (Table S2), from at least 30 individuals (or all the individuals in populations with fewer than 30 individuals) in each population.Principal coordinate analysis (PCoA) was carried out using PAST (Paleontological Statistics) software, version 3.04 [77] to study the correlation of measurements between two populations for each of the two species, as well as the relationships among the four populations.Six morphological characters were used in the PCoA, including length of styles, length and width of fruits, length of corolla tube, and hairs on both ovary and capsules (assigned the value 0 or 1 to represent the absence or presence of hairs).A total of 100 measurements from 15-20 individuals in each population were made, plant heights were made from all the individuals (from 20 to 160, see Table S2), 100 measurements of the length of corolla tube of B. colvilei were made from the specimens.

Figure 2 .
Figure 2. Molecular phylogenetic tree based on the complete cp genome of several representative species in the Scrophulariaceae family and related families and subfamilies.The tree was constructed using Bayesian and maximum likelihood (ML) analyses.Bootstrap values/Bayesian posterior

Figure 2 .
Figure 2. Molecular phylogenetic tree based on the complete cp genome of several representative species in the Scrophulariaceae family and related families and subfamilies.The tree was constructed using Bayesian and maximum likelihood (ML) analyses.Bootstrap values/Bayesian posterior probabilities (≥95%) are shown above branches.Purple dots mark nodes where both bootstrap values and Bayesian posterior probabilities are 100%.

Figure 3 .
Figure 3. Molecular phylogenetic tree based on four partitioned datasets of the Scrophulariaceae family.(a) Phylogenetic tree built using LSC; (b) Phylogenetic tree built using SSC; (c) Phylogenetic tree built using IR; (d) Phylogenetic tree built using 18S/ITS1/5.8S/ITS2/28S.Trees were inferred using neighbor-joining (NJ), Bayesian and maximum likelihood (ML) analyses respectively.Bootstrap values of NJ/ML and Bayesian posterior probabilities (≥95%) are shown above branches.Red dots indicated nodes where bootstrap values of NJ, MLand Bayesian posterior probabilities are all 100%.

Figure 3 .
Figure 3. Molecular phylogenetic tree based on four partitioned datasets of the Scrophulariaceae family.(a) Phylogenetic tree built using LSC; (b) Phylogenetic tree built using SSC; (c) Phylogenetic tree built using IR; (d) Phylogenetic tree built using 18S/ITS1/5.8S/ITS2/28S.Trees were inferred using neighbor-joining (NJ), Bayesian and maximum likelihood (ML) analyses respectively.Bootstrap values of NJ/ML and Bayesian posterior probabilities (≥95%) are shown above branches.Red dots indicated nodes where bootstrap values of NJ, MLand Bayesian posterior probabilities are all 100%.

Figure 4 .
Figure 4. BEAST chronogram of the Scrophulariaceae family based on the complete cp genomes.Columns on the right represent the results of species delimitation from two different tests.Geologic timescale was obtained from TIMETREE, time is shown in millions of years.

Figure 4 .
Figure 4. BEAST chronogram of the Scrophulariaceae family based on the complete cp genomes.Columns on the right represent the results of species delimitation from two different tests.Geologic timescale was obtained from TIMETREE, time is shown in millions of years.

Figure 5 .
Figure 5. Gene map of Buddleja colvilei and B. sessilifolia with Buddleja colvilei (GJ2) as reference.Genes drawn inside the circle are transcribed clockwise, and outside of the outer layer circle are transcribed counterclockwise.The colored bars indicate known protein-coding genes, tRNA, and rRNA.The dashed darker gray area in the inner circle denotes GC content, and the lighter gray area indicates AT content.LSC (Large Single Copy) region; SSC (Small-Single Copy) region, and IR (Inverted Repeat).

Figure 5 .
Figure 5. Gene map of Buddleja colvilei and B. sessilifolia with Buddleja colvilei (GJ2) as reference.Genes drawn inside the circle are transcribed clockwise, and outside of the outer layer circle are transcribed counterclockwise.The colored bars indicate known protein-coding genes, tRNA, and rRNA.The dashed darker gray area in the inner circle denotes GC content, and the lighter gray area indicates AT content.LSC (Large Single Copy) region; SSC (Small-Single Copy) region, and IR (Inverted Repeat).

Figure 6 .
Figure 6.Sequence alignment of chloroplast genomes from five individuals of two Buddleja species, Buddleja colvilei (GJ2) used as reference, by using a 50% identity cutoff.Gray arrows show direction and position of each gene.The colored areas indicate exons, rRNA, Non-coding regions, and tRNA.The Y-axis represents the percentage identity between 50-100%.

Figure 7 .
Figure 7.Comparison of the borders of the LSC, SSC, and IR regions from the cp genomes of five individuals from two Buddleja species.* Indicates a gene segment.Ψ Indicates a pseudogene.The figure is not drawn to scale.

Figure 6 . 20 Figure 6 .
Figure 6.Sequence alignment of chloroplast genomes from five individuals of two Buddleja species, Buddleja colvilei (GJ2) used as reference, by using a 50% identity cutoff.Gray arrows show direction and position of each gene.The colored areas indicate exons, rRNA, Non-coding regions, and tRNA.The Y-axis represents the percentage identity between 50-100%.

Figure 7 .
Figure 7.Comparison of the borders of the LSC, SSC, and IR regions from the cp genomes of five individuals from two Buddleja species.* Indicates a gene segment.Ψ Indicates a pseudogene.The figure is not drawn to scale.

Figure 7 .
Figure 7.Comparison of the borders of the LSC, SSC, and IR regions from the cp genomes of five individuals from two Buddleja species.* Indicates a gene segment.Ψ Indicates a pseudogene.The figure is not drawn to scale.

Figure 9 .
Figure 9. Principal Coordinate Analysis using six morphological characters measured in two species.The symbols represent the four populations: filled squares (K), rings (DZ), empty triangles (YD), filled triangles (GO), and star (Specimens).

Figure 9 .
Figure 9. Principal Coordinate Analysis using six morphological characters measured in two species.The symbols represent the four populations: filled squares (K), rings (DZ), empty triangles (YD), filled triangles (GO), and star (Specimens).

Table 2 .
List of genes present in the cp genomes of Buddleja colvilei and B. sessilifolia.