Complete Chloroplast Genomes and Phylogenetic Relationships of Bougainvillea spectabilis and Bougainvillea glabra (Nyctaginaceae)

Bougainvillea L. (Nyctaginaceae) is a South American native woody flowering shrub of high ornamental, economic, and medicinal value which is susceptible to cold damage. We sequenced the complete chloroplast (cp) genome of B. glabra and B. spectabilis, two morphologically similar Bougainvillea species differing in cold resistance. Both genomes showed a typical quadripartite structure consisting of one large single-copy region, one small single-copy region, and two inverted repeat regions. The cp genome size of B. glabra and B. spectabilis was 154,520 and 154,542 bp, respectively, with 131 genes, including 86 protein-coding, 37 transfer RNA, and 8 ribosomal RNA genes. In addition, the genomes contained 270 and 271 simple sequence repeats, respectively, with mononucleotide repeats being the most abundant. Eight highly variable sites (psbN, psbJ, rpoA, rpl22, psaI, trnG-UCC, ndhF, and ycf1) with high nucleotide diversity were identified as potential molecular markers. Phylogenetic analysis revealed a close relationship between B. glabra and B. spectabilis. These findings not only contribute to understanding the mechanism by which the cp genome responds to low-temperature stress in Bougainvillea and elucidating the evolutionary characteristics and phylogenetic relationships among Bougainvillea species, but also provide important evidence for the accurate identification and breeding of superior cold-tolerant Bougainvillea cultivars.


Introduction
Species of the genus Bougainvillea L., belonging to the family Nyctaginaceae, are of high horticultural ornamental value. They are tropical and subtropical woody vines characterized by vibrant bracts, a long flowering period, and high stress tolerance, making them ideal ornamental horticultural plants [1]. Recent studies have discovered that Bougainvillea potentially has anti-inflammatory, anticancer, antioxidant, antimicrobial, and antihyperglycemic properties [2][3][4][5][6]. This plant group has attracted widespread attention in horticulture, the pharmaceutical industry, and environmental research [7]. Bougainvillea is native to Peru, southern Argentina, and Brazil in South America, but is widely cultivated as landscape plants in other warm climate regions such as the Pacific Islands, Southeast Asia, the Mediterranean, Australia, and the Caribbean Islands [8]. The genus comprises approximately 18 species, among which Bougainvillea spectabilis Willdenow, Bougainvillea glabra Choisy, and Bougainvillea peruviana Humboldt and Bonpland are native species and serve as breeding materials for major horticultural cultivars [9]. By hybridizing and mutating these three native species and one hybrid species, Bougainvillea x buttiana Holttum & Standley, many modern horticultural cultivars with different colors, shapes, and bract sizes have been developed. Currently, there are more than 400 Bougainvillea cultivars worldwide. However, the frequent hybridization of Bougainvillea species because of commercial demands has led The cp genomes of B. glabra and B. spectabilis were 154,520 and 154,542 base pairs (bp) in length, respectively ( Figure 1). These cp genomes, similar to those of other Bougainvillea species, were covalently closed double-stranded circular molecules with a typical quadripartite structure comprising (i) an LSC with a length of 85,688 and 85,695 bp, respectively, accounting for 55.5% of the total genome length in both species; (ii) an SSC with a length of 18,078 and 18,077 bp, respectively, accounting for 11.7% of the total genome length in both species; and (iii) a pair of IRs separating the SSC and LSC regions, with a size of 25,377 and 25,385 bp, respectively, covering 16.4% of the total genome in both species.
A comparative analysis of the cp genomes of B. glabra and B. spectabilis and four related Bougainvillea species revealed that the cp genome size ranged from 153,966 (B. peruviana) to 154,872 bp (B. spinosa) ( Table 1). Their gene structure, GC content, gene number, mRNA, tRNA, and rRNA were similar, indicating a slow evolution of species within Bougainvillea. The GC content of the cp genomes of B. glabra and B. spectabilis was identical (36.46%). Co-linearity analysis using Mauve software (http://darlinglab.org/mauve, accessed on 24 March 2023) revealed that the structure and gene arrangement sequences of the cp genomes among the six species of Bougainvillea were largely similar, with no evident gene rearrangements or inversions. This indicated a high conservation of cp genome sequences in Bougainvillea species (Figure 2). Genes placed outside the circle are transcribed clockwise, whereas genes inside the circle are transcribed counterclockwise. Gene colors differentiate protein-coding genes based on their respective functions. LSC, large single-copy region; SSC, small single-copy region; IRA and IRB, two inverted repeats; GC content, dark grey area in inner circle; AT content, light grey area in inner circle. Gene colors differentiate protein-coding genes based on their respective functions. LSC, large singlecopy region; SSC, small single-copy region; IRA and IRB, two inverted repeats; GC content, dark grey area in inner circle; AT content, light grey area in inner circle.
A comparative analysis of the cp genomes of B. glabra and B. spectabilis and four related Bougainvillea species revealed that the cp genome size ranged from 153,966 (B. peruviana) to 154,872 bp (B. spinosa) ( Table 1). Their gene structure, GC content, gene number, mRNA, tRNA, and rRNA were similar, indicating a slow evolution of species within Bougainvillea. The GC content of the cp genomes of B. glabra and B. spectabilis was identical (36.46%). Co-linearity analysis using Mauve software (http://darlinglab.org/mauve, accessed on 24 March 2023) revealed that the structure and gene arrangement sequences of the cp genomes among the six species of Bougainvillea were largely similar, with no evident gene rearrangements or inversions. This indicated a high conservation of cp genome sequences in Bougainvillea species (Figure 2).  LSC, large single copy; SSC, small single copy; IR, inverted repeat.

Gene Composition of the cp Genomes
Gene annotation revealed that both the B. glabra and B. spectabilis cp genomes contained 131 genes, including 86 protein-coding, 37 transfer (t)RNA, and 8 ribosomal (r)RNA genes (Tables 1 and 2). These genes could be categorized into four groups: photosynthesis-related genes, self-expression-related genes, other genes, and unknown genes. The types and number of genes in these four categories were identical between the B. glabra and B. spectabilis cp genomes. There were 45 photosynthesis-related genes, including five (ndhA, ndhB, petB, petD, and atpF) with introns. ndhB was present in two copies in the IR regions. There were 74 self-expression-related genes, with one intron each in rpl16, rps16, rpoC1, trnA-UGC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC and two introns

Gene Composition of the cp Genomes
Gene annotation revealed that both the B. glabra and B. spectabilis cp genomes contained 131 genes, including 86 protein-coding, 37 transfer (t)RNA, and 8 ribosomal (r)RNA genes (Tables 1 and 2). These genes could be categorized into four groups: photosynthesisrelated genes, self-expression-related genes, other genes, and unknown genes. The types and number of genes in these four categories were identical between the B. glabra and B. spectabilis cp genomes. There were 45 photosynthesis-related genes, including five (ndhA, ndhB, petB, petD, and atpF) with introns. ndhB was present in two copies in the IR regions. There were 74 self-expression-related genes, with one intron each in rpl16, rps16, rpoC1, trnA-UGC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC and two introns in rps12. Fifteen genes (rpl2, rpl23, rps7, rps12, rrn4.5, rrn5, rrn16, rrn23, trnA-UGC, trnI-CAU, trnI-GAU, trnL-CAA, trnN-GUU, trnR-ACG, and trnV-GAC) were present in two copies in the IR regions. clpP in the 'other genes' category contained two introns. The unknown genes ycf1 and ycf2 were located in the IR regions and existed in two copies, whereas ycf3 contained two introns ( Figure 1, Tables 2, S1 and S2). Except for three introns with different lengths, 5 of 15 the remaining introns had the same length in both species. ndhB had two introns of the same length in each species: 660 bp in B. glabra and 668 bp in B. spectabilis. Additionally, the introns in rps16 and petB of B. glabra were 887 and 777 bp in size, respectively, which were one base pair longer than those in B. spectabilis (Tables S1 and S2). Table 2. Annotated genes and their classification in the cp genomes of B. glabra and B. spectabilis.

Category
Group Genes

Codon Usage
Relative synonymous codon usage (RSCU) was used to assess the usage of synonymous codons in the coding sequences, with a higher RSCU value indicating stronger preference [27]. Statistical analysis of the codon numbers and RSCU in the cp DNA of B. glabra and B. spectabilis revealed that they shared the same number of codons (26,599) and different amino acid types encoded by these codons (21). With respect to the codon numbers encoding other amino acid types, the codon numbers were the same except for lysine (Lys) and asparagine (Asn), which had different codon numbers (1477 and 1296 in B. glabra, 1476 and 1297 in B. spectabilis, respectively). Leucine (Leu) was the most abundant amino acid, with 2800 codons (accounting for 10.53% of the total codons), followed by isoleucine (Ile) with 2317 codons (8.71% of the total). Cysteine (Cys) was the least abundant, with 297 codons (1.12% of the total) ( Table S3). As shown in Figure 3, except for tryptophan (Trp), all amino acids were encoded by two or more synonymous codons, and methionine (Met) was encoded by seven synonymous codons. The preferred synonymous codons (RSCU > 1) mainly ended with A or U, i.e., A/T bases.

Simple Sequence Repeat Analysis
Three types of repetitive sequences were detected in the cp genomes of B. glabra and B. spectabilis: forward repeats, reverse repeats, and palindromic repeats. There were 19 forward repeats, 27 palindromic repeats, and 2 reverse repeats in both species (

Nucleotide Diversity of Genes
Nucleotide diversity (Pi) values for B. glabra and B. spectabilis were calculated using DnaSP software v5.10.1. The results showed that the Pi values in the two cp genomes ranged from 0 to 0.20282, with an average of 0.00401. Eight highly variable regions with Pi > 0.007 were detected. Among them, six were located in the LSC region (psbN, psbJ, rpoA, rpl22, psaI, and trnG-UCC) and two were located in the SSC region (ndhF and ycf1) ( Figure 6).

Analysis of IR Boundary Changes
As shown in Figure 7, the IR boundaries exhibited a high degree of conservation between B. glabra and B. spectabilis. The gene content and expansion extent of the boundary regions were identical. When the cp genomes were compared with those of four other Bougainvillea species, all six species were found to have the same genes at the boundaries, but with slightly different lengths.
The LSC/IRb (JLB) boundaries of all six species were located in the ycf1-coding region. Except for B. spinosa, whose ycf1 gene crossed the JLB boundary by 152 bp, the ycf1 genes of the other five species crossed the JLB boundary by 114 bp. The expansion range of the SSC/IRb (JSB) boundary showed that in all six Bougainvillea species, the JSB boundary was located between ycf1 and ndhF, with slight differences in the extent of expansion. The ycf1 genes of all six species extended 3 bp beyond the boundary into the SSC region. The IRb region was 1371 bp in length in B. glabra and B. spectabilis; 1374 bp in B. praecox, B. pachyphylla, and B. spinosa; and 1335 bp in B. peruviana. ndhF was located near the boundary on the SSC side, and the ndhF genes of all species extended 21 bp beyond the boundary into the IRb region. The expansion range of the SSC/IRa (JSA) boundary showed that the JSB boundary in all six Bougainvillea species was located within ycf1, with trnN on the right side, but with slight differences in the extent of expansion. The expansion range of the LSC/IRa (JLA) boundary showed that the JLA boundary had the same genes, with rpl2 on the left side and trnH on the right side, but with slight differences in the extent of expansion. rpl2 of B. glabra, B. spectabilis, and B. praecox was located 176 bp from the boundary, whereas trnH was located 22 bp from the boundary. rpl2 of B. peruviana and B. pachyphylla was located 177 bp from the boundary, and trnH was located 17 bp from the boundary. rpl2 of B. spinosa was located 219 bp from the boundary, and trnH was located 10 bp from the boundary. Nucleotide diversity (Pi) values for B. glabra and B. spectabilis were calculated using DnaSP software v5.10.1. The results showed that the Pi values in the two cp genomes ranged from 0 to 0.20282, with an average of 0.00401. Eight highly variable regions with Pi > 0.007 were detected. Among them, six were located in the LSC region (psbN, psbJ, rpoA, rpl22, psaI, and trnG-UCC) and two were located in the SSC region (ndhF and ycf1) ( Figure  6). Figure 6. Nucleotide polymorphism analysis of the cp genomes of B. glabra and B. spectabilis. Names of protein-coding genes and genes of the intergenic region are along the X-axis, and the nucleotide diversity (Pi) value in each window is along the Y-axis.

Analysis of IR Boundary Changes
As shown in Figure 7, the IR boundaries exhibited a high degree of conservation between B. glabra and B. spectabilis. The gene content and expansion extent of the boundary regions were identical. When the cp genomes were compared with those of four other Bougainvillea species, all six species were found to have the same genes at the boundaries, but with slightly different lengths.
The LSC/IRb (JLB) boundaries of all six species were located in the ycf1-coding region. Except for B. spinosa, whose ycf1 gene crossed the JLB boundary by 152 bp, the ycf1 genes of the other five species crossed the JLB boundary by 114 bp. The expansion range of the SSC/IRb (JSB) boundary showed that in all six Bougainvillea species, the JSB boundary was located between ycf1 and ndhF, with slight differences in the extent of expansion. The ycf1 genes of all six species extended 3 bp beyond the boundary into the SSC region. The IRb region was 1371 bp in length in B. glabra and B. spectabilis; 1374 bp in B. praecox, B. pachyphylla, and B. spinosa; and 1335 bp in B. peruviana. ndhF was located near the boundary on the SSC side, and the ndhF genes of all species extended 21 bp beyond the boundary into the IRb region. The expansion range of the SSC/IRa (JSA) boundary showed that the JSB boundary in all six Bougainvillea species was located within ycf1, with trnN on the right side, but with slight differences in the extent of expansion. The expansion range of the

Phylogenetic Relationships
To determine the phylogenetic positions and relationships of the cp genomes of B. glabra and B. spectabilis, the two reassembled Bougainvillea cp genomes were compared with the published cp genomes of 15 Caryophyllales species, and a phylogenetic tree was constructed (Figure 8). The results showed high support (>90%) for all branch nodes ex-

Phylogenetic Relationships
To determine the phylogenetic positions and relationships of the cp genomes of B. glabra and B. spectabilis, the two reassembled Bougainvillea cp genomes were compared with the published cp genomes of 15 Caryophyllales species, and a phylogenetic tree was constructed (Figure 8). The results showed high support (>90%) for all branch nodes except one. The outgroup species Buxus microphylla and Pachysandra terminalis formed one branch, whereas the 15 Caryophyllales (including Caryophyllaceae, Amaranthaceae, and Nyctaginaceae) species formed a larger branch, clearly distinct from the outgroup. The branch formed by the two Caryophyllaceae species Silene wilfordii and Silene latifolia (branch A) was sister to the branch formed by the Amaranthaceae species Amaranthus hypochondriacus and Amaranthus caudatus (branch B). The branch consisting of branch A and branch B formed a sister group with the larger branch C formed by 11 other Nyctaginaceae species. Within branch C, Nyctaginia capitata, Mirabilis jalapa, and Acleisanthes obtusa formed a sister group with 100% support. Guapira discolor and Pisonia aculeata were separated on another branch, and the remaining six Bougainvillea species formed a sister group. Bougainvillea pachyphylla and B. peruviana were basal to the Nyctaginaceae clade. Bougainvillea glabra and B. spectabilis formed a sister group, which was sister to B. praecox with 100% support. The branch formed by B. glabra, B. spectabilis, and B. praecox was sister to B. spinosa with 89% support.

Discussion
For the first time, this study sequenced, assembled, and analyzed the cp genomes of B. glabra 'Brasiliensis' and B. spectabilis 'Auratus', two morphologically similar Bougainvillea cultivars differing in cold resistance. The results revealed that the cp genomes of these two species possess a typical quadripartite structure, with one LSC, one SSC, and two IR regions, consistent with the cp genome structures of other Bougainvillea species and the most common structure in plant cp genomes [7,18,19]. The cp genome size of B. glabra and B. spectabilis was 154,520 and 154,542 bp, respectively, which was relatively large for Bougainvillea. The size of the sequenced cp genomes was found to be similar to the size of the earlier reported cp genomes of B. glabra (154,536 bp; 154,763 bp) and B. spectabilis (154,541 bp) [18,20]. Bougainvillea glabra and B. spectabilis exhibit the same total GC content in their cp genomes, and the GC content is significantly higher in the IR region than in the LSC and SSC regions. GC content plays an important role in genome variation, and its uneven distribution may contribute to the conservation of the LSC, SSC, and IR regions. Additionally, B. glabra and B. spectabilis show consistency in the number of total, protein-coding, rRNA and tRNA genes and introns, indicating a high similarity in their cp genome sequences, which partially explains the similarity in their morphological characteristics.
During biological evolution, codon usage bias is commonly observed among species

Discussion
For the first time, this study sequenced, assembled, and analyzed the cp genomes of B. glabra 'Brasiliensis' and B. spectabilis 'Auratus', two morphologically similar Bougainvillea cultivars differing in cold resistance. The results revealed that the cp genomes of these two species possess a typical quadripartite structure, with one LSC, one SSC, and two IR regions, consistent with the cp genome structures of other Bougainvillea species and the most common structure in plant cp genomes [7,18,19]. The cp genome size of B. glabra and B. spectabilis was 154,520 and 154,542 bp, respectively, which was relatively large for Bougainvillea. The size of the sequenced cp genomes was found to be similar to the size of the earlier reported cp genomes of B. glabra (154,536 bp; 154,763 bp) and B. spectabilis (154,541 bp) [18,20]. Bougainvillea glabra and B. spectabilis exhibit the same total GC content in their cp genomes, and the GC content is significantly higher in the IR region than in the LSC and SSC regions. GC content plays an important role in genome variation, and its uneven distribution may contribute to the conservation of the LSC, SSC, and IR regions. Additionally, B. glabra and B. spectabilis show consistency in the number of total, proteincoding, rRNA and tRNA genes and introns, indicating a high similarity in their cp genome sequences, which partially explains the similarity in their morphological characteristics.
During biological evolution, codon usage bias is commonly observed among species and can be used to infer phylogenetic relationships among different species or within the same genus [28]. The cp genomes of B. glabra and B. spectabilis consist of 26,599 codons, with Leu being the most frequently encoded amino acid. The RSCU values showed that the majority of optimal synonymous codons end with A or U, leading to an increased AT content in the genes, supporting the widespread occurrence of A/T codon bias in the cp genomes of higher plants [29]. These results are consistent with those from previous studies on codon usage bias in the cp genomes of Bougainvillea species [7,18,19] and suggest that it may be a result of natural selection and gene mutation [30].
SSRs in plant cp genomes are characterized by their abundance, high conservation, and rich genetic information, and variations in their copy numbers can serve as important molecular markers for studying plant polymorphisms, population structure, and population genetic evolution [31]. SSR analysis of the cp genomes of B. glabra and B. spectabilis revealed 270 and 271 SSR loci, respectively, with mononucleotide repeats, mainly composed of poly-A and poly-T sequences, being the most abundant. This may explain the differences in the base composition of the cp genomes in these two species. Previous studies have also revealed that plastid SSRs are generally composed of poly-A and poly-T repeats and rarely contain guanine (G) and cytosine (C) repeats [32,33]. No hexanucleotide or longer repeats were detected in either species, which is consistent with the findings of Bautista et al. [7], but differs from the results of Yang et al. [19], who detected no tetranucleotide or longer repeats. This difference may be due to different parameter settings in SSR analysis, as Yang et al. [19] set the minimum repeat number for tetranucleotide to hexanucleotide repeats to five. In addition to SSRs, we identified eight highly variable regions in B. glabra and B. spectabilis using a sliding window approach, six of which were located in the LSC region and the remaining in the SSC region. This indicates that the IR regions of the two genomes are relatively conserved, which is consistent with the research results of Bautista et al. [7]. This is possibly due to the corrective effect of repeated genes in the IR regions on variations [34]. Among these highly variable sites, protein-coding regions ycf1 and ndhF are also highly variable in Bougainvillea plants and other plant species, making them recommended candidate regions for DNA barcoding [35,36]. Other fragments, particularly trnG-UCC, also exhibit high variability, suggesting their potential as DNA barcoding regions in Bougainvillea and suggesting directions for future research.
The contraction and expansion of the IR regions are major factors causing variation in the size of angiosperm cp genomes, as well as gene variation and loss, and pseudogene formation. Therefore, the cp genome size varies among species [37][38][39]. The IR region in angiosperm cp genomes is typically between 20,000 and 30,000 bp, and longer IR regions result in less impact from structural rearrangements on the cp genome [40]. The IR regions of B. glabra and B. spectabilis were 25,377 and 25,385 bp, respectively, falling within the longer range, indicating higher conservation in this region. Species with minor differences in cp genome junctions are generally closely related [41]. The JLB, JSB, JSA, and JLA boundaries of B. glabra and B. spectabilis share identical flanking genes, and the expansion lengths of each boundary gene sequence are also consistent. This suggests a high conservation of the IR boundaries between B. glabra and B. spectabilis and indicates a close phylogenetic relationship. Bougainvillea glabra, B. spectabilis, and four other Bougainvillea species share the same genes at the boundaries, but there are slight differences in the contraction and expansion lengths of the genes, indicating that the contraction and expansion of the IR boundaries in Bougainvillea cp genomes are relatively conserved.
To determine the phylogenetic relationship between B. glabra and B. spectabilis and their systematic positions within Bougainvillea, a phylogenetic tree was constructed based on the complete cp genomes of B. glabra, B. spectabilis, and 15 Caryophyllales species. The results showed that all Bougainvillea species formed a major clade, with B. pachyphylla and B. peruviana as the basal groups of the genus, which is consistent with the findings of Bautista et al. [7] and Bautista et al. [18]. Bougainvillea glabra and B. spectabilis formed a sister group, indicating a close relationship between them, and this branch was sister to B. praecox with 100% support, suggesting a relatively close phylogenetic relationship.

Sampling, DNA Extraction, and Sequencing
Bougainvillea glabra 'Brasiliensis' (voucher specimen: NJFU220918) and B. spectabilis 'Auratus' (voucher specimen: NJFU220919) plants were obtained from Zhangzhou Shengxiang Landscape and Greening Co., Ltd. (Zhang'zhou, China) and planted at Nanjing Forestry University (118 • 81 E, 32 • 07 N) (Nanjing, China). Voucher specimens were deposited in the VR Laboratory, College of Landscape Architecture, Nanjing Forestry University. Healthy mature leaves were collected from a single plant of both B. glabra and B. spectabilis, rapidly frozen in liquid nitrogen, and stored at −80 • C until use. Total DNA was extracted from the leaves using a modified cetyltrimethylammonium bromide method [42]. The volume and concentration of B. glabra 'Brasiliensis' were 40 µL and 54.59 ng/µL, respectively, and those of B. spectabilis 'Auratus' were 40 µL and 43.86 ng/µL, respectively. After a successful quality assessment of the DNA, it was mechanically disrupted using an ultrasonic homogenizer. Fragment purification, end repair, A-tailing of the 3 ends, and adapter ligation were performed to generate sequencing libraries. The libraries were sequenced using the Illumina NovaSeq PE150 platform (Genepioneer Biotechnologies, Nanjing, China).

Conclusions
In this study, the complete cp genomes of 'Brasiliensis' and 'Auratus', cultivars of B. glabra and B. spectabilis, respectively, which are important horticultural species, were sequenced and analyzed. The results indicated that the cp genomes of these two species were highly conserved in terms of structure and gene content. A total of 270 and 271 SSR loci were identified in the cp genomes of B. glabra and B. spectabilis, respectively, alongside eight highly variable regions (psbN, psbJ, rpoA, rpl22, psaI, trnG-UCC, ndhF, and ycf1), which can serve as potential molecular markers. Phylogenetic analysis showed a close relationship between B. glabra and B. spectabilis. The findings of this study not only provide important evidence for the further genetic improvement and breeding of cold tolerance in Bougainvillea plants and the selection of superior varieties, but also contribute to elucidating the evolutionary and systematic relationships among species in Bougainvillea.