Comparative Analysis of Chloroplast Genomes for the Genus Manglietia Blume (Magnoliaceae): Molecular Structure and Phylogenetic Evolution

Manglietia Blume, belonging to the Magnoliaceae family and mainly distributed in tropical and subtropical regions of Asia, has great scientific and economic value. In this study, we employed next-generation sequencing followed by de novo assembly to investigate the adaptive evolution of Manglietia using plastid genetic information. We newly sequenced the complete or nearly complete plastomes of four Manglietia species (Manglietia aromatica, Manglietia calcarea, Manglietia kwangtungensis, and Manglietia glauca) and conducted comparative analysis with seventeen published plastomes to examine the evolutionary pattern within this genus. The plastomes of these five newly sequenced Manglietia species range from 157,093 bp (M. calcarea2) to 160,493 bp (M. kwangtungensis), all exhibiting circular structures when mapped. Nucleotide diversity was observed across the plastomes, leading us to identify 13 mutational hotspot regions, comprising eight intergenic spacer regions and five gene regions. Our phylogenetic analyses based on 77 protein-coding genes generated phylogenetic relationships with high support and resolution for Manglietia. This genus can be divided into three clades, and the previously proposed infrageneric classifications are not supported by our studies. Furthermore, the close affinity between M. aromatica and M. calcarea is supported by the present work, and further studies are necessary to conclude the taxonomic treatment for the latter. These results provide resources for the comparative plastome, breeding, and plastid genetic engineering of Magnoliaceae and flowering plants.


Introduction
Magnoliaceae Juss.comprises evergreen and deciduous trees or shrubs with showy flowers, boasting a rich diversity of over 300 species primarily found in Southeast Asia and the Americas.It is recognized as one of the most primitive and endangered groups of angiosperms [1,2], holding significant value for investigating the origin and phylogeny of flowering plants.Due to the reticulate evolution of morphological characteristics [3], there has been long debate over the classification of Magnoliaceae regarding the delimitation of tribes, genera, and sections.Currently, the most prominent classification systems for Magnoliaceae have been proposed by Dandy [4][5][6], Law [7,8], Nooteboom [9], and Figlar and Nooteboom [10].Dandy [4][5][6] recognized twelve genera in two tribes, Liriodendron Linn. in the tribe Liriodendreae and others in the tribe Magnolieae.Law [7,8]  further divided Magnolioideae into two tribes and five subtribes, recognizing 16 genera.Nooteboom [9] accepted Law's infrafamilial classification but he merged some genera into others.However, Figlar and Nooteboom [10] proposed a quite different classification by merging all genera except Liriodendron into a single expanded genus, Magnolia Linn., which was further divided into three subgenera and twelve sections.In the Flora of China [11], Xia classified the Chinese species of Magnoliaceae into thirteen genera, including three new genera, while Nooteboom, another co-author of the account of the family for the flora, accepted only two genera, Liriodendron and Magnolia, following the treatment of Figlar and Nooteboom [10].Furthermore, many molecular studies of the phylogeny of the family Magnoliaceae were performed using chloroplast [12][13][14][15][16][17][18][19][20][21][22][23] and nuclear datasets [24].The phylogenetic results based on cpDNA [17,18,21] or nuclear data [24] indicated that 12-16 major clades were found to be monophyletic within Magnoliaceae.Accordingly, Nie et al. [24] proposed an updated classification of the genus Magnolia into 15 sections, and each section was treated as independent genus by Xia [11] and Sima and Lu [3].Although many phylogenetic studies have been conducted to enhance our understanding of the phylogenetic evolution in the family, the deeper relationships within each clade, such as Manglietia Blume, remains poorly resolved due to limited taxon sampling.The classification of Xia is widely accepted in Asia, especially in China [25][26][27][28].Here, we accepted to the generic delimitation, in a narrow sense, in the classification of the family Magnoliaceae by Xia [11] and Sima and Lu [3], which treats all sections of Nie et al. [24] as independent genera.
Manglietia Blume, established by Blume [29], comprises approximately 40 species primarily distributed in tropical and subtropical regions of Asia, with a concentration of to 27 or 29 species in Southern China [11].This genus is characterized by its evergreen habit (except Manglietia decidua Q. Y. Zheng), stipules adnate to the petioles, and the presence of four or more ovules per carpel [30].It was later treated as a section of Magnolia by some scholars [10,21,31].It is generally considered closely allied to the genera Magnolia and Manglietiastrum Law.based on its morphological characteristics [32][33][34][35].Furthermore, the infrageneric classification of Manglietia is still unclear.Tiep [36] was the first one to establish the infrageneric classification of Manglietia, and recognized two sections, i.e., sect.Olivera, with its style shorter than half of the carpel length, and sect.Manglietia, with its style longer than half of the carpel length.But this classification has never been followed or adopted by later authors.Zheng [37] merged the monotypic genus Sinomanglietia Z.X.Yu, which contains a single species (Simarouba glauca Z.X.Yu) with deciduous habits, into Manglietia and treated it as sect.Deciduae Q. Y. Zheng [38,39], while placing all other species with evergreen habits into sect.Manglietia.However, all previous molecular studies indicated that Manglietia was strongly supported to be monophyletic [17,18,20,22], and it was recognized as an independent genus for long time in Asian flora and checklists [8,11].Several phylogenetic analyses of Manglietia have been carried out, but no texts have been produced on the infrageneric classification within Manglietia.Thus, the phylogenetic relationships within Manglietia also remain unresolved, and further studies are necessary.
Since the sequencing of the first chloroplast genome (plastome) in 1986 [40], there has been rapid progress in high-throughput sequencing technology, resulting in the continuous sequencing of the numerous plant plastomes [41,42].The National Center for Biotechnology Information (NCBI) organelle genome database currently holds more than 8600 plant plastomes, with the majority of them sequenced in the past four years [43].In contrast to nuclear genomes, plastomes typically exhibit a conserved structure and a relatively lower rate of nucleotide substitution [44,45].They have been extensively utilized for resolving phylogenetic relationships among plant lineages and investigating chloroplast genome evolution, such as in Dennstaedtiaceae Lotsy.[46] and Magnolia [47].These investigations significantly advanced our understanding of plant evolutionary relationships.However, the plastomes of many species of Manglietia are only published in the form of data, and scientific problems such as structural variation and evolutionary relationships among lineages have not been discussed [23,48].Given their significant scientific and economic Genes 2024, 15, 406 3 of 23 value, further exploration employing chloroplast genomes would provide crucial insights into the systematics and evolution of the family Magnoliaceae [11].
Traditionally based on morphological characteristics alone, species delimitation often fails to distinguish recently diverged species, resulting in cryptic species complexes [49].These species can be derived from isolation differentiation [50], hybridization, and polyploidization [51].The availability of high-throughput sequencing technology has made obtaining plastome sequences more feasible.Compared to nuclear genomes and mitochondrial genomes, plastomes possess a small size, a low rate of nucleotide substitution, single-parental inheritance, and a haploid nature.These characteristics make plastomes an excellent choice for analyzing nucleotide diversity and reconstructing phylogenies among closely related species, particularly among polyploid taxa [52,53].Numerous studies have utilized plastome data to resolve species classification, elucidate phylogenetic relationships among land plants, and conduct comparative analyses of chloroplast genomes [54][55][56].Notably, comparative analysis based on plastome data provides a more comprehensive understanding of species evolution and phylogenetic relationships compared to limited DNA fragments [57].Among Manglietia species, Manglietia calcarea X. H. Song is a rare and endangered and extremely small population distributed in the neighboring region of N Guangxi and S Guizhou [58,59].M. calcarea has a high calcium requirement, a narrow ecological environmental range, and a small population, its distribution areas are relatively remote, and it is endemic to Guizhou and Guangxi [59].M. calcarea was described by Song [60] as originating from the limestone areas in Libo County, Guizhou Province, and its taxonomic status has been controversial.Chen and Nooteboom [35] considered M. calcarea to be similar to Manglietia fordiana Oliv.and treated it as M. fordiana var.calcarea.M. calcarea was neglected in the Flora Reipublicae Popularis Sinica [30] and reinstated in Flora of China [11].Sima et al. [59] compared M. calcarea with M. aromatica Dandy, and M. fordiana based on 13 morphological characteristics, and concluded that it is very similar to M. aromatica; they treated it as a variety M. aromatica var.calcarea.However, this result was based only on morphological characteristics; the classification status of M. calcarea needs be evaluated.To better understand the classification status of M. calcarea and its relationship with Manglietia, it is imperative to identify genetic discrepancies within the major clade of Manglietia.
In this research, we have assembled and annotated the five plastomes of four Manglietia species.The present work aims to (1) investigate the genetic variation within the Manglietia plastome; (2) characterize plastomes structure, sequence divergence, mutation hotspot regions, and repeat regions; (3) evaluate the phylogenetic relationships within Manglietia; and (4) clarify the classification relationships of related genera and the specular species M. calcarea.

Plant Materials and DNA Extraction and Sequencing
The fresh leaves of five samples representing four species of Manglietia (M.aromatica, M. calcarea, M. glauca Blume, and M. kwangtungensis Dandy) were collected from the provinces of Guangxi, Guizhou, and Guangdong, respectively.Among them, two samples of M. calcarea from different locations were added for a comparative analysis of chloroplast genomes on the population level to better understand the classification relationships among the closely related species of M. calcarea.The voucher specimens were deposited at the Herbarium of South China Agricultural University (CANT) and the Herbarium of South China Botanical Garden, Chinese Academy of Sciences (IBSC) (Table S1).Total genomic DNA was extracted from fresh, young leaves using the Plant Genomic DNA Kit (Tiangen, Beijing, China) following the manufacturer's protocol.Once the sample genomic DNA passed quality assessment, short-insert (500 bp) paired-end (PE) libraries were sequenced by the Beijing Genomics Institute (Shenzhen, China) using the Illumina HiSeq 2500 platform with a read length of 150 bp.Each species generated a minimum of 5 Gb clean data.

Plastid Genome Assembly, Annotation, and Comparison
The paired-end reads from the clean data were filtered and assembled into contigs using the GetOrganelle software [61].Subsequently, the assembled plastomes were visually inspected in Bandage [62] software followed by manual editing if necessary in Geneious v11.1.5software; then, complete plastomes were obtained for each sample.

Plastome Comparison and Sequence Divergence Analysis
Seventeen whole plastomes of Manglietia (M.aromatica2, M. calcarea3, M. conifera Dandy [66].The online tool IRscope (https://irscope.shinyapps.io/irapp/(accessed on 24 February 2023)) was utilized to compare the junction regions among the 22 sequences that connect the IR, SSC, and LSC [67].Additionally, the sequence divergence of 22 Manglietia plastomes was investigated using the program mVISTA [68], with LAGAN and M. aromatica2 as references, to demonstrate inter-and intraspecific variations.Nucleotide diversity was also assessed by DnaSP v6.12.03 (DNA Sequences Polymorphism) software with a sliding window strategy [69], where the step size was set to 200 bp and a window length of 600 bp was employed [70].
The codon usage pattern of protein-coding genes in the 22 Manglietia plastomes was estimated using CodonW v.1.4.2.(https://sourceforge.net/projects/codonw/ (accessed on 1 March 2023)).The relative synonymous codon usage (RSCU) values and the effective number of codons (ENC) were determined to quantify the extent of the codon usage bias for each genome by applying published equations for RSCU calculation [71].Subsequently, the TBtools HeatMap function [72] visualized the RSCU values across the 22 Manglietia plastomes, while the ENC values indicated each individual gene's codon bias within a range from 20 to 61; lower ENC values denoted higher codon bias levels observed in specific genes [73].The computation of overall GC content and the individual GC content at the first, second, and third codon positions (GC1, GC2, and GC3, respectively) were calculated utilizing EMBOSS [74] online software (http://emboss.toulouse.inra.fr/cgi-bin/emboss/(accessed on 1 March 2023)).

Repeat Sequence Analysis
The REPuter [75] online software (https://bibiserv.cebitec.uni-bielefeld.de/reputer/(accessed on 26 February 2023)) facilitated the identification of repeat sequences, including palindromic repeats, direct repeats, and reverse repeats, under specified parameters: The maximum size of repeat sequences that were computed was limited to 50, while the minimum size and Hamming distance were set at 30 and 3. Tandem repeat sequences were identified through Tandem Repeats Finder [76] employing alignment parameters such as match = 2, mismatch = 7, and indels = 7. Repeats satisfying the conditions of a minimum alignment score of 80, a maximum period size of 500 bp, and a maximum TR array size of 2 million were considered.Furthermore, simple sequence repeat (SSR) detection utilized the Perl script MISA (MIcroSAtellite identification tool), with a threshold of mono-, di-, tri-, tetra-, penta-, and hexanucleotides, respectively [77].

Analysis of Substitution Rate
In this study, we employed the KaKs_calcaulator [78] to calculate nonsynonymous (Ka) and synonymous substitution rates (Ks), as well as the Ka/Ks ratio, in order to identify gene divergence change within the 22 Manglietia plastomes.To minimize errors, we screened the protein-coding gene (CDS) sequences of these plastomes using specific criteria: each CDS sequence should have a total number of bases that is a multiple of 3 and a length > 300 bp.Subsequently, we retained 51 CDS sequences for further analysis.Pairwise comparisons among the 22 plastomes resulted in a total of 231 sequence pairs.The genetic code was set as the "Bacterial and Plant Plastid Code" with the calculation method "YN".When there were no substitutions or perfect matches in the alignment, the Ks value was set to 0; in such cases, the Ka/Ks value was reported to be "NA" and replaced with 0 in the results.

Dataset Generation and Phylogenetic Analyses
A total of 57 complete Magnoliaceae family plastomes belonging to all sections recognized by Wang et al. [21] were obtained from the NCBI GenBank database.Liriodendron tulipifera Linn.and Liriodendron chinense (Hemsl.)Sargent were used as outgroups.For the phylogenetic analysis, the distribution of the 57 sequences among different genera in Magnolioideae were as follows: 2), and Talauma Juss.(1).
All the annotated filest of the nucleotide sequences of the protein-coding genes (CDS) in GenBank format were extracted using Geneious v11.1.5software and manually corrected if necessary.CDS alignment was performed using Muscle v3.8.31 [79] software and manually adjusted when required.Loci covering less than 55% of species were removed to minimize reliance on loci with limited information or present in relatively few species, resulting in obtaining a final set of 77 CDS sequences from 59 plastomes for subsequent analysis.The script "concatenate_fasta.py"(available at https: //github.com/Kinggerm/PersonalUtilities/(accessed on 31 October 2023)) was utilized to merge locus alignments and generate CDS datasets.Furthermore, Gblocks v0.91b was used with strict exclusion criteria (-b5 = n) to generate the CDS_GB datasets [80].

Features of Manglietia Plastome
All chloroplast genomes exhibited a double-stranded circular quadripartite structure in these species' plastomes (Figure 1).The 22 plastomes ranged in size from 157,093 bp in M. calcarea2 to 160,493 bp in M. kwangtungensis (Table 1).All complete or nearly complete plastomes comprised a large single-copy region (LSC 87,959 bp-88,791 bp), a small single-copy region (SSC 18,741 bp-19,030 bp), and a pair of inverted repeat regions (IR 24,991 bp-26,782 bp).For each assembled chloroplast genome, 110-113 genes were annotated, including 79 protein-coding genes, 28-30 tRNA genes, and 3-4 rRNA genes (Tables 1 and 2).The overall GC content of these plastomes was similar, all of which were Genes 2024, 15, 406 6 of 23 39.30%, except M. calcarea2 (39.00%), and varied within the LSC, SSC, and IR regions.The GC content in the IR region (42.50%-43.20%) was higher than that in LSC (37.90%-38.00%)and SSC (34.20%-34.30%)regions (Table 1).In all species' plastomes examined, the ycf1 gene extended from the IRa into the SSC region while leaving a truncated copy at the junction of IRb/SSC.The gene rps12 underwent trans-splicing across two regions: its 5 ′ end exon resided in the LSC region, whereas its intron and 3 ′ end exon was located within the IR region.The plastid genome sequences obtained have been submitted to GenBank (Accession Nos.PP386157-PP386161).
in these species' plastomes (Figure 1).The 22 plastomes ranged in size from 157,093 bp in M. calcarea2 to 160,493 bp in M. kwangtungensis (Table 1).All complete or nearly complete plastomes comprised a large single-copy region (LSC 87,959 bp-88,791 bp), a small singlecopy region (SSC 18,741 bp-19,030 bp), and a pair of inverted repeat regions (IR 24,991 bp-26,782 bp).For each assembled chloroplast genome, 110-113 genes were annotated, including 79 protein-coding genes, 28-30 tRNA genes, and 3-4 rRNA genes (Table 1 and  Table 2).The overall GC content of these plastomes was similar, all of which were 39.30%, except M. calcarea2 (39.00%), and varied within the LSC, SSC, and IR regions.The GC content in the IR region (42.50%-43.20%) was higher than that in LSC (37.90%-38.00%)and SSC (34.20%-34.30%)regions (Table 1).In all species' plastomes examined, the ycf1 gene extended from the IRa into the SSC region while leaving a truncated copy at the junction of IRb/SSC.The gene rps12 underwent trans-splicing across two regions: its 5′ end exon resided in the LSC region, whereas its intron and 3′ end exon was located within the IR region.The plastid genome sequences obtained have been submitted to GenBank (Accession Nos.PP386157-PP386161).
We examined the expansion and contraction of the IR area between single-copy regions and pairs of IR regions for the twenty-two plastomes (Figure S1).The gene positions at four borders, JLB (LSC/IRb), JSB (IRb/SSC), JSA (SSC/IRa), and JLA (IRa/LSC), had almost identical types except that there were two situations for the JLB border.In M. grandis (NC_058271), the rpl2 gene overlapped in the LSC/IRb region.Second, in other species, the rpl2 gene was located in the IRb region and 56-60 bp away from the border, while in JSB junctional areas, the ndhF gene of all species was 61-105 bp away from border.JSA and JLA were both very conserved among the twenty-two plastomes.The ycf1 gene straddles the boundary of JSA, with 5540-5648 bp in the IRa region.The distance of the junction between trnH-GUG and JLA ranges from 1 to 19 bp.We examined the expansion and contraction of the IR area between single-copy regions and pairs of IR regions for the twenty-two plastomes (Figure S1).The gene positions at four borders, JLB (LSC/IRb), JSB (IRb/SSC), JSA (SSC/IRa), and JLA (IRa/LSC), had almost identical types except that there were two situations for the JLB border.In M. grandis (NC_058271), the rpl2 gene overlapped in the LSC/IRb region.Second, in other species, the rpl2 gene was located in the IRb region and 56-60 bp away from the border, while in JSB junctional areas, the ndhF gene of all species was 61-105 bp away from border.JSA and JLA were both very conserved among the twenty-two plastomes.The ycf1 gene straddles the boundary of JSA, with 5540-5648 bp in the IRa region.The distance of the junction between trnH-GUG and JLA ranges from 1 to 19 bp.
The long repeat within the chloroplast genomes of Manglietia species was analyzed in this study, employing REPuter and Tandem Repeats Finder across twenty-two plastomes.A total of 1155 long repeats, of which 459 (39.74%) were tandem repeats, 342 (29.61%) forward repeats, and 354 palindromic repeats (30.65%), were identified in the genomes, and complement and reverse repeats were not found in Manglietia species (Table S5).The numbers of tandem repeats varied from 19 to 24, palindromic repeats from 13 to 19, and dispersed repeats from 11 to 19 (Figure 5C; Tables S2 and S3).SSRs (36)(37)(38)(39)(40), followed by 6-9 SSRs in both CDS and the coding sequence introns.Notably, A or T types dominated among mononucleotide SSRs, also exhibiting high richness within dinucleotide, trinucleotide, tetranucleotide, and hexanucleotide SSRs.In our results, fiftyfour SSRs and sixty long repeat sequences were identified in the newly sequenced M. ar-omatica1.But in the published plastome of M. aromatica2, we identified fifty-three SSRs and forty-nine long repeat sequences.In the three samples of M. calcarea, 50, 59, and 59 long repeat sequences were identified, respectively (Figure 5A,B,D; Tables S3 and S5).The plastomes of all twenty-two sequences were analyzed in this study, and a total of 51-56 SSRs were identified in each plastome, including six types (mono-, di-, tri-, tetra-, hexa-, and compound nucleotides), and "penta-" SSRs were not present.The details of all plastome SSRs are represented in Table S4.In all twenty-two chloroplast genomes, mononucleotides accounted for more than half (50.90%-59.26%),except for M. glauca.Among the different regions analyzed, the IGS region contained the largest number of SSRs (36-40), followed by 6-9 SSRs in both CDS and the coding sequence introns.Notably, A or T types dominated among mononucleotide SSRs, also exhibiting high richness within dinucleotide, trinucleotide, tetranucleotide, and hexanucleotide SSRs.In our results, fiftyfour SSRs and sixty long repeat sequences were identified in the newly sequenced M. aromatica1.But in the published plastome of M. aromatica2, we identified fifty-three SSRs and forty-nine long repeat sequences.In the three samples of M. calcarea, 50, 59, and 59 long repeat sequences were identified, respectively (Figure 5A,B,D; Tables S3 and S5).

Codon Usage Bias Analysis
Codon usage bias is a fundamental genomic feature to provide crucial insights into species evolution.In total, 51 protein-coding genes were identified across the 22 chloroplast genomes analyzed.The GC content of these protein-coding genes ranged from 38.84% to 38.92%, with minimal variation observed in GC1, GC2, and GC3, all of which were below 42%.These findings also indicate a high abundance of A/T bases in the 22 Manglietia plastomes, particularly at the third codon position (Table 3).When examining codon usage bias through multi-species analysis, the effective number of codons (ENc) is commonly employed to quantify deviations from random selection and assess genome-or gene-specific biases.ENc values typically range between 20 and 61 [83].Notably, lower ENc values suggest significant codon usage bias within a species' genome or gene [84].According to a previous study, when the ENc value is less than or equal to 35, it can be inferred that the species genome or gene codon usage bias is significant.In our study, the ENc values for all 22 Manglietia plastomes ranged from 50.34 to 50.97 (Table 3), significantly exceeding the threshold value of 35 and indicating weak codon usage bias within these plastomes.There were slight differences in the ENc, GC, GC1, and GC2 values between the two plastomes of M. aromatica.Three samples of M. calcarea also had little difference.

Codon Usage Bias Analysis
Codon usage bias is a fundamental genomic feature to provide crucial insights into species evolution.In total, 51 protein-coding genes were identified across the 22 chloroplast genomes analyzed.The GC content of these protein-coding genes ranged from 38.84% to 38.92%, with minimal variation observed in GC1, GC2, and GC3, all of which were below 42%.These findings also indicate a high abundance of A/T bases in the 22 Manglietia plastomes, particularly at the third codon position (Table 3).When examining codon usage bias through multi-species analysis, the effective number of codons (ENc) is commonly employed to quantify deviations from random selection and assess genomeor gene-specific biases.ENc values typically range between 20 and 61 [83].Notably, lower ENc values suggest significant codon usage bias within a species' genome or gene [84].According to a previous study, when the ENc value is less than or equal to 35, it can be inferred that the species genome or gene codon usage bias is significant.In our study, the ENc values for all 22 Manglietia plastomes ranged from 50.34 to 50.97 (Table 3), significantly exceeding the threshold value of 35 and indicating weak codon usage bias within these plastomes.There were slight differences in the ENc, GC, GC1, and GC2 values between the two plastomes of M. aromatica.Three samples of M. calcarea also had little difference.The RSCU value denotes the ratio between the observed frequency of a codon's usage and its theoretical expected frequency, serving as a crucial parameter for quantifying codon usage bias in scientific research.The terminator codons ATG and TGG were removed in this analysis, because terminator codons are not involved in encoding amino acids, and ATG and TGG only encode methionine and tryptophan without using bias.Overall, 51 protein-coding genes were encoded in 64 kinds of codons.Among them, 30 codons exhibited an RSCU value greater than 1, with 28 of these codons having A/T as their third bases.This indicates that high-frequency codons are more inclined to use A/T endings.There were 29 low-frequency codons in RSCU < 1, and 27 codons ending in G/C at the third base, accounting for 93.10%, indicating that the frequency of codons ending in G/C in the chloroplast genome is low.Among the synonymous codons of plastomes, GCY encoding alanine has the highest RSCU value, followed by UUA encoding isoleucine (Figure 6).

Phylogenetic Analysis
To ascertain the phylogenetic positions of Manglietia species and elucidate their evolutionary relationships, we constructed phylogenetic trees utilizing 77 protein-coding genes through the maximum likelihood (ML) and Bayesian inference (BI) methods.We combined 54 published chloroplast genomes of Magnoliaceae with five newly sequenced genomes, and trimmed some poor-alignment regions using Gblocks v 0.91b software.The aligned matrix of the CDS_GB dataset extracted by Geneious v11.1.5showed a length of 68,520 bp in the 77 protein-coding region.For ML analysis, the best-fit model for the CDS_GB dataset was K3Pu + F + I and for BI analysis was GTR + F + I, as estimated by IQ-TREE.Note: ENc, effective number of codons; GC, content of guanine-cytosine; GC3s, probability that the third base of the codon appears G/C; GC1/2/3, GC content of the first, second, and third codon bases.

Phylogenetic Analysis
To ascertain the phylogenetic positions of Manglietia species and elucidate their evolutionary relationships, we constructed phylogenetic trees utilizing 77 protein-coding genes through the maximum likelihood (ML) and Bayesian inference (BI) methods.We combined 54 published chloroplast genomes of Magnoliaceae with five newly sequenced genomes, and trimmed some poor-alignment regions using Gblocks v 0.91b software.The aligned matrix of the CDS_GB dataset extracted by Geneious v11.1.5showed a length of 68,520 bp in the 77 protein-coding region.For ML analysis, the best-fit model for the CDS_GB dataset was K3Pu + F + I and for BI analysis was GTR + F + I, as estimated by IQ-TREE.
A phylogram of the maximum likelihood (ML) tree, displaying the support values at the nodes, is depicted in Figure 7.With Liriodendron as the outgroup, the members of Magnolioideae were divided into three major strongly supported groups (Figure 7A-C), each with ML bootstrap values (BS) of 94 and 100.Among them, the genera Manglietia, Houpoea, Oyama, and Magnolia were clustered into group A with strong support, the genera Michelia, Aromadendron, Yulania, Pachylarnax, Kmeria, Metamagnolia, and Paramagnolia were clustered into group B, and group C contained other genera.Our phylogenetic analysis strongly supports the result that there are 14 major clades within the subfamily Magnolioildeae that monophyletic.
with ML bootstrap values (BS) of 94 and 100.Among them, the genera Manglietia, Houpoea, Oyama, and Magnolia were clustered into group A with strong support, the genera Michelia, Aromadendron, Yulania, Pachylarnax, Kmeria, Metamagnolia, and Paramagnolia were clustered into group B, and group C contained other genera.Our phylogenetic analysis strongly supports the result that there are 14 major clades within the subfamily Magnolioildeae that monophyletic.

Plastome Variation
In this study, we reported the plastomes of five individuals from four Manglietia species and compared them with those of 17 other Manglietia species to enhance our understanding of genome organization and molecular evolution of the Magnoliaceae family.The plastomes of most terrestrial plants exhibit highly conserved characteristics, including similar gene content and organization across different plant lineages [85].All newly assembled Manglietia plastomes display the typical quadripartite structure observed in photosynthetic angiosperm plastomes, showing no notable distinctions compared to previously published chloroplast genomes within Magnoliaceae.The GC content of the twenty-two Manglietia plastomes analyzed in our study remains consistent at 39.3%, with the exception of M. calearea2 which has a slightly lower GC content of 39.0%.The higher GC content detected in the IR regions can be attributed to the presence of four copies of GC rRNA genes (rrn16, rrn23, rrn4.5, and rrn5) that are clustered in these regions, a common phenomenon in various plant species [86][87][88].Compared with the GC content of the plastomes of other angiosperms, Manglietia has a higher GC content.The GC content is anticipated to exert a substantial influence on genome functioning and species ecology.It is postulated that higher GC content may be associated with enhanced adaptability to seasonally arid environments or cold regions, which are characteristic of a temperate climate for angiosperm species [89].The plastome of five new sequences of Manglietia ranges from 157,093 bp (M.calcarea2) to 160,493 bp (M.kwangtungensis).It is similar in size to the plastomes of other previously published Magnoliaceae species [21,90].The genome size of M. calcarea2 is approximately 3000 bp shorter than that of the other two samples, primarily due to a loss in the IR region encompassing the rrn16-trnI-GAU sequence.But another M. calcarea (M.calcarea1 from Dongduo) assembled complete sequence length is 160,446 bp, and that of the published M. calcarea (M.calcarea3) sequence is 160,027 bp.Comparing the plastomes between the two M. aromatica sequences, the new sequence is longer than the published plastome, mainly due to the difference of 213 bp in the LSC.The deletion of chloroplast genomes also leads to differences in the repeat sequence analysis and the codon usage bias analysis.
The genomics of the 22 Manglietia plastomes were compared using mVISTA and Mauve software.The genomics of 22 Manglietia chloroplast genomes exhibit a high degree of similarity, with significantly higher sequence variation observed in the non-coding and single-copy regions compared to the coding and IR regions.The borders of JLB are relatively conserved among angiosperm plastomes, primarily located within the rps19 and rpl2 genes.Our analysis reveals that M. grandis has the most pronounced contraction of the IR region, accompanied by an expansion of the rpl2 gene at the JLB boundary by 308 bp, while the other 21 sequences have their rpl2 genes situated within the IR region.This finding is consistent with a previous study on other Manglietia species [91].It should be noted that modifications in IR boundaries can result in alterations in plastome size [92].Notably, significant expansions and contractions have been reported in other plants, like Pelargonium transcaalense, which possesses a plastome size of 242,575 bp with an IR region spanning 87,724 bp [93].However, the underlying mechanism governing expansion and contraction events within the plastid genome's IR region remains unclear.
The coding regions and conserved sequences of the plastome have been extensively utilized for phylogenetic inferences at higher taxonomic levels, such as family or genus [94].Plastomes serve as valuable resources for identifying mutational hotspots across different lineages and are employed in intraspecies discrimination and species-level phylogenetic studies [95,96].However, certain plastid DNA fragments currently used in Magnoliaceae, including the trnK intron containing matK, trnH-psbA, atpB-rbcL, rbcL, the trnL intron, trnL-trnF, and ndhF, fail to provide sufficient phylogenetic signals required for establishing high-resolution relationships among related taxa.This limitation is particularly evident when classifying infrageneric taxa with uncertain taxonomic status [17,18].To address this issue comprehensively within a Manglietia genus-specific context while maintaining a species-level focus on phylogenetic studies of Manglietia species groups, our alignment identifies the top 13 regions that exhibit the highest degree of genetic variability.
Molecular markers based on DNA polymorphisms, such as SSR, have emerged as valuable genetic resources widely employed for assessing genetic diversity and deducing molecular phylogenetics [97].Simple sequence repeats (SSRs) are short (1-6 bp) repeat motifs that are tandemly repeated varying numbers of times [98].SSRs can provide interspecific polymorphisms, making them effective markers in population genetic analysis.In the present study, a total of 1252 SSRs were identified across the 22 plastomes (Table S4); however, further experiments are required to verify their effectiveness.Mononucleotide SSRs represent the most prevalent motifs and occupy the largest portion among all SSR types.Tri-and hexanucleotide SSRs occur at much lower frequencies (Figure 5A).Pentanucleotide SSRs were not found in any of the twenty-two plastomes examined.Additionally, most of the detected SSRs consist of A/T repeat units, which may contribute to the high AT richness observed in these chloroplast genomes.Further statistical analysis reveals that these polymorphic variations are not evenly distributed throughout the plastomes.Compared to both single-copy regions and inverted repeat regions, large single-copy regions contain a higher number of SSRs.As expected, sequence variations primarily occur within the LSC region and non-coding regions such as intergenic spacer regions (Figure 5B,D).Similar findings have been reported in other angiosperm plastid genomes [99,100].

Phylogenetic Analysis
Chloroplast genomes have been utilized in phylogenetic analyses due to their nonrecombinant and uniparentally inherited nature, as well as their comparatively slower evolutionary rates compared to nuclear and mitochondrial genomes [101,102].The plastome regions of matK and ndhF have demonstrated remarkable success as genetic markers within the Magnoliaceae family [17].However, the limited number of loci used in phylogenetic inference may lack sufficient power when closely related species are being considered.Consequently, there is a growing preference for conducting phylogenetic analyses based on comprehensive plastome datasets, necessitating comparative genomic studies involving a larger number of plastome sequences.
The phylogenetic analysis based on the protein-coding regions derived from chloroplast genomes also strongly support that the classification of two subfamilies Liriodendroideae and Magnolioideae, as proposed by Law [7,8].Liriodendroideae include a single genus, Liriodndron, with only two species disjunctly distributed in Eastern Asia and eastern North America [103].However, the classification of Magnolioideae has been highly debated for a long time.Some authors [10,21,24,31] argued that a single expanded genus Magnolia would be accepted and it may be divided into 9-15 sections based on morphology or molecular evidence, while many botanists [3][4][5][6]9,11] prefered to maintain a narrow generic delimitation and recognize up to 16 genera within Magnolioideae.Our phylogenetic analysis strongly supports 14 major clades within Magnolioideae, which correspond to the 14 genera proposed by Sima and Lu [3], indicating that maintaining a narrow generic concept would be better to present the evolutionary tendency within Magnolioideae.Manglietia has been widely accepted as an independent genus for a long time [3,[6][7][8]11].Meanwhile, the monophyly of Manglietia has been supported by all previous phylogenetic studies [14,15,17,18,21,22,24,104].Previous phylogenetic analysis using nuclear data [24] indicated that Manglietia and Houpoea (sect.Rytidospermum) formed sister groups, and then, they were found to be sisters to Oyama (sect.Oyama).However, in our study, Houpoea and Oyama are found to be sister groups with a strong support (BS = 100), and then, they are found to be sisters to Manglietia (BS = 100).Discordance between nuclear and organellar phylogenies is commonly observed across the plant tree [105,106].This incongruence in tree topology may arise from various biological phenomena, including gene duplication, horizontal gene transfer, incomplete lineage sorting, or gene flow [107,108].In our study, due to insufficient data and under-sampling, we were unable to further analyze this phylogenetic conflict.
The infra-generic classification of Manglietia is still unclear.Tiep [36] and Zheng [37] attempted to propose infrageneric classifications (Figure 8), but their classifications have never been adopted by other authors.Based on the scale of the style and carpel length, Tiep [36] divided the genus Manglietia into two sections, sect.Manglietia and sect.Olivera.However, our results indicate that both sect.Manglietia and sect.Olivera are not monophyletic.When they merged the genus Sinomanglietia with Manglietia, Zheng et al. [37] divided the genus into two sections, sect.Decidua and sect.Manglietia.Sect.Decidua, originally described as the genus Sinomanglietia includes a single species M. decidua, with a deciduous habit, while all other species with evergreen habits were placed in sect.Manglietia.However, in our studies, M. decidua, a representative of sect.Decidua, is nested within sect.Manglietia.It is suggested by the results of Wang et al. [21], based on 86 whole chloroplast genomes, that M. decidua is the sister of all other sampled Manglietia.In the present study, the CDS_GB dataset divided twenty-two plastomes from Manglietia into three clades (Figures 7 and S2) with bootstrap values (BS) of 73 and 100 and BI posterior probabilities (PP) of 0.67 and 1.00.Clade 1 consists of M. aromatica, M. calcarea, M. conifera, M. patungensis, M. glaucifolia, M. glauca, M. dandyi, M. zhengyiana, M. kwangtungensis, and M. lucida.Clade 2 includes a single species, M. decidua, with a deciduous habit.Clade 3 comprises M. crassipes, M. ventii, M. fordiana, M. insignis, M. duclouxii, M. hookeri, M. grandis, and M. obovalifolia.This implies that the genus may be divided into three sections, but further studies are necessary.The relationships among most Manglietia species nodes have strong support values (bootstrap support value > 70 and Bayesian posterior probability > 0.90), while a few nodes are weak (bootstrap support value < 50), and the phylogenetic relationships of some species are still unclear.As for the plastomes of the five newly sequenced species in this study, M. kwangtungensis is sister to M. lucida and located at the base of clade 1 with weak support (BS = 28).The sister relationship between M. glauca and M. patungensis also has weak support (BS = 69, PP = 0.94); this may be due to incomplete taxon sampling.The treatment of M. calcarea has been controversial in the past.Chen and Nooteboom [35] and Kumar [109] treated it as a variety (M.fordiana var.calcarea and M. fordiana var.calcarea, respectively).Xia et al. [11] reinstated it as independent species.Later, Sima et al. [59] considered it to be closely related to M. aromatica and treated it as a variety, M. aromatica var.calcarea.Our results show that M. calcarea is isolated with M. fordiana, each located in different clades, clade 1 and clade 3, respectively.M. fordiana is located in clade 3, and sister to M. crassipes and M. ventii, with strong support (BS = 98, PP = 1.00).This implies that it is not appropriate to treat M. calcarea as a variety of M. fordiana.Morphologically, M. calcarea differs from M. fordiana in that it has 12-16 carpels and mature carpels dehiscing along the dorsal sutures.However, M. calcarea and M. aromatica are clustered into a group, and the previously published M. aromatica2 is nested within M. calcarea, and then, forms sister groups with the currently sequenced M. aromatica1.The sequence difference between the two species is only 305 bp, and the main difference is the loss of bases in the intergenic spacer (Figure 2).Therefore, the affinity between M. calcarea and M. aromatica is supported by our studies.But this suggests that M. calcarea and M. aromatica are not monophyletic.It seems that M. calcarea may be considered to be different from M. aromatica as its leaves are coriaceous, rigid, and rounded with an acumen 6-14 mm long at the apex, and it has 9 tepal 9 12-16 carpels, while in the latter, the leaves are thinly coriaceous to coriaceous, mucronate to acuminate at the apex, and it has 11-12 tepals and 29-39 carpels [59].M. calcarea is recorded from the neighboring areas between N Guangxi (Huanjiang) and S Guizhou (Libo and Dushan), while M. aromatica is distributed in W Guangxi, SW Guizhou, and Yunnan Province.Sometimes, M. calcarea is identified as M. aromatica in the literature [110,111].At the moment, we cannot examine the voucher of previously sampled M. aromatica2, and its identification needs be confirmed.It seems that our studies suggest merging M. calcarea with M. aromatica.However, the final determination on the species M. calcarea and M. aromatica needs be further studied to include more samples in their distribution ranges.has 9 tepal 9 12-16 carpels, while in the latter, the leaves are thinly coriaceous to coriaceous, mucronate to acuminate at the apex, and it has 11-12 tepals and 29-39 carpels [59].M. calcarea is recorded from the neighboring areas between N Guangxi (Huanjiang) and S Guizhou (Libo and Dushan), while M. aromatica is distributed in W Guangxi, SW Guizhou, and Yunnan Province.Sometimes, M. calcarea is identified as M. aromatica in the literature [110,111].At the moment, we cannot examine the voucher of previously sampled M. aromatica2, and its identification needs be confirmed.It seems that our studies suggest merging M. calcarea with M. aromatica.However, the final determination on the species M. calcarea and M. aromatica needs be further studied to include more samples in their distribution ranges.

Conclusions
In the present study, we assembled and analyzed five new complete or nearly complete plastome sequences of Manglietia species and conducted the first comparative analysis with other Manglietia species.The annotation and comparison within Manglietia species revealed conservation of the gene sequence, GC content, and genomic composition.Additionally, we identified repeated sequences, 51-56 microsatellites, and 13 highly mutational hotspot regions in the Manglietia plastome.These findings contribute to our un-

Conclusions
In the present study, we assembled and analyzed five new complete or nearly complete plastome sequences of Manglietia species and conducted the first comparative analysis with other Manglietia species.The annotation and comparison within Manglietia species revealed conservation of the gene sequence, GC content, and genomic composition.Additionally, we identified repeated sequences, 51-56 microsatellites, and 13 highly mutational hotspot regions in the Manglietia plastome.These findings contribute to our understanding of the adaptations of Manglietia species to limestone environments.Furthermore, they provide valuable genomic resources and potential markers for future studies on species identification and speciation within this genus.This study sheds light on the phylogenetic relationships and adaptive evolution of Manglietia.The genus may be divided into three sections, and the previously proposed infrageneric classifications are not supported by our studies.The close affinity between M. calcarea and M. aromatica is supported, but their taxonomic treatment needs to be further studied.

Figure 1 .
Figure 1.Four newly sequenced gene maps of Manglietia plastomes were generated using the OGDRAW online platform.Clockwise transcription is observed for genes located inside the circle, while counterclockwise transcription occurs for those outside.Color-coded representation

Figure 1 .
Figure 1.Four newly sequenced gene maps of Manglietia plastomes were generated using the OG-DRAW online platform.Clockwise transcription is observed for genes located inside the circle, while counterclockwise transcription occurs for those outside.Color-coded representation distinguishes genes with different functions.Large single-copy region (LSC), small single-copy region (SSC) and inverted repeat region (IR) genes are indicated.Additionally, GC content is depicted by a darker shade in the inner circle, whereas AT content is represented by a lighter gray shade.

Figure 2 .
Figure 2. Comparison of twenty-two plastomes from Manglietia using mVISTA program, with the annotation of M. aromatica2 as a reference.Pink regions indicate conserved non-coding areas, purple represents conserved exons, and white regions denote more variable sites.The vertical scale illustrates percent identity ranging from 50% to 100%.

Figure 2 .
Figure 2. Comparison of twenty-two plastomes from Manglietia using mVISTA program, with the annotation of M. aromatica2 as a reference.Pink regions indicate conserved non-coding areas, purple represents conserved exons, and white regions denote more variable sites.The vertical scale illustrates percent identity ranging from 50% to 100%.

Figure 3 .
Figure 3. Mauve alignment of twenty-two plastomes from Manglietia.The reference genome used in this study was the chloroplast genome of M. aromatica1.

Figure 3 .
Figure 3. Mauve alignment of twenty-two plastomes from Manglietia.The reference genome used in this study was the chloroplast genome of M. aromatica1.

Figure 4 .
Figure 4. Comparison of the variation in nucleotide diversity (π) values across the twenty-two Manglietia plastomes.The vertical scale indicates nucleotide diversity (Pi) value and the horizontal axis represents sequence length.

Figure 4 .
Figure 4. Comparison of the variation in nucleotide diversity (π) values across the twenty-two Manglietia plastomes.The vertical scale indicates nucleotide diversity (Pi) value and the horizontal axis represents sequence length.

Figure 5 .
Figure 5. Analysis of repeat sequence maps in twenty-two plastomes derived from Manglietia species is presented.(A) Classification of SSRs based on repeat type, including momo-, di-, tri-, tetra-, penta-, hexa-and compound nucleotides.(B) Classification of SSRs in twenty-two plastomes, IGS, CDS, and CDS-IGS (C) Numbers of the four repeat types, F, P, R, and C. (D) SSRs locus distribution among three different regions.

Figure 5 .
Figure 5. Analysis of repeat sequence maps in twenty-two plastomes derived from Manglietia species is presented.(A) Classification of SSRs based on repeat type, including momo-, di-, tri-, tetra-, penta-, hexa-and compound nucleotides.(B) Classification of SSRs in twenty-two plastomes, IGS, CDS, and CDS-IGS (C) Numbers of the four repeat types, F, P, R, and C. (D) SSRs locus distribution among three different regions.

Figure 6 .
Figure 6.The RSCU of 22 plastomes from Manglietia species.Color variation closely relates to the RSCU value size.

Figure 6 .
Figure 6.The RSCU of 22 plastomes from Manglietia species.Color variation closely relates to the RSCU value size.

Figure 7 .
Figure 7. Maximum likelihood analysis based on a combined data matrix of 77 plastid genes for 59 species.The "*" indicates our new sequences.The numbers indicate the phylogenetic support values from maximum likelihood with 100% bootstrap support (BS).

Figure 7 .
Figure 7. Maximum likelihood analysis based on a combined data matrix of 77 plastid genes for 59 species.The "*" indicates our new sequences.The numbers indicate the phylogenetic support values from maximum likelihood with 100% bootstrap support (BS).

Figure 8 .
Figure 8.The different infrageneric classifications of Manglietia.The three phylogenetic trees represent the result of this study, the result of Tiӗp based on morphological characteristics, and the result of Zheng based on deciduous habits [36,37].

Figure 8 .
Figure 8.The different infrageneric classifications of Manglietia.The three phylogenetic trees represent the result of this study, the result of Tiěp based on morphological characteristics, and the result of Zheng based on deciduous habits [36,37].

Table 1 .
The basic characteristics of twenty-two plastomes from Manglietia species.

Table 2 .
List of genes encoded by twenty-two plastomes of Manglietia species.

Table 3 .
Codon features of chloroplast genomes of 22 plastomes from Manglietia plant species.