Population Variation and Phylogeography of Cherry Blossom (Prunus conradinae) in China

Prunus conradinae (subgenus Cerasus, Rosaceae) is a significant germplasm resource of wild cherry blossom in China. To ensure the comprehensiveness of this study, we used a large sample size (12 populations comprising 244 individuals) which involved the fresh leaves of P. conradinae in Eastern, Central, and Southwestern China. We combined morphological and molecular evidence (three chloroplast DNA (cpDNA) sequences and one nuclear DNA (nr DNA) sequence) to examine the population of P. conradinae variation and differentiation. Our results revealed that Central, East, and Southwest China are important regions for the conservation of P. conradinae to ensure adequate germplasm resources in the future. We also found support for a new variant, P. conradinae var. rubrum. We observed high genetic diversity within P. conradinae (haplotype diversity [Hd] = 0.830; ribotype diversity [Rd] = 0.798), with novel genetic variation and a distinct genealogical structure among populations. There was genetic variation among populations and phylogeographic structure among populations and three geographical groups (Central, East, and Southwest China). The genetic differentiation coefficient was the lowest in the Southwest region and the gene exchange was obvious, while the differentiation was obvious in Central China. In the three geographic groups, we identified two distinct lineages: an East China lineage (Central China and East China) and a Southwest China lineage ((Central China and Southwest China) and East China). These two lineages originated approximately 4.38 million years ago (Mya) in the early Pliocene due to geographic isolation. P. conradinae expanded from Central China to East China at 3.32 Mya (95% HPD: 1.12–5.17 Mya) in the Pliocene. The population of P. conradinae spread from East China to Southwest China, and the differentiation time was 2.17 Mya (95% (HPD: 0.47–4.54 Mya), suggesting that the population of P. conradinae differentiated first in Central and East China. The population of P. conradinae experienced differentiation from Central China to Southwest China around 1.10 Mya (95% HPD: 0.11–2.85 Mya) during the early Pleistocene of the Quaternary period. The southeastern region of East China, near Mount Wuyi, likely serves as a refuge for P. conradinae. This study establishes a theoretical foundation for the classification, identification, conservation, and exploitation of germplasm resources of P. conradinae.


Introduction
Prunus conradinae (Koehne) Yu et Li, a member of the subgenus Cerasus (Mill.) in the Rosaceae family [1][2][3], is a deciduous species of wild cherry blossom native to China.Mature P. conradinae trees range from 3 to 10 m in height.P. conradinae umbels usually contain three to five large flowers, with white or pink petals that bloom prior to leaf emergence [4].The leaves of P. conradinae are blunt and serrated, with small glands at the

Ecological Niche Model
Longitudinal and latitudinal data on the geographic distribution points of P. conradinae were obtained from field investigations and species specimen databases.In total, 554 specimens were retrieved from 24 herbariums.The database specimen data were compared to ensure that the geographic habitat distributions were correct.Within each 2.5 × 2.5 grid, the distribution point nearest to the center was selected.This approach resulted in 201 P. conradinae geographic distribution points [24].Environmental data were downloaded from the World Climate Database WorldClim, version 2.1, January 2020 (http://www.worldclim.org/(accessed on 20 February 2023)).The database contains information on 19 climatic factors for given periods: current day , 2050s (2041-2060), and 2070s (2061-2080), with a spatial resolution of 2.5 min.Projected climate data (2050s and 2070s) were modelled using a general atmospheric circulation model, CCSM4.Climate information and key factors influencing the geographic distribution of P. conradinae were obtained using DIVA-GIS, version 7.5, and the data were cross-referenced with the altitude, longitude, and latitude of each distribution point [25].We conducted Pearson's correlation analysis [26].Based on the recorded geographic distribution of P. conradinae and climate data, we used MaxEnt, version 3.4.1 to construct a map of the current and future (2050s and 2070s) geographic distribution of P. conradinae.

Plant Materials
A plant sampling strategy was developed based on the statistical analysis and verification of the specimen data, along with forecasts of the contemporary suitable area and data on the contemporary distribution of P. conradinae.Between 2019 and 2022, field investigations were conducted, with 244 samples from 12 populations (fresh leaves of P. conradinae in Eastern, Central, and Southwestern China) collected across eight provinces (Table 1).During the sampling of P. conradinae in Hubei Province, two populations (Phoenix Pool, Yichang City, FHC; Gexian Mountain, Xianning City, GXS) exhibited a stable and continuous red-pink coloration.In addition, leaves and hypanthiums of a population in the Wangcheng Slope area of Enshi Autonomous Prefecture in Hubei Province displayed pubescence.In addition to field observations, samples from each of these populations were subjected to a detailed morphological analysis and comparison with samples from populations located there.Traditional phenotypic characteristics showed a stable and continuous variation amongst different populations in the same area (Table 1).Based on our observations, two new variations were identified: P. conradinae var.ruburm and P. conradinae var.pubescens.This classification was then corroborated by molecular validation of samples sourced from each population.Genomic DNA was extracted from 244 fresh leaves of P. conradinae (The number and location of each population in leaf collection are shown in Table 1) using a commercial kit (Tiangen Biotechnology Co., Ltd., Shanghai, China) following the manufacturer's instructions.The concentration and purity of the extracted DNA were assessed via 1% agarose gel electrophoresis.DNA samples that met the required standards were stored in a refrigerator at -80 • C.These samples were then shipped to Shanghai, China Shanghai Shenggong Co., Ltd. for sequencing to obtain haplotypes for subsequent phy-Plants 2024, 13, 974 4 of 19 logeographic analysis.Universal primers for different sequences of cpDNA and nrDNA in cherry blossom were collated by reviewing the relevant literature and accessing the NCBI website.Three pairs of cpDNA universal primers-a gene maturation enzyme gene fragment (MatK-F: CGATCTATTCATTCAATATTTC; R: TCTAGCACACGAAAGTC-GAAGT) [27], a non-coding gene spacer fragment (TrnL-F-F: CGAAATCGGTAGACGC-TACG; R: ATTTGAACTGGTGGTGACACGAG) [28], and (TrnD-E-F: ACCAATTGAACTA-CAATCCC; R: AGGACATCTCTCTTTCAAGGAG) [29] and a pair of nrDNA sequence fragments (ITS-F: GGAAGTAAAAGTCGTAACAAGG; R: TCCTCCTCC-GCTTATTGATATGC)were employed to determine the genetic diversity and population structure of P. conradinae in different regions.A 25 µL PCR amplification reaction system was constructed using 2.0 µL of DNA template, 0.5 µL (10 µmol/L) each of upstream and downstream primers, 12.5 µL of 2×PCR Master Mix, and 9.5 µL of ddH 2 O.The PCR amplification protocol was as follows: initial denaturation at 94 • C for 5 min, followed by 30 cycles of denaturation at 94 • C for 1 min, annealing at 54-56 • C for 1 min, extension at 72 • C for 1 min, and a final extension at 72 • C for 5 min.After the end of the PCR experimental reaction procedure, the samples were sent to China Shanghai BioEngineering for purification and sequencing after the 1% agarose gel electrophoresis test met the requirements of sending amplified products.

Data Analysis
All cpDNA sequences were aligned using MAFFT, version 7 [30] and then manually reviewed and edited using PhyloSuite, version 1.2.2.After trimming, the sequences of MatK, TrnL-F, and TrnD-E were concatenated.DnaSP, version 6 was used to estimate the genetic diversity of each population.[31].The primary parameters considered included the number of haplotypes (h), haplotype diversity (H d ), nucleotide diversity (Pi/π), and number of polymorphic sites (S) [32].An AMOVA analysis of molecular variance was performed using Arlequin, version 3.1 [33], with significance tested via 1000 nonparametric permutations.Factors, such as the degree of freedom (d.f.), total variance, variation component, variation variance distribution, and genetic differentiation index (F ST ), were estimated within and among P. conradinae populations.Parameters were set to acquire the numerical values of Nst, Gst, and the significance level of P, assisting in the exploration of factors influencing genetic differentiation in the P. conradinae populations and the presence of a marked phylogeographic structure.Software, including PopArt, version 1.7 and Notepad++, version 7.7, and the TCS Network, were used to construct a haplotype network diagram and investigate haplotype kinship [34].Based on TCS haplotype relationships, a geographic distribution map of haplotypes was created using ArcMap, version 10.2, which also corresponded to the species of haplotype and distribution proportion.Neutrality tests (Tajima's and Fu's FS tests) and pairwise mismatch distribution analyses were carried out using Arlequin, version 3.1 [35].Expected and observed value curves, sourced from DnaSP version 6.1, were compared to test the hypothesis of sudden population expansion if they coincided.The study also sought to ascertain if the P. conradinae population had undergone bottleneck or expansion events [36].

Molecular Dating and Demographic Analyses
BEAST, version 6 software was employed to construct a temporal phylogenetic tree for the Rosaceae family and estimate the divergence time of the P. conradinae lineage [37].Using the most recent phylogenetic tree of family Rosaceae [38], four representative groups from different subfamilies were selected, along with their corresponding chloroplast genome sequences obtained from NCBI.Multiple sequence alignment of all chloroplast genomes was conducted using MAFFT, version 7 [30].A best fit nucleotide substitution model was constructed using the maximum likelihood method and phylogeny.IQ-Tree, version 1.6.12software was employed to derive the model [39].The best fit nucleotide substitution model was calculated using the Bayesian information criterion (BIC) [40].A random starting tree was used for 10,000 generations, with a sampling frequency of one every 1000 generations, facilitating the selection of the best fit nucleotide replacement model.A phylogenetic tree of the Rosaceae intergenus (encompassing four subfamilies) was constructed with the aid of five molecular clocks or fossil calibration points.The full chloroplast genome was utilized to determine the differentiation time of P. conradinae versus that of other cherry blossom species within the same clade.From two fossil calibration points, phylogenetic trees reflecting the divergence time of 10 nrDNA (ITS) ribotypes of P. conradinae were constructed, enabling predictions of the differentiation times and migration routes of P. conradinae.The software parameters were eventually set to a GTR substitution model and an exponential uncorrelated relaxed model using the Yule process.Two independent MCMC runs were carried out, each comprising 300,000,000 generations and sampling every 1000 generations.The first 12,000,000 generations in each run were discarded as burn-in.Based on the fossil divergence time, the crown group of Rosaceae (N1) was constrained with a log-normal distribution, setting the mean at 90.18 Mya and the standard deviation at 0.05 (Table 2) [38].The group (tribe Maleae and tribe Spiraeeae) and tribe Amygdaleae (N2) was constrained with a log-normal distribution, setting the mean value at 72.62 Mya and the standard deviation at 0.05 [10,41].The tribe Amygdaleae (N3) was constrained with a log-normal distribution, setting the mean at 68.58 Mya and the standard deviation at 0.01 [41][42][43].Based on the fossil divergence time, the differentiation time of Prunus in a strict sense ranged from 60.7 to 62.4 Mya.The mean divergence time of Prunus was set at 55 Mya (N4), and the standard deviation was set at 0.09 [44,45].The node subgenus Cerasus (N5) was constrained by a log-normal distribution, setting the mean value at 28.21 Mya and the standard deviation at 0.05 [41].The log files and tree files from the two separate runs were merged using LogCombiner, version 2.6.6 (part of the BEAST package).All effective sample size values were above 200.TreeAnnotator, version 1.7.3 (part of the BEAST, version 1.7.3 package) was used, with 25% removed as burn-in to estimate the mean divergence time and 95% highest posterior density (HPD) interval.Finally, FigTree, version 1.3.1 [46] was utilized to display the age of each node and its 95% HPD interval [44,47].

Ecological Niche Model
Based on the model evaluation criteria of the receiver operating curve, the detection outcomes for P. conradinae attained an excellent standard (training setting: 0.945, test setting: 0.949) [38].Based on current potential suitable areas data, suitable habitat regions for P. conradinae include Hubei, Zhejiang, Fujian, Jiangxi, Anhui, Henan, Yunnan, Chongqing, Guizhou, Hunan, and Sichuan, as well as possibly Jiangsu and Shandong.The core distribution areas of P. conradinae are located mainly in Central and Eastern China.In future scenarios, with gradual climate warming, the total suitable area in the 2050s will be reduced compared to that in the current period but will increase in the 2070s.Under the future climate (CCSM4) scenario change in the 2050s, the total suitable area decreased compared with the current situation, but under the future climate scenario change in the 2070s, the total suitable area increased, and the extremely suitable area spread to the high latitude and northeast direction, and the high suitable area spread to the low latitude direction.The core suitable areas, however, remain consistent in Central and Eastern China (Figure 1).Utilizing DIVIA-GIS, 10 limiting environmental factors were identified within Plants 2024, 13, 974 6 of 19 the habitats of the potential distribution areas.The primary factors limiting the geographic distribution of P. conradinae at present were the amount of seasonal (bio16) and annual precipitation (bio12).The annual minimum temperature (bio7) was shown to not be a significant climate factor.Water, as the dominant climate-limiting factor, proved to have a stronger impact on the geographic distribution of P. conradinae than temperature.This is postulated to be associated with the climatic zone of the P. conradinae distribution area being affected by north subtropical and subtropical humid monsoon regions.By the 2070s, rainfall is predicted to increase in China, particularly in Central and Southwestern regions, and temperature changes may become significantly noticeable [26].The anticipated growth of the total suitable area of P. conradinae in the 2070s is linked to the evolving trend of these comprehensive climate factors.
scenarios, with gradual climate warming, the total suitable area in the 2050s will be reduced compared to that in the current period but will increase in the 2070s.Under the future climate (CCSM4) scenario change in the 2050s, the total suitable area decreased compared with the current situation, but under the future climate scenario change in the 2070s, the total suitable area increased, and the extremely suitable area spread to the high latitude and northeast direction, and the high suitable area spread to the low latitude direction.The core suitable areas, however, remain consistent in Central and Eastern China (Figure 1).Utilizing DIVIA-GIS, 10 limiting environmental factors were identified within the habitats of the potential distribution areas.The primary factors limiting the geographic distribution of P. conradinae at present were the amount of seasonal (bio16) and annual precipitation (bio12).The annual minimum temperature (bio7) was shown to not be a significant climate factor.Water, as the dominant climate-limiting factor, proved to have a stronger impact on the geographic distribution of P. conradinae than temperature.This is postulated to be associated with the climatic zone of the P. conradinae distribution area being affected by north subtropical and subtropical humid monsoon regions.By the 2070s, rainfall is predicted to increase in China, particularly in Central and Southwestern regions, and temperature changes may become significantly noticeable [26].The anticipated growth of the total suitable area of P. conradinae in the 2070s is linked to the evolving trend of these comprehensive climate factors.

Sequence Variation, Haplotype Frequency, and Distribution
Three cpDNA fragments, namely MatK, TrnL-F, and TrnD-E, had a combined length of 2238 bp, with fragment lengths of 740 bp, 868 bp, and 630 bp, respectively.Analysis of these fragments detected 22 mutation sites: two in the MatK fragment, twelve in the TrnL-F fragment, and eight in the TrnD-E fragment.Among the 12 populations studied, 18 distinct chloroplast haplotypes (H1-H18) were identified.Among these, the most frequently observed were H1, H7, and H15, accounting for 30.33% (74), 24.59% (60), and 9.43% (23), respectively, of haplotypes in the populations (Table 3).Haplotype H1, with the highest frequency, was found in populations in Central China (Gexianshan Haplotypes H10, H11, and H18 exhibited the lowest frequencies, each detected in only two populations.H1-H9, H11-H13, and H15 were the most commonly detected haplotypes.Haplotypes H10, H14, H16, H17, and H18 were endemic, detected only in populations in East China (MYS), Central China (GXS and FHC), and Southwest China (WCP and HFG).Among these populations, the Gexian Mountain (GXS) population displayed the greatest H d , with seven haplotypes, and the Dabie Mountain (DBS) population displayed the least, with two haplotypes (Tables 3 and 4).The length of the nrDNA sequencing fragment was 578 bp, and 19 mutation sites were identified.From the 12 populations analyzed, 11 ribotypes (R1-R11) were identified.The most common ribotypes were R4, R1, and R3, with frequencies of 30.3% (85), 24.5% (47), and 9.4% (28), respectively (Table 3).R4 had the highest frequency and was found in Central China (GXS and DWS) and East China (MYS, WYS, QLF, ZXTC, and DYH).R1 was detected in populations in Southwest China (EMS and HFG) and Central China (WCP and FHC).R3 was detected in populations in Central China (WCP and FHC).R6, a unique ribotype, was detected in populations in the Gexian Mountain (GXS).The Gexian Mountain (GXS) population had the most diverse ribotypes (N = 6), including rare ones, and the overall highest count of ribotypes (Tables 3 and 4).

Haplotype Network
Based on the TCS haplotype network diagram, Hap1 and Hap7 were located in the central region, while H1, H7, and H15 had a higher number of individuals.Consequently, Hap1 and Hap7 are considered to be ancient haplotypes, whereas the remaining haplotypes are considered to be derived.The 18 haplotypes identified formed two distinct groups: an East China lineage (Central China and East China) and a Southwest China lineage ((Central China and Southwest China) and East China).Different haplotypes under each branch communicate in the same area.Seventeen of the eighteen haplotypes were detected in populations in Central China.Notably, H10, H14, H16, H17, and H18 were unique to specific locations, namely MYS, GXS, WCP, FHC, and HFG, respectively (Figure 2, Table 3).
central part of the diagram.Thus, it can be inferred that R4 is an ancient ribotype.The 11 ribotypes detected in populations in Central China were divided into two lineages: an East China lineage (Central China and East China) and a Southwest China lineage (Central China and Southwest China).Ten of the 11 ribotypes detected were found in populations in Central China.R6 was identified as an endemic ribotype in Gexian Mountain (GXS) (Figure 3, Table 3).According to the network diagram of TCS ribotypes (Figure 3), R4 was located in the central part of the diagram, while individuals R1, R4, and R5 were highly prevalent in the central part of the diagram.Thus, it can be inferred that R4 is an ancient ribotype.The 11 ribotypes detected in populations in Central China were divided into two lineages: an East China lineage (Central China and East China) and a Southwest China lineage (Central China and Southwest China).Ten of the 11 ribotypes detected were found in populations in Central China.R6 was identified as an endemic ribotype in Gexian Mountain (GXS) (Figure 3, Table 3).

Genetic Diversity and Population Genetic Structure
In terms of H d , the Wuyi Mountain (WYS) population in Fujian had the highest diversity index, and the Dabie Mountain (DBS) population in Anhui had the lowest genetic diversity index.The overall population haplotype diversity was H d = 0.830, the overall population nucleotide diversity was P i × 10 −3 = 0.878, and the average nucleotide difference count was K = 1.916.The variation in H d among the different populations ranged from 0.257 to 0.836, the variation in nucleotide diversity (P i × 10 −3 ) ranged from 0.230 to 1.113, and the variation in the mean nucleotide difference count (K) ranged from 0.524 to 2.191 (Table 4).In Southwest China (Chongqing, Sichuan), Central China (Hubei, Hunan), and East China (Jiangxi, Anhui, Fujian, Zhejiang), H d varied from 0.627 to 0.774, and nucleotide diversity (P i × 10 −3 ) varied from 1.050 to 1.240.

Genetic Diversity and Population Genetic Structure
In terms of Hd, the Wuyi Mountain (WYS) population in Fujian had the highest diversity index, and the Dabie Mountain (DBS) population in Anhui had the lowest genetic diversity index.The overall population haplotype diversity was Hd = 0.830, the overall population nucleotide diversity was Pi × 10 −3 = 0.878, and the average nucleotide difference count was K = 1.916.The variation in Hd among the different populations ranged from 0.257 to 0.836, the variation in nucleotide diversity (Pi × 10 −3 ) ranged from 0.230 to 1.113, and the variation in the mean nucleotide difference count (K) ranged from 0.524 to 2.191 (Table 4).In Southwest China (Chongqing, Sichuan), Central China (Hubei, Hunan), and East China (Jiangxi, Anhui, Fujian, Zhejiang), Hd varied from 0.627 to 0.774, and nucleotide diversity (Pi × 10 −3 ) varied from 1.050 to 1.240.
The range of the mean nucleotide difference (K) was 1.895-2.771.The genetic diversity of the populations in Central China was higher than that in the other regions.The population differentiation index (FST) for cpDNA in P. conradinae was 0.48886, signifying a significant level of differentiation.Among populations, genetic variation accounted for 48.89%, while within populations it was 51.11%.Genetic variation within populations slightly exceeded the variation between populations, although the values were similar [27].As shown by the AMOVA results, genetic variation between populations accounted for 3.06% of three geographical groups and genetic variation within populations accounted for 46.32% of three geographical groups.Within populations, the genetic variation within three geographical groups was 50.62%, slightly higher compared to the genetic variation between populations in three geographical groups.The genetic differentiation parameters The range of the mean nucleotide difference (K) was 1.895-2.771.The genetic diversity of the populations in Central China was higher than that in the other regions.The population differentiation index (F ST ) for cpDNA in P. conradinae was 0.48886, signifying a significant level of differentiation.Among populations, genetic variation accounted for 48.89%, while within populations it was 51.11%.Genetic variation within populations slightly exceeded the variation between populations, although the values were similar [27].As shown by the AMOVA results, genetic variation between populations accounted for 3.06% of three geographical groups and genetic variation within populations accounted for 46.32% of three geographical groups.Within populations, the genetic variation within three geographical groups was 50.62%, slightly higher compared to the genetic variation between populations in three geographical groups.The genetic differentiation parameters of the P. conradinae population (N ST = 0.29843, N ST = 0.28176, p < 0.05) indicated population substructuring.Genetic differentiation was detected in all three geographic regions: Southwest China (N ST = 0.081, N ST = 0.072, p < 0.05), Central China (N ST = 0.22810, G ST = 0.18051, p < 0.05), and East China (N ST = 0.33970, G ST = 0.30473, p < 0.05) (Tables 4 and 5).The total ribotype diversity (R d ) was 0.798, the nucleotide diversity (P i × 10 −3 ) was 0.886, and the average nucleotide difference (K) was 3.799.The R d value of the different populations ranged from 0.000 to 0.798, the nucleotide diversity (P i × 10 −2 ) ranged from 0.000 to 0.886, and the average nucleotide difference (K) ranged from 0.000 to 3.799 (Table 4).The R d , nucleotide diversity (P i × 10 −2 ) and average nucleotide difference (K) values in Southwest, Central, and East China ranged from 0.486 to 0.745, 0.169 to 0.756, and 0.972 to 4.341, respectively.
The populations in Central China exhibited higher genetic diversity compared to those in the other regions.The population differentiation index indicated high horizontal differentiation at the population level (F ST = 0.82511).The genetic variation was 82.51% among populations and 17.49% within populations.The results of the AMOVA indicated that the genetic variation among three geographical groups was 16.51%, whereas the genetic variation among populations within three geographical groups was 56.25%, with the lowest coefficient of differentiation observed among populations in Southwest China.Within the three geographical groups, the genetic variation among populations exceeded that within populations (Tables 4 and 5).6).Alternatively, examining the similarity between the expected value and the observed value curves can provide insights.Greater The estimated coancestor time for the 10 ribotypes of P. conradinae was 4.38 Mya (95% HPD: 2.38-6.51Mya) in the Pliocene epoch.We identified two distinct lineages: an East China lineage (Central China and East China) and a Southwest China lineage ((Central China and Southwest China) and East China).These two lineages originated approximately 4.38 million years ago (Mya) in the early Pliocene due to geographic isolation.The first ribotype to differentiate from the population of P. conradinae in Central China + East China lineage was R10, belonging to the Central-China-specific ribotype, which indicated that the population of P. conradinae in Central China and East China differentiated first.The estimated differentiation time for this event was approximately 3.32 Mya (95% HPD: 1.12-5.17Mya) during the Pliocene epoch.R11 was the earliest haplotype to diverge from a Southwest China lineage ((Central China and Southwest China) and East China).R11 belongs to a ribotype specific to East China and originated approximately 2.17 Mya (95% (HPD: 0.47-4.54Mya).This suggests that P. conradinae may have spread to the southwest from East China.On the other hand, R3 was the first ribotype to differentiate from the lineage of Central China + Southwest China.R3 is exclusive to Central China and emerged around 1.10 Mya (95% HPD: 0.11-2.85Mya) during the Pleistocene.This indicates that the differentiation time of P. conradinae from Central China to Southwest China was about 1.10 Mya (Figure 4B).
If a mismatch distribution analysis curve exhibits bimodal or multi-modal patterns, and the values of SSD and Hrag are low (p value of <0.05 under a transient expansion model), it suggests that the population of P. conradinae is relatively stable or gradually declining over time.Conversely, if a mismatch distribution analysis curve displays a single peak, and the values of SSD and Hrag are high (p value of >0.05), it indicates a recent population expansion event (Figure 5, Table 6).Alternatively, examining the similarity between the expected value and the observed value curves can provide insights.Greater similarity signifies a past expansion event, whereas greater dissimilarity suggests no expansion event (Figure 5).The analysis of mismatch distribution at the species level revealed that all 12 populations in the eight provinces had a double p value greater than 0.05, indicating population expansion (Table 6).At the population and geographic grouping levels, the population of P. conradinae peak was insignificant in Central China and East China but significant in Southwest China, with all p-values exceeding 0.05.These results demonstrate that both the population and the three geographic groups of P. conradinae align with expansion at the P. conradinae population and geographic levels.
Plants 2024, 13, x FOR PEER REVIEW 14 of 20 similarity signifies a past expansion event, whereas greater dissimilarity suggests no expansion event (Figure 5).The analysis of mismatch distribution at the species level revealed that all 12 populations in the eight provinces had a double p value greater than 0.05, indicating population expansion (Table 6).At the population and geographic grouping levels, the population of P. conradinae peak was insignificant in Central China and East China but significant in Southwest China, with all p-values exceeding 0.05.These results demonstrate that both the population and the three geographic groups of P. conradinae align with expansion at the P. conradinae population and geographic levels.

Population Variation and Taxonomic Status
An optimal nucleotide replacement model, GTR + F + I + G4, was constructed using IQ-Tree, version 1.6.12,based on the BIC.Among the 18 haplotypes, H10, H14, H16, H17, and H18 were specific to MYS, GXS, WCP, FHC, and HFG, respectively.The phylogenetic analysis of nrDNA (ITS) molecular markers revealed that R6 was a specific ribotype for GXS, among 11 ribotypes examined (Table 3, Figure 6).The cpDNA sequence-binding phenotype supported the existence of two varieties (P.conradinae var.ruburm and P. conradinae var.pubescens).However, the combination of nrDNA (ITS) sequence and phenotype provided evidence in favor of P. conradinae var.ruburm as a distinct variety.Insufficient molecular evidence was found for P. conradinae var.ruburm, and no pronounced variation was observed between P. conradinae and other populations.Therefore, the results from morphological and sequence markers support a distinction between P. conradinae and P. conradinae var.pubescens [48] (Table 3, Figure 6).
Plants 2024, 13, x FOR PEER REVIEW 15 of 20 phenotype provided evidence in favor of P. conradinae var.ruburm as a distinct variety.Insufficient molecular evidence was found for P. conradinae var.ruburm, and no pronounced variation was observed between P. conradinae and other populations.Therefore, the results from morphological and sequence markers support a distinction between P. conradinae and P. conradinae var.pubescens [48] (Table 3, Figure 6).

Genetic Diversity and Population Genetic Structure
P. conradinae is widely distributed throughout China and exhibits significant morphological diversity and phenotypic variation.Genetic differentiation among populations was observed based on cpDNA and nrDNA markers, indicating high levels of population genetic diversity in P. conradinae (Hd = 0.830; Rd = 0.798).This level of diversity surpasses the average value found in 170 other seed plant species (mean value: Hd = 0.67) [49].High Hd of cpDNA is typically associated with a long evolutionary history and restricted gene flow between populations [50,51].
In the present study, between-population variation was more common than within-

Discussion
4.1.Genetic Diversity and Population Genetic Structure P. conradinae is widely distributed throughout China and exhibits significant morphological diversity and phenotypic variation.Genetic differentiation among populations was observed based on cpDNA and nrDNA markers, indicating high levels of population genetic diversity in P. conradinae (H d = 0.830; R d = 0.798).This level of diversity surpasses the average value found in 170 other seed plant species (mean value: H d = 0.67) [49].High H d of cpDNA is typically associated with a long evolutionary history and restricted gene flow between populations [50,51].
In the present study, between-population variation was more common than withinpopulation variation, and genealogical structure was detected at the population level in all three geographic regions, albeit with low genetic differentiation coefficients.The southwestern region of China exhibited the lowest differentiation.Gene exchange was also apparent in this region, likely due to the abundance of wild cherry blossom resources in the region.In contrast, significant differentiation was detected in Central China, possibly linked to the strong adaptability of P. conradinae and seed dispersal facilitated by birds, animals, water, and wind [52].

Geographical Structure of Pedigree
The P. conradinae population was divided into two distinct lineages consisting of three geographic groups: an East China lineage (Central China and East China) and a Southwest China lineage ((Central China and Southwest China) and East China), as determined by the cpDNA and nrDNA haplotype phylogenetic tree and the TCS network map [44,45].The phylogeographic group differentiation occurred approximately 4.38 Mya (95% HPD: 2.38-6.51Mya) during the early Pliocene, resulting in the divergence of the two lineages due to geographic isolation [23].Climate change in the Pliocene had a marked influence on the population expansion of P. conradinae [12].East China lineage (Central China and East China), and R10, a ribotype specific to populations in Central China, was the first ribotype to differentiate from this lineage [21].P. conradinae expanded from Central China to Eastern China at 3.32 Mya (95% HPD: 1.12-5.17Mya) in the Pliocene.Another lineage, referred to as Southwest China lineage ((Central China and Southwest China) and East China), was identified.R11 was the first ribotype to differentiate from the P. conradinae population in the Southwest China lineage (East China and (Central China and Southwest China)) lineage and falls into the East-China-specific ribotype category.R11 belongs to a ribotype specific to East China and originated approximately 2.17 Mya (95% (HPD: 0.774.54Mya).This suggests that P. conradinae may have spread to the southwest from East China and significant gene exchange between East China and Central China.This gene exchange has led to noticeable morphological variation and intraspecific genetic differentiation.The majority of the other subclades underwent divergence in the Pliocene period.Ribotype R3 marks the initial differentiation within the Central China + Southwest China lineage of the population of P. conradinae.This ribotype was exclusively found in Central China of the population of P. conradinae, with ribotype differentiation occurring round 1.10 Mya (95% HPD: 0.11-2.85Mya) during the Pleistocene epoch.The population of P. conradinae experienced differentiation from Central China to Southwest China around 1.10 Mya (95% HPD: 0.11-2.85Mya) during the early Pleistocene of the Quaternary period.Consequently, the formation of the distribution center and overall pattern of P. conradinae occurred during the transition from the Pliocene to the Pleistocene, aligning with predictions that the distribution center of cherry blossom was established prior to the commencement of the Quaternary glacial period.
The mountains in the Northern Hemisphere are located at middle and lower latitudes.Glacial activity during the Quaternary resulted in alternating periods of cold glaciation and warm interglacial periods, causing significant fluctuations in sea levels [53].P. conradinae is believed to have differentiated in Southwest China.The results of the mismatch distribution analysis indicated that the population of P. conradinae and the three geographic groups did not reject the expected expansion model [54].The haplotype distribution, diversity analysis, and TCS results pointed to the highest genetic diversity among the populations in Central China.The species of haplotype were most abundant in the southeastern region near Wuyi Mountain in Eastern China and were the most likely sanctuaries for P. conradinae.Taking into account the haplotype distribution range and diversity index analysis, it can

Figure 2 .
Figure 2. (A) The distribution range of P. conradinae in China and the potential habitat for P. conradinae in China under current climatic conditions.(B) The locations of the 12 natural populations and the geographic distribution of the 18 cpDNA haplotypes (H1-H18) are depicted in pie charts, where the size of each chart corresponds to the number of individuals sampled.(C) The TCS network illustrates the interrelationships among haplotypes, with circles representing each haplotype and colors corresponding to their respective populations.The size of each pie chart is proportionate to the frequency of its associated haplotype.

Figure 2 .
Figure 2. (A) The distribution range of P. conradinae in China and the potential habitat for P. conradinae in China under current climatic conditions.(B) The locations of the 12 natural populations and the geographic distribution of the 18 cpDNA haplotypes (H1-H18) are depicted in pie charts, where the size of each chart corresponds to the number of individuals sampled.(C) The TCS network illustrates the interrelationships among haplotypes, with circles representing each haplotype and colors corresponding to their respective populations.The size of each pie chart is proportionate to the frequency of its associated haplotype.

Figure 3 .
Figure 3. (A) The distribution range of P. conradinae in China and the potential habitat for P. conradinae in China under current climatic conditions.(B) The locations of the 12 natural populations and the geographic distribution of 11 nrDNA Ribotypes (R1-R11) are depicted in pie charts, with the size of each chart corresponding to the number of individuals sampled.(C) The TCS network visually depicts the interrelationships among Ribotypes, which are represented by circles.The colors of these circles correspond to their representation across all populations.Additionally, the size of each pie chart accurately reflects the frequency of its respective Ribotype.(D) The Bayesian phylogenetic tree of Ribotypes size is constructed based on nrDNA (ITS) sequences.

Figure 3 .
Figure 3. (A) The distribution range of P. conradinae in China and the potential habitat for P. conradinae in China under current climatic conditions.(B) The locations of the 12 natural populations and the geographic distribution of 11 nrDNA Ribotypes (R1-R11) are depicted in pie charts, with the size of each chart corresponding to the number of individuals sampled.(C) The TCS network visually depicts the interrelationships among Ribotypes, which are represented by circles.The colors of these circles correspond to their representation across all populations.Additionally, the size of each pie chart accurately reflects the frequency of its respective Ribotype.(D) The Bayesian phylogenetic tree of Ribotypes size is constructed based on nrDNA (ITS) sequences.

Figure 4 .
Figure 4. (A) The complete chloroplast genome was utilized to determine the divergence time of various Rosaceae species based on five divergence time nodes.(B) The construction of nrDNA (ITS) ribosomal divergence time of P. conradinae is based on two nodes representing differentiation events.If a mismatch distribution analysis curve exhibits bimodal or multi-modal patterns, and the values of SSD and Hrag are low (p value of <0.05 under a transient expansion model), it suggests that the population of P. conradinae is relatively stable or gradually declining over time.Conversely, if a mismatch distribution analysis curve displays a single peak, and the values of SSD and Hrag are high (p value of >0.05), it indicates a recent population expansion event (Figure 5, Table6).Alternatively, examining the similarity between the expected value and the observed value curves can provide insights.Greater

Figure 4 .
Figure 4. (A) The complete chloroplast genome was utilized to determine the divergence time of various Rosaceae species based on five divergence time nodes.(B) The construction of nrDNA (ITS) ribosomal divergence time of P. conradinae is based on two nodes representing differentiation events.

Figure 5 .Table 6 .
Figure 5. (A-D) Mismatch distribution map of distinct geographical lineages (groups) of P. conradinae based on nrDNA fragment.(E-H) The mismatch distribution map of distinct geographical groups (lineages) of P. conradinae is generated based on cpDNA fragment analysis.Table 6. Neutrality and population expansion tests for P. conradinae.

Figure 5 .
Figure 5. (A-D) Mismatch distribution map of distinct geographical lineages (groups) of P. conradinae based on nrDNA fragment.(E-H) The mismatch distribution map of distinct geographical groups (lineages) of P. conradinae is generated based on cpDNA fragment analysis.

Table 1 .
Voucher information and geographic characteristics of 12 populations of P. conradinae.

Table 2 .
Fossils and molecular estimation used as calibration points for molecular dating.

Table 5 .
Analyses of molecular variance (AMOVAs) based on cpDNA and nrDNA data for populations of P. conradinae.
CT : Proportion of genetic variation among groups.F SC : Proportion of genetic variation between populations within groups.F ST : Proportion of genetic variation between populations and groups overall.

Table 6 .
Neutrality and population expansion tests for P. conradinae.