Spatial Genetic Structure of Prunus mongolica in Arid Northwestern China Based on RAD Sequencing Data

: The extensive range of sand deserts, gravel deserts, and recent human activities have shaped habitat fragmentation of relict and endangered plants in arid northwestern China. Prunus mongolica is a relict and endangered shrub that is mainly distributed in the study area. In the present study, population genomics was integrated with a species distribution model (SDM) to investigate the spatial genetic diversity and structure of P. mongolica populations in response to habitat fragmentation and create a proposal for the conservation of this endangered species. The results showed that the northern marginal populations were the ﬁrst isolated from other populations. The SDM suggested that these marginal populations had low levels of habitat suitability during the glacial period. They could not obtain migration corridors, and thus possessed low levels of gene ﬂow connection with other populations. Additionally, several populations underwent secondarily geographical isolation from other central populations, which preserved particular genetic lineages. Genetic diversity was higher in southern populations than in northern ones. It was concluded that long-term geographical isolation after historical habitat fragmentation promoted the divergence of marginal populations and refugial populations along mountains from other populations. The southern populations could have persisted in their distribution ranges and harbored higher levels of genetic diversity than the northern populations, whose distribution ranges ﬂuctuated in response to paleoclimatic changes. We propose that the marginal populations of P. mongolica should be well considered in conservation management. ,


Introduction
Plant species are especially crucial to the maintenance and stability of arid land ecosystems. In arid northwestern China, shrubs are edifiers that adapt to the habitats of sand and gravel deserts. The effect of long-term aridification processes has accommodated a few relict shrubs, such as Prunus mongolica, Tetraena mongolica, Ammopiptanthus mongolicus, Gymnocarpos przewalskii, and Helianthemum songaricum. At present, most of them are also categorized as endangered plants. The genetic structures of these relict plants were investigated to improve conservation management [1][2][3]. Habitat fragmentation is considered to be the main driving force in shaping the spatial genetic structure of these relict and endangered shrubs in arid northwestern China [2][3][4][5].
Two types of habitat fragmentation affected the relict and endangered plants in arid northwestern China. The first was promoted by the ancient development of extensive ranges of sand and gravel deserts following the enhancing aridification of the Asian interior. In response to the retreat of Tethys and the uplift of the Qinghai-Tibet Plateau Diversity 2021, 13, 397 2 of 12 and its surrounding high mountains, the initiation of aridity began in the study area [6]. This region underwent increasing aridity and experienced the onset of desert conditions during the late Miocene, but extensive deserts in the study area also occurred during the Pleistocene [6,7]. The second fragmentation type was caused by recent human activities such as urban sprawl, industrial development, coal mining, and livestock grazing [8], which may have resulted in habitat loss and the fragmentation of these endangered plants. These relict and endangered plants usually survive and are distributed in narrow ranges. Thus far, the effect of fragmented habitats on the genetic structures of relict and endangered plants in arid northwestern China remains unclear. This basic information is significant if we are to understand the evolutionary history of-and make effective conservation managements for-these endangered shrubs.
Endangered P. mongolica is a shrub of 1-2 meters in height and is mainly distributed in arid northwestern China (Figure 1). The habitats of this plant are sand deserts, gravel deserts, desert grasslands, and dry river beds. At present, it has fragmented distributions surrounding several deserts in the study area, such as the Tengger Desert, Maowusu Desert, Ulan Buh Desert, and Kubuqi Desert ( Figure 1). This arid region is characterized by annual rainfalls of 150-180 mm and the mean annual temperature is 8-9 • C [9]. It is also one of the most important livestock production areas in China. The breeding system of P. mongolica was determined to be entomophily and outcrossing [10]. Previous study indicated P. mongolica shared a close phylogenetic relationship with P. tangutica and P. davidiana based on complete chloroplast genome sequences [11]. However, these species are significantly different at distribution range and habitat characteristics. P. mongolica distributes in arid desert or desert grasslands, while these two closely related species grow in humid forests and shrub slopes. Due to its narrow distribution and recent habitat loss, P. mongolica is recorded as the second conservation level in the China Plant Red Data Book [12]. The spatial genetic pattern of P. mongolica was also previously investigated using cpDNA sequences. The cpDNA markers revealed the genetic character from the maternal inheritance of seed-mediated gene flows. It suggested an east-west split of haplotypes for this species [13,14]. However, the effects of habitat fragmentation on population genetic structure remain mostly unclear.
In this study, we employed RAD sequencing technology [15] and generated genomewide SNP data for P. mongolica populations across nearly its full range. The genome-wide SNPs were proven to have a strong ability to distinguish between population structures and interpret evolutionary history for organisms [16,17]. This method has been frequently used in molecular phylogeography and conservation genetics [18]. Based on the sampling strategy and molecular approaches, the aims of our study were to: (1) clarify the spatial genetic diversity and structure of P. mongolica populations in response to habitat fragmentation, (2) make a proposal to maintain the genetic integrity of this endangered species for conservation. The present study provides important guidelines for the conservation management of P. mongolica germplasm and conservation implications for endangered plants in arid northwestern China. Similar to the ADMIXTURE clustering, the PCA analysis also showed that the populations of P1, P4, and P5 were independent and isolated from other populations in the three-dimensional coordinate axis (Figure 2a). For the remaining seven populations, the PCA analysis showed that several individuals from the populations of P7, P8, and P10 were isolated from other individuals (Figure 2b). Based on the pairwise genetic distances among these 97 individuals using the high-quality unlinked genomic SNPs dataset, an NJ phylogenetic tree was constructed ( Figure 3). It showed that each of the ten populations was well-grouped and clustered in an independent clade. These ten clades were supported by high bootstrap values (Figure 3).

Plant Collection and DNA Extraction
A total of 97 individuals from 10 populations across the full distribution range of P. mongolica were sampled and sequenced to generate genome-wide SNPs ( Figure 1, Table S1). This species does not reproduce vegetatively by root suckers. We sampled neighboring individuals at least 5 m apart within each population. Sampling was carried out according to the size of the populations, with 10 to 15 individuals sampled for large populations. For several small populations, we tried to collect as many individuals as possible. We collected leaf materials for each of the 97 sampled individuals from the field. Then, they were dried and stored with silica gel in the laboratory. Total genomic DNA was extracted using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) following the manufacturer's instructions.

RAD Sequencing and SNP Calling
Libraries from genomic DNA of each of the 97 individuals were constructed through RAD-seq methodology [19] at Novogene Co. Ltd. in Beijing, China. First, the genomic DNA was digested with high-fidelity restriction enzymes. Then, libraries were created by uniquely barcoding each of the individuals from each respective site and pooling barcoded individuals. The libraries were then pooled for polymerase chain reactions (PCRs) and the PCR products were purified. Finally, the sequencing was performed with an Illumina Hiseq PE150 platform.
Raw reads were filtered for quality control under the following criteria: adapter reads were initially filtered out, paired reads were discarded if the number of Ns in either read exceeded 10%, and the number of low quality (Q ≤ 5) bases in a single read was restricted to less than 50%. Clean reads from the sequenced samples were mapped to the reference genome of Prunus dulcis (https://www.rosaceae.org/organism/Prunus/dulcis accessed on 22 March 2021) using BWA software with the parameters "mem -t 4 -k 32 -M" [20]. After alignment, SNPs were called using SAMtools [21]. The output of SNPs was further filtered under the parameters "dp7-miss0.1-maf0.05". These thresholds were used to clean the low-quality SNPs with complete degree < 0.1, coverage depth < 7, and rare SNPs filter of MAF < 0.05. To reduce the bias of linkage disequilibrium (LD), SNPs in strong LD (pairwise genotype correlation r 2 > 0.2) were excluded using PLINK [22] in a window of 50 SNPs (sliding window overlap: 10 SNPs at a time). The final dataset contained 8328 SNPs. Next, the dataset of high-quality unlinked SNPs was used for the following analysis.

Genetic Structure Analyses
First, the population structure of sampled individuals was clustered using the program ADMIXTURE ver. 1.3.0 [23]. This program infers the ancestry of the sampled individuals based on large SNP genotype datasets using a block relaxation algorithm. While using ADMIXTURE, a defined K value was needed that indicated the number of ancestral populations. For each K value, a cross-validation (CV) error was exhibited during the calculation. The K value was set ranging from 1 to 8 when running the program. Here, we employed two iterations of ADMIXTURE. The first was conducted for all individuals from ten populations. Three populations (P1, P4, and P5) were independent from other populations (see results). The second was conducted for the remaining seven populations to investigate the deep-level population structure.
Next, the PCA was calculated for the 97 individuals using the GCTA ver. 1.93.2 program [24]. Finally, a neighbor-joining (NJ) phylogenetic tree of 97 individuals from 10 populations was constructed using the program MEGA-X ver. 10.1.8 [25] based on the genetic distance matrix. The node supports were assessed through bootstrap resampling (BP) with 1000 repetitions. Meanwhile, nucleotide diversity (π) was computed using the software ARLEQUIN ver. 3.5.2.2 [26] for the ten populations. Two other indexes of genetic diversity, including observed heterozygosity (H o ) and expected heterozygosity (H e ), were estimated using the software PLINK [22]. To describe the spatial pattern of genetic diversity across the distribution range, a landscape map of genetic variation was generated based on the values of nucleotide diversity (π) for each population using the Genetic Landscapes GIS Toolbox [27] in ArcGIS 10.2 (ESRI, Redlands, CA, USA).

Species Distribution Model and Potential Migration Corridors
Present-day occurrence locations of P. mongolica were acquired from our field surveys and previous publications [13,14]. A total of 19 P. mongolica occurrence points were used in the species distribution modeling (SDM) analyses. To establish the SDMs, we initially downloaded 19 bioclimatic variables at a resolution of 2.5 arcmin from the PaleoClim Database (http://www.paleoclim.org/, accessed on 22 March 2021; [28]). To avoid model overfitting, we calculated the correlation coefficients of these 19 bioclimatic variables and retained the least correlated variables (Spearman's < 0.8). At last, seven remaining bioclimatic variables were used in the SDMs to model the potential distributions of this species. The Diversity 2021, 13, 397 5 of 12 last glacial maximum (LGM) distribution of this species was also modeled with the seven corresponding LGM bioclimatic variables under the Community Climate System Model version 4 (CCSM4 model), at a resolution of 2.5 arcmin, which were downloaded from the PaleoClim Database (http://www.paleoclim.org/ accessed on 22 March 2021; [28]). We generated present-day and LGM potential distributions of P. mongolica using the maximum entropy approach as implemented in MAXENT, version 3.3.3k [29]. The location data were randomly set to 75% for training and 25% for model testing, respectively. Model performance was evaluated by the area under the receiver operating characteristic curve (AUC) of a receiver operating characteristic plot.
Based on the generated layers of potential distributions, we inferred the potential migration corridors of P. mongolica in the present and LGM periods by the least-cost path (LCP) analysis using SDMtoolbox 2.0 (reference). First, the resistance layers of P. mongolica were created through inverting the generated layers of potential distributions (1-SDM habitat suitability value) during the present and LGM periods. Second, a cost distance raster for each sample locality was generated according to dispersal cost layer (resistance layer). Then, the corridor layers were established between every pair of localities based on the cost distance raster. Finally, all of the pairwise corridor layers were summed as the eventual dispersal corridor for P. mongolica [30]. To assess the effect of habitat suitability on P. mongolica genetic diversity at the population level, we computed correlation coefficients between indexes of population genetic diversity (π, H o and H e ) and current/LGM suitability values using SPSS 17.0 (IBM, Almond, NY, USA).

Results
According to the generated sequences of 97 P. mongolica individuals, the GC content of these individuals was about 37%. The numbers of mapped reads for each individual ranged from 5,895,000 to 18,359,186. The mapping rate ranged from 36.17% to 93.47%. The average depth of sequencing was from 13.36× to 31.06×. At last, 8328 high-integrity unlinked SNPs were used in the following analyses.
When employing ADMIXTURE for the dataset of all 10 populations, decreasing trends of the CV error values were shown when the K values changed from 1 to 8. According to genetic clustering of these 97 individuals, the P1, P4 and P5 populations were independent and isolated from other populations when the K value was 4 ( Figure 1b). Clear population structure was not obtained when the K value was set larger than 4. When investigating the population structure for the remaining seven populations, the lowest CV error values were shown when the K value was 5. According to the obtained clustering, four independent clusters were shown in five individuals of the P7 population, four individuals of the P10 population, five individuals of the P8 population, and four individuals of the P8 population, respectively (Figure 1c). The remaining individuals were grouped in the fourth cluster.
Similar to the ADMIXTURE clustering, the PCA analysis also showed that the populations of P1, P4, and P5 were independent and isolated from other populations in the three-dimensional coordinate axis (Figure 2a). For the remaining seven populations, the PCA analysis showed that several individuals from the populations of P7, P8, and P10 were isolated from other individuals (Figure 2b). Based on the pairwise genetic distances among these 97 individuals using the high-quality unlinked genomic SNPs dataset, an NJ phylogenetic tree was constructed (Figure 3). It showed that each of the ten populations was well-grouped and clustered in an independent clade. These ten clades were supported by high bootstrap values (Figure 3).  The values of nucleotide diversity (π), the average observed heterozygosity (Ho) and expected heterozygosity (He) for this species were 0.339 (±0.160), 0.378 (±0.187) and 0.350 (±0.128) ( Table 1)    Nind, the number of individuals analyzed; π, the average nucleotide diversity; SD, standard deviation; Ho, the average observed heterozygosity; He, the average expected heterozygosity.  Table 1). The parallelogram was generated based on the convex hull of eight geographical points.
Present and LGM SDMs of P. mongolica were well performed, which was supported by the high AUC values of 0.998/0.997 for the training and test data. In comparison with the present distribution (Figure 5a,b), the LGM distribution shrunk significantly in marginal populations such as P4, P5, and P10. Based on the present and LGM modeling distributions, the potential migration corridors of P. mongolica existed among the central populations (Figure 5c,d). Especially during the LGM period, the northern marginal populations displayed a weak connection with the southern central populations. The nucleotide diversity (π) and expected heterozygosity (He) were not significant correlated with both current and LGM habitat suitability values for P. mongolica (p-value > 0.1). The observed heterozygosity (Ho) was significant correlated with current habitat suitability value (pvalue = 0.043), but it was not significant correlated with LGM habitat suitability values (pvalue > 0.1).  Table 1). The parallelogram was generated based on the convex hull of eight geographical points.
Present and LGM SDMs of P. mongolica were well performed, which was supported by the high AUC values of 0.998/0.997 for the training and test data. In comparison with the present distribution (Figure 5a,b), the LGM distribution shrunk significantly in marginal populations such as P4, P5, and P10. Based on the present and LGM modeling distributions, the potential migration corridors of P. mongolica existed among the central populations (Figure 5c,d). Especially during the LGM period, the northern marginal populations displayed a weak connection with the southern central populations. The nucleotide diversity (π) and expected heterozygosity (H e ) were not significant correlated with both current and LGM habitat suitability values for P. mongolica (p-value > 0.1). The observed heterozygosity (H o ) was significant correlated with current habitat suitability value (p-value = 0.043), but it was not significant correlated with LGM habitat suitability values (p-value > 0.1).

Effect of Habitat Fragmentation on Population Structure of P. mongolica
With respect to population genetic diversity, we found that it was higher in southern

Effect of Habitat Fragmentation on Population Structure of P. mongolica
With respect to population genetic diversity, we found that it was higher in southern populations than northern populations, based on genome-wide SNPs (Figure 4). Our results were consistent with the report on P. mongolica population genetic diversity based on cpDNA sequences [13]. SDM showed that these southern populations exhibited extensive suitable habitats for this species in both the present and glacial periods (Figure 5a,b). In contrast, the northern populations experienced fluctuations in distribution ranges. Thus, the southern populations could have persisted in their distribution ranges and harbored high levels of genetic diversity during the process of paleoclimatic change. This hypothesis was also supported by a few of species [31,32] that habitat stability was positively associated with population genetic diversity. However, the low levels of genetic diversity for northern populations were not the result of founder effects following interglacial recolonizations [33]. Our population genetic structure showed that the northern populations of P1, P4, and P5 had distinctly different genotypes from the southern populations. The low levels of genetic diversity for northern populations could have been caused by small effective population sizes in response to fluctuations in distribution ranges following paleoclimatic change.
In the present study, we found that the northern marginal populations of P1, P4, and P5 were the first isolated from other populations (Figures 1b and 2a). Historically, northwestern China experienced long-term aridification in response to Cenozoic global cooling and the uplift of the Qinghai-Tibetan Plateau [34]. During the process of stepwise climatic drying, extensive ranges of sand and gravel deserts developed in the study area (Figure 1a) [6]. Due to the effects of geographical barriers, gene flow was blocked for these northern populations along the desert margins and due to long geographical distance from other populations. The SDM suggested that these three populations had low levels of habitat suitability during the glacial period (Figure 5b). They could easily have been isolated from other populations following the process of paleoclimatic change. From the modeling of migration corridors, P. mongolica was well-connected among the central populations along the southern margin ( Figure 5). These northern marginal populations had low levels of opportunity for migration and gene flow connection with other populations. After long-term geographical isolation, they harbored particular genetic clades and persisted in independent locations. This is a different pattern of population structure for P. mongolica generated based on genome-wide SNPs. This pattern is consistent with the spatial genetic structure of Panzerina lanata in the same study area [35].
When the deep-level genetic structure of remaining P. mongolica populations was investigated, we found that several individuals within the populations of P7, P8, and P10 preserved particular genetic clades (Figures 1c and 2b). The data suggested these three populations underwent secondarily geographical isolation from other central populations.
Here, the populations of P7 and P8 were located at the foot of the Helan Mountains, which had served as refugia for a number of species in the study area [35][36][37]. The population of P10 was distributed in the west marginal area of these deserts, a long distance from other populations (Figure 1a). Additionally, this population was not supported by clear migration corridors connecting with central populations ( Figure 5). The population of P10 could have been limited with respect to gene flow by geographical isolation. From the phylogenetic tree of all sampled individuals (Figure 3), each of ten populations formed a solid clade under high support values. In general, this genetic structure indicated a fragmented pattern among these P. mongolica populations.

Conservation Implications
P. mongolica is listed as a relict and endangered arid plant species in China [9]. This species is used as forage, in traditional medicine, and for sand fixation. P. mongolica struggles with natural regeneration due to the harsh habitat. In addition, this species experienced extensive habitat loss and decrease in population size in recent years [38,39]. Artificial reproduction was undertaken for afforestation and it was protected in several natural reserves. However, these natural reserves were established to protect special ecosystems in arid lands, such as mountainous forests or desert ecosystems. To maintain the evolutionary potential to overcome future environmental changes, it is crucial to preserve the integrity of the genetic structure of this species. Based on our obtained results, we propose several strategies for future protective actions.
On the one hand, the marginal populations of P. mongolica should be well-considered. The marginal populations of P1, P4, P5, and P10 harbor particular genetic lineages. These marginal populations preserve low levels of genetic diversity. SDM also suggested that the habitat suitability of these marginal populations could be sensitive to climatic changes. Thus, these marginal populations should be given conservation priority as they face the challenges of environmental changes. Additionally, ex situ conservation measures will be needed to preserve the gene pools of these marginal populations. On the other hand, genomic data and cpDNA data should be combined to fully understand the population genetic structure of P. mongolica. Here, we found different patterns of genetic structure among two sets of genetic data. However, the populations near the mountains (such as P4, P5, P7, and P8) preserve unique genetic resources for this species, based on both genomic data and cpDNA data [13].

Conclusions
The present study investigated population genetic structures and conservation strategies for a relict and endangered plant, P. mongolica, in arid northwestern China. Based on the obtained genome-wide SNPs, we found that long-term geographical isolation after historical habitat fragmentation had promoted the divergence of marginal populations and refugial populations along mountains from other populations. They harbored particular genetic lineages. We also found that the northern populations had experienced fluctuations in distribution ranges in response to paleoclimatic changes. The southern populations could have persisted in their distribution ranges and harbored higher levels of genetic diversity than the northern populations. Based on these results, we propose that the marginal populations of P. mongolica should be well considered in conservation management. Future studies should highlight the possible genetic mechanism of endangerment and the genetic basis of adaptability to environmental changes.