Genetic Diversity of Rhanterium eppaposum Oliv. Populations in Kuwait as Revealed by GBS

Natural populations of Rhanterium eppaposum Oliv. (Arfaj), a perennial forage shrub, have depleted due to unethical human interventions and climate change in Kuwait. Therefore, there is an urgent need to conserve this native plant through the assessment of its genetic diversity and population structure. Genotyping by sequencing (GBS) has recently emerged as a powerful tool for the molecular diversity analysis of higher plants without prior knowledge of their genome. This study represents the first effort in using GBS to discover genome-wide single nucleotide polymorphisms (SNPs) of local Rhanterium plants to assess the genetic diversity present in landraces collected from six different locations in Kuwait. The study generated a novel set of 11,231 single nucleotide polymorphisms (SNPs) and indels (insertions and deletions) in 98 genotypes of Rhanterium. The analysis of molecular variance (AMOVA) revealed ~1.5% variation residing among the six populations, ~5% among the individuals within the population and 93% variation present within the populations (FST = 0.029; p = 0.0). Bayesian and UPGMA analyses identified two admixed clusters of the tested samples; however, the principal coordinates analysis returned the complete population as a single group. Mantel’s test returned a very weak correlation coefficient of r2 = 0.101 (p = 0.00) between the geographic and genetic distance. These findings are useful for the native species to formulate conservation strategies for its sustainable management and desert rehabilitation.


Introduction
Land degradation is an issue of global concern and poses a serious threat to flora biodiversity, soil fertility and food security in arid lands [1]. Anthropogenic activities, climate change and human population expansion have fueled it further and increased the vulnerability of desert ecosystems [2]. Global simulations predict that current landscapes will store 24% and 10% less carbon in plants and soil, respectively, if covered by their natural vegetation [3]. A substantial amount of carbon is accumulated by native vegetation, thus reducing the impacts of climate change. Indigenous flora is not only important to a region's cultural identity but is also critical for environmental health concerning soil fertility, agricultural productivity and sustenance of biodiversity [4]. Besides this, it provides a unique niche for the dying flora and fauna [4]. Therefore, plants of genetically diverse wild species must be reintroduced on a large scale to achieve global terrestrial restoration targets [5].
Kuwait harbors an important genetic resource that contains a valuable gene pool of drought and salt tolerance. Nonetheless, Kuwaiti foliage has been severely degraded, and significant changes in vegetation communities have been observed since 1974 [6]. Further, the Gulf War of 1991 brought in enormous levels of oil pollution, badly affecting plantations in the entire country [7]. The natural plant genetic resources of Kuwait are at risk of extinction, so they must be conserved. With the signing of the Convention on Biological Diversity (CBD) in 1993, the nation developed a national biodiversity strategy [8]. This convention was intended to counteract the extinction of species and the loss of biodiversity. Not only does this refer to propagating dying species, but also to ensuring the survival of the maximum number of native species and maintaining their genetic diversity.
Rhanterium epapposum, Oliv. (Asteraceae), locally known as "Arfaj", is the national plant of Kuwait [9] and grows in deep, sandy soils and shallow substrates. The plant reproduces sexually and exhibits cross-pollination. The seeds are blown short distances by winds or transported over long distances by zoochory [10]. The diploid set of chromosome numbers of this species was identified as twelve [11]. The plant is also valued for its essential oils that have been used to treat skin ailments and gastrointestinal disorders in folklore [12]. Camels and sheep graze on it as forage. The plant's aesthetic appeal, flower color and potential to adapt to urban landscaping makes it an ideal candidate for desert rehabilitation [6]. Human interventions such as off-road driving, camping and expansion in urban areas [13] have affected the natural population of Rhanterium, thereby reducing it from 30.6% to 2.1% [14,15]. Previously common in southern Kuwait, the species now occurs only in protected areas such as Sulaibiya Field Station, military airbases, military camps and some restricted oil fields [16,17]. It is therefore necessary to understand the extent and distribution of the genetic variation within these populations, to devise sampling strategies that are efficient in capturing the genetic diversity for selection trials and to use the material in revegetation and rehabilitation projects that are intended for maintaining the biological diversity of this native plant in Kuwait.
Technological advancement has brought about a paradigm shift in the way of assessing genetic diversity. The conventional morphological markers have been replaced by the use of first-generation molecular markers such as simple sequence repeats (SSRs), amplified fragment length polymorphism (AFLP) and randomly amplified polymorphic DNA (RAPDs) [18][19][20]. Due to a further reduction in the costs of next-generation technologies, the second generation of molecular markers based on single nucleotide polymorphisms (SNPs) have now become more common [21]. Various experimental methods can be used to observe SNP markers in plants, but genotyping by sequencing is currently the most popular way to study SNPs in plants [22,23]. The GBS approach does not require prior knowledge of the genome, as it is based on the principle of genome reduction with restriction enzymes. It combines the process of marker discovery and genotyping, generating a large dataset of genome-wide SNPs, making it an ideal choice for plant genetic diversity analysis [23,24].
There are very few published studies on the genetic diversity and population structure of Rhanterium eppaposum in Kuwait, the Gulf Cooperation Council (GCC) and other tropical countries [9,[13][14][15]25]. Our study aimed to identify genome-wide and high-quality SNPs in R. eppaposum populations to estimate genetic diversity and to identify genetic structures within the wild landraces growing in Kuwait. The findings provide a basis to formulate strategies for its conservation and rehabilitation.

Results
To the best of our knowledge, in this study, SNP markers were generated and used for the first time in the non-model species of R. eppaposum through the technique of GBS.

Genotyping and SNP Density
For the 99 samples, sequencing of the GBS library yielded 40 million raw reads with an average base quality of Q 30 ( Figure S1). The sequencing coverage was estimated to be 10x. Combined variant calling files (.vcf) were generated, and a total of 38,199 genomic variants (34,109 SNPs and 2207 indels) were obtained. A high density of low-frequency variants was recorded in the samples tested in the present study ( Figure 1a). The genomic missing rates were also recorded to be high (Figure 1b), and therefore, the locus missing rate of 0.6 was set as a cut-off to filter out low-quality variants. Samples with >5% missing genotypes, variants with missing calls in more than 5% of samples and variants with minimum allele frequency (MAF) < 0.01 were filtered. This removed 15,781 (41%) low-quality SNPs/indels and a single genotype. The total number of variants remained at 11,231 (10,568 SNPs and 753 indels). The single genotype was also excluded from further analysis. genotypes, variants with missing calls in more than 5% of samples and variants with minimum allele frequency (MAF) < 0.01 were filtered. This removed 15,781 (41%) low-quality SNPs/indels and a single genotype. The total number of variants remained at 11,231 (10568 SNPs and 753 indels). The single genotype was also excluded from further analysis.

Molecular Diversity
genotypes, variants with missing calls in more than 5% of samples and variants with minimum allele frequency (MAF) < 0.01 were filtered. This removed 15,781 (41%) low-quality SNPs/indels and a single genotype. The total number of variants remained at 11,231 (10568 SNPs and 753 indels). The single genotype was also excluded from further analysis.

Molecular Diversity
A cut-off of 0.05 for the allowed level of variants for molecular diversity analysis further filtered the usable number of SNPs at each location. Maximum numbers were 8841 at Al Kabd, followed by 8838 at Al Salmi, 8583 at Om Qaser, 8547 at Al Maqwa, 8078 at SANR and 7536 at Mina Abdulla. Among these, approximately 26% were polymorphic at Al Salmi and Al Kabd, 21 % at SANR and Mina Abdulla, 16.5% at Om Qaser and 14% at Al Maqwa (Table 1).
There were more transition-type SNPs (28.40%) than transversion-type (19.5%), and the ratio of Ts/Tv was 1.5 (2351/1618). Substitutions were higher than the transitions and transversions. The observed theta (pi) was higher than the expected theta (S) at all the locations yielding a negative Tajima's D (D < 0). From this, we interpret that there is an abundance of rare alleles in all the Rhanterium populations. In addition to this, the inbreeding coefficient (FS) ranged from 0.129 to 2.83 (mean −1.095 p = 0.00).

Genetic Differentiation
An AMOVA partitioning on this dataset revealed 93% of variance distributed within the populations, 4.8% among the individuals among the populations and a minimum of 1.5% among the populations. An average F ST of 0.02, F IT of 0.06 and a F IS of 0.05 at a confidence interval of 95% was obtained. The values of fixation indices were in agreement with molecular diversity indices ( Table 2).

Population Structure
To establish phylogenetic relationships among the 98 genotypes, a cladogram was constructed using the UPGMA method and 10,568 SNPs ( Figure 3). Cluster A with four sub-clusters, cluster B with five sub-clusters and cluster C with six sub-groups each were seen. All the clusters had overlapping populations from the six locations. Only Mina Abdulla populations existed as one combined group in cluster A.  The Bayesian approach of clustering by Evanno's method demonstrated a clear peak at 3 (Figure 4a), indicating that three groups were distributed across all the Rhanterium populations. The PCA analysis plotted all the Rhanterium of Kuwait as a single collection (Figure 4b). The first three principal components explained 5.1% (PC1 = 1.8, PC2 = 1.7 and PC3 = 1.6) of the variance for R. eppaposum. A complete admixture of populations was observed in the STRUCTURE plot (Figure 4c).  The clustering analysis was further supported by Mantel's test. An r 2 of 0.101 at p = 0.05 was obtained between the genetic and geographic distance, indicating that the populations were sparingly isolated by distance ( Figure 5).

Discussion
This study reports the genome-wide characterization of Rhanterium for the first time in Kuwait. We were able to generate 10,568 SNPs from 98 genotypes and studied their genetic diversity. The model-based Bayesian clustering revealed the occurrence of three admixed clusters with only 1.5% variation residing among the populations. The high intra-population diversity is encouraging in terms of the presence of rare alleles rendering The clustering analysis was further supported by Mantel's test. An r 2 of 0.101 at p = 0.05 was obtained between the genetic and geographic distance, indicating that the populations were sparingly isolated by distance ( Figure 5).  The clustering analysis was further supported by Mantel's test. An r 2 of 0.101 at p = 0.05 was obtained between the genetic and geographic distance, indicating that the populations were sparingly isolated by distance ( Figure 5).

Discussion
This study reports the genome-wide characterization of Rhanterium for the first time in Kuwait. We were able to generate 10,568 SNPs from 98 genotypes and studied their genetic diversity. The model-based Bayesian clustering revealed the occurrence of three admixed clusters with only 1.5% variation residing among the populations. The high intra-population diversity is encouraging in terms of the presence of rare alleles rendering

Discussion
This study reports the genome-wide characterization of Rhanterium for the first time in Kuwait. We were able to generate 10,568 SNPs from 98 genotypes and studied their genetic diversity. The model-based Bayesian clustering revealed the occurrence of three admixed clusters with only 1.5% variation residing among the populations. The high intrapopulation diversity is encouraging in terms of the presence of rare alleles rendering scope for further population expansion. The technique established will also help in studying the genetic diversity of other non-model species in Kuwait and abroad.

Genotyping by Sequencing
When a reference genome is lacking, de novo assembly of a rough genome at a lower coverage is desirable. In the present study, rough genomes based on the whole genome sequencing for one representative sample were used as a reference genome for subsequent alignments. Previous reports on the use of de novo contigs as a reference for SNP discovery and genotyping on non-model species are available [23,24,26]. Genome survey sequencing has been employed previously for a native tree of Kuwait to filter microsatellite motifs [19]. However, transcriptomes have also been used as a reference for genotyping non-model species avoiding the need for de novo DNA assembly with the advantage of finding SNP sites near selected targets [27]. Some other non-reference-based methods for GBS analysis include Stacks [28], TASSEL [29] and UNEAK [30]. Haplotag Software for Haplotype-Based GBS analysis is also optimized for use in an out-crossing and self-pollinating nonmodel plant species [24,31]. The in silico prediction of optimum restriction enzymes (PstI + BtgI) was also performed from the reference genome, setting a baseline to obtain at least 1000 filtered markers. To increase the genome coverage and reduce the levels of missing data, we employed two restriction enzymes to fragment genomic DNA. Some of the previous studies have demonstrated that double-enzyme digestion generates more consistent results among different individuals than single-enzyme digestion [24,32,33].
In the current study, haplotype-based FreeBayes was employed to call putatively polymorphic variants (specifically SNPs, indels, MNPs and composite insertion and substitution events) from the short-read aligned BAM files (Phred + 33) into a variant call format [34]. Using the double-digest approach and haplotype-based variant calling, we were able to order 11,231 good quality markers against the predicted 1000 markers deemed to be sufficient for downstream genetic diversity analysis. Freebayes [34] was applied to the 1000 Genomes Exon Targeted samples, using default parameters [35]. GBS has been attempted previously to study the genetic diversity of crested wheatgrass [36], forage grasses [37] and Rimth, native to Kuwait [15,22].

Molecular Diversity
In this study, the number of polymorphic SNPs in Al Kabd populations was the highest. This confirmed our previous predictions based on RAPD, SRAP and ISSR markers [9,13,14]. Higher diversity in these populations may provide new elite and desirable alleles facilitating genotypes for desert rehabilitation and revegetation [38]. Although a high number of variants were generated, only 59% of them fell above the cut-off of 0.6 (locus missing rates). This could be attributed to the naturally high levels of sequence diversity in wild accessions [29]. Haloxylon salicornicum growing in similar environments also depicted higher locus missing rates [15].
The average transition/transversion (Ts/Tv) ratio was found to be 1.5. The mutation of methylcytosine to uracil and eventually to thymine might result in transition abundance within a species [39]. In Eukaryotes, cytosine methylation in DNA is one of the most significant changes and is crucial for gene activity regulation and genomic integrity preservation. Plant development, differentiation and response to biotic and abiotic stresses are all influenced by DNA methylation and other epigenetic mechanisms [40]. Rhanterium populations naturally grow under stressed conditions; consequently, methylation events are expected to be high. The voluminous transitions and exceedingly repetitive sequences can be con-sidered an evolutionary footprint of methylation events [40][41][42]. Analogous observations were recorded in Iranian wheat landraces [43] and wild accessions of H. salicornicum populations [15]. A highly repetitive genome was also recorded in Acacia pachyceras growing in Kuwait [19].
The negative Tajima's D values are encouraging and suggestive of excessive lowfrequency polymorphisms or rare alleles [44]. This could be attributed to a selective sweep event that might have resulted in the population expansion of the Rhanterium population in Kuwait. This was supported by the inbreeding coefficients, which indicate a higher number of observed heterozygotes within the populations [45].

Mixed Ancestry Is a Potential Cause for Low Genetic Diversity among Populations
The findings from a population structure analysis suggest that the large, genetically homogenous group of 98 genotypes of Rhanterium collected all over Kuwait is likely to originate from a common ancestor. The minor genetic differences are believed to be due to recombination events occurring during sexual reproduction [46]. These findings show that the majority of Rhanterium landraces are genetically similar, justifying the need to extend the diversity of future rehabilitation projects by integrating alternative genotypes from neighboring countries such as Saudi Arabia, UAE, Oman, Qatar, etc. In a similar study conducted with ISSR markers [9] and SRAP and RAPD markers [13], low genetic diversity among the populations was demonstrated. Likewise, the populations of Haloxylon also demonstrated a weak population structure in Kuwait [45]. Our analysis showed a high within-line genetic variation of tested Rhanterium populations, which agrees with studies on highly outcrossing species [47]. Overall, our genetic diversity results are following diversity studies of Rhanterium [9,13].

Geographic Origin Showed No Congruence with Genetic Structure
There was no clear congruence between genetic structure (clusters) and geographic origin in our analyses, i.e., no grouping was based on spatial origin. This was confirmed through the results of Mantel's correlation test. No isolation by distance pattern was observed. Regular exchange of pollens, critters and reproductive material taking place between these spatially apart but outcrossing species is the only explanation that can be given for this in the present circumstances. Kuwait is a country with a small land area of 17,000 km 2 . The topography is flat with slightly uneven desert dunes. Frequent sandstorms and winds (Toz) are quite common. These features facilitate the easy dissemination of seeds to otherwise far areas [48]. The complete admixture of populations is thus justified, as stated by Sewell Wright [48]. These results were in congruence with other desert species of Kuwait [45].
We speculated differences in the population structure owing to the various soil types in which these populations are growing. The soil types of OmQaser, SANR, KABD, Al Maqwa, Mina Abdulla and Al Salmi were calcigypsids, haplocalcids, torripsaments, acquisalids and Petrogypsids, respectively [49]. It appears from our findings that the soil types are not influencing the Rhanterium community much in Kuwait. The phenotypic characterization of these populations, however, needs to be undertaken.

GBS and SNP Calling
The GBS was performed at the University of Minnesota Genomic Centre (UMGC USA. Genomic DNA was digested using the restriction enzymes PstI + BtgI (New England BioLabs, Inc., Ipswich, MA, USA), and barcoded adapters were ligated to each DNA sam ple using T4 ligase (New England BioLabs, Inc.). Dual indexed SBG libraries for R. ep paposum were pooled and loaded across 4 lanes of a 150 bp single read sequencing ru on a NextSeq 550 (Illumina, San Deigo, CA, USA) instrument with a coverage of 10x gen erating ~40 M raw reads.
In-house-developed Gopher-pipelines by the University of Minnesota Informatic Institute in collaboration with the University of Minnesota Genomics Center and the Re search Informatics Solutions (RIS) group at the University of Minnesota Supercomputin Institute were used for sequence alignment and variant calling (Supplementary File). Per base and per-read quality score statistics were calculated for each fastq file throug FastQC [50]. Mean quality scores for all libraries were ≥Q30. Sequence adapters were re moved, and low-quality bases were trimmed using Trimmomatic [51]. Each sample wa aligned to a reference genome using BWA, and the bam file was sorted and indexed. A representative sample (RH_KB_01) was chosen for paired-end, whole-genome sequenc ing at a coverage of 50x on an Illumina HiSeq 2500 [52]. Approximately 180-190 M raw reads with a GC content of 35% were assembled with Abyss 2.0.2, applying the "abyss pe" command using a kmer size of 64 [53]. This was used as a reference to align the 9 GBS libraries. The Bayesian genetic variant detector FreeBayes v1.1.0 was used for join SNP calling on all the samples [34]. Mantel's test was run on geographic and genetic dis tances using GenAlEx6.5 software [54].

Data Analysis
The analysis of molecular variance (AMOVA) was used to partition the genetic vari ation into inter-and intra-gene pool diversities using Arlequin V3.5 software [55]. For th

GBS and SNP Calling
The GBS was performed at the University of Minnesota Genomic Centre (UMGC), USA. Genomic DNA was digested using the restriction enzymes PstI + BtgI (New England BioLabs, Inc., Ipswich, MA, USA), and barcoded adapters were ligated to each DNA sample using T4 ligase (New England BioLabs, Inc.). Dual indexed SBG libraries for R. eppaposum were pooled and loaded across 4 lanes of a 150 bp single read sequencing run on a NextSeq 550 (Illumina, San Deigo, CA, USA) instrument with a coverage of 10x generating~40 M raw reads.
In-house-developed Gopher-pipelines by the University of Minnesota Informatics Institute in collaboration with the University of Minnesota Genomics Center and the Research Informatics Solutions (RIS) group at the University of Minnesota Supercomputing Institute were used for sequence alignment and variant calling (Supplementary File). Per-base and per-read quality score statistics were calculated for each fastq file through FastQC [50]. Mean quality scores for all libraries were ≥Q30. Sequence adapters were removed, and low-quality bases were trimmed using Trimmomatic [51]. Each sample was aligned to a reference genome using BWA, and the bam file was sorted and indexed. A representative sample (RH_KB_01) was chosen for paired-end, whole-genome sequencing at a coverage of 50x on an Illumina HiSeq 2500 [52]. Approximately 180-190 M raw reads with a GC content of 35% were assembled with Abyss 2.0.2, applying the "abyss-pe" command using a kmer size of 64 [53]. This was used as a reference to align the 99 GBS libraries. The Bayesian genetic variant detector FreeBayes v1.1.0 was used for joint SNP calling on all the samples [34]. Mantel's test was run on geographic and genetic distances using GenAlEx6.5 software [54].

Data Analysis
The analysis of molecular variance (AMOVA) was used to partition the genetic variation into inter-and intra-gene pool diversities using Arlequin V3.5 software [55]. For the analysis of population structure, a model-based Bayesian cluster analysis was performed using STRUCTURE version 2.3.4 [56]. The structure analysis was run 10 times for each K value (K = 1 to 7) using a burn-in period of 1000 steps and 10,000 MC steps and an admixture model. All parameters were set to default values recommended by the manufacturer. The probability of best fit into each number of assumed clusters (K) was estimated by an ad hoc statistic ∆K based on the rate of change in the log probability of data between consecutive K values [57]. A cladogram based on the similarity between the SNPs was constructed employing the UPGMA algorithm [13].

Conclusions
With the application of GBS, it has been possible to generate 10,568 SNP markers for diversity analysis of Rhanterium in Kuwait. The variation residing among these populations was found to be 1.5%. Further analysis grouped the assayed samples into three genetic clusters with admixed populations. These results can enhance genotype selection for increased genetic variation and for desert rehabilitation. The findings in this study can also aid in the application of GBS in the characterization of non-model plants with complex genomes.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/plants11111435/s1. Figure S1: Phred Quality of 99 samples of Rhanterium eppaposum sequenced on an Illumina platform; Table S1: SNP/Indel counts per scaffolds; Table S2: Position of SNPs and Indels on each scaffold and Table S3: GPS coordinates and soil types of Rhanterium populations growing in Kuwait.  Data Availability Statement: All data generated or analyzed during this study are included in this published article. The raw DNA sequencing data are available on the NCBI portal under the accession number PRJNA669035 (SAMN16430163 to SAMN16430261).