Genome-Wide Diversity, Population Structure and Demographic History of Dromedaries in the Central Desert of Iran

The development of camel husbandry for good production in a desert climate is very important, thus we need to understand the genetic basis of camels and give attention to genomic analysis. We assessed genome-wide diversity, linkage disequilibrium (LD), effective population size (Ne) and relatedness in 96 dromedaries originating from five different regions of the central desert of Iran using genotyping-by-sequencing (GBS). A total of 14,522 Single Nucleotide Polymorphisms (SNPs) with an average minor allele frequency (MAF) of 0.19 passed quality control and filtering steps. The average observed heterozygosity in the population was estimated at 0.25 ± 0.03. The mean of LD at distances shorter than 40 kb was low (r2 = 0.089 ± 0.234). The camels sampled from the central desert of Iran exhibited higher relatedness than Sudanese and lower than Arabian Peninsula dromedaries. Recent Ne of Iran’s camels was estimated to be 89. Predicted Tajima’s D (1.28) suggested a bottleneck or balancing selection in dromedary camels in the central desert of Iran. A general decrease in effective and census population size poses a threat for Iran’s dromedaries. This report is the first SNP calling report on nearly the chromosome level and a first step towards understanding genomic diversity, population structure and demography in Iranian dromedaries.


Introduction
Camels have a unique morphology and physiology as they are capable of providing vital products in a desert climate, even under harsh conditions. Camelini (one-and two-humped camels) and Lamini (New World camels) are two tribes of the Camelidae family [1]. Camelids influence human endurance and thriving in the peripheral agro-natural zones of (semi-) deserts [2]. Today, camels have gained significance as sustainable livestock species with certain important properties (e.g., immunogenic and milk composition) [3]. Out of around 35 million camel heads of the world (FAO, 2019), the more significant part (95%) are dromedary [4]. There are 138,659 camels in Iran (FAO, 2019), and most of them are dromedaries producing 0.5% of total red meat in Iran [5]. Overuse of land and water resources has expanded the danger of desertification [6]. Over 84% of Iran is arid or semi-arid [7]. Thus, camel husbandry in Iran has a high potential considering changing climatic conditions as well as religious and cultural values [6]. For sustainable development of camel breeding to facilitate protein production in desert climate, we need to give special attention to the conservation and genetic improvement of the species. The main limitations that camel breeding is facing in Iran are the lack of phenotypic records and pedigrees, small herd size and missing connectedness, and genetic evaluations. Previous investigations about genetic diversity and population structure in dromedaries were confined to microsatellite markers, mitochondrial DNA (mtDNA) [8], geographical regions, e.g., Kenya [9], and Pakistan [10], and candidate gene sequencing [11]. Next-generation sequencing platforms have prepared suitable methods for population genetic analyses in local or regional populations/ breeds to investigate effective population size, breed formation, demographic history, selection and genetic drift (interpreted by linkage disequilibrium (LD)) at the whole-genome level [12]. Whole-genome sequencing revealed past demographic events and the evolutionary history of Old World camels [13]. Recently, dromedaries' genome sequences have been assembled into chromosomes [14,15] and the whole genomes of two Iranian dromedary camels were analyzed. This genomic information could prompt genome-wide association studies (GWAS) and genomic selection in Iranian camel breeding [16], similar to populations from the Arabian Peninsula and Sudan, which showed signatures of selection in regions associated with adaptation to arid environment, dairy traits, energy homeostasis, and chondrogenesis [11]. To understand the genetic makeup and relatedness of Iranian dromedaries and to create the basis for future breeding programs and genomic selection, we assessed the genomic diversity, linkage disequilibrium and population structure of dromedary camels in different regions of the central desert of Iran.

Genotyping-by-Sequencing (GBS) Data
The blood samples were gathered with EDTA tubes, and genomic DNA was extracted using the modified salt-out method. In this method, optimization includes utilization of separate buffers instead of buffy coat isolation, chloroform for DNA phase isolation and extraction of purified DNA, and sodium acetate for more concentrated DNA [17]. The extracted DNA was evaluated with a Nanodrop Spectrophotometer and 96 samples were genotyped-by-sequencing using two restriction enzymes, EcoR1 and HinF1, and paired-end (150 bp) sequencing on the Illumina HiSeq 2000 platform by Persian Bayangene Research and Training Center (Shiraz, Iran). The adapters were trimmed with bcl2fastq, and the read pairs with low quality were omitted (base Qphred ≤20) by using fastQC.

SNP Calling
The sequence reads were mapped to the dromedary reference genome assembly on chromosome level (GCA_000803125.3; [14]) by using the BWA-MEM algorithm of the Burrows-Wheeler Aligner (BWA; [18]). The PCR duplicates were detected by utilizing Picard tools and disregarded in downstream analyses both by GATK [19] and SAMtools [20]. SNPs were called across the GBS data using GATK.

Population Structure, Linkage Disequilibrium and Genome-Wide Diversity
Quality control (QC) steps, linkage disequilibrium (LD), genome-wide diversity (observed and expected heterozygosity), as well as admixture analyses were performed using TASSEL V5.0 [21]. Variants with MAF < 0.05 and Call Rate < 0.95 were removed. Of the 41,897 SNPs, 256 markers were monomorphic, and 27,375 markers were deleted because of low minor allele frequency (MAF < 0.05). Nei's distances and pairwise F ST among populations were calculated by using the StAMPP package in R [22]. To investigate population structure, R packages including vcfR, poppr, ape, and RColorBrewer were used for K-means clustering and discriminant analysis of principal components (DAPC). The median joining network was created in SplitsTree by using the genetic distance matrix. Relatedness among genome-wide SNPs based on the unadjusted Ajk statistic was evaluated by using TASSEL. The LD between two SNPs was evaluated using the statistics r 2 (Equation (1)) [23]: where P AB , P ab , P Ab , and P aB are the haplotype frequencies and P A , P a , P B , and P b are the allele frequencies, respectively. The LD decay plot was created in popLDecay [24].

Effective Population Size and Tajima's D
Effective population size (Ne) was predicted using the pairwise LD [3]. The SNeP tool was used for predicting effective population size [25] based on the relatedness among r 2 , Ne, and c (Equation (1) [26]). The corrected r 2 was estimated following Equation (3) [27]: where c is the recombination rate, n is the number of individuals sampled; β = 2 when the gametic phase is known and β = 1 if instead the phase is not known. Ne during t generations ago was predicted (Equation (4)): where N T is the effective size t generations ago, computed as t = (2ƒ(c t )) − 1 [28]; c t is the recombination rate defined for a specific physical distance between markers and optimally adjusted with the mapping functions mentioned above; r 2 adj is the LD value adjusted for sample size; and α = {1, 2, 2.2} is a correction for the mutation rates [29].
where k ij is the difference between sequence i and j.

Genomic Diversity and Linkage Disequilibrium
A total of 14,522 SNPs resulted after filtering and were used for final analysis. The largest gap between SNPs (21,088.280 kb) was located on chromosome 11. We note that this is the first SNPs-based report at chromosome-level (Table 1). Before quality control (QC), the average MAF of all SNPs was 0.08, and after filtering, it increased to 0.19 ( Figure 2). Average observed heterozygosity in the Iranian central desert dromedaries was estimated at 0.25 ± 0.03. Linkage disequilibrium (LD) was estimated for each pairwise combination of SNPs (a total of 242,265 SNP pairs). Average LD (r 2 = 0.089 ± 0.234) was observed at distances shorter than 40 kb, which decreased to 0.019 ± 0.034 in 40-60 kb. The LD decay trend is shown in Figure 3. Ninety-eight percent of total pairwise combinations with r 2 > 0.20 related to 0-40 kb intervals (Table 2).

Cluster Analysis and Genomic Relationship
Cluster analysis of the 96 dromedary camels for five regions in the central desert of Iran has been shown in Figure 4. Because of low genetic distance, there were no determined clusters and high admixture among regions ( Figure 5). Only 1.6% and 1.4% of the genetic variance was illustrated with PC1 and PC2, respectively, which demonstrated that individuals are relatively homogeneous. Nonetheless, five samples that originate from outside of Iran (Pakistan: GBS046, GBS047, GBS042, GBS057, and GBS058) diverged from the rest (Figures 4 and 5). The pairwise F ST statistic among five populations of Iran's central desert camels was very low (0.002-0.011) ( Table 3).   Because of a limited number of bull camels and half-sib calves within herds, there were node clusters visible, although the median of relatedness among camels was very low (A jk = −0.011 ± 0.113).
There is no obvious population structure visible in the Iranian central desert dromedaries, although, in the zoom, it looks as if there was a very low separation into two groups, which, however, do not correspond to any geographical origins nor to any functional separation of the investigated dromedaries in terms of usage for milk or meat production ( Figure 6).

Demographic History (Effective Population Size)
Recent Ne (up to 10 generations ago) was estimated as 89 to 487, respectively ( Table 4). The trend of Ne shows a great decrease of up to 90% until the last glacial period (LGP; 100,000-20,000 years ago and another bottleneck around 2000-6000 years ago (Figure 7)).

Discussion
Guo et al. identified 2,433,732 variants in 366 camels, including the seven domestic Camelus bactrianus breeds in East Asia by using the GBS technique [31]. Bahbahani et al. reported 39,843 SNPs after QC in Sudanese dromedary camels [11]. Al Abri et al. reported~80,000 SNPs by using WGS of 9 camels and GBS of 244 dromedary camels [32]. The different numbers of SNP S across different studies can be due to different techniques or the heterozygosity levels in the other camel groups.
Heterozygosity of the Iranian central desert dromedaries was predicted as low (0.25 ± 0.30). Arabian Peninsula and Sudanese dromedaries' H O where 0.560 ± 0.003 and 0.347 ± 0.003, respectively [11]. The camels sampled from the central desert of Iran have higher relationships than dromedaries from Sudan (median A jk = − 0.007 ± 0.018) and lower ones than individuals from the Arabian Peninsula (median A jk = 0.347 ± 0.093) [11]. Predicted Tajima's D (1.28) suggested a bottleneck or balancing selection in dromedary camels in the central desert of Iran. The sudden decrease of camels in Iran could be the cause of a positive Tajima's D. The number of camels in Iran declined 46% during 1960-1970, however the heterozygosity has been still maintained, most probably by balancing selection resulting in positive Tajima's D. Balancing selection occurs when the heterozygotes have a higher fitness than the homozygote [33] and conserve genetic polymorphism.
Until recently, LD patterns of many livestock species have been reported. For example, the amounts of LD at 0-30 kb distances in Angus, Charolais, and C beef cattle were 0.29, 0.22, and 0.21, respectively [34]. The mean of r 2 (LD) was 0.155, 0.156, and 0.128 across all chromosomes for three sheep breeds [35]. Corbin et al. [36] reported r 2 = 0.6 in 5 kb and up to 20 kb in Thoroughbred horses. The average r 2 > 0.2 at 1.0-1.5 Mb intervals was predicted in two pig populations [37]. The mean of LD in Iranian central desert dromedary camels was predicted as very low compared to livestock animals because of high generation interval, lack of artificial selection, and natural reproduction.
Principal component analysis suggested that Sudanese camel populations are almost homogeneous [11]. On the contrary, PCA results distinguished the Chinese Bactrian camels from the Mongolian Bactrian camels very well [31]. The results of PCA, Admixture, and SplitTree tests show that dromedary camels in the central desert of Iran are homogeneous and have a separate genetic background from outside of Iran.
The trend of Ne shows a bottleneck around 2000-6000 years ago in Iranian central desert dromedary camels. This is consistent with a decline in Ne in Old World camels around 4000-5000 years ago and might be related to the domestication scenarios of dromedary and Bactrian camels [2].

Conclusions
Commercial SNP arrays have been developed for many livestock animals but no SNPchip is available for dromedaries. GBS based on whole-genome sequencing has solved the problem of ascertainment bias [38] as the efficiency of GBS markers is comparable to SNP arrays [39]. GBS could prove highly rewarding for predictions of the relatedness and genetic diversity of dromedaries. This research, the first report of GBS-SNPs assembled by near-chromosomes, suggested dromedary camels in the central desert of Iran are homogeneous and different from outside camels. The decrease of effective population size is a threat for Iran's dromedary camels.