A Genome-Wide Association Study Identifies Quantitative Trait Loci Affecting Hematological Traits in Camelus bactrianus

Simple Summary Bactrian camels can adapt to harsh natural environments. This unique tolerance of camels is tightly linked to their hematological traits, which are related to their immune, metabolic, and disease status. Therefore, mapping genomic regions that affect blood cell traits can help identify genomic characteristics that can be used as biomarkers of immune, metabolic, and disease states. This knowledge will further our understanding of the camel’s tolerance mechanisms. Abstract Bactrian camels (Camelus bactrianus) are one of the few large livestock species that can survive in the Gobi Desert. Animal immunity and disease resistance are related to hematological traits, which are also associated with tolerance observed in Bactrian camels. However, no genome-wide association studies have examined the genetic mechanism of the immune capability of Bactrian camels. In the present study, we used genotyping-by-sequencing data generated from 366 Bactrian camel accessions to perform a genome-wide association study for 17 hematological traits. Of the 256,616 single-nucleotide polymorphisms (SNPs) obtained, 1,635 trait–SNP associations were among the top quantitative trait locus candidates. Lastly, 664 candidate genes associated with 13 blood traits were identified. The most significant were ZNF772, MTX2, ESRRG, MEI4, IL11, FRMPD4, GABPA, NTF4, CRYBG3, ENPP5, COL16A1, and CD207. The results of our genome-wide association study provide a list of significant SNPs and candidate genes, which offer valuable information for further dissection of the molecular mechanisms that regulate the camel’s hematological traits to ultimately reveal their tolerance mechanisms.


Introduction
Animal metabolism and immune status can be directly reflected by their hematological traits [1]. Therefore, screening for genomic regions that are associated with blood cell characteristics can identify genomic features that have a potential use as biomarkers to detect metabolic, immune, and disease status [2].
Camel species include the two-humped Bactrian camel (Camelus bactrianus) and the one-humped dromedary (Camelus dromedarius) [3,4]. China and Mongolia are the main countries that harbor Bactrian camels, and the absence of dromedaries in these locations [5,6] prevents hybridization between the two species. In addition, extant wild camels (Camelus ferus) are only distributed in these two countries [7,8]. Camelus bactrianus is one of the few large livestock animals that can survive in the Gobi Desert because

Ethics Statement
The procedures and protocols for animal use were carried out according to the national codes of practice for the care and handling of farm animals and were approved by the animal care committee of the Camel Protection Association of Inner Mongolia.

Animals and Sample Collection
This study included 366 domestic Bactrian camels. The sample population represented the seven main domestic Camelus bactrianus breeds in East Asia, including four Chinese domestic breeds (the Alxa Bactrian camel, the Junggar Bactrian camel, the Sunite Bactrian camel, and the Tarim Bactrian camel) and three Mongolian domestic breeds (the Galbiin Gobiin Ulaan Bactrian camel, the Haniin Hetsiin Huren Bactrian camel, and the Tokhom Tongalag Bactrian camel (see Supplementary Table S1)). Whole blood was collected from the jugular vein into vacutainers containing ethylenediaminetetraacetic acid (EDTA, for complete blood count tests, CBC) or no anticoagulant (for clinical chemistry panels, CCPs). All blood samples were analyzed within 24 h of sample collection. For CBC tests, the blood samples with the EDTA anticoagulant were analyzed using an automated hematology analyzer (ADVIA 2120, Siemens Healthcare Diagnostics Inc., Tarrytown, NJ, USA) with the manufacturer's reagents and camel-specific settings. Analysis was carried out for total white blood cells (WBC), total red blood cells (RBC), average red blood cell volume (MCV), hemoglobin (HGB), mean hemoglobin content (MCH), hematocrit (HCT), total number of platelets (PLT), average platelet volume (MPV), mean hemoglobin concentration (MCHC), red blood cell distribution width-standard deviation (RDW_SD), red blood cell distribution width-coefficient of variation (RDW_CV), total platelet volume (PCT), and platelet distribution width (PDW). For the CCPs, the blood without the anticoagulant was centrifuged at 3000× g at 4 • C for 15 min. An automated chemistry analyzer (Modular P Chemistry Analyzer; Roche Diagnostics, Indianapolis, IN, USA) was used to analyze the separated serum using the manufacturer's reagents. The analyzer calculated the levels of alanine aminotransferase (ALT), aspartate aminotransferase (AST), cholesterol (CHO), and glutamyl transpeptidase (GGT) ( Table 1).

Genotyping
The sequence data were generated using genotyping-by-sequencing (GBS) [20,21]. The DNA was first digested using the methylation-sensitive restriction enzyme ApeKI (R0643L, New England Biolabs, Ipswich, MA, USA), and then the restriction fragments were ligated to a unique barcode adapter and universal adapter. The QIAquick PCR Purification Kit (28104, QIAGEN, Valencia, CA, USA) was used to purify an equal volume of pooled ligation products for PCR amplification. The amplicons were re-purified after amplification to generate clean PCR products. The generated library represented all 366 individual genotypes and was sequenced in each of the four lanes of an Illumina HiSequation PE150 instrument (Illumina, San Diego, CA, USA) to generate single-ended 100 bp reads.
The raw sequencing data were submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) in the GenBank database under accession number PRJNA565837.

Estimating the Linkage Disequilibrium (LD) and Deducing the Population Structure
A phylogenetic tree and principal component analysis (PCA) were used to investigate the population structure. The TreeBeST software (http://treesoft.sourceforge.net/treebest.shtml) was used to calculate the distance matrix, which was the basis for constructing the phylogenetic tree using the neighbor-joining method. The distance between two bodies i and j was calculated using the following formula.
where L is the length of the SNP region, and the allele at position 1 is A/C. If the genotypes of the two individuals are AA and AA, then d ij = 0. If they are AA and AC or AC and AC, then d ij = 0.5. If they are AA and CC, then d ij = 1. The GCTA software (http://cnsgenomics.com/software/gcta/pca.html) was used to calculate the feature vectors and eigenvalues. The R software was used to draw the PCA distribution map. The SNP at position k in individual i is represented by d ik . If individual i is homozygous with the reference allele, then d ik = 0. If it is heterozygous, then d ik = 1. If individual i is homozygous with the non-reference allele, then d ik = 2. M is a matrix of n × S containing the standard genotype.
The SNP data of the 366 camels were analyzed using the Frappe software to establish the population structure. First, we created an input file for software tool, PLINK, (http://pngu.mgh.harvard. edu/~purcell/plink/), a Ped file, and then used the Frappe software to construct the population genetic structure and group lineage information.

Statistical Analyses
The Bonferroni calibrated multiple test was used to determine the significance threshold, with a genomic significant level threshold of 0.05 per effective SNP locus and a chromosomal significant horizontal threshold of 1 per effective SNP locus. For the purpose of this study, the corresponding thresholds were set as 6.71 × 10 −6 (0.05/256616).

Genome-Wide Association Study
GWAS analysis was performed using the GEMMA software (http://www.xzlab.org/software.html), and the population structure and individual kinship were corrected using the mixed linear model (MLM).
where y is the phenotypic trait, X is the indicator matrix of the fixed effect, α is the estimated parameter of the fixed effect, Z is the indicator matrix of the SNP, β is the effect of the SNP, W is the indicator matrix of the random effect, µ is the predicted random individual, and e is a random residual and obeys e~(0, δ e 2 ).

Quality Assessment and Statistics of the Sequencing Data
Sequencing produced 548.54 Gb of data, with an average of 1.4987 Gb per sample, of which 548.52 Gb was high-quality clean data, with an average of 1.4986 Gb per sample (Supplementary  Table S2).

Mapping to the Reference Genome
The sample rate was used to visualize the similarity between the sample sequencing data and the reference genome. The homology between the sequencing data and the reference sequence was evaluated by the coverage depth and coverage. The genome size was 2,009,177,929 bp, the average population ratio of the sample was 97.22-98.30%, the average genome sequencing depth ranged from 7.18× to 18.86×, and the 1× coverage rate (at least one base coverage) was above 6.46% (Supplementary  Table S3).

SNP Calling and Annotation
At the genomic level, DNA sequence variations can be caused by an SNP, including single-base conversions and transversions. A total of 2,433,732 SNPs were identified using the SAMtools software, and the obtained SNPs were filtered (dp2, Miss0.5, Maf0.01). Lastly, 256,616 SNPs were obtained for subsequent analysis. The results of the ANNOVAR software analysis for SNP annotations are shown  Table 2. Among the SNPs, 1,193 were annotated as exonic and 70,370 as intronic. The intergenic area was annotated with the most SNPs (182,276).

GWAS of Phenotypic Traits
We tested the association between each trait and the 256,616 SNPs. Quantile-quantile plot analysis showed that the model could reasonably control the false positive associations that potential population structures might cause. For the selected p-value of 10 −6 , the analysis identified 1,635 trait-SNP associations as the top quantitative trait loci (QTLs) ( Supplementary Table S4). However, Animals 2020, 10, 96 8 of 13 because of the multiple tests carried out, some of the reported associations might be false positives. The inter-chromosomal association patterns in the Manhattan plot show that trait-related SNPs are distributed throughout the genome (Figure 4, Supplementary Figures S1-S12).
To further evaluate the SNPs associated with the chosen traits, we aligned the tag sequence in which the SNP is located with all known sequences to infer its location and identify potential candidate genes that carry true etiological polymorphisms. The upstream and downstream 20 kb of 1635 SNP sites, which are known to be candidate genes in the camel genome, were searched randomly. Lastly, we identified 664 candidate genes associated with 13 blood traits (Supplementary  Table S5, p-value < 10 −6 ). WBC traits and RBC traits (MCV RBC, HCT, HGB, MCH, RDW_SD, RDW_CV), platelet traits (MPV, PCT, PDW), GGT, and CHO were identified as associated with 5, 226, 41, 50, and 342 genes, respectively. Table 3 lists the most significant candidate genes for the 13 blood traits according to gene function. None of the SNP markers showed associations with PLT, MCHC, AST, or ALT.  Figure 4 is the Manhattan map, which is the genetic marker effect value, i.e., the F-tested whole-genome p-value is sorted according to the physical position on the chromosome. The abscissa is the genomic coordinates, and the ordinate is −log10 (p-value). The smaller the p-value, the stronger the correlation, and the larger the ordinate. The horizontal dashed line in the Manhattan chart indicates the level of significance. When −log10 (p-value) > 6.26, the SNP is considered to be significantly associated with the trait.  Figure 4 is the Manhattan map, which is the genetic marker effect value, i.e., the F-tested whole-genome p-value is sorted according to the physical position on the chromosome. The abscissa is the genomic coordinates, and the ordinate is −log10 (p-value). The smaller the p-value, the stronger the correlation, and the larger the ordinate. The horizontal dashed line in the Manhattan chart indicates the level of significance. When −log10 (p-value) > 6.26, the SNP is considered to be significantly associated with the trait.
To further evaluate the SNPs associated with the chosen traits, we aligned the tag sequence in which the SNP is located with all known sequences to infer its location and identify potential candidate genes that carry true etiological polymorphisms. The upstream and downstream 20 kb of 1635 SNP sites, which are known to be candidate genes in the camel genome, were searched randomly. Lastly, we identified 664 candidate genes associated with 13 blood traits (Supplementary Table S5, p-value < 10 −6 ). WBC traits and RBC traits (MCV RBC, HCT, HGB, MCH, RDW_SD, RDW_CV), platelet traits (MPV, PCT, PDW), GGT, and CHO were identified as associated with 5, 226, 41, 50, and 342 genes, respectively. Table 3 lists the most significant candidate genes for the 13 blood traits according to gene function. None of the SNP markers showed associations with PLT, MCHC, AST, or ALT.
Animals 2020, 10, 96 9 of 13 Table 3. Potential candidate genes identified by the genome-wide association study (GWAS) and based on genome annotations for the wild Bactrian camel.

Discussion
In this study, we performed a GWAS for camel hematological traits using the genotypes of 366 domestic Bactrian camels from China and Mongolia. To the best of our knowledge, this is the first GWAS for camel hematological traits using GBS.
Population stratification affects the proportion of false positives in genome-wide association analysis. Therefore, corrections are needed to ensure the reliability of the statistical analysis [25][26][27]. Various strategies have been proposed to solve these GWAS-related problems [28,29], and we used three analyses to accurately determine the relationships among the analyzed genotypes. In the phylogenetic tree, the 366 genotypes are divided into seven groups, which is consistent with the number of breeds of the species in the study [7,30]. However, it is worth noting that a small number of Tarim Bactrian camel samples are mixed in with the Junggar Bactrian camel group, possibly because they all live in the Xinjiang region of China and hybridization events are inevitable. The PCA results distinguish the Chinese Bactrian camels from the Mongolian Bactrian camels very well. This clear distinction between breeds from different regions is likely the result of artificial constraints. Herders in both countries strictly restrict their camels' active territory, so the animals are prevented from crossing national borders. The results of the structural analysis show that the Chinese Bactrian camels have more ancestral groups than the Mongolian Bactrian camels, which is likely to be related to the history of the Silk Road by which camels from other regions came to China [31]. Lastly, the quantile-quantile (Q-Q) plots show no evidence of the group stratification phenomenon, which demonstrates that the GWAS results based on the mixed linear model are relatively reliable.
The identification of markers that are tightly linked to phenotypic characteristics is the ultimate goal of genetic mapping [32,33]. The use of GBS generated a large number of genome-wide markers, which facilitated the GWAS in Bactrian camels. We identified 664 candidate genes associated with 13 blood traits. The most significant candidate genes according to gene function for the 13 blood traits are ZNF772, MTX2, ESRRG, MEI4, IL11, FRMPD4, GABPA, NTF4, CRYBG3, ENPP5, COL16A1, and CD207 (for their full names, see Table 3).
WBCs are involved in the phagocytosis of bacteria and disease prevention. RBCs are essential for oxygen transport within the organism and the removal of the byproducts of respiration. The platelets play an important role in blood coagulation, wound healing, inflammatory response, and organ transplant rejection [34]. The white blood cell has emerged as a marker of inflammation that is widely available in clinical practice [35]. Studies have shown that zinc finger proteins are important regulators of white blood cell development, and the candidate gene ZNF772 is likely to be involved in the regulation of early white blood cell differentiation [36]. Red blood cells are the most important components of oxygen transport through the blood in vertebrates, and they also have immune functions. Mtx2 is a protein in the mitochondrial sorting and assembly machinery (SAM) that has been implicated in TNF-induced apoptosis [37]. Gene ESRRG has been shown to be a negative regulator of anaerobic glycolysis [38] and might be related to energy metabolism in red blood cells, which affects the production of hemoglobin. The interaction of platelets with immune cells (neutrophils, monocytes, lymphocytes, and dendritic cells) can effectively enhance the function of immune cells. The candidate genes CRYBG3 and ENPP5, as important protein structure regulators, are likely to be involved in the regulation of platelet volume and size.
Although we identified 12 candidate genes, their mechanism of action remains unclear and requires further study and confirmation. These candidate genes lay the foundation for further exploration of the characteristics of blood traits in Bactrian camels to reveal their tolerance mechanisms.

Conclusions
In conclusion, because blood traits are effective indicators of the camel's immune and metabolic status, we attempted to identify genomic regions that control hematological traits in Camelus bactrianus. As a result, we identified 12 candidate genes and associated genetic markers, which provide a reference for further research on the camel's special immune metabolism and disease resistance. Therefore, this study contributes to the understanding of the molecular regulation of blood traits in domestic animals.  Table S4: Genome-wide association study (GWAS)-related gene function annotation. Table S5