Genome-Wide Signatures of Selection Detection in Three South China Indigenous Pigs

South China indigenous pigs are famous for their superior meat quality and crude feed tolerance. Saba and Baoshan pigs without saddleback were located in the high-altitude area of Yunnan Province, while Tunchang and Ding’an pigs with saddleback were located in the low-altitude area of Hainan Province. Although these pigs are different in appearance, the underlying genetic differences have not been investigated. In this study, based on the single-nucleotide polymorphism (SNP) genotypes of 124 samples, both the cross-population extended haplotype homozygosity (XP-EHH) and the fixation index (FST) statistic were used to identify potential signatures of selection in these pig breeds. We found nine potential signatures of selection detected simultaneously by two methods, annotated 22 genes in Hainan pigs, when Baoshan pigs were used as the reference group. In addition, eleven potential signatures of selection detected simultaneously by two methods, annotated 24 genes in Hainan pigs compared with Saba pigs. These candidate genes were most enriched in GO: 0048015~phosphatidylinositol-mediated signaling and ssc00604: Glycosphingolipid biosynthesis—ganglio series. These selection signatures were likely to overlap with quantitative trait loci associated with meat quality traits. Furthermore, one potential selection signature, which was associated with different coat color, was detected in Hainan pigs. These results contribute to a better understanding of the underlying genetic architecture of South China indigenous pigs.


Introduction
Pigs have been domesticated for 9000 years [1]. During their long history of evolution and breeding, pigs have been selected naturally or artificially for specific traits, such as adaption to high temperature and humidity in South China, coat color, body length, meat quality, and so forth. Many genetic footprints, i.e., signatures of selection, remain in the genome [2,3] and have been of interest for evolutionary biologists and breeders. The study of signatures of selection may provide some information about selection mechanisms and benefit future pig breeding.
The regions of the genome where the signatures of selection can be detected usually show long-range linkage disequilibrium (LD) accompanied by a high population frequency [2][3][4] and these regions can be detected based on genomic data with the population statistical method. Currently,

Single-Nucleotide Polymorphism (SNP) Genotyping and Data Quality Control
Genomic DNA samples from all three breeds of Chinese indigenous pig were extracted from ear tissue using the E.Z.N.A. ® Tissue DNA Kit (D3396-02, Omega Bio-tek, Norcross, GA, USA). The Illumina PorcineSNP60 BeadChip [27], which contains 61,565 SNPs, was used for the SNP genotyping of Baoshan pigs and Saba pigs, while the GGP Porcine Chip (https://genomics.neogen.com/en/ggp-porcine), which contains 68,516 SNPs, was used for the SNP genotyping of Ding'an pigs and Tunchang pigs. The SNP data of three South China pig breeds are available in the figshare database (https: //doi.org/10.6084/m9.figshare.7588235.v1) The quality control criteria for genotypic data were as follows: (1) Retaining the mutual SNPs between two SNP chips (the alleles of each SNP on two chips were unified according their SNP chip annotation file while merging the genotype from different chips); (2) removing SNP loci with a call rate of less than 0.90 and unknown position or located on sex chromosomes; (3) filtering out individuals with call rates less than 0.90; and (4) removing SNP loci with minor allele frequency (MAF) less than 0.05. PLINK software [28] was used to perform data quality control. Following quality control, fastPHASE [29] was used to infer haplotypes for the haplotype-based method (XP-EHH) with the parameters -KL10, -KU30, and -Ki5.

Principal Component Analysis
To investigate the pattern of genetic differentiation among breeds, principal component analysis (PCA) was conducted with GCTA software (Version 1.91.1) [30]. Then, the figure of PCA was plotted using R base package with plot function [31]. The SNPs used in this analysis were filtered for pairwise LD (r 2 < 0.5) with PLINK software [28] using the command indep-pairwise 50 5 0.5.

Phylogenetic Tree
In order to better understand the relationship between the three breeds investigated in this study, a phylogenetic tree based on the pairwise identical by state (IBS) was constructed. The average proportion of alleles shared among all individuals (denoted as Dst) was calculated as follows: Dst = (IBS2 + 0.5 × IBS1)/N, where IBS1 and IBS2 are the number of loci which share either one or two alleles IBS of two individuals, respectively, and N is the total number of SNPs. Then, 1-Dst is the genetic distance between all pairwise combinations of individuals, as in Ai et al. [19]. The Dst was calculated by PLINK software [28]. A neighbor-joining (N-J) tree [32] based on genetic distance was constructed by MEGA software (Version 7.0.14) [33].

Identification of Signatures of Selection
Both the XP-EHH and the F ST were used for the detection of signatures of selection in this study. The XP-EHH was needed to define test groups and reference groups. In this study, XP-EHH [34] was used to calculate the XP-EHH scores. A chromosome segment of 1 Mb was directly converted as 1 centiMorgan (cM) in Ma et al. [23]. The Genepop R package [35] was used to calculate F ST statistics. The F ST statistic indicated the population differentiation; however, it was unable to indicate which population experienced selection.
In previous research, XP-EHH scores were reported to approximately follow a normal distribution [6]. In this study, the unstandardized XP-EHH scores were transformed into a normal distribution. Then, the p-values of standardized XP-EHH scores which were lower than 0.01 were treated as significant SNPs detected by XP-EHH, as in Li et al. [13]. The distribution of F ST approximately followed a normal distribution after the normalization of the square root of F ST , as in Gianola et al. [12]. In this study, the p-values of standardized F ST below 0.01 were treated as significant SNPs. F ST may produce high rates of false positives compared with XP-EHH, as suggested by Ma et al. [36]. Therefore, the significant SNPs detected either both methods or at least one method, which were treated as potential signatures of selection in this study. In addition, this study focused on the significant SNPs detected simultaneously by two methods.

Genome Annotation and Quantitative Trait Loci (QTL) Overlapping with Potential Signatures of Selection
The potential selection regions were defined by extending 200 kb both upstream and downstream of the potential signatures of selection in Liu et al. [37]. Genome annotation was based on the Sus scrofa 10.2 (https://www.animalgenome.org/blast/). Genes harbored in these potential selection regions were treated as candidate genes and RNAs and unconfirmed genes were filtered out. Additionally, the Animal Quantitative Trait Loci (QTL) Database [38] was used to annotate potential traits related to the potential selection regions based on QTL physical position intervals downloaded from the Animal QTL database [38].

Gene Ontology (GO) Terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Enrichment Analysis
To further explore the function of these candidate genes, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway [39] and Gene Ontology (GO) [40] were used for enrichment analyses through the Database for Annotation, Visualization, and Integrated Discovery (DAVID) Version 6.8 (https://david.ncifcrf.gov/) [41,42]. The GO terms and KEGG pathways with p-values > 0.05 were filtered out.

Genotypes and Population Structure
A total of 34,815 SNPs located on autosomes were common in the two SNP chips. In quality control, 1833 and 9444 SNPs were filtered out for the SNP call rate and MAF, respectively. The average individual call rate was 0.9781 and no individual was removed. After quality control, 124 individuals and 23,538 SNPs were retained for further study.
A subset of 23,538 SNPs (18,994 LD-pruned SNPs) were retained to conduct PCA. PCA1 and PCA2 explained 13.52% and 5.05% of the total variation, respectively ( Figure 1). Individuals of two Hainan pig subpopulations were clustered together, as also shown by the N-J tree ( Figure 2). Baoshan and Saba appeared as two separate groups. The average genetic distance between Baoshan pigs (0.2775 ± 0.0042) was the highest among the four pig populations (Saba: 0.2213 ± 0.0056; Ding'an: 0.2209 ± 0.0067; Tunchang: 0.2345 ± 0.0062). More details are shown in Table S1.

Identification of Signatures of Selection
According to the result of the PCA ( Figure 1) and the N-J tree (Figure 2), the two subpopulations of Hainan pigs (Ding'an pigs and Tunchang pigs) were treated as test groups, while the Baoshan and Saba populations were used as reference groups, respectively.
The distribution of unstandardized XP-EHH scores and standardized XP-EHH scores are shown in Figures 3b and 4b. The distribution of F ST statistics and standardized F ST statistics are shown in Figures 5b and 6b. In the Ding'an and Tunchang/Baoshan groups, 174 SNPs and 71 SNPs were treated as significant through XP-EHH in Hainan pigs and Baoshan pigs (Table S2). Meanwhile, 445 SNPs were treated as significant using F ST (Table S2). Similarly, in the Ding'an and Tunchang/Saba groups, 110 SNPs and 125 SNPs were treated as significant through XP-EHH in Hainan pigs and Baoshan pigs (Table S3). Meanwhile, 445 SNPs were treated as significant using F ST (Table S3).
On one hand, nine and eleven potential signatures of selection were detected simultaneously by two methods in Hainan pigs using Baoshan and Saba pigs as the reference groups, respectively. Moreover, compared to the reference group, three significant SNPs were detected simultaneously by two methods in both of the two comparisons (Ding'an and Tunchang/Baoshan, Ding'an and Tunchang/Saba), which were located on Sus scrofa chromosome 2 (SSC2) (rs81360002) and SSC14 (rs81223780 and rs80838751), respectively. Furthermore, rs81280567 was the only significant SNP detected simultaneously by two methods in both Baoshan and Saba pigs (Tables 1 and 2). On the other hand, a total of 680 potential signatures of selection were detected by at least one method in Hainan pigs and Baoshan pigs (Table S2). In addition, the mean LD degree between pairs of significant SNPs detected by at least one method was 0.1626; furthermore, a total of 35 pairs of significant SNPs detected by at least one method (a majority of SNPs located in SSC14) were in high LD (r 2 > 0.8) (Table S2 and Figure S1). A total of 668 potential signatures of selection were detected by at least one method in Hainan pigs and Saba pigs (Table S3). In addition, the mean LD degree between pairs of significant SNPs detected by at least one method was 0.1665; furthermore, a total of 31 pairs of significant SNPs detected by at least one method (a majority of SNPs located in SSC14) were in high LD (r 2 > 0.8) (Table S3 and Figure S2).

Identification of Signatures of Selection
According to the result of the PCA (Figure 1) and the N-J tree (Figure 2), the two subpopulations of Hainan pigs (Ding'an pigs and Tunchang pigs) were treated as test groups, while the Baoshan and Saba populations were used as reference groups, respectively.
The distribution of unstandardized XP-EHH scores and standardized XP-EHH scores are shown in Figure 3b and Figure 4b. The distribution of FST statistics and standardized FST statistics are shown in Figure 5b and Figure 6b. In the Ding'an and Tunchang/Baoshan groups, 174 SNPs and 71 SNPs were treated as significant through XP-EHH in Hainan pigs and Baoshan pigs (Table S2). Meanwhile, 445 SNPs were treated as significant using FST (Table S2). Similarly, in the Ding'an and Tunchang/Saba groups, 110 SNPs and 125 SNPs were treated as significant through XP-EHH in Hainan pigs and Baoshan pigs (Table S3). Meanwhile, 445 SNPs were treated as significant using FST (Table S3).
On one hand, nine and eleven potential signatures of selection were detected simultaneously by two methods in Hainan pigs using Baoshan and Saba pigs as the reference groups, respectively. Moreover, compared to the reference group, three significant SNPs were detected simultaneously by two methods in both of the two comparisons (Ding'an and Tunchang/Baoshan, Ding'an and Tunchang/Saba), which were located on Sus scrofa chromosome 2 (SSC2) (rs81360002) and SSC14 (rs81223780 and rs80838751), respectively. Furthermore, rs81280567 was the only significant SNP detected simultaneously by two methods in both Baoshan and Saba pigs ( Table 1 and Table 2). On the other hand, a total of 680 potential signatures of selection were detected by at least one method in Hainan pigs and Baoshan pigs (Table S2). In addition, the mean LD degree between pairs of significant SNPs detected by at least one method was 0.1626; furthermore, a total of 35 pairs of significant SNPs detected by at least one method (a majority of SNPs located in SSC14) were in high LD ( 2 > 0.8) (Table S2 and Figure S1). A total of 668 potential signatures of selection were detected by at least one method in Hainan pigs and Saba pigs (Table S3). In addition, the mean LD degree between pairs of significant SNPs detected by at least one method was 0.1665; furthermore, a total of 31 pairs of significant SNPs detected by at least one method (a majority of SNPs located in SSC14) were in high LD ( 2 > 0.8) (Table S3 and Figure S2).

Genome Annotation and QTL Overlapping with Potential Signatures of Selection
Within the potential selection regions detected simultaneously by two methods in the Ding'an and Tunchang groups, 22 and 24 candidate genes were annotated in the National Coalition Building Institute database, respectively. QTL overlapping with the potential selection regions detected simultaneously by two methods was associated with backfat at last rib, average backfat thickness, drip loss, and so on, as shown in Tables 1 and 2 and Tables S4 and S6. More details for each QTL trait are described in the animal QTL database (https://www.animalgenome.org/QTLdb/). Two candidate genes were annotated in the two reference groups (Baoshan and Saba). The one potential selection region detected simultaneously by two methods overlapped with QTLs related to average daily gain, dressing percentage, percentage type I fibers, and so on (see Tables 1 and 2 and Tables S5 and S7).
In addition, a total of 1349 candidate genes annotated in 680 significant SNPs detected by at least one method in Hainan pigs and Baoshan pigs and the QTLs overlapping with these significant SNPs were associated with meat quality, average backfat thickness, and so on (Table S2), while 1267 candidate genes annotated in 668 significant SNPs detected by at least one method in Hainan pigs and Saba pigs totally and the QTLs overlapping with these significant SNPs were associated with meat quality, average daily gain, and so on (Table S3).

GO Terms and KEGG Pathway Enrichment Analysis
Within the potential selection regions detected simultaneously by two methods in the Ding'an and Tunchang groups, 41 candidate genes were annotated in total (Tables 1 and 2). One GO term and one KEGG pathway were enriched and targeted, both of which involved two candidate genes ( Table 3). The enriched KEGG pathway was ssc00604 (glycosphingolipid biosynthesis-ganglio series) and the enriched GO term was GO: 0048015 (phosphatidylinositol-mediated signaling). However, no GO term or KEGG pathway was enriched in the reference group (Baoshan, Saba).
The 1349 candidate genes annotated in 680 significant SNPs detected by at least one method in Hainan pigs and Baoshan pigs involved 22 GO terms and 12 KEGG pathways (Table S2) and 1267 candidate genes annotated in 668 significant SNPs detected by at least one method in Hainan pigs and Saba pigs involved 13 GO terms and 17 KEGG pathways (Table S3).

Discussion
In this study, the potential signatures of selection in South China indigenous pig populations were identified using two approaches. Nine and eleven potential signatures of selection were detected simultaneously by two methods in Hainan pigs with Baoshan and Saba pigs as reference groups, respectively. Moreover, 22 and 24 candidate genes were found to be enriched in Hainan pigs with Baoshan and Saba pigs as reference groups, respectively. These selection regions were overlapping with QTLs associated with meat quality, disease resistance, and growth. In Baoshan and Saba pigs, only one potential signature of selection was detected simultaneously by two methods was identified, which overlapped with growth and meat quality traits. These results together suggest the potential utility of the findings from the present study.
In general, the signatures of selection revealed by the methods based on population differentiation were associated with phenotypic changes in morphology and behavior. Interestingly, there was a significant difference in coat color between Hainan pigs and the reference populations. Furthermore, a potential signature of selection on SSC14 (rs81223780) was detected in Hainan pigs with either Baoshan or Saba pigs as the reference population. This region was reported by Wilkinson et al. [43] when black and partially black coat breeds (Large Black, Berkshire, Hampshire, British Saddleback) were compared against red coat breeds (Duroc). The results from the present study, together with those of Wilkinson et al. [43], suggest that a promising functional candidate region for pig coat color has been identified and that this region could be of research interest in the future. Additionally, one candidate gene associated with coat color (melanocortin 5 receptor, MC5R) was detected in this study. The MC5R gene, a member of the melanocortin receptor gene family, has been reported to create a ligand-dependent signal modulation with MC1R, which may participate in physiological color change in flounder [44]. Moreover, another study reported that the MC5R gene polymorphism (A303G) may affect the feed intake, feed conversion, and other physicochemical characteristics in Large White x Landrace crossbred pigs [45].
The phenotypes of each individual were not included in most of the signatures of selection detection analysis, hence the functional explanation of significant signals was usually less conclusive. Although a new methodology for the detection of signature of selection for specific complex traits was recently proposed by Beissinger et al. [46], the phenotypic values were not always available in such research. The reported QTLs could serve as a reference or potential clue to understand the identified signatures of selection. In this study, the Animal QTL database [38] and enrichment analysis were used to enhance our understanding of the detected signature of selection. The QTLs overlapping with potential selection regions were mainly related to traits of meat quality, disease resistance, and growth. It is known that the meat quality of most Chinese indigenous pigs is superior, especially for Ding'an and Tunchang pigs [25,26]. Traditionally, the priorities of pig domestication in China were fat deposit and reproduction, which was confirmed by Wang et al. [24]. From this perspective, the signatures of selection detected in this study would be related to those traits annotated from the above analysis. However, we should be sufficiently cautious to conduct further specific functional research based solely on the findings from signature detection. Pavlidis et al. [47] reported that annotation term enrichment is known to not perform well when applied to selective sweeps.
Although some interesting findings were reported here, the limitations of the present study should not be neglected. These include: (1) The low density of markers. The average distance between adjacent SNPs is 100 kb and the average LD degree between pairs of SNPs with a distance within 200 kb of each population is 0.201, 0.168, 0.151, and 0.188 in Ding'an, Tunchang, Baoshan, and Saba pigs, respectively. This indicates that the SNPs were not dense enough in the present study, although similar SNP chips were used in other studies [23]; (2) the effectiveness of the two detection methods used in this study. The F ST method may bring a higher false positive rate compared with XP-EHH, as suggested by Ma et al. [23]. Furthermore, the F ST is suitable for the detection of genome regions that are differentially fixed in different breeds, while XP-EHH is used in detecting variants which are still segregating in populations and are a subject of ongoing selection. In addition, focusing on SNPs detected by both methods, some potential signatures of selection among breeds might be neglected and the combination of significant SNPs detected by one method might cause false positive results. To provide comprehensive and balanced results, the significant SNPs detected by either two methods or at least one method were provided and analyzed simultaneously in this study; (3) the small size of the effective population of the three breeds might affect the F ST statistic; and (4) the contrast between Yunnan and Hainan pigs was insufficient. Although geographic isolation exists, the direction of Chinese pig domestication was similar in different regions. These limitations together might impact the observations of this study and should be overcome in further investigations.
In conclusion, some potential signatures of selection that might be functionally associated with meat quality, disease resistance, and growth were detected in Hainan pig genomes. Moreover, potential signatures of selection and two candidate genes were detected in Saba and Baoshan pig populations.
This study may provide knowledge for the genetic foundation of adaptive evolution in three breeds of South China indigenous pigs.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/10/5/346/s1, Figure S1: LD plot of significant SNPs in Hainan pigs and Baoshan pigs, Figure S2: LD plot of significant SNPs in Hainan pigs and Saba pigs, Table S1: Genetic distance within populations, Table S2: Significant SNPs detected by single method in Hainan pigs and Baoshan pigs, Table S3: Significant SNPs detected by single method in Hainan pigs and Saba pigs, Table S4: Summary of potential selection regions in Hainan pigs (Ding'an and Tunchang/Baoshan group), Table S5: Summary of potential selection regions in Baoshan pigs (Ding'an and Tunchang/Baoshan group); Table S6: Summary of potential selection regions in Hainan pigs (Ding'an and Tunchang/Saba group), Table S7: Summary of potential selection regions in Saba pigs (Ding'an and Tunchang/Saba group).