Whole Genome Sequencing Reveals the Effects of Recent Artificial Selection on Litter Size of Bamei Mutton Sheep

Simple Summary Bamei mutton sheep is a Chinese domestic sheep breed developed by crossing German Mutton Merino sheep and indigenous Mongolian sheep for meat production. There is large variation in the reproductive abilities of Bamei mutton sheep. After recent artificial selection, the average lambing rate of the Bamei mutton nucleus group was over 150%. We used the FST (Fixation Index) and XP-EHH (The Cross-Population Extended Haplotype Homozygosity) statistical approach to detect the selective sweeps between high- and low-fecundity Bamei mutton sheep groups. JUN (JUN proto-oncogene, AP-1 transcription factor subunit), ITPR3 (inositol 1,4,5-trisphosphate receptor type 3, PLCB2 (phospholipase C beta 2), HERC5 (HECT and RLD domain containing E3 ubiquitin protein ligase 5), and KDM4B (lysine demethylase 4B) were detected that are potential responsible for litter size. These observations provide a new opportunity to research the genetic variation influencing fecundity traits within a population evolving under artificial selection. Abstract Bamei mutton sheep is a Chinese domestic sheep breed developed by crossing German Mutton Merino sheep and indigenous Mongolian sheep for meat production. Here, we focused on detecting candidate genes associated with the increasing of the litter size in this breeds under recent artificial selection to improve the efficiency of mutton production. We selected five high- and five low-fecundity Bamei mutton sheep for whole-genome resequencing to identify candidate genes for sheep prolificacy. We used the FST and XP-EHH statistical approach to detect the selective sweeps between these two groups. Combining the two selective sweep methods, the reproduction-related genes JUN, ITPR3, PLCB2, HERC5, and KDM4B were detected. JUN, ITPR3, and PLCB2 play vital roles in GnRH (gonadotropin-releasing hormone), oxytocin, and estrogen signaling pathway. Moreover, KDM4B, which had the highest FST value, exhibits demethylase activity. It can affect reproduction by binding the promoters of estrogen-regulated genes, such as FOXA1 (forkhead box A1) and ESR1 (estrogen receptor 1). Notably, one nonsynonymous mutation (p.S936A) specific to the high-prolificacy group was identified at the TUDOR domain of KDM4B. These observations provide a new opportunity to research the genetic variation influencing fecundity traits within a population evolving under artificial selection. The identified genomic regions that are responsible for litter size can in turn be used for further selection.

blood samples were placed in EDTA vacutainer tubes for storage. These ewes were about 3 years old and selected from among 500 sheep. Only ewes with litter size data showing that they had given birth more than three times were sampled. The selected ewes were grouped into two categories based on the phenotype of litter size (monotocous sheep giving birth to only one lamb in three consecutive parities and polytocous sheep giving birth to more than two lambs in two consecutive parities). Then, genomic DNA was extracted from 200 µL of sheep blood using a QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany), in accordance with the manufacturer's instructions. DNA quality and integrity were assessed by spectrophotometry (OD260/280) and 1.0% gel electrophoresis.

Genome Sequencing
High-quality DNA for Illumina sequencing library construction was randomly sheared into small pieces (300-400 bp). After end-repair, "A"-tailing and ligating to Illumina sequencing adapters, 400-500 bp ligated products were amplified by ligation-mediated PCR (LM-PCR). Then, 2 × 100 bp paired-end sequencing was carried out on an Illumina HiSeq 2500 sequencer and the original data were analyzed by Illumina HiSeq Control Software (Illumina, San Diego, CA, USA).

Population Genetics Analysis
The samples were separated into two groups (five monotocous individuals, five polytocous individuals). The SNP densities, minor allele frequencies, and Tajima's D of each group were calculated by VCFtools v0.1.12b (https://vcftools.github.io/index.html) [28]. Only SNPs in autosomes were preserved for the phylogenetic analysis. A phylogenetic tree of all samples was generated using SNPhylo [29] version 20160204 (http://chibba.pgml.uga. edu/snphylo/), based on the maximum likelihood method. A total of 500,000 SNPs were randomly selected for calculating the linkage disequilibrium (LD) r 2 using Haploview [30] with parameters set as follows: "-missingCutoff 0.2 -dprime -minMAF 0.1." The SNP pairs were clustered based on the physical distances of these genes. The average LD (e.g., 0-1 kb) of each group was represented by the mean r 2 .

Selective Sweep Analysis
The pooled heterozygosity Hp was calculated over 10 kb windows using the formula: where ∑ n MAJ denotes the sum of major allele frequencies in a selected window and ∑ n MI N denotes the sum of minor allele frequencies [31]. F ST values were calculated between monotocous individuals and polytocous individuals for single SNPs using a method that adjusts for a small sample size [32]. We averaged F ST values over 50 kb sliding windows along the genome with the Bio::PopGen::PopStats Package in BioPerl [33] and Z-transformed the resultant distribution. Putative selection targets were extracted from the extreme tail of the distribution by applying a Z(F ST ) > 5 cut-off [34]. fastPHASE v1.4.0 was used to phase the genotypes of all samples with the parameters "-T10 -K8" [35]. Then, the phased data were used to calculate the cross-population extended haplotype homozygosity (XP-EHH) value by XP-EHH [36]. XP-EHH values were averaged over 50 kb sliding windows. These scores approximately followed a normal distribution; the threshold to locate putatively selected regions was two times the XP-EHH distribution standard deviation (|XP-EHH| > 2). Manhattan plots for F ST and XP-EHH were generated with the R package gap [37].
A phylogenetic tree was generated using all variants located in these regions. Candidate genes targeted by positive selection were defined as genes overlapping with sweep regions (ZF ST > 5 and |XP-EHH| > 2). GO (gene ontology) and KEGG (kyoto encyclopedia of genes and genomes) enrichment analyses for candidate genes were performed by DAVID 6.8 [38], and p values were corrected using the Benjamini-Hochberg method. Protein-altering mutations in these genes were listed and ranked by their single-site F ST value. The protein-altering mutations of the candidate sweep gene KDM4B were localized to regions that are evolutionarily conserved among mammalian species.

Sanger Sequencing Validation
To confirm the SNPs detected in exons of the genes selected by sweep analysis, we selected eight SNPs from six genes and designed primers for their Sanger sequencing (Supplementary Table S1). Then, we outsourced the amplification and screening of SNPs in 15 monotocous and 14 polytocous sheep of this group to GENENODE (Wuhan, China).

Sequencing and Mapping of the Sheep
We sequenced five monotocous and five polytocous sheep using an Illumina HiSeq2500 sequencer, generating a total of 627.16 million raw read pairs, comprising 180.16 Gb of raw data. After filtering out the low-quality reads, each sheep was mapped to the sheep reference genome, with an average alignment rate of 93.76% (92.79-94.34%). The percentages of reads sequenced at least once per bp varied from 87.38% to 94.55%. The percentage of reads sequenced at least four times per bp was >61.82% (Supplementary Table S3).

Identification of the Variation of the Sheep
We obtained about 21.39 million SNPs and 2.04 million insertions and deletions (indels) for all sheep in the two groups. Most SNPs (~71.7%) were located within intergenic regions, while only a few (~0.6%) were located within coding regions. The exonic SNPs were identified (Supplementary Table S4). There were 45,775 vs. 48,265 SNPs involving nonsynonymous mutations, 742 vs. 785 involving stop-gain variation, and 45 vs. 52 involving stop-loss nonsense variation in the polytocous and monotocous sheep, respectively. The rates of heterozygosity of the polytocous and monotocous groups were 30.02% and 30.07%, respectively. The transition-to-transversion ratios (ts/tv) were almost identical between the two groups (2.4346 for the polytocous population versus 2.4352 for the monotocous population) (Supplementary Table S5).

Population Structure Analysis
The levels of genome-wide genetic diversity, Tajima's D, and the minor allele frequency (MAF) distribution indicated that high-frequency minor alleles constitute a small proportion of the total but are slightly more abundant in polytocous sheep than in monotocous ones (Supplemental Figure S1). Additionally, haplotype analysis indicated that Animals 2021, 11, 157 5 of 13 the polytocous sheep have slower decay of pairwise correlation coefficient (r 2 ) and higher integrated haplotype homozygosity (iHH) than the monotocous sheep (Supplemental Figure S2).
To determine the phylogenetic relationship between monotocous and polytocous sheep, a neighbor-joining tree was constructed using high-quality SNPs. When the tree was generated using the SNPs for the whole genome, monotocous and polytocous sheep formed a mixed clade ( Figure 1A). This indicated that the pairwise distances within each group were larger than those between the groups and there was no significant distinction between the two groups. However, in the phylogenetic tree constructed using SNP data selected based on F ST score, the monotocous and polytocous sheep were classified into two genetically different groups ( Figure 1B). The results of the two trees show that the population genetic structure was not associated with the litter size and that the breeding time of high-fecundity Bamei mutton sheep was relatively short. monotocous population) (Supplementary Table S5).

Population Structure Analysis
The levels of genome-wide genetic diversity, Tajima's D, and the minor allele frequency (MAF) distribution indicated that high-frequency minor alleles constitute a small proportion of the total but are slightly more abundant in polytocous sheep than in monotocous ones (Supplemental Figure S1). Additionally, haplotype analysis indicated that the polytocous sheep have slower decay of pairwise correlation coefficient (r 2 ) and higher integrated haplotype homozygosity (iHH) than the monotocous sheep (Supplemental Figure S2).
To determine the phylogenetic relationship between monotocous and polytocous sheep, a neighbor-joining tree was constructed using high-quality SNPs. When the tree was generated using the SNPs for the whole genome, monotocous and polytocous sheep formed a mixed clade ( Figure 1A). This indicated that the pairwise distances within each group were larger than those between the groups and there was no significant distinction between the two groups. However, in the phylogenetic tree constructed using SNP data selected based on FST score, the monotocous and polytocous sheep were classified into two genetically different groups ( Figure 1B). The results of the two trees show that the population genetic structure was not associated with the litter size and that the breeding time of high-fecundity Bamei mutton sheep was relatively short.

Analysis of the Selected Loci and Candidate Genes
To find selection signals associated with high prolificacy, the average F ST values were calculated for nonoverlapping 50 kb windows on the autosomes and X chromosome. After Z-transforming the values, we selected the windows with Z(F ST ) > 5 across the genome, as in previous studies [34]. In total, we identified 85 unique autosomal regions and two Xchromosome regions containing 81 candidate genes. The region with the strongest selective signal (F ST = 0.6639187, ZF ST = 10.65708746) between the monotocous and polytocous sheep was located on chromosome 5 (16750000-16800000 bp), which contains KDM4B (lysine demethylase 4B) (Figure 2A, Supplementary Table S6).
neighbor-joining phylogenetic tree constructed using whole-genome SNP data. The scale bar represents the level of similarity; Monotocous (red) and polytocous (blue) samples are indicated. (B) A neighbor-joining phylogenetic tree constructed using SNP data selected based on FST score.

Analysis of the Selected loci and Candidate Genes
To find selection signals associated with high prolificacy, the average FST values were calculated for nonoverlapping 50 kb windows on the autosomes and X chromosome. After Z-transforming the values, we selected the windows with Z(FST) > 5 across the genome, as in previous studies [34]. In total, we identified 85 unique autosomal regions and two Xchromosome regions containing 81 candidate genes. The region with the strongest selective signal (FST = 0.6639187, ZFST = 10.65708746) between the monotocous and polytocous sheep was located on chromosome 5 (16750000-16800000 bp), which contains KDM4B (lysine demethylase 4B) (Figure 2A, Supplementary Table S6). We also estimated the XP-EHH statistic for the monotocous and polytocous groups using monotocous sheep as a control. An XP-EHH value greater than zero indicates that these sites have been selected in the monotocous population, while a value less than zero indicates that selection has occurred in the polytocous population. We scanned the regions using the threshold |XP-EHH| > 2 as candidate regions. A total of 198 regions including 162 genes were found to have undergone positive selection in the XP-EHH analysis, with 155 regions having undergone selection in the polytocous population. A region of chromosome 6 (36,200,000-36,250,000 bp) associated with strong selection was found to have the largest |XP-EHH| value (XP-EHH = −3.963) ( Figure 2B, Supplementary Table S7). We also estimated the XP-EHH statistic for the monotocous and polytocous groups using monotocous sheep as a control. An XP-EHH value greater than zero indicates that these sites have been selected in the monotocous population, while a value less than zero indicates that selection has occurred in the polytocous population. We scanned the regions using the threshold |XP-EHH| > 2 as candidate regions. A total of 198 regions including 162 genes were found to have undergone positive selection in the XP-EHH analysis, with 155 regions having undergone selection in the polytocous population. A region of chromosome 6 (36,200,000-36,250,000 bp) associated with strong selection was found to have the largest |XP-EHH| value (XP-EHH = −3.963) ( Figure 2B, Supplementary Table S7).
We combined the genes obtained by the two methods described above. A total of 221 genes were found to have been selected in total. Overall, 14 genes were positively selected across both of the two methods (Supplementary Table S8).
Gene Ontology and KEGG pathway analyses were performed to further study the functions of the selected genes identified by the two methods. The enriched GO terms (p-value < 0.1) and KEGG pathways (p-value < 0.5) are shown in Supplementary Tables S9 and S10. By the Functional Annotation Clustering tool of DAVID 6.8, we identified two annotation clusters (classification stringency: Medium). Annotation cluster 2 contained the GnRH signaling pathway, estrogen signaling pathway, and oxytocin signaling pathway. The reproductive hormones in these pathways are involved in regulating sheep estrus, Animals 2021, 11, 157 7 of 13 follicle development, and ovulation. JUN (JUN proto-oncogene, AP-1 transcription factor subunit), ITPR3 (inositol 1,4,5-trisphosphate receptor type 3), and PLCB2 (phospholipase C beta 2) were enriched in all three pathways (Table 1).

Mutations in the KDM4B Gene
KDM4B is located in the highly differentiated region with the highest F ST value between the monotocous and polytocous groups. A selected mutation that goes to fixation tends to reduce variation in linked sites in the process of a selective sweep [39]. Therefore, we determined the Hp value of the window (chr 5: 16,750,000-16,800,000 bp) around KDM4B. The Hp value of the polytocous group (Hp = 0.0818) decreased in the KDM4B region and was the lower than in the monotocous group (Hp = 0.22923) ( Figure 3A).
To identify SNVs subjected to selection, we screened the exonic mutations of the KDM4B gene in both monotocous and polytocous groups. SNVs that can alter protein translation, structure, and even function may contribute to rapid evolution in domestic animals [40]. Here, 13 synonymous SNVs, 3 nonsynonymous SNVs, and 1 frameshift deletion were identified. All of the three nonsynonymous SNVs cause amino acid sequence changes p.S570G (F ST = 0.11), p.S924L (F ST = 0), and p.S936A (F ST = 0.45) in the translated protein (in accordance with Ensembl gene annotation) (Supplementary Table S11).
We compared the F ST values of the three nonsynonymous SNVs; only mutation p.S936A had high divergence of allele frequency within our population: monotocous sheep 50% (n = 5) and polytocous sheep 90% (n = 5). To confirm these frequencies based on Illumina sequencing, alleles of additional samples were genotyped by Sanger sequencing. The frequency of mutant allele of KDM4B gene genotyped by Sanger sequencing showed a slight decrease in polytocous group 60% (n = 14), compared with the result in whole genome sequencing. The mean of individual litters variants in polytocous sheep may affect the distribution of the genotype when increasing number of samples. The frequency of the mutant homozygote in polytocous sheep was higher than that in monotocous sheep. Moreover, the distribution of KDM4B p.S936A genotype contained significant differeces To identify SNVs subjected to selection, we screened the exonic mutations of the KDM4B gene in both monotocous and polytocous groups. SNVs that can alter protein translation, structure, and even function may contribute to rapid evolution in domestic animals [40]. Here, 13 synonymous SNVs, 3 nonsynonymous SNVs, and 1 frameshift deletion were identified. All of the three nonsynonymous SNVs cause amino acid sequence changes p.S570G (FST = 0.11), p.S924L (FST = 0), and p.S936A (FST = 0.45) in the translated protein (in accordance with Ensembl gene annotation) (Supplementary Table S11).
We compared the FST values of the three nonsynonymous SNVs; only mutation p.S936A had high divergence of allele frequency within our population: monotocous sheep 50% (n = 5) and polytocous sheep 90% (n = 5). To confirm these frequencies based on Illumina sequencing, alleles of additional samples were genotyped by Sanger sequencing. The frequency of mutant allele of KDM4B gene genotyped by Sanger sequencing p.S936A affected the TUDOR domain ( Figure 3C). This domain can bind to specific lysine methylation marks on histone proteins (H3-K4me3, H3-K23me3, and H4-K20me3). It plays a vital role in chromatin localization and the regulation of enzymatic function [41]. Thus, we aligned the KDM4B protein mutant with its ortholog in diverse vertebrates to evaluate the functional effects of the variants. The results reveal that p.S936A is quite well conserved, being invariant among all of the other mammals that we used ( Figure 3C). All of the results indicate that p.S936A is an important mutation for the reproduction-related KDM4B sweep.

Discussion
Continuous artificial selection in production-oriented breeding has left selective signatures and genomic variability in domesticated sheep. In this study, we performed whole-genome sequencing of 10 Bamei mutton sheep with different litter sizes from the same population. In this population, reproductive traits have undergone intensive selection via the breeding strategy. In this research, numerous mutations were selected in the population after screening, but the rates of heterozygosity did not differ between the two populations. When a neighbor-joining tree was constructed using SNPs selected based on F ST score, these two groups were differentiated into two genetic clusters. Most quantitative traits are known to respond quickly to artificial selection, and this population subjected to selection of litter size might have evolved in opposite directions with soft sweeps [42,43].
Selective sweeps can identify important genomic regions that have been swept in the recent past. Some genes associated with complex, economically important traits have been identified with the help of selected sweep analysis by whole-genome sequencing. We used the XP-EHH test to detect alleles near fixation within a Bamei mutton sheep population. Upon undergoing recent selection, the selected allele generally reaches a high frequency or fixation in one group, but remains polymorphic in the whole population [44]. In this research, we discovered 155 regions selected for in the polytocous population, which was greater than the 43 regions screened in the monotocous population. Using fixed window selection of F ST and XP-EHH, 221 candidate genes were found to be associated with the prolificacy trait. In a previous study using a segregated flock based on QTL and GWAS mapping, some mutations (FecB, FecX, and FecG) were identified to affect ovulation in sheep [11]. However, here we did not identify any mutations previously reported to increase the number of ovulations in sheep. The litter size per breeding ewe is not only influenced by ovulation, but also affected by a number of factors including fertilization rate and pregnancy loss (in the embryonic and fetal development period). Recently, the LEPR gene, estrogen receptor 1 (ESR1) gene, and prolactin (PRL) gene were found to be associated with fecundity, as revealed by a selective sweep analysis in European commercial and semi-feral breeds and a Chinese indigenous breed distributed in different ecoregions [23,45,46]. JUN, ITPR3, and PLCB2 in the selected region were enriched in gonadotrophin releasing hormone (GnRH), oxytocin, and estrogen signaling pathway associated with the complex process from estrus to lambing. This process is organized by complex communication among the hypothalamus, pituitary, ovary, and uterus. GnRH, secreted by the hypothalamus, regulates the synthesis and release of gonadotrophins, follicle-stimulating hormone (FSH), and luteinizing hormone (LH) from the pituitary [47]. FSH and LH support the growth and maturation of follicles. Estradiol and progesterone secreted by the corpus luteum inhibit GnRH release by feedback modulation. During the follicular phase, following luteolysis, estradiol reaches a critical threshold and stimulates the preovulatory gonadotrophin surge, appearing to help the ovulation of mature follicles [11].
PLCB is involved in a wide range of signals of reproductive processes, being regulated by many hormones (FSH, LH, GnRH, oxytocin). PLCB2 as a member of the PLCB subfamily is activated by G-protein-linked receptors and can hydrolyze phosphatidylinositol 4,5bisphosphate to form inositol-1,3,4-trisphosphate (IP3) and diacylglycerol (DAG), which stimulate Ca 2+ release and protein kinase C activity, respectively [48]. ITPR3 is a member of the IP3 receptor family. IP3, as an intracellular secondary messenger, mobilizes Ca 2+ from endoplasmic reticulum stores, which transduces several calcium-dependent cascades. IP3 receptor downregulation induced by GnRH can suppress the secretion of LH/FSH [49,50].
JUN is one of three JUN members [JUN (c-JUN), JUNB, and JUND] constituting the AP-1 family of heterodimeric transcription factors by combining with four FOS members [FOS (c-Fos), FOSB, FRA1, and FRA2]. GnRH can stimulate the expression of JUN by rapidly and transiently binding to the AP1 site in the FSHB promoter and then stimulating FSHB transcription [50]. Conditional knockout of JUN in mice resulted in subfertility in both sexes, such as impaired spermatogenesis in males, diminished corpora lutea in females, and lower gonadal steroid (GnRH and LH) levels [51].
The nonsynonymous mutation p.S936A at KDM4B was discovered and confirmed in monotocous and polytocous sheep; it may be the reason for the signal of a selective sweep at the KDM4B locus. KDM4B belonging to the KDM4/JMJD2 family of histone demethylases contains a JmjN domain, JmjC domain, tandem plant homeodomains (PHD), and tandem Tudor domains. KDM4B catalyzes the demethylation of H3K9me3 and H3K9me2 at or near regulated promoters to promote expression of the downstream pathway induced by multiple different extracellular stimuli [52]. KDM4B plays a central role in regulating the Estrogen Receptor (ER) signaling cascade by controlling expression of the ER and FOXA1 genes. These two important genes can maintain the estrogen-dependent phenotype [53]. In the developing ovarian follicle, granulosa cells are the main producers of estrogen. KDM4B is expressed in granulosa cells at early stages of folliculogenesis and its level is correlated with pregnancy failure in IVF patients [54]. KDM4B expression in the uterus is also associated with recurrent pregnancy loss in women [55]. The identified intersection between steroid hormones and KDM4B in the ovary and uterus sheds new light on the regulation of reproduction.

Conclusions
Following high-throughput sequencing, SNPs and indels were identified from two sheep populations. The sequencing data revealed the genetic diversity and population differentiation of the Bamei mutton sheep population experiencing selection for litter size. The genomic selection scan detected some interesting candidate genes and pathways under artificial selection, which might have increased litter size. This genome-wide research provides valuable information for future whole-genome selection for fecundity in this sheep breed.
Supplementary Materials: The following are available online at https://www.mdpi.com/2076-2 615/11/1/157/s1, Figure S1: Distributions of (A) minor allele frequency and (B) Tajima's D across the genome of Monotocous (Blue) and polytocous (Red), Figure S2: Levels of linkage disequilibrium across the genome of Monotocous (Blue) and polytocous (Red), Table S1: Primer information of Sanger sequencing, Table S2: Sample information and sequencing quality, Table S3: Reads mapping statistics and coverage of depth for Monotocous and Polytocous sheep, Table S4: Number of SNPs and Indels in different chromosomes, Table S5: Number of annotated SNPs in different gene regions, Table S6: Genomic regions and genes highly differentiated (Z(FST) > 5) between Monotocous and Polytocous sheep, Table S7: Candidate regions and genes of 50-kb windows with value of |XPEHH| > 2 in Monotocous and Polytocous sheep, Table S8: Overlapping selected windows selected by FST and XP-EHH, Table S9: Gene Ontology enrichment analysis of the 221 candidate genes with GO terms (p < 0.1), Table S10: Ranking of KEGG pathway with Pvalue (p < 0.5) enriched from 221 genes, Table S11: Mutations in exons of KDM4B gene.