Whole Genome Resequencing Reveals Selection Signals Related to Wool Color in Sheep

Simple Summary The color of wool is an essential trait in sheep which plays a significant role in the textile industry. The color of wool is determined by the presence of various pigments, which can range from white to various shades of brown, gray, black, etc. Understanding the genetics behind wool color is crucial for selective breeding and producing desirable colors for different textile products. By studying the genetic basis of wool color, researchers can identify genes related to pigmentation and develop strategies to enhance or modify wool color. This knowledge contributes to the improvement of wool quality, diversification of textile options, and economic development in the wool industry. Abstract Wool color is controlled by a variety of genes. Although the gene regulation of some wool colors has been studied in relative depth, there may still be unknown genetic variants and control genes for some colors or different breeds of wool that need to be identified and recognized by whole genome resequencing. Therefore, we used whole genome resequencing data to compare and analyze sheep populations of different breeds by population differentiation index and nucleotide diversity ratios (Fst and θπ ratio) as well as extended haplotype purity between populations (XP-EHH) to reveal selection signals related to wool coloration in sheep. Screening in the non-white wool color group (G1 vs. G2) yielded 365 candidate genes, among which PDE4B, GMDS, GATA1, RCOR1, MAPK4, SLC36A1, and PPP3CA were associated with the formation of non-white wool; an enrichment analysis of the candidate genes yielded 21 significant GO terms and 49 significant KEGG pathways (p < 0.05), among which 17 GO terms and 21 KEGG pathways were associated with the formation of non-white wool. Screening in the white wool color group (G2 vs. G1) yielded 214 candidate genes, including ABCD4, VSX2, ITCH, NNT, POLA1, IGF1R, HOXA10, and DAO, which were associated with the formation of white wool; an enrichment analysis of the candidate genes revealed 9 significant GO-enriched pathways and 19 significant KEGG pathways (p < 0.05), including 5 GO terms and 12 KEGG pathways associated with the formation of white wool. In addition to furthering our understanding of wool color genetics, this research is important for breeding purposes.


Introduction
The influence of wool color on the textile industry dates back a long time [1][2][3], and the colors of wool include white, brown, gray, black, tan, and yellow [2,4,5].White wool meets the demand for rich colors [1,3,6,7] due to its excellent dyeing ability.Breeding for white wool has always been a priority in sheep farming, due to the high level of pursuit by the textile industry [1,3,5,8,9].And, with the rise of green concept [3,10,11], natural colored wool is a way to replace traditional printing and dyeing [12].A comparison of modern and ancient wool products reveals a reduction in the diversity of modern wool colors [5,13], making it imperative to conserve colored wool breeding resources.Research on wool color-related genes can help improve varieties [7,14], increase the economic value and competitiveness of natural wool [14], and provide more options and innovations for textile production of different colored wools [15].At the same time, understanding wool color characteristics of different genotypes and breeds can help maintain genetic diversity [16], prevent gene loss [16], and promote ecosystem management and conservation [7,[17][18][19].Therefore, continued research on candidate genes for various wool colors remains necessary [14].
Regarding the mining of sheep wool color genes, the initial studies mainly explored the effect of genes on sheep wool color through knockout or mutation of single genes [20].With the improvement in sequencing technology and the development of in vitro breeding techniques, more and more genes with their variant forms have been discovered and their roles have been determined [9,21].Currently, technologies such as whole genome sequencing, RNA sequencing, and gene editing techniques are widely used to mine and study wool color genes in sheep [16,17,22,23].The wool color of sheep is controlled by a series of genes.In the past decades, scientists have successfully identified multiple genes related to wool color in sheep.The MC1R, ASIP, TYRP1, KIT and MITF loci are important in biology and genetics, and they play key roles in biological processes such as formation and distribution of coat color, pigment production and distribution, and cell migration.The study of these genes not only contributes to our in-depth understanding of the genetic mechanism of coat color but also provides a basis for the improvement and selection of sheep breeds with specific coat color characteristics.The study of key genes for coat color not only is important for animal husbandry but also provides valuable information for biological and medical research, as well as deepening the understanding of ecology and evolution.Under the action of the MC1R gene, melanocytes produce melanin and deposit it into the hair follicle, which results in a black or brown coat color in sheep [24,25].In contrast, under the action of the ASIP gene, melanin production and deposition is inhibited, leading to light pigmentation [24].The ratio of the expression of these two genes allows the wool color to be presented in the black to reddish-brown range [24].TYR is a key enzyme in the regulation of melanogenesis, and mutations in TYR lead to the production of white wool [24,26,27].TYRP1 is an important enzyme in the synthesis of true melanin [24,26,27].Mutations in TYRP1 result in the inability to convert the brownish 5,6-dihydroxyindol into the blackish eumelanin, which affects the shade of brown color of wool [24,28].KIT [19], MLPH [24], and KIF5A [29] are commonly recognized as genes that mediate the formation and distribution of pigment granules in melanocytes which are associated with the translocation of melanosomes.In contrast, the PMEL [27] gene is involved in melanosome structure, and its variation can inhibit melanosome formation, resulting in melanin dilution [24,25].In addition, the regulatory mechanisms of transcription factors are also closely related to sheep wool color.Transcription factors are a class of proteins that can bind to gene DNA and regulate gene expression, such as SOX10 [24] and MITF [30], which have been shown to be essential for melanocyte differentiation and maturation.MITF is a key regulator of pigmentation, and variations in MITF have been associated with the formation of lightcolored wool [24,30].Variations in IRF4 lead to lighter coloration [31,32].Mutations in DCT result in increased production of eumelanin and decreased production of pheomelanin in melanocytes [25,27,33,34].The MSG1 (CITED1) gene enhances melanin production in B16 cells [35].In previous studies, black and white wools were mainly used, followed by brown and tan, while other colors of wools were less studied.
Wool color is complex and affected by the interaction of multiple genes [5,24], and although some relevant genes have been identified, research is still ongoing [14].Different breeds of sheep may be caused by different genes even if they have the same color wool [19,24], and they have different genetic variants [7,19], requiring in-depth study of the mechanism of wool color [24].In addition, gene interactions and environmental factors also affect wool color [19], which is an issue that may require further research in the future.
We studied the coat color of different breeds of sheep by whole genome resequencing to obtain more comprehensive and detailed data [9,19,21].Employing three signal analysis methods [36] (population differentiation index (Fst [36]), nucleotide diversity ratio (θπ ratio [37][38][39]), and cross-population extended haplotype homozygosity (XP-EHH [40])) allowed us to obtain comprehensive insights into selection signals, resulting in improved reliability and understanding of the evolution and adaptations of sheep breeds, as well as the effects of natural and artificial selection.

Ethics Statement
All experimental work on sheep was approved by the Animal Ethics Committee of the Institute of Animal Science, China Academy of Agricultural Science (protocol code IAS 2022-7 and 25 February 2022).

Sample Collection and Sequencing
Jugular vein bloods were collected from fifteen sheep breeds (Table 1)   We carefully selected healthy, breed-typical individuals at the age of 1 year to ensure a representative and comparable cohort of animals.Whenever possible, we utilized full/half siblings to minimize individual variations.The farm provided high-quality feed and ensured access to clean water.Moreover, it offered a suitable and comfortable feeding environment with appropriate space allocation, dry bedding, and a controlled temperature range.Regular health checks and vaccination programs were implemented along with strict hygiene practices.Reasonable exercise opportunities were provided while ensuring adequate rest for the animals.Additionally, effective breeding management was carried out alongside regular inspection and maintenance of equipment.

Alignments and Quality Control
The raw reads of fastq format were first processed through a series of quality control procedures using FastQC to ensure reliable reads in the subsequent analyses.The standards of quality control were followed, including (1) removing reads with ≥10% unidentified nucleotides (N); (2) removing reads with >20% bases having phred quality less than 5; (3) removing reads with >10 nt aligned to the adapter, allowing ≤ 10% mismatches; and (4) removing putative PCR duplicates generated by PCR amplification in the library construction process (reads 1 and 2 of 2 paired-end reads that were completely identical).
Valid high-quality sequencing data were aligned to the reference genome (GCF_01677 2045.1_ARS-UI_Ramb_v2.0) using BWA software (v 0.7.17) [41] with the following parameters: mem -t 4 -k 32 -M.The resulting alignments were processed using SAMTOOLS [42] to remove duplicates, employing the parameter rmdup.In order to enhance the accuracy of data analysis, high-quality SNPs [43] meeting the following criteria were selected: (1) SNPs with a depth of coverage greater than 2; (2) SNPs with a proportion of MIS (deletions) less than 10%; (3) SNPs with a minimum allele frequency (MAF) greater than 5% [44].

Population Structure Analysis
Before conducting the analysis, all single nucleotide polymorphisms (SNPs) underwent trimming using the indep-pairwise [45] function of PLINK 1.09 software [46].The trimming process involved applying specific parameters, including a non-overlapping window of 25 SNPs, a step size of 5 SNPs, and a threshold of 0.05 for r2, in order to obtain a set of independent SNP markers.To examine the clustering patterns within the population, we conducted a principal component analysis (PCA) using PLINK 1.09 [46].Additionally, to assess the genetic relatedness among individuals, we constructed neighbor-joining (N-J) trees [47] using MEGA (v 7.0) software [48] and visualized them using ITOL (v 6) software [49] (https://itol.embl.de/upload.cgi,(accessed on 9 June 2023)).Furthermore, to evaluate the extent of population stratification and to validate the findings from PCA and N-J trees, we employed ADMIXTURE (v 1.3) software [50] to construct the population genetic structure, with k values ranging from 2 to 9.

Analysis of Selection Signals
Resequenced data are rich in variation information yet are fraught with noise and false alarms.By first using Fst [36] and θπ ratio [37][38][39] screening, an initial set of candidate selection signals can be quickly obtained, reducing the time and resources required for subsequent analysis.The true selection signal can be more accurately identified and potential false positives can be eliminated by subsequent verification and validation using XP-EHH [40].As a starting point for analysis, BAS, DUL, ALT, QIB, TUB, GBF, NLB, and SPG were categorized into G1 and GME, DOP, LTH, GLT, HUS, TON, and LLT were categorized into G2.The Fst and θπ values were computed by employing VCFtools (version 0.1.15)software [51].The analysis incorporated specific parameters: -fst-windowsize 50,000 and -fst-window-step 50,000.Subsequently, the obtained values were utilized to derive the θπ ratio [37][38][39].The selected genomic intervals of the G1 and G2 populations associated with wool color traits were screened by comparing the Fst and pi values of the G1 and G2 populations.Then, we use population marker information to estimate the haplotype of each chromosome by fastphase 1.4 [52] with the options set to −Ku40 −Kl10 −Ki10.XP-EHH scores were calculated using haplotype information from the XP-EHH program at http://hgdp.uchicago.edu/Software/,(accessed on 9 June 2023) to determine whether selection had occurred in the experimental (G1 or G2) population [40].Using the sliding window method, XP-EHH values were then calculated with a window size of 50 kb and a step size of 20 kb.Next, the mean was computed for each SNP in the sliding window.A negative XP-EHH score means that selection has taken place in the reference population, in contrast to a positive XP-EHH score, which represents that selection has occurred in the experimental population.
The problem of small sample sizes in varieties can be addressed to some extent by combining multiple methods for analysis.By combining multiple methods, the ability to detect genetic variation can be improved and the reliance on large sample sizes can be reduced [40,53,54].Each method has its own unique information and limitations, so joint analysis can combine the strengths of different methods and increase the sensitivity and accuracy of detection of genetic signals [55][56][57].In addition to increasing detection power, the combined analysis of three methods can be independently validated, providing complementary information [40,53,58,59].Consistent results from multiple methods can increase confidence in the signal.When multiple methods indicate the presence of the same genetic variant signal, the authenticity of the signal can be confirmed with greater confidence.Different methods have different characteristics and preferences for detecting genetic variation.By utilizing multiple methods in combination, a comprehensive understanding of the characteristics and patterns of genetic variation can be obtained from different perspectives, better revealing the potential biological significance.

Detection and Annotation of Candidate Genes
Different breeds of wool colors have different heritabilities, mostly low to medium heritabilities, but some have high heritability [60][61][62][63][64].It is possible that genes associated with wool color are located in regions with low genetic diversity.Therefore, in this study, the threshold for selection signal analysis has been adjusted to the top 5% to avoid overlooking candidate genes that may be involved in wool color [65].Then, the loci within the window where Fst [36], θπ ratio [37,38], and XP-EHH [40] were top5% were extracted as significant SNP loci, namely the candidate loci for the selection signal.Areas 50 kb upstream and downstream of the candidate loci were regarded as selection signaling regions.We used ANNOVAR software (https://annovar.openbioinformatics.org/en/latest/, accessed on 9 June 2023) [66] to annotate the genes with the sheep reference genome.Finally, the Venn diagram was created on the basis of the candidate genes derived from Fst, θπ ratio, and XP-EHH.

Candidate Gene Enrichment Analysis
To uncover the function and the mechanism of expression regulation of the genes, a functional enrichment analysis was performed.Candidate gene functional enrichment was performed using DAVID 6.8 [67] (https://david.ncifcrf.gov/,(accessed on 15 June 2023)), with gene symbol as the input parameter and Ovis_aries selected as the background organism.We tallied the number of genes that were enriched in these GO [68] terms and evaluated the significance of their enrichment by means of the hypergeometric distribution test.These genes were analyzed for KEGG [69] enrichment by means of Kobas 3.0 [70] (http: //kobas.cbi.pku.edu.cn/kobas3/genelist/,(accessed on 15 June 2023)), with Ovis_aries selected for background organism, using the Hypergeometric test/Fisher's exact test as the statistical method.The terms and pathways with p-value < 0.05 were judged to be significant.

Genetic Variation and Population Genetic Analysis
Whole genome resequencing with an average coverage of 7.6× was performed on 48 sheep individuals in this study.A total of 9,581,315,830 reads were obtained after alignment to the sheep reference genome (ARS-UI_Ramb_v2.0),covering 98.03% of the reference sequence.A coverage of 98.03% signifies that the likelihood of missing important information or encountering errors is minimized, thereby ensuring more comprehensive and accurate genomic insights.Furthermore, high coverage facilitates enhanced confidence and interpretability, particularly when examining variant annotations, identifying mutation sites, and exploring genomic structure and function.Consequently, this dataset proves valuable for subsequent studies involving population structure analysis and identification of selection signals.After variant calling and quality control, a total of 22,133,207 SNPs were identified.Statistical results of SNPs showed that variants mainly occurred in intergenic interval, followed by intronic interval, exonic interval, etc.Among the exonic variants, there were 82,379 non-synonymous SNPs and 68,110 synonymous SNPs (Table 2).The TS/TV ratio was determined to be 1.9, closely approximating 2. This observation implies a relatively balanced distribution of SNPs across the population, indicative of a normalized genomic population structure.These findings establish a solid foundation of reliable data for further investigations into population structure and the identification of potential selection signals.Firstly, a map of the worldwide distribution of sheep breeds was created (Figure S1).Then PCA, phylogenetic tree construction, and population structure analysis were executed on the fifteen sheep populations using the received SNP datasets to understand the genetic relationships and differences between different wool color sheep populations from a genome-wide perspective.According to the PCA results (Figure 1a), PC1 and PC2 explained 5.33% and 4.04% of the genetic variation, respectively; the 15 breeds clustered into two groups, European sheep breeds and East Asian sheep breeds; Yunnan sheep breeds were significantly separated from other East Asian sheep breeds; and Tibetan sheep breeds were slightly separated from Kazakh and Mongolian sheep breeds.The non-white wool sheep breeds (GBF, SPG, and NLB) and white wool sheep breeds (DOP and GME) were separated from the population via PC1 and PC2 (Figure S2).The non-white wool sheep breeds (BAS, DUL, ALT, QIB, and TUB) could be separated from the remaining sheep population following PC3 (Figures S3 and S4).The population genetic structure (Figure 1b,d) was constructed using ADMIXTURE software to confirm the accuracy of the results obtained from PCA.With K = 2, the blue background was dominant, and there was a clear transition from European sheep breeds to East Asian Kazakh, Tibetan, and Mongolian sheep breeds to Yunnan sheep breeds; Yunnan sheep breeds and European sheep breeds were clearly separated from other breeds; and when K = 3, Tibetan sheep (GBF) were separated from other breeds.The population genetical structure results confirmed the results of PCA.The results of the N-J tree (Figure 1c) are somewhat different from those of PCA and STRUCTURE.The PCA results are the same as those of STRUCTURE, which suggests that the position of individuals in genetic space is consistent with their genetic components among different genetic groups.The N-J tree is inconsistent with the PCA and STRUCTURE results, which suggests that there are differences in the phylogenetic relationships between species and that further study and consideration of other possible factors and explanations are needed.

Analysis of Selection Signals
Within the non-white wool group (G1 vs. G2), 3944 top 5% selection signals were screened by the joint Fst&θπ ratio (Figure 2a); 8223 top 5% selection signals were screened by XP-EHH (Figure 2b).Upon ANNOVAR annotation of the screened candidate SNPs, 544 and 1061 candidate genes associated with colored wool color were detected, respectively.After constructing the Venn diagram, 365 overlapping candidate genes were obtained (Figure 2d), 2431 and 4250 selection signals in the top 5% of Fst&θπ ratio (Figure 2a) and XP-EHH (Figure 2c) were screened in the white wool group (G2 vs. G1), and 388 and 625 candidate genes were derived by annotation.Ultimately, 214 overlapping candidate genes were screened using the Venn diagram [71] (Figure 2e).

Sample Control and Population Genetic Analysis
In this study, we performed whole genome resequencing on 48 sheep samples.Individual variations may also potentially affect the analysis of selection signals.To mitigate these differences, this research focuses on selecting representative samples to minimize individual disparities.Additionally, a joint analysis method is employed to address the limitations posed by the small sample size, compensating for individual variations and reducing potential errors.This approach aims to improve the screening efficiency and reliability of selection signals by covering a broader and more comprehensive genomic region using the three combined methods [40,53,54].
According to the PCA and ADMIXTURE results, it was found that there were high genetic similarities and consistent genetic components among the four populations of European, Yunnan, Kazakh, and Mongolian sheep, which were consistent with their breeding history.The Tibetan sheep (GBF) is very close to the Kazakh and Mongolian sheep in PCA and has the genetic components of Yunnan sheep in the population structure analysis; GBF is a local breed, which has been selected and bred for a long time, and according to its geographic location, it is assumed that there is a genetic exchange with Yunnan sheep, Kazakh, and Mongolian sheep.The QIB in the N-J tree is very inconsistent with PCA and ADMIXTURE.According to the investigation, QIB was bred in the late 19th century by merchants and pilgrims who brought back lambskin sheep and other black lambskin sheep from overseas and crossed them with local ewes.The breeding time is relatively short, so it is closer to the European sheep in terms of phylogenetic relationship.

Selective Signal Analysis
Wool color formation is a complex process regulated by a variety of factors and mechanisms, mainly involving the development of pigment cells [72,73], pigment synthesis [72,73], pigment transport and release [72][73][74][75], pigment particle distribution [24,74], and other processes.The overall appearance of wool color in different animals also depends on the distribution pattern of pigment granules.Due to the specific origins of melanoblasts in certain regions of the neural crest during embryonic development, there are specific time frames within which migrating cells must reach their designated positions in the skin [72,73].Failure to do so can lead to areas of the skin lacking pigment cells, resulting in patches of white coloration known as leucism [24].Impaired melanogenesis ultimately leads to a complete absence of pigment.This phenomenon is most commonly observed on the legs, abdomen, and forehead since these areas are farthest from where melanoblasts originate and therefore require more time for cell migration [19].Overall, the formation of wool color is regulated by gene expression, cell-cell interactions, and hormonal regulation [74].

GO Terms and Pathways Associated with White Wool
Arginine and proline metabolism (oas00330) affects and regulates the function of pigment cells, which leads to the lightening or whitening of the animal's wool color [149].Myelin and melanocytes share common progenitors, and thus, myelination (GO:0042552) may affect melanocyte formation, which in turn leads to leucism [72,73].The structure and modification of chromatin (GO:0000785) plays an important role in the regulation of gene expression and phenotypic features, and it may be involved in the synthesis and formation of skin and hair pigmentation [150][151][152][153].The protein kinase complex (GO:1902911) plays an important role in cell signaling and regulation and is associated with pigment cell differentiation and pigment synthesis [30,154,155].The mitochondrion (GO:0005739) is involved in many cellular functions, including energy production and intracellular signaling, and its dysfunction is associated with hypopigmentation [156,157].RNA polymerase II transcription factor activity and sequence-specific DNA binding (GO:0000981) are associated with melanocyte development and differentiation [158].Valine, leucine, and isoleucine degradation (oas00280) is associated with hair color, and complexes of branched-chain amino acids which are potential depigmenting agents inhibit melanin synthesis [159,160].Carbon metabolism (oas01200) provides the carbon framework for melanin synthesis, but low carbon conditions are not favorable for melanin synthesis [161].Terpenoid backbone biosynthesis (oas00900) is associated with the synthesis of many natural products, some of which (carotenoids) may be related to coat color, skin color, and melanin [162][163][164].Selenocompound metabolism (oas00450) may be related to melanin because selenium is a trace element that has an effect on melanin synthesis [165,166].Lysine degradation (oas00310) prevents melanin pigment formation by inhibiting tyrosinase activity, which in turn leads to depigmentation [167].Nicotinate and nicotinamide metabolism (oas00760) decreases melanin synthesis [168].Pyruvate metabolism (oas00620) inhibits melanin biogenesis [169].Glycerophospholipid metabolism (oas00564) and purine metabolism (oas00230) are associated with melanin deficiency [170].The dysregulation of DNA replication (oas03030) may induce melanocyte decay [171].Transcriptional misregulation in cancer (oas05202) is involved in the regulation of melanin deposition [172].Glycine, serine, and threonine metabolism (oas00260) may also play a role in the white phenotype [139,143].

Genes Associated with Wool Color
Based on the significantly different genes obtained using the multiple selection signal analysis method and the cross-pathway genes in the Sankey diagram of wool color-related pathways, we have obtained some genes reported to be related to wool color formation after checking the related research literature to confirm previous studies and to better reflect the accuracy of this paper.However, we must point out that genetic differences between breeds may also affect the association between genes and coat color, since the two sets of samples in this study cover different breeds.In addition to this, candidate genes may not produce consistent effects in coat color phenotypes across breeds.PDE4B is associated with melanin deposition [173].Nie et al. found that GMDS may be associated with skin color regulation from a genome-wide association analysis [174].GATA1 and RCOR1 are major transcription factors for melanin formation [175], and GATA1 affects the formation of the red phenotype [176,177].The MAPK4-mediated MAPK4/MAPK6 pathway affects melanocyte proliferation and differentiation [113].TECRL is associated with hyperpigmentation of the head [112].MAPK10 is an important gene involved in melanin synthesis [112].SLC36A2 may be associated with melanogenesis [178], and it also is a candidate gene for cream, pearl, and champagne dilution phenotypes in horsehair [179,180].Mutations in exon 2 of the SLC36A1 gene are a key locus responsible for the champagne coat color in horses [179].MAOA is associated with the leucism phenotype [94,181].NOS3 may theoretically reduce melanogenesis [182].PDE4B is involved in melanin synthesis [183,184] and were exclusively associated with tanning ability [184].The overexpression of GABRR1 inhibits melanin stem cell regeneration [185].The GRM5 gene is associated with skin and hair pigmentation [186,187].PPP3CA may be associated with color variation [188].PPP1CB was shown to regulate actin filament polymerization and/or reorganization [189], regulating the distribution of pigment granules.Mutations in ABCD4 cause hyperpigmentation of the skin, leading to lighter hair color [190,191].VSX2 affects human skin pigmentation [192] and is also a candidate gene for vertebrate retinal pigmentation [193,194].The ITCH gene regulates the expression of the ASIP gene, which leads to the white wool phenotype [195].NNT can inhibit melanogenesis by suppressing MITF gene expression [196].However, another study showed that NNT can inhibit redox-dependent hyperpigmentation by a mechanism independent of UVB and MITF [197,198].POLA1 has been associated with impaired pigmentation [199].PDE3A regulates the production of cAMP and cGMP, which in turn may affect pigment aggregation and dispersion through a number of termes and pathways [200,201].MCM6 is associated with pigmentation around the eyes of cattle [202] and may be related to the formation of orange and blue skin color in lizards [138].In addition to this, MCM6 influences the normal development of melanocytes through DNA replication [203][204][205].POLA1 has been shown to be associated with reticulate skin hyperpigmentation in humans [206,207] and may be related to hyperpigmentation in threespine stickleback [206].IGF1R belongs to a family of tyrosine kinase receptors on cell membranes, and its aberrant expression hinders melanocyte pigmentation, proliferation, and migration [208][209][210].HMGA2 can greatly strength melanocyte stem cell activation and translocation [211,212].MEIS1 leads to the formation of ectopic pigment cell clumps [213].HOXA10 gene upregulates the DKK1 gene to regulate skin pigmentation [214,215] and may influence the black and white hair follicle phenotype in goats [215].Abnormal function of DAO enzymes may be associated with a number of skin pigmentation disorders [216][217][218].
Based on the screened pathways associated with wool color, we constructed Sankey diagrams and queried the cross-pathway genes.For some cross-pathway genes that were unknown to be associated with wool color traits, it was hypothesized that CALML4, GRIN1, MYLK, FGF18, and FGFR2 were associated with non-white wool formation and that ACAT2, PCCB, ALDH6A1, and ACSS2 were associated with white wool formation.

Conclusions
We have identified a range of genes that play pivotal roles in the formation and regulation of wool color.Among them, PDE4B, GMDS, RCOR1, TECRL, MAPK10, SLC36A2, SLC36A1, MAOA, GABRR1, GRM5, PPP3CA, and PPP1CB were associated with melanin stem cell regeneration, melanocyte proliferation and differentiation, melanin synthesis and distribution, as well as color variation, affecting the formation of the non-white wool phenotype; ABCD4, VSX2, ITCH, NNT, POLA1, PDE3A, MCM6, POLA1, IGF1R, HMGA2, MEIS1, HOXA10, and DAO were involved in impediments to melanocyte pigmentation, proliferation, and migration, which influence the formation of the white wool phenotype.In addition, we found that some genes (CALML4, GRIN1, MYLK, FGF18, FGF2, ACAT2, PCCB, ALDH6A1, and ACSS2) may be involved in wool color formation, which needs to be further verified.Our results will help to better promote sheep wool improvement breeding, which is crucial for the development of the white wool industry in China.

Supplementary Materials:
The following supporting information can be downloaded at https: //www.mdpi.com/article/10.3390/ani13203265/s1, Figure S1 S1: Gene annotation of top 5% outlier genomic regions from Fst and θπ ratio in non-white wool vs. white wool group; Table S2: Gene annotation of top 5% outlier genomic regions from Fst & θπ ratio in white wool vs. non-white wool group; Table S3: Gene annotation of top 5% outlier genomic regions from XP-EHH in non-white wool vs. white wool group; Table S4: Gene annotation of top 5% outlier genomic regions from XP-EHH in white wool vs. non-white wool group; Table S5: GO enrichment analysis in non-white wool vs. white wool group; Table S6: GO enrichment analysis in white wool vs. non-white wool group; Table S7: KEGG enrichment analysis in non-white wool vs. white wool group; Table S8: KEGG enrichment analysis in white wool vs. non-white wool group; Table S9: GO terms associated with non-white trait; Table S10: GO terms associated with white trait; Table S11: KEGG pathways associated with non-white trait; Table S12: KEGG pathways associated with white trait.Informed Consent Statement: Not applicable.

Figure 1 .
Figure 1.World distribution map of sheep breeds and population genetic structure analysis: (a) principal component analysis (PCA); (b) population structure analysis (Different colors represent different components of ancestry); (c) neighbor-joining tree; (d) cross-validation error.

Author Contributions:
Conceptualization, C.W.; methodology, C.W.; formal analysis, W.Z. and M.J.; investigation, W.Z., M.J., T.L., Z.L., H.W. and Z.Y.; resources, C.W. and Z.L.; writing-original draft preparation, W.Z.; writing-review and editing, M.J., H.W. and Z.L.; project administration, C.W.; funding acquisition, C.W. All authors have read and agreed to the published version of the manuscript.Funding: This research was funded by the National Natural Science Foundation of China, grant number No. 32272851.Institutional Review Board Statement: The animal study protocol was approved by the Institutional Ethics Committee of the Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (IAS-CAAS) (protocol code IAS 2022-7 and 25 February 2022).

Table 1 .
Information on the sheep populations in this study.

Table 2 .
The distribution of SNP variants in the genome region.

Table 3 .
Genes associated with wool color inferred from Sankey diagrams.