Genome-Wide Runs of Homozygosity, Effective Population Size, and Detection of Positive Selection Signatures in Six Chinese Goat Breeds

Detection of selection footprints provides insight into the evolution process and the underlying mechanisms controlling the phenotypic diversity of traits that have been exposed to selection. Selection focused on certain characters, mapping certain genomic regions often shows a loss of genetic diversity with an increased level of homozygosity. Therefore, the runs of homozygosity (ROHs), homozygosity by descent (HBD), and effective population size (Ne) are effective tools for exploring the genetic diversity, understanding the demographic history, foretelling the signature of directional selection, and improving the breeding strategies to use and conserve genetic resources. We characterized the ROH, HBD, Ne, and signature of selection of six Chinese goat populations using single nucleotide polymorphism (SNP) 50K Illumina beadchips. Our results show an inverse relationship between the length and frequency of ROH. A long ROH length, higher level of inbreeding, long HBD segment, and smaller Ne in Guangfeng (GF) goats suggested intensive selection pressure and recent inbreeding in this breed. We identified six reproduction-related genes within the genomic regions with a high ROH frequency, of which two genes overlapped with a putative selection signature. The estimated pair-wise genetic differentiation (FST) among the populations is 9.60% and the inter- and intra-population molecular variations are 9.68% and 89.6%, respectively, indicating low to moderate genetic differentiation. Our selection signatures analysis revealed 54 loci harboring 86 putative candidate genes, with a strong signature of selection. Further analysis showed that several candidate genes, including MARF1, SYCP2, TMEM200C, SF1, ADCY1, and BMP5, are involved in goat fecundity. We identified 11 candidate genes by using cross-population extended haplotype homozygosity (XP-EHH) estimates, of which MARF1 and SF1 are under strong positive selection, as they are differentiated in high and low reproduction groups according to the three approaches used. Gene ontology enrichment analysis revealed that different biological pathways could be involved in the variation of fecundity in female goats. This study provides a new insight into the ROHs patterns for maintenance of within breed diversity and suggests a role of positive selection for genetic variation influencing fecundity in Chinese goat.


Introduction
Chinese goats have extensive genetic resources and an outstretched gene pool. Natural selection and artificial selection at different intensities over time, imposed by environmental changes and animal husbandry practices, have resulted in a considerable number of desirable traits such as extensive adaptability, outstanding prolificacy, and powerful disease and cold resistance [1]. The traceable unique genetic pattern remaining in the genomic regions of an individual under selection is termed their selection signature [2]. Selection focused on certain characters, mapping certain genomic regions, often shows reduced genetic diversity and stretches of homozygosity. Detection of this selection footmark in the genomic regions can provide information regarding the underlying genetic mechanisms of specific phenotypic traits to better guide animal breeding. The genetic diversity of animals is vital for promoting their current production potential in a diverse environment, changing breeding strategies, and sustainable genetic improvement. The lack of genetic variation resulting from breeding closely related individuals often leads to the expression of genes that are detrimental to reproduction or even survival. This is why intensive selection strategies have drawn the attention of the scientific community; interest exists in preserving, characterizing, and monitoring the autozygosity of important animals for sustainable livestock production [3,4].
Inbreeding is a concerning practice in the livestock industry, considerably influencing genomic patterns, hence designing a tactful breeding scheme is essential. Practically, inbreeding represents the level of homozygosity of a population or within the genome of an individual, which leads to offspring with loss of fitness. The inbreeding coefficient (F) is generally used to estimate the extent of individual's inbreeding, which is traditionally computed from pedigree information. In recent times, there is an increasing interest in estimating the inbreeding coefficient from runs of homozygosity (ROH) as it is likely to be the most powerful method in detecting inbreeding effects from among several alternative estimates. Molecular genetics has facilitated the estimation of the level of autozygosity at individual and population levels resulting from demography, natural and artificial selection, and inbreeding by checking the number of contiguous segments of the genome, referred to as runs of homozygosity (ROH) [5,6]. ROH is used to estimate the extent of identical haplotypes in the genome of an individual, transmitted from their parents [2]. A longer ROH indicates recent inbreeding, whereas a shorter ROH indicates a loss of genetic diversity resulting from the founder effect or a genetic bottleneck in a population. The selection pressure increase the homozygosity in the targeted genomic region leads to the occurrence of ROH [7]. Therefore, the genomic regions with high ROH frequency can be used to detect the association between the genes and traits of interest [8]. Characterization of the homozygosity by descent (HBD) segments associated with ROH, and estimation of the autozygosity in an individual's genome, have become popular techniques as these measures provide insights into the recent history of a population as well as trait architecture [9]. However, until now, the extent of the ROH across the genome of various goat breeds was poorly understood.
The effective population size (N e ) is the size of a hypothetical ideal population that has the same divergence of gene frequency under random genetic drift or an equal amount of inbreeding as in the real population under consideration [10]. The effective population size is an important parameter for evaluating population genetic diversity and characterizing and understanding the underlying genetic architecture of an animal genome [11]. The N e is crucial in conservation biology as it is used to estimate the rate of genetic drift and inbreeding and affects the systematic evolutionary forces such as selection, mutation, and migration [12]. The N e also provides information on population demographic processes such as migration and admixture, genetic variations, and linkage disequilibrium in a population [11,13].
Reproductive performance is characterized by fecundity, which is crucial to the goat industry. Improvement of reproductive traits, especially the increase in litter size, has attracted widespread interest because small improvements could lead to large gains in profit [14]. With the advancement of molecular genetic technologies, a series of techniques have been developed to identify evidence of selection. Numerous candidate genes and genetic loci have been identified in years using single nucleotide polymorphism (SNP) chips [15]. Due to the availability of the Illumina Goat SNP 50K

Data Quality Control
Quality control of the SNP data was implemented using PLINK v.1.07 [20,21]. The noninformative SNPs that were incompatible with the following criteria were removed from the panel: (1) SNP call rate greater than 95%, (2) an SNP minor allele frequency greater than 0.05, (3) an SNP with a genotyping rate greater than 95%, (4) a maximum individual missing genotype rate of more than 10%, or (5) Hardy-Weinberg equilibrium >1 × 10 −5 .

Distribution of ROH
ROHs were computed and characterized for each breed using detectRUNS packages in R software [20]. The following criteria were used to define ROHs: (1) a minimum number of SNPs = 15, (2) minimum length of ROH = 1 Mb, and (3) one possible heterozygote and one missing genotype were allowed for each ROH [22]. The number of ROHs per chromosome was estimated by counting the ROH number for each chromosome. The percentage of the chromosome covered by ROHs was calculated by the formula suggested by Al-Mamun et al. [23]: summing all the ROHs on a chromosome divided by the number of individuals containing ROHs on that chromosome produces the mean ROH length; the mean ROHs length of that chromosome is then divided by the respective chromosome length and multiplied by 100 to convert into a percentage. The total number, length of ROH (in Megabases), and the sum of all ROH segments (in Megabases) were calculated for each breed. We categorized the ROH lengths into six length classes (0-3, 3-5, 5-10, 10-20, 20-30, and >30 Mb) to compare the distribution of ROH length in each breed. We calculated the frequency and average ROH length for each length category. The average ROH length for each class of each breed was estimated by summing all ROH segments of each ROH length class per breed and dividing by the total number of individuals of that respective breed.

Inbreeding Coefficient
The genomic inbreeding coefficient (FROH) for each breed was computed using the following method [24]:

Data Quality Control
Quality control of the SNP data was implemented using PLINK v.1.07 [20,21]. The non-informative SNPs that were incompatible with the following criteria were removed from the panel: (1) SNP call rate greater than 95%, (2) an SNP minor allele frequency greater than 0.05, (3) an SNP with a genotyping rate greater than 95%, (4) a maximum individual missing genotype rate of more than 10%, or (5) Hardy-Weinberg equilibrium >1 × 10 −5 .

Distribution of ROH
ROHs were computed and characterized for each breed using detectRUNS packages in R software [20]. The following criteria were used to define ROHs: (1) a minimum number of SNPs = 15, (2) minimum length of ROH = 1 Mb, and (3) one possible heterozygote and one missing genotype were allowed for each ROH [22]. The number of ROHs per chromosome was estimated by counting the ROH number for each chromosome. The percentage of the chromosome covered by ROHs was calculated by the formula suggested by Al-Mamun et al. [23]: summing all the ROHs on a chromosome divided by the number of individuals containing ROHs on that chromosome produces the mean ROH length; the mean ROHs length of that chromosome is then divided by the respective chromosome length and multiplied by 100 to convert into a percentage. The total number, length of ROH (in Megabases), and the sum of all ROH segments (in Megabases) were calculated for each breed. We categorized the ROH lengths into six length classes (0-3, 3-5, 5-10, 10-20, 20-30, and >30 Mb) to compare the distribution of ROH length in each breed. We calculated the frequency and average ROH length for each length category. The average ROH length for each class of each breed was estimated by summing all ROH segments of each ROH length class per breed and dividing by the total number of individuals of that respective breed.

Inbreeding Coefficient
The genomic inbreeding coefficient (F ROH ) for each breed was computed using the following method [24]: where L ROH is the total length of ROH of each individual in the genome and L AUTO is the length of the autosomal genome of the goat (set to 2399.4 Mb [22]). To verify the accuracy of F ROH , we also calculated the inbreeding coefficient based on the difference between the expected and observed numbers of homozygous genotypes (F HOM ) with the command-het using PLINK v.1.07. The correlation between F ROH and F HOM was calculated across the breeds.

Detection of Common ROHs
To identify the genomic regions with a high ROH frequency, the percentage of occurrences of an SNP in an ROH was calculated by counting the number of times the SNP was detected in an ROH across the population. The SNPs showing a percentage higher than 45% were selected as genomic regions with a high frequency of ROHs for further analysis. We used the caprine reference genome annotation file from the NCBI website to annotate the genes identified at particular genome coordinates for all the selected regions [25].

Homozygosity by Descent (HBD)
A hidden Markov model (HMM)-based approach was implemented in R CRAN: (https://CRAN. Rproject.org/package=RZooRoH) to scan the individual genome for the HBD segments, as described in Solé and Gori [26] and Bertrand and Kadri [9]. A MixKR model was applicable, where nine HBD states with respective rates (R k ) of (2 1 , 2 2 , 2 3 , . . . , 2 9 ) and one non-HBD with an R k rate of 2 9 were used. The number of classes, K, was fixed for each run. Only one non-HBD class and K-1 HBD class for a total of K = 10 class were considered. The predefined rate of K-1 HBD classes always ranged 2 to 2 K-1 , whereas the rate of the non-HBD class was fixed as the most ancient class. Each K has its own rate parameter, R K , which indicates the lengths of the segments for its respective class. The length of HBD classes is exponentially distributed with rate R k , which is double the number of generations to the common ancestor of the respective class, and this is called a K.R model [26]. The length of the HBD segment is 1/R Morgans, indicating high rates associated with shorter segments. To estimate the inbreeding coefficient, we considered the ancestors with an R k rate higher than threshold T as unrelated. The corresponding genomic inbreeding coefficient (F G−T ) was then estimated, with R K ≤ T averaged over the whole genome.

Effective Population Size (N e )
Effective population size (N e ) was calculated for each breed separately using SNeP v1.1 [27]. N e estimates at different time points are based on linkage disequilibrium, using the formula proposed by Sved [28]: where N e is the effective population size, c is the genetic distance in Morgans, and E(r 2 ) is the expected average coefficient of determination (r 2 ) value for distance c. Each time point corresponding to a genetic distance was calculated as: T = 1 2c . Estimated effective population sizes (N e ) against their past generations were plotted in R over the last 100 and 1000 generations.

Population Differentiation, Analysis of Molecular Variance, and Structure
Reynolds's genetic distance and pairwise difference were computed using the integrated software for a different hierarchical level of population genetic data analysis, using ARLEQUIN v.3.5.2 [29] with Genes 2019, 10, 938 6 of 24 100 permutations and a significance level of 0.05. The analyses of molecular variance (AMOVA) to test the partition of genetic diversity were performed using ARLEQUIN v.3.5.2.

Principal Component Analysis
To investigate the genetic relationship between individuals, we used the principal component analysis (PCA). A pruned set of SNPs that were in approximate linkage equilibrium with each other were used to avert clustering [30]. The PLINK indep-pairwise option (indep-pairwise 50, 10, 0.1), where the command considers each window of 50 SNPs, was used to find the pruned set of SNPs [20]. The criterion of a pairwise linkage disequilibrium (r 2 ) value of <0.1 was used to obtain the SNP panel that approximates linkage equilibrium. In this study, PCA was performed using PLINK software (parameter: pca) and the output was visualized using R software.

Neighbor-Joining Tree
To further confirm the phylogenetic relationships among the breeds, we constructed a neighbor-joining (NJ) tree using MEGA v.5.0 [31]. We calculated the genetic distance matrix using PLINK v.1.09 [20,21] (parameter: distance-matrix) and then, based on this distance matrix, we constructed the Neighbor-joining tree using MEGA v.5.0.

Admixture Analysis
To examine the pattern of genetic variation among the breeds, a model-based clustering approach was applied using the ADMIXTURE program v.1.2 [32]. The ancestral source K refers to the number of populations used in this dataset. A five-fold cross validation (CV) error for each K was used to select the best K.

Selective Sweep Analysis
We used the population differentiation index (F ST ), nucleotide diversity (π), and cross-population extended haplotype homozygosity (XP-EHH) approach to identify the genomic regions that appear to be the target of selection. We categorized the goat breeds into high and low groups according to their reproductive performance, and compared the high fecundity (~200%) goat breeds group (JG, GF, NJ, and LP) with the low fecundity (~100%) goat breeds (LN and QG) to explore positive selection signatures.

Population Differentiation Index (F ST ) Approach
The population differentiation index F ST is an important indicator for testing the footprint of positive selection and elicit whether genetic differentiation exists between populations. The unbiased metric of pairwise F ST values were computed, as described by Weir et al. [33]. We then transformed Z into F ST values [34].

Nucleotide Diversity (π)
Another popular approach to calculate the degree of polymorphism within a population is nucleotide diversity (π). We calculated the site nucleotide diversity (π) as the proportion of pairwise differences between two populations, (π (low fecundity goat)/π (high fecundity goat)) to investigate the genetic variation between the groups.

Estimation of Cross-population extended haplotype homozygosity (XP-EHH)
The XP-EHH estimate is often used to detect a positive selection event, mainly to compare the haplotype homozygosity (EHH) and integrated haplotype score (iHS) between two populations. The basic principle of XP-EHH estimation involves scanning the distinct SNPs between populations that are homozygous for one and polymorphic for others through the comparison of the EHH score of two populations. A positive XP-EHH value indicates selection occurs in the test population; a negative value indicates selection in the control population. The calculation formula is where I A is the integrated value of the test population EHH and I B is the integrated value of the reference population EHH. The codes from the following website were used to calculate the XP-EHH: http://hgdp.uchicago.edu/software/xpehh.tar. The calculated raw XP-EHH statistics were then standardized to a distribution with zero mean and unit variance. The regions showing extremely high F ST values and elevated nucleotide diversity (top 1% of both) were considered to be potentially selected candidate outliers under selection. We also scanned the top 0.1% of XP-EHH estimates to improve the confidence in the selected outliers under selection.

Gene Annotation and Functional Enrichment Analysis
Based on the above findings, the cross top 1% values of F ST and π and 0.1% of XP-EHH were considered as selective signals and were extended up to 100 kb upstream and downstream to cover the candidate region. To annotate the candidate region, we downloaded the caprine gene from Ensemble (http://www.ensemble.org/) and annotated it using R v.3.5.1. The overlapping genes retrieved from the two analytical approaches were used for enrichment analysis. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis of the target genes were performed using a web-based toolset: g: profiler (https://biit.cs.ut.ee/gprofiler/gost) and KOBAS v.3.0 (http://kobas.cbi.pku.edu.cn/anno_iden.php), respectively. As only limited genes are annotated in the goat genomes, we used the human genome as the background for enrichment analysis.

Sample and SNP Filtration
From the 206 individuals tested, only two individuals (one from JG and another from LN) were removed from further analysis due to their low genotyping rate (Maximum individual missingness rate >0.1). The total genotyping rate in the remaining individuals was 0.98147. A total of 1338 SNPs with a call rate <0.95, 2296 SNPs with a minor allele frequency <0.05, and 1814 SNPs with a Hardy-Weinberg equilibrium (p < 1 × 10 −5 ) were excluded from the panel (Table 1). After the quality filtration and exclusion of SNPs from the sex chromosome, 45,311 SNPs from 204 individuals were used for downstream analyses. ROHs are the result of demography, natural and artificial selection, and inbreeding [6,35]. According to the parameters set, 18,066 ROHs were found in total, with a mean of 88.55 per individual, ranging from 11 (QG) to 244 (GF). LN presented the largest number of ROHs (n = 7425), followed by GF (n = 3415). GF displayed the longest ROHs on average (mean = 3.29 ± 0.001 Mb) and the shortest average ROH length was found in LP (1.95 ± 0.001 Mb). Among all the ROHs, the longest ROH was detected on chromosome 20 of the LP goat breed (67.54 Mb), consisting of 1304 SNPs (Table 2). A total of 812 ROHs were longer than 10 Mb (225 for GF, 78 for JG, 375 for LN, 80 for LP, 33 for NJ, and 21 for QG). Chromosome 1 exhibited the maximum number of ROHs (n = 1027), followed by chromosome 2 (n = 920) ( Figure 2). The total number of ROHs per chromosome tended to decrease with decreasing chromosome length. The highest and lowest percentages of ROHs per chromosome were calculated for chromosome 28 (6.23%) and chromosome 1 (1.72%), respectively ( Figure 2). According to the parameters set, 18,066 ROHs were found in total, with a mean of 88.55 per individual, ranging from 11 (QG) to 244 (GF). LN presented the largest number of ROHs (n = 7425), followed by GF (n = 3415). GF displayed the longest ROHs on average (mean = 3.29 ± 0.001 Mb) and the shortest average ROH length was found in LP (1.95 ± 0.001 Mb). Among all the ROHs, the longest ROH was detected on chromosome 20 of the LP goat breed (67.54 Mb), consisting of 1304 SNPs (Table 2). A total of 812 ROHs were longer than 10 Mb (225 for GF, 78 for JG, 375 for LN, 80 for LP, 33 for NJ, and 21 for QG). Chromosome 1 exhibited the maximum number of ROHs (n = 1027), followed by chromosome 2 (n = 920) ( Figure 2). The total number of ROHs per chromosome tended to decrease with decreasing chromosome length. The highest and lowest percentages of ROHs per chromosome were calculated for chromosome 28 (6.23%) and chromosome 1 (1.72%), respectively ( Figure 2).   Figure 3 shows the relationship between the number of ROHs and the total genomic length covered by ROHs per individual, which varies considerably among breeds. The QG breed presented with a large number of short ROHs relative to the other breeds, and the GF goat displayed some extreme individuals with ROH coverage of more than 600 Mb (Figure 3). Figure 3 shows the relationship between the number of ROHs and the total genomic length covered by ROHs per individual, which varies considerably among breeds. The QG breed presented with a large number of short ROHs relative to the other breeds, and the GF goat displayed some extreme individuals with ROH coverage of more than 600 Mb (Figure 3). The frequencies of ROH numbers in different length classes are illustrated in Figure 4A. Our results show an inverse relationship between ROH length and frequency. The frequencies of shorter ROHs (0-3) predominated; the number of these ROHs accounted for 82.72% of the total number of ROHs. However, the frequency of ROHs in this major class varied between breeds. The majority, more than 90% of the ROHs in this class, were obtained from LP breeds, whereas the percentage was at least 80% for GF, JG, NJ, and QG, and less than 80% for the LN breed. GF and LN had the highest frequencies of ROHs among all breeds in the length category greater than three. Figure 4B shows the distribution of ROH within breeds. Among the six ROH categories under consideration, the short ROHs (<3) are most prevalent across the populations, wherein the average ROH mean spanned from 31 (QG) to 175 (GF). The longest ROH category (>30) was the rarest, wherein the GF breed displayed the highest mean ROH compared with the other breeds. The frequencies of ROH numbers in different length classes are illustrated in Figure 4A. Our results show an inverse relationship between ROH length and frequency. The frequencies of shorter ROHs (0-3) predominated; the number of these ROHs accounted for 82.72% of the total number of ROHs. However, the frequency of ROHs in this major class varied between breeds. The majority, more than 90% of the ROHs in this class, were obtained from LP breeds, whereas the percentage was at least 80% for GF, JG, NJ, and QG, and less than 80% for the LN breed. GF and LN had the highest frequencies of ROHs among all breeds in the length category greater than three. Figure 4B shows the distribution of ROH within breeds. Among the six ROH categories under consideration, the short ROHs (<3) are most prevalent across the populations, wherein the average ROH mean spanned from 31 (QG) to 175 (GF). The longest ROH category (>30) was the rarest, wherein the GF breed displayed the highest mean ROH compared with the other breeds.  To investigate the ROH content in each breed, we also calculated the sum of all ROH lengths for each individual within the breeds and found that the LN had more ROHs than others ( Figure 5). Individuals of the LN breed generally exhibited the highest average sum of ROH content (256.03 Mb) on their genome, followed by GF (117.75 Mb) and LP (106.75 Mb) ( Figure 5, Table S2). In To investigate the ROH content in each breed, we also calculated the sum of all ROH lengths for each individual within the breeds and found that the LN had more ROHs than others ( Figure 5). Individuals of the LN breed generally exhibited the highest average sum of ROH content (256.03 Mb) on their genome, followed by GF (117.75 Mb) and LP (106.75 Mb) ( Figure 5, Table S2). In contrast, the QG individuals displayed the lowest average sum of ROH (22.20 Mb). To investigate the ROH content in each breed, we also calculated the sum of all ROH lengths for each individual within the breeds and found that the LN had more ROHs than others ( Figure 5). Individuals of the LN breed generally exhibited the highest average sum of ROH content (256.03 Mb) on their genome, followed by GF (117.75 Mb) and LP (106.75 Mb) ( Figure 5, Table S2). In contrast, the QG individuals displayed the lowest average sum of ROH (22.20 Mb).

Inbreeding Coefficient of Runs of Homozygosity (FROH)
The mean inbreeding coefficient estimates, with their range in variation and distribution in the studied goat breeds, are reported in Table 2 and Figure 6. The mean FROH for all breeds ranged from 0.026 (QG) to 0.19 (GF) ( Table 2). The GF goat is the most inbred goat, followed by LN (0.162) and LP (0.105). At the individual level, the individuals of the GF breed displayed the highest FROH (0.48), including some individuals with extreme values compared with other breeds.

Inbreeding Coefficient of Runs of Homozygosity (F ROH )
The mean inbreeding coefficient estimates, with their range in variation and distribution in the studied goat breeds, are reported in Table 2 and Figure 6. The mean F ROH for all breeds ranged from 0.026 (QG) to 0.19 (GF) ( Table 2). The GF goat is the most inbred goat, followed by LN (0. The FHOM results indicated positive values only for the GF and LP breeds. The FROH was higher in GF in both the short-and long-length classes, whereas LN showed a higher FROH in the mediumlength class ( Table 3). The correlations between FROH and FHOM were found to be highest for QG (0.99), followed by NJ (0.98) and GF (0.97) ( Table 2). The FROH was higher in GF in most of the ROH length classes, with a higher mean FROH across the classes (Table 3).   The F HOM results indicated positive values only for the GF and LP breeds. The F ROH was higher in GF in both the short-and long-length classes, whereas LN showed a higher F ROH in the medium-length class ( Table 3). The correlations between F ROH and F HOM were found to be highest for QG (0.99), followed by NJ (0.98) and GF (0.97) ( Table 2). The F ROH was higher in GF in most of the ROH length classes, with a higher mean FROH across the classes (Table 3).

Genomic Regions with a High ROH Frequency
The most common genomic regions associated with ROHs in the six goat populations were detected, and the percentage of SNPs in ROHs were assessed by calculating the frequency of SNPs occurring in those ROHs across individuals. The output was plotted against the position of the SNP along the chromosome (Figure 7). A total of 97 genomic regions were identified (Table S3). Here, we focused on the selected regions and investigated if the identified regions coincided with the regions harboring the genes involved in reproduction. We identified six genes in the selected ROH regions that are associated with reproduction. We found that the ROHs on chromosomes 25 and 13 overlapped with regions detected by the selection signature, which spanned MARF1 and SYCP2, respectively. Another two regions on chromosomes 4 and 24 were near to the region detected by the selection signature, which contain ADCY1 and TMEM200C, respectively; these genes are also associated with reproductive processes.

Homozygosity by Descent (HBD) Classes
The inheritance of two copies of the same chromosomal segments in an individual's genome from an ancestor tracing back to different periods is referred to as homozygosity by descent (HBD) segments. The variable sizes of these segments are associated with the different ancestors that can be traced back to different generations in the past. The rate of the HBD class is inversely associated with the size of the segments. Long HBD segments correspond to recent common ancestors, whereas short segments indicate autozygosity inherited from ancient common ancestors. Figure 8 presents the partitioning of individual genomes in seven different HBD classes. A difference was observed in terms of partitioning. The individuals from JG and QG showed a limited amount of autozygosity associated with small and ancient HBD segments (indicated in blue). The GF breed displayed more variation in autozygosity and length of HBD segments. The most inbred GF population exhibited an autozygosity level of more than 0.4, mainly associated with a distant ancestor, with indications that the ancestors contributed traces back 128 generations. Some individuals of GF exhibited HBD segments from recent common ancestors. LN individuals showed a higher level of autozygosity compare with the LP and NJ breeds, but this is associated with more distant ancestors (with the HBD classes with ratings of 16 to 128).

Homozygosity by Descent (HBD) Classes
The inheritance of two copies of the same chromosomal segments in an individual's genome from an ancestor tracing back to different periods is referred to as homozygosity by descent (HBD) segments. The variable sizes of these segments are associated with the different ancestors that can be traced back to different generations in the past. The rate of the HBD class is inversely associated with the size of the segments. Long HBD segments correspond to recent common ancestors, whereas short segments indicate autozygosity inherited from ancient common ancestors. Figure 8 presents the partitioning of individual genomes in seven different HBD classes. A difference was observed in terms of partitioning. The individuals from JG and QG showed a limited amount of autozygosity associated with small and ancient HBD segments (indicated in blue). The GF breed displayed more variation in autozygosity and length of HBD segments. The most inbred GF population exhibited an autozygosity level of more than 0.4, mainly associated with a distant ancestor, with indications that the ancestors contributed traces back 128 generations. Some individuals of GF exhibited HBD segments from recent common ancestors. LN individuals showed a higher level of autozygosity compare with the LP and NJ breeds, but this is associated with more distant ancestors (with the HBD classes with ratings of 16 to 128).
be traced back to different generations in the past. The rate of the HBD class is inversely associated with the size of the segments. Long HBD segments correspond to recent common ancestors, whereas short segments indicate autozygosity inherited from ancient common ancestors. Figure 8 presents the partitioning of individual genomes in seven different HBD classes. A difference was observed in terms of partitioning. The individuals from JG and QG showed a limited amount of autozygosity associated with small and ancient HBD segments (indicated in blue). The GF breed displayed more variation in autozygosity and length of HBD segments. The most inbred GF population exhibited an autozygosity level of more than 0.4, mainly associated with a distant ancestor, with indications that the ancestors contributed traces back 128 generations. Some individuals of GF exhibited HBD segments from recent common ancestors. LN individuals showed a higher level of autozygosity compare with the LP and NJ breeds, but this is associated with more distant ancestors (with the HBD classes with ratings of 16 to 128).  The average inbreeding coefficient was estimated using the HBD class. GF goats displayed the highest average inbreeding coefficient, followed by the LN breed. An increasing trend of inbreeding coefficient with ancient classes was observed across the breeds, whereas the GF goats showed a rapid increase in the inbreeding coefficient from 0.00 for an estimate based on the first class (R K = 2) to 0.09 with R K ≤ 8, and then increasing marginally up to R K ≤ 64. The highest value was observed at R K = 512 (0.169) (Figure 9). A similar pattern for the inbreeding coefficient was found for LN: rapidly increasing up to R K ≤ 64, and the highest inbreeding coefficient was found at R K ≤ 128 (0.134). The average inbreeding coefficient was estimated using the HBD class. GF goats displayed the highest average inbreeding coefficient, followed by the LN breed. An increasing trend of inbreeding coefficient with ancient classes was observed across the breeds, whereas the GF goats showed a rapid increase in the inbreeding coefficient from 0.00 for an estimate based on the first class (RK = 2) to 0.09 with RK ≤ 8, and then increasing marginally up to RK ≤ 64. The highest value was observed at RK = 512 (0.169) (Figure 9). A similar pattern for the inbreeding coefficient was found for LN: rapidly increasing up to RK ≤ 64, and the highest inbreeding coefficient was found at RK ≤ 128 (0.134).

Effective Population Size (Ne)
Ancestral and recent effective population sizes (Ne) for six goat populations are presented in Figure 10. Estimated Ne showed a downward trend with the increase in generations across the populations. The most rapidly declining recent Ne was found in the GF and LN breeds, whereas JG and QG showed a slowly declining trend in Ne (Figure 10b).

Effective Population Size (N e )
Ancestral and recent effective population sizes (N e ) for six goat populations are presented in Figure 10. Estimated N e showed a downward trend with the increase in generations across the populations. The most rapidly declining recent N e was found in the GF and LN breeds, whereas JG and QG showed a slowly declining trend in N e (Figure 10b). Figure 9. Value of the threshold T used to estimate FG-T (using HBD classes with RK ≤ T).

Effective Population Size (Ne)
Ancestral and recent effective population sizes (Ne) for six goat populations are presented in Figure 10. Estimated Ne showed a downward trend with the increase in generations across the populations. The most rapidly declining recent Ne was found in the GF and LN breeds, whereas JG and QG showed a slowly declining trend in Ne (Figure 10b).

Population Differentiation, AMOVA, and Structure
Population genetic differentiation was evaluated by calculating Reynolds' genetic distance and pairwise differences. Our result showed that the Reynolds' genetic distance (DR) values vary from 0.02116 (NJ-QG) to 0.20055 (LN-LP), whereas the highest and lowest pairwise differences were found for LN-LP (0.18172) and NJ-QG (0.02099), respectively (Table 4).

Population Differentiation, AMOVA, and Structure
Population genetic differentiation was evaluated by calculating Reynolds' genetic distance and pairwise differences. Our result showed that the Reynolds' genetic distance (D R ) values vary from 0.02116 (NJ-QG) to 0.20055 (LN-LP), whereas the highest and lowest pairwise differences were found for LN-LP (0.18172) and NJ-QG (0.02099), respectively (Table 4). Analysis of molecular variance (AMOVA) illustrated that intra-population variation accounted for 89.61% of the total variation, with 9.68% (p < 0.001) of the variation being inter-population (Table S4).

Distinct Population Structure Pattern
The PCA results showed that the first principal component (PC1) accounted for 18.09% of the genetic variation, resulting in the clear segregation of LN and a clear separation of the LP breed in the positive direction of the second principal component (PC2), which explained 12.50% of the total variance ( Figure 11A). The NJ breed is clustered together with QG and a degree of overlap was detected between JG and GF, which is in accordance with their geographic distribution in China (Figure 1). This finding is consistent with the result of the NJ tree, wherein the LN and LP breeds formed genetically distinct groups ( Figure 11B). The NJ and QG goat breeds are clustered together and the degree of differentiation was low between JG and GF in the NJ tree analysis ( Figure 11B). To further verify the PCA results, we performed population admixture analysis, where the least amount of cross-validation error occurred when K = 5 ( Figure 11C), which indicates that K = 5 was the optimal modeling choice. Our admixture result showed that the LN and LP goat breeds displayed a separate group when K = 3. However, the QG admixed with NJ and JG, and GF goat breeds are mixed with each other ( Figure 11D) when K = 4 or 5. The results of the admixture were consistent with the results of PCA and the NJ tree.

Genome-Wide Selective Sweep Analysis in High-Fecundity Goat Breeds (High vs. Low)
The regions falling within the upper 1% of the empirical distribution of F ST (F ST > 0.261025) and nucleotide diversity (π) were scanned as candidate regions under selection. Based on the reference genome annotation, a gene that overlapped with an outlier of the top 1% of both the F ST and π estimates was deemed to be candidate gene under positive selection. We identified 460 candidate loci, of which a subset of 54 loci harbored 86 putative candidate genes with the signature of positive selection (Table S5). To improve confidence, we also scanned the top 0.1% of XP-EHH estimations. We found two candidate genes were highly differentiated between the high and low-reproduction groups according to three approaches used (Figure 12, Figure 13) that might have experienced strong positive selection. Based on their biological functions, associated pathways, and information from published studies, several genes were found possibly responsible for the reproduction trait in goats and are thus annotated in the Manhattan plot of F ST , log 2 (θπ ratio) and XP-EHH ( Figure 13). Table 5 shows the potential candidate genes with their loci.

GO Annotation and KEGG Pathway of the Target Genes
Functional enrichment of the target genes revealed that the genes were significantly enriched in 20 GO terms in molecular functions, 103 terms in biological processes, and 49 terms in cellular components (Table S6). The most significant GO terms were found in metabolic processes, regulation of biological processes, developmental processes, and reproductive processes in BP, binding (GO: 0005488) in molecular functions, and intracellular organelles (GO: 0043229) in cellular components ( Figure 14A).

GO Annotation and KEGG Pathway of the Target Genes
Functional enrichment of the target genes revealed that the genes were significantly enriched in 20 GO terms in molecular functions, 103 terms in biological processes, and 49 terms in cellular components (Table S6). The most significant GO terms were found in metabolic processes, regulation of biological processes, developmental processes, and reproductive processes in BP, binding (GO: 0005488) in molecular functions, and intracellular organelles (GO: 0043229) in cellular components ( Figure 14A).

Discussion
The detection of selection footprints in the genomic region has the potential to be used to identify genes and mutations associated with economically important phenotypic traits of livestock species. As the level of genetic diversity represents the raw materials for breed improvement, elucidating the genetic diversity of a population can provide insights for improving breeding strategies. ROH, HBD, and Ne are useful tools for exploring genetic diversity, providing information about population demographics evolution over time, and predicting underlying genome architecture. The developed genome-wide goat SNP arrays have become the marker of choice for investigating underlying genetic diversity, inferring population demography, and mapping genomic regions subject to selection [36]. In this study, we examined the genome-wide

Discussion
The detection of selection footprints in the genomic region has the potential to be used to identify genes and mutations associated with economically important phenotypic traits of livestock species. As the level of genetic diversity represents the raw materials for breed improvement, elucidating the genetic diversity of a population can provide insights for improving breeding strategies. ROH, HBD, and N e are useful tools for exploring genetic diversity, providing information about population demographics evolution over time, and predicting underlying genome architecture. The developed genome-wide goat SNP arrays have become the marker of choice for investigating underlying genetic diversity, inferring population demography, and mapping genomic regions subject to selection [36]. In this study, we examined the genome-wide runs of homozygosity, effective population size, and signature of positive selection in six Chinese goat populations using goat SNP 50 K chips.
Our findings showed that the ROHs are frequent across the populations. The distribution of ROHs per chromosome displayed a specific pattern: the greatest number was found in the first three chromosomes, the ROH number tended to decrease with decreasing chromosome length, and the smallest number was found on chromosome 28, consisting of 284 segments. Our results are consistent with those reported for Valle del Belice sheep [37]. The percentage of coverage per chromosome was quite different in the different populations, which suggests it may be breed-specific.
The higher sum of ROH length across the GF goats indicated their lower level of genetic diversity. In general, all breeds showed their majority of average ROHs in the 0-3 Mb length class, which is in agreement with the results obtained for Spanish goats [34]. A different ROH distribution pattern was noted for GF goats, which displayed a high mean ROH in the long length category (>30 Mb), which is indicative of demographic decline and recent inbreeding [38]. Thus, the accumulation of long ROHs in the genome of the GF breed enables them to carry deleterious mutations in homozygous form [39]. The highest value of F ROH in GF, suggesting high inbreeding in this goat breed, is the signature of the extensive use of few bucks within herds. Consequently, widespread mating between relatives occurred, which might have contributed to the high proportion of fixed alleles, resulting in low genetic diversity in the GF population. Attention should be paid to this goat breed to prevent a loss of goat genetic resources. A large contribution to autozygosity was observed in GF goats in the HBD class, tracing back to different generations, which indicates a reduced effective population size, possibly due to a bottleneck or the founder effect. A similar trend in autozygosity was observed for sheep [40]. The limited amount of autozygosity in JG and QG goat populations suggested a large N e in the past. We found a declining trend in N e from 1000 to 100 generations ago across the studied populations. This decreasing trend in N e might be associated with human migration with goats that subsequently led to breed formation. The most rapid decline in N e was observed over the last 100 generations in all breeds, suggesting that significant bottlenecks occurred at the time of domestication, breed formation, and selection within these breeds [41,42]. The elevated N e estimates in JG, LP, NJ, and QG in all generations compared with GF and LN might be due to population admixture within these breeds. Brito and Jafarikia [43] estimated the N e in Australian and Canadian goats and found a similar trend of declining N e over the past generations, and an accelerated decline in N e over the past 100 generations. Comparatively, a lower N e was found in GF and LN, with estimated values were 42 and 85, respectively, 13 generations ago, which could be due to intensive selection pressure or artificial insemination used to develop these breeds. N e values for Argentinian, French, and South African goat breeds were reported to be 57, 67, and 93, respectively, at 10 generations ago [44], which is consistent with our findings. The N e slope for GF and LN in Figure 10b indicates constantly decreasing population sizes for these breeds, suggesting that action is needed to maintain a sufficiently large N e .
All the pairwise F ST values between populations were statistically significant (p < 0.05). The pair-wise F ST values ranged from 0.02094 (NJ-QG) to 0.18172 (LN-LP), revealing the least differentiation between NJ and QG, whereas the highest differentiation was between LN and LP. In terms of genetic distance, NJ and QG are most closely related (D R = 0.02116), whereas LN and LP appear most distinct (D R = 0.20055), which is possibly due to shared common ancestry between NJ and QG and the geographical isolation of LP goats from other breeds. The LN is the most popular goat breed due to its cashmere trait and has experienced strong selection pressure. The LP breed is traditionally reared in the Yunnan province, in the southwest part of China. The Hengduan Mountains in between the Qinghai-Tibet Plateau and Yunnan-Guizhou Plateau have restricted livestock gene flow between Southwestern and Northern China. The results are similar to those reported for Ethiopian sheep breeds [45]. The AMOVA revealed that 89.61% of the genetic variation was within populations and 9.68% (p < 0.001) was between populations. This is a little higher than the within-population variation and slightly lower than between-population variation (87.86% and 11.86%) observed for South African, French, and Argentinian goat populations [44], and similar (89.83% and 7.49%, respectively) to South African and Italian goat breeds in terms of within breed variation, as reported by Nicoloso and Bomba [13].

Candidate Genes in the ROH Regions
In this paper, we do not discuss all the genomic regions associated with a high ROH frequency; we focused on some selected regions that showed associations with the reproductive traits in goats. We identified eight genes reported to be associated with the reproductive traits of goats (Table 6) of which two genes (MARF1 and SYCP2) overlapped with the genes identified by the selection signature. The GDF9 gene was identified on chromosome 7 of GF goats, which plays a crucial role in early folliculogenesis in female mammals [46]. GDF7, which is the member of the BMP family and is required for seminal vesicle development, was detected on chromosome 11 [47]. INHA, located on chromosome 2, was reported as a candidate gene for litter size in goats [48]. We identified another gene, MTHFSD, on chromosome 18 of LP goats, which was reported to be involved in the variation of litter size [49]. MARF1, which is essential for the development of competent oocytes for successful fertilization, was detected on chromosome 25 [50]. On chromosome 13 of GF goats, we identified the SYCP2 gene, which is required for maintaining normal male and female fertility [51]. Another two genes (TMEM200C and ADCY1), which we identified by selection signature detection, were also close to the ROH region and involved in reproductive functions.

Signature of Selection
For selection signature scanning, the six goat populations were divided into two groups based on their genetic pattern and fecundity rate collected from the records and literature. Phylogenic analyses supported the relatedness among the breeds of the high fecundity group and clearly separated the low from the high group. A strong genetic relationship was observed between NJ and QG as they are grouped closely in the neighbor-joining tree, indicating a high degree of gene flow due to being near to each other in geographic location. The tree also showed that the GF and JG breeds form a group in the same clade, suggesting that these goat breeds share common ancestry. This result is supported by the PCA, where JG and GF are clustered together. Population admixture analysis generated some signals of admixture and underlying genetic relationships among the populations. The high fecundity group included JG, GF, LP, and NJ (~200%), which have a distinct pattern, although the NJ goat was admixed with QG goat in the admixture result. The low fecundity group contained LN and QG (~100%), and the LN goat was separated from the others in all the phylogenetic analyses.
Selective sweep based on population differentiation (F ST ) and nucleotide diversity (π) for the high and low reproduction groups of goat breeds was detected. To minimize the false positive, we selected only the overlapping genes on the top 1% of F ST and Pi approaches. To find the high-quality candidate genes, we also scanned 0.1% empirical distribution of XP-EHH estimates and selected the overlapping genes under these three approaches as strong candidate genes. Candidate genes associated with reproduction were identified in different genomic loci under selective sweep. Locus 13,838,569 of chromosome 25 had the highest F ST value (F ST = 0.419, ZF ST = 6.129) and an elevated π ratio and XP-EHH (θπ ratio = 5.00, XP-EHH = 3.486; Table 5). Based on the genome annotation, this locus encompasses MARF1 (Meiosis arrest female 1), which is required for the process of controlling meiosis and the development of healthy offspring. Accurate completion of the meiotic process, cytoplasmic maturational events, and genomic integrity is essential for the production of oocytes suitable for fertilization and embryogenesis. Genetic control of these events is vital for successful reproduction. A previous study showed that MARF1 is essential for the development of fertilization competent oocyte, and mutations lead to female infertility in mammals [50]. TMEM200C (Transmembrane Protein 200C) was identified at locus 39,657,370 in chromosome 24 (F ST = 0.306, ZF ST = 4.281, θπ ratio = 2.409). TMEM200C is commonly involved in the communication of a cell with its external environment. A Genome-Wide Association Studies reported that an SNP in the proximity of the TMEM200C gene is significantly associated with the twinning rate in Maremmana cattle [52].The candidate gene SF1 (Steroidogenic factor-1), located at locus 43,352,393 on chromosome 29 (F ST = 0.305, ZF ST = 4.262, θπ ratio = 3.450, XP-EHH = 2.903), is required for the formation of gonads [53] and to maintain normal reproduction [54]. The distant locus 55,939,729 of chromosome 13 showed high differentiation (F ST = 0.329, ZF ST = 4.647, θπ ratio = 2.346) and was detected as a putative selection signal. This locus contains SYCP2 (Synaptonemal Complex Protein 2), which is a proteinaceous structure required for the normal meiotic fusion of oocytes and spermatocyte formation and to maintain normal male and female fertility. A study suggested that SYCP2L, which is the paralog of SYCP2, plays a significant role in the survival of oocytes and regulates reproductive aging in females [51]. We detected ADCY1 (Adenylate cyclase 1), which showed evidence for positive selection for reproduction and is located at locus 43,874,883 on chromosome 4 (F ST = 0.301, ZF ST = 4.193, θπ ratio = 2.829). The candidate gene ADCY1 encodes adenylate cyclase 1, which is involved in the formation of cyclic adenosine monophosphate (cAMP) cyclizing AMP, which subsequently partakes in oocyte meiotic arrest and suppresses its resumption [55]. BMP5 (Bone Morphogenetic Protein 5) was identified in the proximity of locus 44,074,688 (F ST = 0.286, ZF ST = 3.946, θπ ratio = 2.635) of chromosome 23 and is involved in the hippo signaling pathway, which plays a significant role in controlling follicular growth and ovulation in livestock [56]. BMP5 belongs to the BMP subfamily and plays a crucial role in ovarian folliculogenesis by enhancing proliferation of granulosa cells, which is associated with the growing female gamete in the mammalian ovary [57], indicating that this gene is more likely to be under selection. MARF1 and SF1 polymorphisms are noticeable as they are highly differentiated between the high-and low-reproduction groups under our three approaches used.
Functional enrichment analysis is used to explore a controlled vocabulary to extensively describe the characteristics of genes and gene products. According to our enrichment analysis results, several GO and pathways were directly or indirectly involved with the reproduction of animals, although some of the pathways could not pass the significance threshold (p < 0.05). The selected genes were mostly enriched in biological processes, including metabolic processes, regulation of biological processes, developmental processes, and reproductive processes. The most significant GO terms were found binding (GO: 0005488) in molecular functions and intracellular organelles (GO: 0043229) in cellular components. The pathway analysis revealed the involvement of selected genes in reproduction. The mitogen-activated protein kinase (MAPK) pathway has important roles in oocyte maturation and eventually in ovulation by regulating cyclic guanosine monophosphate production and its transportation to the oocyte [58,59]. Other enriched pathways, such as bile secretion (hsa04976), retrograde endocannabinoid signaling (hsa04723), the hippo signaling pathway (hsa04390), the oxytocin signaling pathway (hsa04921), and ovarian steroidogenesis (hsa04913), play significant roles in the processes of oocyte maturation, regulation of follicular growth, and ovulation in livestock [56]. Although some of the candidate genes were enriched in none of the significant pathways, these genes were deemed to be candidate genes for reproduction due to their biological functions and information from published studies.

Conclusions
In this study, we investigated the patterns of homozygosity and the signature of positive selection in six Chinese goat populations. Demographic decline, the highest level of inbreeding and severe founder effects are the signatures of an increased level of homozygosity in Guangfeng goat while population admixture is associated with the opposite effect in other goat breeds. The existence of two genes in the ROH regions coincided with the regions detected by the selection signal demonstrated that the homozygosity is not solely the result of demography instead of positive selection. Several genes were identified as candidate genes by F ST , π and XP-EHH algorithm of which MARF1 and SF1 genes are highly differentiated between high and low reproduction groups according to the three approaches used. The information about the genetic status and the identified genes under positive selection may be useful for further genetic improvement and setting conservation program for these important goat breeds. Further studies relying on high-density markers are necessary to evaluate the level of genetic autozygosity and signature of selection of the goat breed.