Suitability of Pedigree Information and Genomic Methods for Analyzing Inbreeding of Polish Cold-Blooded Horses Covered by Conservation Programs

Traditionally, pedigree-based relationship coefficients were used to manage inbreeding and control inbreeding depression that occurs within populations. The extensive incorporation of genomic data in livestock breeding creates the opportunity to develop and implement methods to manage populations at the genomic level. Consequently, the realized proportion of the genome that two individuals share can be more accurately estimated instead of using pedigree information to estimate the expected proportion of shared alleles. To make use of this improvement, in this study we evaluated the genomic inbreeding measures in the Polish conserved cold-blooded horse population and compared the data with the traditional measures of inbreeding. Additionally, an ancestry fractions/proportions from Admixture software were tested as an estimate of lineage (ancestry coefficient) used for horses qualifying for the conservation program. The highest correlation of pedigree-based (FPED) and genomic inbreeding estimates was found for FROH (runs of homozygosity-based F coefficient) and FUNI (F coefficient based on the correlation between uniting gametes). FROH correlation with FPED tended to increase as the number of generations registered as pedigree increased. While lineage and gene contributions (Q) from Admixture software correlated, they showed poor direct compliance; hence, Q-value cannot be recommended as the estimate of pedigree-based lineage. All these findings suggest that the methods of genomics should be considered as an alternative or support in the analysis of population structure in conservative breeding that can help control inbreeding in rare horse populations.


Introduction
The accuracy of inbreeding estimations based on pedigree information depends on the precision of breeding documentation. However, mistakes often appear due to imprecise records and limited knowledge of the origin. Ancestry information is difficult to obtain even for some registered animals because (1) one or more paths of their pedigrees may trace to or through foreign herd books, and (2) early ancestry information may not be recorded electronically [1]. Research has often ignored these problems and assumed that animals are not inbred and are unrelated if ancestry is unavailable [1]. Therefore, there is a need to develop a method that will allow a more accurate determination of the degree of inbreeding, and thus the risk status. This is especially important in small local populations possessing unique and valuable features.
In recent years, the results of genomic and pedigree-based studies have been compared to estimate inbreeding, especially for populations in danger of extinction. Kardos et al. [2] used computer simulations to test whether the realized proportion of the genome that is identical by descent is predicted better by the pedigree inbreeding coefficient or by genomic measures of inbreeding. Results show that marker-based measures of IBDG are substantially more precise and often less biased than FP. Ablondi et al. [3] evaluated the loss of genetic variability in the Bardigiano breed, based on linkage disequilibrium and provided the first genome-wide scan of genetic diversity and selection signatures in an Italian native horse breed, using the average inbreeding based on runs of homozygosity (ROH). Other work, completed by Mancin et al. [4], investigated the genetic diversity in the Italian Heavy Horse breed by using pedigree and genomic data. Pedigree information allowed reliable estimations of inbreeding values, resulting in medium to high correlations with genomic inbreeding. Sciavo et al. [5] analyzed the distribution of ROH in pig breeds, and found that ROH better captured inbreeding information in the analyzed breeds and could complement pedigree-based inbreeding coefficients for the management of these genetic resources.
Cold-blooded horses appeared in Poland in the second half of the 19th century. It was related to the economic exigence and associated with the import of sires from Western Europe. The main breeds included Ardennes, Belgian and Breton, while the less used breeds included North Swedish, Russian, Døle and Mur-insulan, as well as single documented cases of Fjord and Canadian horses and Boulonnais and Jutland stallions [6,7]. In the first years of the 20th century the Sokólski Center was established in the north-east region of Poland, where Breton and Norfolk-Breton stallions were allowed to mate [8]. Another center was created for Sztumski horses in East Prussia. There, cold-blooded mares were mated with German (Rhine-Belgian), French and Belgian stallions, contributing to the creation of the heaviest type of cold-blooded horses in Poland. In the first volume of the Polish stud book, edited in 1964 [9], there were 167 mares and 147 stallions of the Sokólski type, as well as 268 mares and 204 stallions of Sztumski type, which are breeds that are still present in horse pedigrees. According to Chrzanowski [10], in the 1970s there were about 600 lines established by Ardennes, Belgian and Breton stallions located in north-east Poland.
The pedigree information of these populations can be incomplete as the divisions into local breeds were removed in the second volume of the stud book [11]. All individuals were identified as Polish draft horses, which meant that local breeds of cold-blooded horses did not exist officially for almost 50 years. Currently, in the conservation programs, introduced in 2008 [12,13] the main objectives have been to maintain the genetic variability.
In the genetic resources conservation program, only horses that have been entered in the stud book of the Polish draft horse, have an appropriate pedigree and have a conformation that conforms to the breed standard can participate. In the gene pool of Sztumski and Sokólski horses the most important contribution has come from Ardennes, Polish cold-blooded (z) and unknown (NN) horses, followed by Belgian horses. This influenced a relatively high number of founders, namely 1139 for Sztumskis and 1118 for the Sokólski population, with the effective number of founders of 156.9 and 111.4, respectively. Additionally, the data indicates that only one third (35.8% Sztumski; 38.1% Sokólskie) of the gene pool in both populations comes from local horses, and as much as two thirds from imports [14]. It is known that there was a relationship in the group of imported stallions, especially Ardennes, but it was not displayed in the database, and thus their inbreeding was considered equal to zero. This means that the volume of inbreeding could be underestimated. The analysis of the mean coefficient of inbreeding indicated 1.54 for Sztumski and 1.56 for Sokólski horses [14], with a slight upward trend, not exceeding 0.29-0.31%/year and ranging, for single animals, between 0% and 32.7%. The main group of Sztumski (71%) and Sokólski (77%) horses accounted for the lowest range of inbreeding, from ≥0 to 3.13%. Only 2% of the horses with an inbreeding coefficient exceeding 10% were found. The high level of inbreeding often resulted from the incestuous mating between parents and progeny or from the mating of half-siblings [14].
The generation intervals for both populations were similar-7.46 years for Sztumski and 7.24 for Sokólski horses. The pedigree completeness for five generations was respectively 98.79 and 98.24 and for all known generations, namely 16, was 51.11% and 45% [14]. From the fourth generation back they appeared as imported horses, the pedigrees of which, according to the assumptions, were not entered into the database and these foreign horses were considered as founders.
The aim of this study is to assess the concordance and mutual relationships between pedigree-based and single nucleotide polymorphism (SNP)-based inbreeding coefficients as well as between ancestry coefficients calculated using pedigree and genomic data. The results of this analysis will give a new toolset for the horse resources conservation program in Poland and will allow the assessment of the reliability and usefulness of genomic data for the maintenance of conserved horse populations.

Materials and Methods
In this study we compared and analyzed in detail the inbreeding estimates calculated based on pedigree data (F PED ) and so called genomic inbreeding (realized inbreeding) measures expressed by four different coefficients.
The data for the pedigree analysis came from the reference population, which consisted of 1694 Sokólski horses and 2042 Sztumski horses participating in genetic resources conservation programs in 2020 (Table 1). The reference population is the one for which we defined the parameters, i.e., inbreeding coefficient. In this case they are the population covered by the conservation programs in 2020, according to the information contained in Table 1. The population of horses born between 1991 and 2018 is also the entire population of Sztumski and Sokólski horses participating in the protection programs. They are all simultaneously existing horses while also belonging to these populations. The pedigree data used in this study was provided by the Polish Horse Breeders Association. All the available pedigree information was entered into the Bio_konie horse database of the National Research Institute of Animal Production. The data set consists of 6531 pedigree horses born between 1991 and 2018. In total, in the database there was collected information on 30,331 horses that were (a) participating in conservation programs from 2008 to 2020 and (b) all their available ancestors. The pedigrees of 3736 horses of the reference population were traced back to the earliest recorded ancestors. It was assumed that the foreign animals (imported stallions and mares), which were found in the pedigrees, were considered as the founders, and their ancestors were not included in the database. Ancestors born in Poland of unknown origin (NN) were also considered as founders. The inbreeding coefficient was computed using a model developed by Meuwissen and Lou [15].
The study of genomic inbreeding was based on 175 cold-blooded horses, belonging to two types, namely Sokólski (n = 106) and Sztumski (n = 69), which were previously analyzed for genetic differentiation [15] and selection signatures [16].
Genomic inbreeding was evaluated based on Neogen Equine Community array (Illumina, San Diego, CA, USA) data, including genotypes for 65,157 SNPs with an average inter-marker distance in EquCab2.0 genome of 36.3 kb. SNP data were filtered as previously described [16]. In brief, the filters included a MAF (minor allele frequency) threshold of 5% and <20% of missing genotypes in the whole studied population. Additionally, SNPs with critical p-values for Hardy-Weinberg equilibrium (HWE) <1.0 × 10 −6 in each breed separately were excluded. The final SNP panel of 52,023 markers was scattered across the horse genome (EquCab2.0; https://www.ncbi.nlm.nih.gov/assembly/GCF_000002305.2/) with an average inter-marker distance of 43.0 kb. For ROH detection, the filtered SNPs set was remapped to the EquCab3.0 (https://www.ncbi.nlm.nih.gov/assembly/GCF_002863 925.1/) genome assembly using the UCSC Genome Browser and Lift Genome Annotations tool. This removed another 109 SNPs that did not map to the newer assembly. Based on these filtered data, four inbreeding measures were calculated, similarly as in our previous study [16,17]: (i) the usual variance-standardized relationship minus 1 (F GRM ), (ii) methodof-moments F coefficient estimate (similar to F IS ), (iii) F coefficient based on the correlation between uniting gametes (F UNI ) [18], and (iv) runs of homozygosity (ROH)-based coefficient (F ROH , including ROH segments with lengths above 1 Mb) [19] using Plink v1.90b4 software [20]. The first three coefficients were calculated using Plink -ibc command, while the ROH-based coefficient was calculated by identification of ROH, covering a minimum of 30 SNPs, as in our previous study in cattle [21] and evaluation of a genome portion covered by ROH as proposed by McQuillan et al. [19].
Comparison among various measures of inbreeding was made using Spearman's rank correlation (rho) coefficients following data distribution evaluation with the Shapiro-Wilk test. Statistical significance of differences in F coefficient between two analyzed horse types was assessed using an ANOVA test. Statistical testing was done using JASP software [22].
Moreover, a Q-value (ancestry fractions/proportions) from Admixture software [23], calculated as described in [15], were compared (using correlation analysis) with linage data used for horses qualifying for the conservation program. Lineage data were calculated from the proportions of the desirable ancestors, (e.g., in the Sokólski type, originating from ancestors born in a historical region and factors of the type-Breton and Ardennes horses), in relation to undesirable in the pedigree (Sztumski horses, Belgian and their derivatives, Fjords, Mur-insulan and other breeds not involved in the creation of Sokólski horses). This was done to evaluate whether Q-value can be used instead of lineage data in the conservation program. Similarly, for the Sztumski type, the percentage of the pedigree was estimated as the proportion of ancestors born in the historical region and the factors of the type (Ardennes, Belgian horses and their derivatives) to the undesirable ancestors, Breton, Sokólski horses, Fjords, Mur-insulins and other breeds not involved in the creation of Sztumski horses.

Pedigree Data Analysis
The distribution of the inbreeding coefficient in the reference population of the Sztumski and Sokólski cold-blooded horses, estimated on the basis of the pedigree information (F PED ), is presented in Figure 1 and in the analyzed population in Figure 2.
The mean inbreeding coefficient was 0.0159 for the Sokólski population and 0.0155 for the Sztumski population.
The inbreeding distribution of the sample from the population (175 horses were included in the genomic analysis) is very similar to the distribution in the entire population of Sztumski and Sokólski cold-blooded horses ( Figure 1). Nevertheless, in the range of values from zero to <0.0313% the proportions are reversed-there is a greater share of Sztumski individuals and there are no non-inbred individuals. , x FOR PEER REVIEW 5 of 12  The pedigree completeness index per each generation show that in the first two generations, 100% of the ancestors are present. In the third and next generation there is an increasing loss of ancestors, most often due to the fact that they are individuals of foreign breeds whose pedigrees, with the assumptions, are not entered into the database.   The pedigree completeness index per each generation show that in the first two generations, 100% of the ancestors are present. In the third and next generation there is an increasing loss of ancestors, most often due to the fact that they are individuals of foreign breeds whose pedigrees, with the assumptions, are not entered into the database. The pedigree completeness index per each generation show that in the first two generations, 100% of the ancestors are present. In the third and next generation there is an increasing loss of ancestors, most often due to the fact that they are individuals of foreign breeds whose pedigrees, with the assumptions, are not entered into the database.
The number of available ancestors' data in our pedigree information is presented in Figure 3 and the increase of inbreeding in subsequent generations depending on the availability of pedigree information is given in Figure 4.
The visible decrease in the number of pedigree information from the sixth generation results from the appearance of numerous imported ancestors-founders, i.e., whose data was not entered into the database (Figure 4). It is also indicated by a decrease in the inbreeding coefficient going back to earlier generations ( Figure 5). The number of available ancestors' data in our pedigree information is presented in Figure 3 and the increase of inbreeding in subsequent generations depending on the availability of pedigree information is given in Figure 4.  The visible decrease in the number of pedigree information from the sixth generation results from the appearance of numerous imported ancestors-founders, i.e., whose data was not entered into the database (Figure 4). It is also indicated by a decrease in the inbreeding coefficient going back to earlier generations ( Figure 5).

Genomic Analysis and Its Comparison within Pedigree Data
The analyzed horse population included 175 animals born between 2008 and 2018 and with a high number of generations (between 11 and 15) registered in the pedigree data ( Supplementary File 1, Figures S1 and S2). While analyzing inbreeding coefficients changes across birth years, we found that only FPED data showed (expected in breeding populations) an increasing trend, while genomic measures rather showed a declining tendency (Supplementary File 1, Figure S3). availability of pedigree information is given in Figure 4.  The visible decrease in the number of pedigree information from the sixth generation results from the appearance of numerous imported ancestors-founders, i.e., whose data was not entered into the database (Figure 4). It is also indicated by a decrease in the inbreeding coefficient going back to earlier generations ( Figure 5).

Genomic Analysis and Its Comparison within Pedigree Data
The analyzed horse population included 175 animals born between 2008 and 2018 and with a high number of generations (between 11 and 15) registered in the pedigree  While taking into account the number of generations registered in the horse pedigrees (pedigree depth), correlations of FPED with genomic data tended to increase along with the number of registered generations ( Figure 6). This correlation was the highest for animals with 15 registered generations and their FROH estimates (0.673; p = 0.028) (Supplementary File 3). Another analyzed genetic parameter was lineage (Table 5) which we correlated with Q-value (ancestry fractions/proportions) obtained from Admixture software. These data correlated with the medium value of rho = 0.385 (p < 0.001) (Supplementary File 4) and showed rather poor concordance for horses with outlying or extreme values (Figure 7).

Genomic Analysis and Its Comparison within Pedigree Data
The analyzed horse population included 175 animals born between 2008 and 2018 and with a high number of generations (between 11 and 15) registered in the pedigree data ( Supplementary File 1, Figures S1 and S2). While analyzing inbreeding coefficients changes across birth years, we found that only F PED data showed (expected in breeding populations) an increasing trend, while genomic measures rather showed a declining tendency (Supplementary File 1, Figure S3).
When compared to F PED , genomic measures of inbreeding either under (F GRM ) or overestimated the population inbreeding ( Table 2, Supplementary File 1-Figure S3). F PED values ranged from 0.2 to 13%. This interval was similar to that observed for F ROH (2-14.9%) and F UNI (2-5.1%) coefficients. Negative inbreeding values (outbred) were observed exclusively for F GRM , suggesting the presence of some statistical artifacts. Comparison of all F coefficients among the analyzed horse types (SOK and SZTUM) did not show any statistically significant differences (Table 3). Similarly, no significant differences were observed between males (n = 21) and females (n = 154) within both breeds. The analysis of correlation coefficients among all studied F measures in the whole population showed the strongest relationship between F ROH and F UNI (rho = 0.919; p < 0.001) and the weakest between F PED and F GRM (0.351; p < 0.001). F PED showed the strongest correlation with F ROH and F UNI with Spearman's rank of 0.443 and 0.430 (p < 0.001), respectively (  While taking into account the number of generations registered in the horse pedigrees (pedigree depth), correlations of F PED with genomic data tended to increase along with the number of registered generations ( Figure 6). This correlation was the highest for animals with 15 registered generations and their F ROH estimates (0.673; p = 0.028) (Supplementary File 3).   Another analyzed genetic parameter was lineage (Table 5) which we correlated with Q-value (ancestry fractions/proportions) obtained from Admixture software. These data correlated with the medium value of rho = 0.385 (p < 0.001) (Supplementary File 4) and showed rather poor concordance for horses with outlying or extreme values (Figure 7).

Discussion
In this study we analyzed and performed various comparisons between different measures of inbreeding, including pedigree-based coefficient and genomic measures of inbreeding. It is well known that the mating of related individuals results in the inbreeding of an offspring [24]. In closed, selected populations, increase in inbreeding is

Discussion
In this study we analyzed and performed various comparisons between different measures of inbreeding, including pedigree-based coefficient and genomic measures of inbreeding. It is well known that the mating of related individuals results in the inbreeding of an offspring [24]. In closed, selected populations, increase in inbreeding is inevitable and increasing inbreeding reduces genetic variation that can lead to so called "inbreeding depression" [25]. The individual inbreeding coefficient (F) is defined as the proportion of an individual's genome that is autozygous, that has homozygous "identical by descent" (IBD) status, or equivalently the probability of a randomly sampled locus in the genome that could be autozygous. Traditionally, inbreeding coefficients are calculated from pedigree data (F PED ) using methodology proposed by Wright [26] or more recently by the method of [27]. When pedigrees are not available or their depth or quality are poor, inbreeding coefficients can be derived from genotypic data, exploring, for example, the difference between observed and expected multilocus heterozygosity [28]. Inbreeding measures can for example be estimated by the maximum likelihood approaches [29], by methods-ofmoment [30], from the diagonal elements of a genomic relationship matrix (GRM) [27], from simple heterozygosity or homozygosity measures [31], based on genotypic correlations [28] or from the proportion of the genome within ROH [19,28]. A higher level of inbreeding, that is, the proportion of genome that is IBD, brings more chance for expression of homozygous recessive deleterious alleles. These are considered to be the main cause of inbreeding depression, which reduces the fitness of animals and results in inter alia deterioration of fertility [31]. To avoid inbreeding depression, accurate and reliable estimation of inbreeding is important, especially in native and conserved populations in which a limited number of individuals are being currently used for mating or for populations that were reconstructed with a limited number of founders.
Recently, an inbreeding coefficient derived from ROH was developed and shown to be optimal for the estimation of genome-wide autozygosity and for detecting inbreeding effects [19,28,32,33]. This was confirmed in our data in which F ROH showed the strongest correlation with F PED data. In this analysis we used F calculated based on ROH with a length >1 Mb. This was because we wanted to capture a total animal autozygosity, including relatively distant (ancient) incidents of inbreeding which often cannot be shown in the pedigree analysis due to the lack of information. ROH are defined as contiguous homozygous regions in the genome where the two haplotypes inherited from the ancestors are identical by descent [31]. Previous studies [33], reported that the ROH segments of 2-4 Mb represent the inbreeding of distant generations of the ancestors (13-25 generations ago), which cannot usually be captured using pedigree information. The ROH segments > 8 Mb represent the proportion of autozygosity originated from ancestors that were born 6-7 generations ago and the ROH longer than 16 Mb reflect ancestors that were born 3-6 generations ago [34]. This was also visible in our data in which the correlation coefficient between F PED and F ROH tended to increase along with the increase of the number of generations registered in pedigree data. Similar observations were also made by other authors [33] as well as in our previous study [35]. The F ROH values presented in this study are visibly lower than ones presented in our previous work [11]. This was raised as a result of SNPs re-mapping to the newest currently available horse genome assembly (EquCab3.0.). The number of ROH detected after re-mapping was clearly reduced mainly in the shortest ROH length categories, which are the most commonly identified as false positives [32,36]. However, a detailed description of this observation will be presented in our upcoming study. Similarly, moderate correlation was found in our data between F UNI and F PED . F UNI is defined as a correlation between genetic effects that gives more weight to homozygosity at rare alleles [37]. F UNI is directly related to the definition of Wright [26]. It was found that in scenarios of large population sizes, such as in human populations, F UNI can be an appropriate inbreeding measure to estimate inbreeding depression, whereas in scenarios of small population sizes, F ROH may be more appropriate [28,38]. While F UNI appears to be simpler to calculate than F ROH , it can be also considered as a useful measure of inbreeding in conserved horse populations. Unexpectedly low correlations were found between F GRM and F ROH , especially when F GRM belongs to the same group of methods as F UNI , and is based on covariances between genetic effects [28].

Conclusions
In this study, based on a conserved horse population, we showed the usefulness of genomic data to assess population inbreeding with relatively high conformance to pedigreebased estimates. We confirmed previous observations that F ROH is the most similar to the pedigree-base estimates measure of inbreeding and we demonstrated its ability to reflect an ancient inbreeding. We also presented F UNI as a congruent measure of inbreeding, with a similar range and values as F PED and F ROH . While lineage and gene contributions from Admixture software correlated, it showed poor direct compliance, hence Q-value cannot be recommended as the estimate of pedigree-based lineage. All these findings suggest that the methods of genomics should be considered as an alternative or support in analysis of population structure in conservative breeding and can help in the control of inbreeding in rare horse populations.
Supplementary Materials: The following are available online at https://www.mdpi.com/2073-442 5/12/3/429/s1, Table S1. Mean inbreeding coefficients by the year of birth in the whole population, Table S2. Spearman's rank correlation coefficients among different measures of inbreeding depending on the number of generations registered in the pedigree, Figure S1  Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare that they have no conflict of interest.