1. Introduction
Inbreeding is the mating between (close) relatives and is unavoidable in genetically small populations. The degree of inbreeding is typically measured with pedigree-based inbreeding coefficients, as introduced by Wright [
1]. Individuals with higher inbreeding coefficients show a lower phenotypic performance on average, a phenomenon known as inbreeding depression [
2,
3,
4]. Inbreeding depression occurs because part of the genetic load in populations, known as inbreeding load, is only expressed in homozygotes [
2]. Inbreeding depression is expected to be largely due to partial dominance, i.e., the existence of (partially) deleterious recessive alleles, although overdominance and epistasis may also play a role [
2,
3,
5].
Inbreeding load in a population is not constant, but rather dynamic over time. New deleterious recessive alleles arise continuously by mutation and these alleles are eroded over time by (natural and/or artificial) selection and genetic drift [
2]. Inbreeding increases the efficiency of selection against deleterious recessive alleles in a process called purging [
2,
6].
To test for the existence of purging in populations, various pedigree-based methods have been proposed [
7,
8,
9]. To test for purging in captive wildlife populations, Ballou [
7] introduced the ancestral inbreeding coefficient, which is the probability that a random allele in an individual has been previously exposed to inbreeding, i.e., that this allele has been identical-by-descent (IBD) in at least one ancestor. While investigating purging in the captive breeding program of the Speke’s gazelle (
Gazella Spekei), Kalinowski et al. [
8] extended Ballou’s concept by considering the IBD-status of the individual as well. In Kalinowski’s approach the total pedigree-based inbreeding coefficient is decomposed into an ancestral (
FANC) and a new (
FNEW) inbreeding coefficient. The
FANC is the probability that alleles are IBD in the individual while they were already IBD in at least one ancestor, whereas
FNEW is the probability that alleles are IBD for the first time in the individual’s pedigree [
8].
To calculate
FANC and
FNEW (and other inbreeding coefficients), a gene dropping based algorithm has been developed and implemented in GRain software [
10]. The GRain algorithm has also been incorporated in the PEDIG package [
11], in versions 2007 and later. Various studies have used the GRain algorithm, either in GRain itself [
12,
13,
14,
15,
16] or in PEDIG [
17,
18,
19], to calculate
FANC and
FNEW.
The objective of this study was to demonstrate that the previous version of GRain (v 2.1) produced biased estimates of FANC and FNEW. For several simple pedigrees, we show how FANC and can be calculated by hand. We also investigate the magnitude of the bias for two example data sets of Hungarian Pannon White rabbits and Dutch Holstein Friesian dairy cattle. A revised version of GRain software (v 2.2), which provides unbiased FANC and FNEW estimates, is now available online.
2. Calculation of Ancestral and New Inbreeding Coefficients by Hand
For simple pedigrees, Kalinowski’s ancestral inbreeding (
FANC,X) and new inbreeding (
FNEW,X) coefficients of an individual X can be calculated by hand. To do so, Mendelian inheritance principles are followed, meaning that each allele has a probability of 0.5 to be passed on from parent to offspring. First, Wright’s classical inbreeding coefficient (
FX) is determined. The
FX is defined as the probability that the two alleles at a random locus in individual X are IBD, and is calculated as [
1]:
where
n is the number of paths connecting the sire of X with the dam of X through the
ith common ancestor,
Fi is the inbreeding coefficient of the
ith common ancestor, and
ks and
kd are the number of generations from, respectively, sire and dam (included) to the
ith common ancestor (excluded). Then,
FANC,X is calculated as the probability that X is IBD for an allele, given that this allele was also IBD in at least one of the ancestors of X. Finally,
FNEW,X is obtained by subtracting
FANC,X from
FX, since the ancestral and new inbreeding sum up to the total inbreeding.
In
Figure 1, four example pedigrees are shown. The corresponding inbreeding coefficients are provided in
Table 1. In example (1), the
FX equals 0.0078 (0.5
7), because there is a single path that connects parents F and G through common ancestor A, which is of length 7 (
ks +
kd + 1 = 7), and ancestor A is non-inbred (
FA = 0). The
FANC,X for this example is 0, because none of the ancestors of X are inbred. Consequently,
FNEW,X is equal to
FX (so 0.0078).
In example (2), the FX equals 0.0703, because it is the inbreeding on ancestor D (0.54) multiplied with (1 + FD), where FD is the inbreeding coefficient of ancestor D (0.53). The FANC,X is calculated as the probability that X is IBD for an allele that was IBD in D as well. Since D is the only inbred ancestor, we do not need to consider the IBD status of any other ancestors. The probability that D is IBD for an allele from its grandparent A, is the inbreeding coefficient of D on A and equals 0.125 (0.53). To obtain FANC,X, this probability has to be multiplied with the probabilities that the allele is passed on to X, through both the paths D-E-F-X and D-G-X. The probability that E inherits the allele from D is simply 1, because D is IBD. The probability that F inherits the allele from E is 0.5 and that X inherits it from F is also 0.5, so the total probability for the path D-E-F-X is 0.25 (0.52). Similarly, the probability for path D-G-X is 0.5. This gives a total probability of 0.125 × 0.25 × 0.5 = 0.0156 for FANC,X. Consequently, FNEW,X = FX − FANC,X = 0.0703 − 0.0156 = 0.0547. Note that, in this example, the FANC,X can also be calculated as two times the inbreeding coefficient of X on D (0.54), multiplied with the inbreeding coefficient of D on A (0.53). However, it is important to realize that this reasoning only holds for scenarios in which one inbreeding loop is “on top of the other”, and not when there is an overlap in inbreeding loops, such as in examples (3) and (4).
In example (3), the FX equals 0.0390 and is the sum of inbreeding on ancestor A (0.57) and on ancestor B (0.55). The FANC,X is calculated as the probability that X is IBD for an allele that was IBD in ancestor E as well. Since ancestor E is the only inbred ancestor, we do not need to consider the IBD status of any other ancestors. The probability that E is IBD for an allele from its grandparent A, is the inbreeding coefficient of E on A and equals 0.125 (0.53). This probability has to be multiplied by the probability that this allele is passed on to X through both the path E-G-X and B-D-F-X. The probability that G inherits the allele from E is 1, because E is IBD. The probability that X inherits the allele from G is 0.5, so the total probability for the path E-G-X is 0.5. The probability that B carries the allele is 1, otherwise E could not have been IBD. The probability that the allele is passed on from B to D to F and to X is 0.125 (0.53). This gives a total probability of 0.125 × 0.125 × 0.5 = 0.0078 for FANC,X. Consequently, FNEW,X = FX − FANC,X = 0.0390 − 0.0078 = 0.0312.
In example (4), the FX equals 0.1641 and is the sum of inbreeding on ancestor A (0.57 + 0.55), on ancestor B (0.54) and on ancestor C (0.54). The FANC,X in this example is the probability that X is IBD for an allele that was also IBD in F and/or G (since F and G are inbred ancestors). The FANC,X is the sum of the probabilities for three scenarios: (i) X is IBD for an allele that was IBD in both F and G, (ii) X is IBD for an allele that was IBD in F, but not in G, and (iii) X is IBD for an allele that was IBD in G, but not in F. The probability that F is IBD for an allele from A is the inbreeding coefficient of F on A and equals 0.0625 (0.54). If F is IBD for an allele from A, then both B and C must be carriers of that allele, and the probability that G is also IBD for that same allele is 0.125 (0.53), since this is the probability that G inherits that allele through B-G (0.5) multiplied with the probability that G inherits that allele through C-E-G (0.52). When F and G are IBD for the same allele, X has to be IBD for that allele as well. Therefore, the probability that scenario (i) happens is 0.0078 (i.e., 0.0625 × 0.125 × 1). If F is IBD for an allele from A, the probability that G carries two other “unknown” alleles is 0.375 (i.e., 0.5 × (1 − 0.52)), leaving 1 − 0.125 − 0.375 = 0.5, for the probability that G carries one copy of the allele and one copy of an unknown allele (scenario ii). In that case, the probability that the allele is inherited by X from G is 0.5. The total probability for scenario (ii) is therefore 0.0156 (i.e., 0.0625 × 0.5 × 0.5). Due to the symmetry in the pedigree, the probability for scenario (iii) is equal to that of scenario (ii), so 0.0156. Thus, the total probability that X is IBD for an allele that was also IBD in F and/or G, i.e., the FANC,X, equals 0.0078 + 0.0156 + 0.0156 = 0.0391. Consequently, FNEW,X = FX − FANC,X = 0.1641 − 0.0391 = 0.1250.
3. Underestimation of Ancestral Inbreeding by Previous Version of GRain
In GRain, a stochastic approach known as gene dropping [
20] is implemented to calculate inbreeding coefficients. In this approach, many independent simulations are run. In each simulation, alleles are dropped through the pedigree following Mendelian inheritance rules, and the IBD-status of individuals is stored. After all simulations are completed, the
FX is estimated as the fraction of simulations in which the alleles of individual X were IBD. Similarly, the
FANC,X is calculated as the fraction of simulations in which X was IBD for an allele that was already IBD in one of the ancestors of X. The accuracy of the estimated inbreeding coefficients is higher when more simulations are run. As shown by Baumung et al. [
10], using 10
6 simulations provides estimates of inbreeding coefficients that show a correlation of >0.999 with inbreeding coefficients calculated using a deterministic approach (with only minor differences at the fourth decimal). A more detailed explanation of the GRain program and its computational demands is given by Baumung et al. [
10].
When
FANC,X was computed using the previous version of GRain (v 2.1), the
FANC,X for examples (1), (2), (3) and (4) from
Figure 1 equaled 0, 0.0156, 0.0039 and 0.0234, respectively (
Table 1). Although the coefficients for examples (1) and (2) were correct, the
FANC,X coefficients for examples (3) and (4) were underestimated. Note that example (3) is equivalent to the example used by McParland et al. [
18], in
Figure 1 in their paper, for which they reported the incorrect
FANC,X estimate of 0.0039.
The underestimation of FANC,X was occasionally caused by an incorrect tracking of IBD-status of ancestors throughout the pedigree. In the previous version of GRain (v 2.1), every individual was given a flag that indicated whether one of their ancestors had been IBD (1 if true, 0 if false). This flag was calculated as the sum of the flags of the parents, divided by two. Thus, when both parents had a flag of 1, the flag of the offspring would also be 1, which is correct. However, when only one of the parents had a flag of 1 (and the other 0), the offspring would get a value of 0.5, which is incorrect (since it should be 1). In the revised version of GRain (v 2.2), this issue was solved by obtaining the flag of an offspring as the maximum of the flags of its parents.
To clarify, in example (2) in
Figure 1, whenever ancestor D was IBD, both parents F and G had a flag of 1 and X also got a flag of 1. Therefore, the
FANC,X was estimated correctly. In example (3), however, whenever ancestor E was IBD, parent G had a flag of 1 and parent F had a flag of 0 and, as a result, X got a flag of 0.5. Consequently, for simulations in which individual X was IBD for an allele that was also IBD in E, a value of 0.5 was stored (instead of 1) for the
FANC,X calculation. After simulations were completed, the stored values were summed across simulations and divided by the total number of simulations. Since stored values were underestimated by a factor two, the
FANC,X for example (3) was also underestimated by a factor two. In example (4), whenever both F and G were IBD, X got a flag of 1. This happened in 0.0078 of the simulations (see explanation in the previous section for calculation by hand, scenario (i)). When only parent F or parent G were IBD, while the other parent was not, X got a flag of 0.5. This happened in 0.0156 + 0.0156 = 0.0312 of the simulations (see explanation in the previous section for calculation by hand, scenarios (ii) and (iii)). Therefore, the
FANC,X for example (4) was underestimated by some factor between one and two. More specifically, the underestimated
FANC,X was equal to 0.0078 + (0.5 × 0.0312) = 0.0234.
4. Examples for Pannon White Rabbits and Holstein Friesian Cattle
To investigate the impact of the incorrect estimation, we computed
FANC and
FNEW for two example data sets, using both the previous and revised version of GRain, and 10
6 simulations. The first data set was a pedigree of 22,781 rabbits of the Hungarian Pannon White (PW) breed. This pedigree included 6760 rabbits (1421 bucks and 5339 does) with offspring and 16,021 rabbits without offspring. All rabbits were born between 1992 and 2016. To assess pedigree completeness, the number of complete generations (NCG) and the complete generation equivalent (CGE) were computed for each rabbit. The CGE was computed as the sum of (1/2)
n of all known ancestors of an individual, with
n being the number of generations between the individual and a given ancestor. The mean NCG in the PW pedigree was 4.0 (ranging from 0 to 10) and the mean CGE was 8.6 (ranging from 0 to 22.1). The second data set contained 37,061 Dutch Holstein Friesian (HF) cows, which were part of a larger pedigree of 167,924 individuals (19,363 bulls and 148,561 cows) and were used by Doekes et al. [
21]. These HF cows were born between 2012 and 2016 and were filtered to have a high pedigree completeness (NCG ≥ 3 and CGE ≥ 10), and have phenotypic information on 305-day milk, fat and protein yields. The mean NCG in these HF cows was 6.5 generations (ranging from 3 to 9) and the mean CGE was 12.5 generation equivalents (ranging from 10.0 to 14.7). More details on the HF data set can be found in Doekes et al. [
21].
For both the PW and HF data set, the total inbreeding coefficients (
) were identical across the previous and revised version of GRain. The
FANC in the previous version however, was generally underestimated and the
FNEW was overestimated (
Figure 2). For the PW data set and for inbreeding coefficients above zero, the
FANC from the previous version was on average 0.65 times the revised
FANC (and the
FNEW was 1.27 times the revised
FNEW). For the HF data set and inbreeding coefficients above zero, the
FANC from the previous version was on average 0.71 times the revised
FANC (and the
FNEW was 1.36 times the revised
FNEW). Pearson correlation coefficients between coefficients estimated with the previous and revised version were high. For the PW data set, the correlations between the previous and revised version equaled 0.997 and 0.968 for
FANC and
FNEW, respectively. For the HF data set, these correlations equaled 0.993 and 0.987, respectively. This indicates that the underestimation of
FANC (and overestimation of
FNEW) did not strongly affect the ranking of animals.
For the Holstein Friesian data set, we also investigated the potential differences in inbreeding depression estimates for
FANC and
FNEW, calculated with the previous and revised version of GRain. A linear mixed model was run in ASReml 4.1 [
22], in which
FANC and
FNEW were fitted as fixed effects and the regression coefficients on
FANC and
FNEW were used as estimates of inbreeding depression (see Doekes et al. [
21] for a detailed explanation). In general, differences between inbreeding depression estimates based on the previous and revised version of GRain were small (
Figure 3). For example, the effect of a 1% increase in
FNEW on 305-day milk yield was −46.4 kg (SE = 12.4 kg) for the previous version and −47.3 kg (SE = 11.2 kg) for the revised version. Standard errors for the inbreeding depression effects appeared smaller when the revised version was used to estimate
FANC and
FNEW, compared to when the previous version was used. For example, the mean standard error of inbreeding depression estimates for fat and protein yields was 0.51 kg for the revised version, and 0.67 kg for the previous version. The overall conclusion, that
FNEW was associated with significant inbreeding depression, while
FANC was not, was the same for both versions. Based on these findings, we expect that conclusions from other studies using
FANC and
FNEW estimates from GRain v 2.1 (e.g., [
17,
18]) will also largely hold. However, they should be interpreted with caution.