How Depressing Is Inbreeding? A Meta-Analysis of 30 Years of Research on the Effects of Inbreeding in Livestock

Inbreeding depression has been widely documented for livestock and other animal and plant populations. Inbreeding is generally expected to have a stronger unfavorable effect on fitness traits than on other traits. Traditionally, the degree of inbreeding depression in livestock has been estimated as the slope of the linear regression of phenotypic values on pedigree-based inbreeding coefficients. With the increasing availability of SNP-data, pedigree inbreeding can now be replaced by SNP-based measures. We performed a meta-analysis of 154 studies, published from 1990 to 2020 on seven livestock species, and compared the degree of inbreeding depression (1) across different trait groups, and (2) across different pedigree-based and SNP-based measures of inbreeding. Across all studies and traits, a 1% increase in pedigree inbreeding was associated with a median decrease in phenotypic value of 0.13% of a trait’s mean, or 0.59% of a trait’s standard deviation. Inbreeding had an unfavorable effect on all sorts of traits and there was no evidence for a stronger effect on primary fitness traits (e.g., reproduction/survival traits) than on other traits (e.g., production traits or morphological traits). p-values of inbreeding depression estimates were smaller for SNP-based inbreeding measures than for pedigree inbreeding, suggesting more power for SNP-based measures. There were no consistent differences in p-values for percentage of homozygous SNPs, inbreeding based on runs of homozygosity (ROH) or inbreeding based on a genomic relationship matrix. The number of studies that directly compares these different measures, however, is limited and comparisons are furthermore complicated by differences in scale and arbitrary definitions of particularly ROH-based inbreeding. To facilitate comparisons across studies in future, we provide the dataset with inbreeding depression estimates of 154 studies and stress the importance of always reporting detailed information (on traits, inbreeding coefficients, and models used) along with inbreeding depression estimates.


Introduction
Inbreeding depression refers to the decrease in mean phenotypic value with increased levels of inbreeding [1,2]. The phenomenon of inbreeding depression was already documented in the 19th century by Charles Darwin, who studied 57 plant species and observed that the offspring of self-fertilized plants were shorter, weighed less, flowered later, and produced fewer seeds than the offspring of unrelated plants [3,4]. Since Darwin's time, inbreeding depression has been documented for a wide range of plant and animal species and for both wild and domestic populations [5][6][7][8][9][10].
Inbreeding depression is caused by an increase in homozygosity associated with inbreeding, which reduces the expression of dominance effects [1,2]. When dominance effects are on average favorable (i.e., when there is directional dominance in the favorable direction), the reduced expression of dominance effects results in a decrease in mean phenotypic value. In the absence of epistasis, the expected decrease in mean phenotypic

Comparison across Traits and Trait Groups
To enable comparison across traits and trait groups, b-values were scaled by dividing them by the trait mean (to obtain b m ) or trait standard deviation (SD; to obtain b s ). Not every study reported trait means and SDs. Consequently, the number of available estimates for b m and b s equaled 2094 and 1519, respectively. For the comparison across traits and trait groups, SNP-based estimates were excluded (n = 257 for b m and n = 255 for b s ). In addition, outliers that were more than 3 SDs away from the mean were excluded (n = 19 for b m and n = 5 for b s ). After these edits, 1818 and 1259 pedigree-based estimates remained for b m and b s , respectively. Descriptive statistics for b m and b s were calculated in R [184]. Skewness and kurtosis were determined with the package "moments" [185]. The following model was then applied: where Y ijk was the inbreeding depression estimate (either b m or b s ); POPULATION i was the effect of the ith population; TRAIT_GROUP j was the effect of the jth trait group; and E ijk was the error term. Population was defined as the combination of study and breed. Study and breed were combined into a single effect, because many studies focused on a single breed, and breeds were often investigated in a single study. The model was run with the "glm" function in R [184]. Estimated marginal means (EMMs), also known as least square means, were obtained for the different trait groups with the function "emmeans" from the package "emmeans" [186]. Pairwise comparisons between EMMs were performed with the function "pwpm" in emmeans [186], which applies Tukey-Kramer's procedure to account for multiple testing [187,188]. To study potential differences between individual traits, the same model as above was used, but with TRAIT j instead of TRAIT_GROUP j . In the latter analysis, traits with less than 10 records were excluded.

Comparison across Inbreeding Measures
For the comparison across inbreeding measures, the following measures were considered: pedigree inbreeding (F PED ), inbreeding based on ROH (F ROH ), inbreeding derived from the diagonal of a GRM (F GRM ), inbreeding derived from the diagonal of a GRM computed with allele frequencies of 0.5 (F GRM0.5 ), and the percentage of homozygous SNPs (HOM). Note that F GRM0. 5 and HOM are equivalent, except for a difference in scale, with F GRM0.5 = 2HOM − 1 (Appendix A). Measures based on GRMs were combined into F GRM , no matter which GRM was used. Measures based on ROH were considered as F ROH , regardless of the criteria used to identify ROH (which can vary substantially, e.g., [189]).
Comparisons were made within studies and within traits. There were 12 studies that reported b-values for at least two of the above-mentioned inbreeding measures [129,134,138,146,155,157,160,168,173,177,181,183]. These studies all reported trait means and SDs, which were used to calculate b m and b s . The dataset used for the comparisons is provided in the Supplementary Materials (Table S2). A direct comparison of b m and b s is inappropriate, because of scale differences between F PED , F ROH , F GRM , F GRM0.5 , and HOM [26]. For example, HOM measures the probability of alleles being "identical by state" (IBS) and typically has a mean of 60 to 70% and a SD of 1 to 2%, whereas F PED measures the probability of alleles being "identical by descent" (IBD) with reference to the founder generation and typically has a lower mean and larger SD than HOM (Table S2). To account for such differences, b m and b s were standardized by multiplying them by the SD of the inbreeding measure in the corresponding population, so that they were expressed per 1 SD increase in inbreeding (rather than per 1%). For one of the twelve studies [134], scaling was not possible because SDs of inbreeding measures were not reported. This study was therefore excluded from the comparison across inbreeding measures.
Significance of b-values was also compared across inbreeding measures. Test statistics were calculated as b se(b) 2 and these test-statistics were compared to a chi-square distribution with one degree of freedom to obtain approximate p-values (following the Wald test). Smaller p-values indicate a more significant association between inbreeding and phenotypic value and suggest more predictive power to detect inbreeding depression.

Inbreeding Depression Estimates for Different Traits and Trait Groups
Across all studies and traits, the median (mean) pedigree-based b m and b s equaled −0.13 (−0.22) and −0.59 (−0.71), respectively. In other words, a 1% increase in F PED was associated with a median decrease in phenotypic value of 0.13% of a trait's mean, or 0.59% of a trait's SD. The distributions of b m and b s showed substantial kurtosis (i.e., were heavily tailed) and were somewhat negatively skewed ( Figure 1). with . = 2 − 1 (Appendix A). Measures based on GRMs were combined into , no matter which GRM was used. Measures based on ROH were considered as , regardless of the criteria used to identify ROH (which can vary substantially, e.g., [189]).
Comparisons were made within studies and within traits. There were 12 studies that reported -values for at least two of the above-mentioned inbreeding measures [129,134,138,146,155,157,160,168,173,177,181,183]. These studies all reported trait means and SDs, which were used to calculate and . The dataset used for the comparisons is provided in the Supplementary Materials (Table S2). A direct comparison of and is inappropriate, because of scale differences between , , , . , and [26]. For example, measures the probability of alleles being "identical by state" (IBS) and typically has a mean of 60 to 70% and a SD of 1 to 2%, whereas measures the probability of alleles being "identical by descent" (IBD) with reference to the founder generation and typically has a lower mean and larger SD than (Table S2). To account for such differences, and were standardized by multiplying them by the SD of the inbreeding measure in the corresponding population, so that they were expressed per 1 SD increase in inbreeding (rather than per 1%). For one of the twelve studies [134], scaling was not possible because SDs of inbreeding measures were not reported. This study was therefore excluded from the comparison across inbreeding measures.
Significance of -values was also compared across inbreeding measures. Test statistics were calculated as ( ) and these test-statistics were compared to a chi-square distribution with one degree of freedom to obtain approximate p-values (following the Wald test). Smaller p-values indicate a more significant association between inbreeding and phenotypic value and suggest more predictive power to detect inbreeding depression.

Inbreeding Depression Estimates for Different Traits and Trait Groups
Across all studies and traits, the median (mean) pedigree-based and equaled −0.13 (−0.22) and −0.59 (−0.71), respectively. In other words, a 1% increase in was associated with a median decrease in phenotypic value of 0.13% of a trait's mean, or 0.59% of a trait's SD. The distributions of and showed substantial kurtosis (i.e., were heavily tailed) and were somewhat negatively skewed ( Figure 1).

Figure 1.
Histograms of estimates of (n = 1818) and (n = 1259) across all studies and traits, after removal of extreme outliers. Descriptive statistics and a normal distribution (dashed red lines; based on mean and standard deviation (SD)) are also shown.
For each trait group, except for the group "other traits", the mean and median and were negative (Figure 2). After correcting for the effect of population, the EMMs of and for all groups, except "other traits", were all negative and most were significantly below zero ( Table 1). The EMMs for and for "other traits" were  For each trait group, except for the group "other traits", the mean and median b m and b s were negative ( Figure 2). After correcting for the effect of population, the EMMs of b m and b s for all groups, except "other traits", were all negative and most were significantly below zero ( Table 1). The EMMs for b m and b s for "other traits" were positive but not significantly different from zero (P > 0.05). When ignoring the "other traits" and comparing groups based on b m , production traits and reproduction/survival traits showed the most depression (EMMs of −0.308 and −0.302, respectively) and conformation traits showed the least depression (EMM of −0.142). The difference between production and reproduction/survival traits on one hand, and conformation traits on the other hand, was also significant (P < 0.05), whereas other pairwise comparisons were not (Table S3). When comparing trait groups based on b s , weight/growth traits showed the most depression (EMM of −1.071) and reproduction/survival traits showed the least depression (EMM of −0.410). For b s , there were no significant pairwise comparisons between EMMs of trait groups (except with the group "other traits"; Table S4).
Genes 2021, 12, x FOR PEER REVIEW 6 of 22 positive but not significantly different from zero (P > 0.05). When ignoring the "other traits" and comparing groups based on , production traits and reproduction/survival traits showed the most depression (EMMs of −0.308 and −0.302, respectively) and conformation traits showed the least depression (EMM of −0.142). The difference between production and reproduction/survival traits on one hand, and conformation traits on the other hand, was also significant (P < 0.05), whereas other pairwise comparisons were not (Table S3). When comparing trait groups based on , weight/growth traits showed the most depression (EMM of −1.071) and reproduction/survival traits showed the least depression (EMM of −0.410). For , there were no significant pairwise comparisons between EMMs of trait groups (except with the group "other traits"; Table S4).
When running a model with individual traits instead of trait groups, the EMMs for individual traits showed substantial variation (Table S5). For , the EMMs ranged from −0.938 to 0.321, with the vast majority below zero (39 out of 44, of which 19 with P < 0.05). The EMMs above zero were not significant (P > 0.05). For , the EMMs ranged from −1.938 to 1.522, with the majority being below zero (26 out of 29, of which there were eight with P < 0.05) and only one EMM was significantly larger than zero (P < 0.05).   When running a model with individual traits instead of trait groups, the EMMs for individual traits showed substantial variation (Table S5). For b m , the EMMs ranged from −0.938 to 0.321, with the vast majority below zero (39 out of 44, of which 19 with P < 0.05). The EMMs above zero were not significant (P > 0.05). For b s , the EMMs ranged from −1.938 to 1.522, with the majority being below zero (26 out of 29, of which there were eight with P < 0.05) and only one EMM was significantly larger than zero (P < 0.05). Table 1. Estimated marginal means of inbreeding depression estimates per trait group with standard errors (SEs). Inbreeding depression estimates are expressed as percentage of a trait's mean (b m ) or as percentage of a trait's SD (b s ). The number of estimates (N) and the p-value for testing the mean against zero are also shown.

Inbreeding Depression Estimates for Pedigree-Based and SNP-Based Measures of Inbreeding
Standardized b m and b s generally correlated well across different inbreeding measures (Figures 3 and 4). The highest correlations were found between F GRM0.5 and HO (0.97 for b m and 0.94 for b s ). Correlations between F ROH and HOM, and correlations between F GRM and F GRM0.5 , were also high (i.e., ≥0.9). The lowest correlations were found between F ROH and F GRM (0.27 for b m and 0.52 for b s ) and between F GRM and HOM (0.24 for b m and 0.47 for b s ), although it should be noted that these correlations were low due to a single study (shown in purple in Figures 3 and 4).

Inbreeding Depression Estimates for Pedigree-Based and SNP-Based Measures of Inbreeding
Standardized and generally correlated well across different inbreeding measures (Figures 3 and 4). The highest correlations were found between . and (0.97 for and 0.94 for ). Correlations between and , and correlations between and . , were also high (i.e., ≥0.9). The lowest correlations were found between and (0.27 for and 0.52 for ) and between and (0.24 for and 0.47 for ), although it should be noted that these correlations were low due to a single study (shown in purple in Figures 3 and 4). The data points (colored per study) and linear trendline are shown (lower triangle) as well as the density curve for each inbreeding measure (diagonal) and the correlation and regression equation (upper triangle). Note that slopes of the linear trendline differ from 1, which is also expected when correlations between inbreeding measures themselves are not equal to 1.
= pedigree inbreeding; = inbreeding based on runs of homozygosity; = inbreeding from genomic relationship matrix (studies in pink and purple used VanRaden's method 2, and light blue Yang's method); . = inbreeding from genomic relationship matrix with allele frequencies fixed to 0.5; = percentage of homozygous SNPs.  . Note that slopes of the linear trendline differ from 1, which is also expected when correlations between inbreeding measures themselves are not equal to 1. = pedigree inbreeding; = inbreeding based on runs of homozygosity; = inbreeding from genomic relationship matrix (studies in pink and purple used VanRaden's method 2, and light blue used Yang's method); . = inbreeding from genomic relationship matrix with allele frequencies fixed to 0.5; = percentage of homozygous SNPs.
When comparing the significance of inbreeding depression estimates, SNP-based inbreeding measures generally had lower p-values than pedigree inbreeding (Table 2). For example, had a lower p-value than in 30 out of 38 comparisons (79%). In these comparisons, the median p-value was 0.026 for and 0.131 for . Similarly, the pvalues for , . and were lower than those for in the majority of comparisons (7 out of 7 for , 21 out of 26 for . , and 15 out of 18 for ). Among the SNP-based measures, no consistent differences in p-values were observed (Table 2). For example, the percentage of comparisons in which the second measure had a lower p-value than the first measure was 57% for the comparisons between and , 56% between and , and 50% between . and . The latter was expected because of the equivalence of . and (Appendix A). In the comparison between and . , had a lower p-value in 25 out of 40 comparisons (62%), but the median p-value was very similar for the two measures (0.037 vs. 0.036). In the comparison between and , had a lower p-value in 10 out of 14 comparisons (71%), whereas the median p-value was smaller for than for (0.009 vs. 0.071). In the comparison between and . , . had a lower p-value in 100% of the comparisons, but this was based on only three comparisons in a single study. The data points (colored per study) and linear trendline are shown (lower triangle), as well as the density curve for each inbreeding measure (diagonal) and the correlation and regression equation (upper triangle). Note that slopes of the linear trendline differ from 1, which is also expected when correlations between inbreeding measures themselves are not equal to 1. F PED = pedigree inbreeding; F ROH = inbreeding based on runs of homozygosity; F GRM = inbreeding from genomic relationship matrix (studies in pink and purple used VanRaden's method 2, and light blue used Yang's method); F GRM0.5 = inbreeding from genomic relationship matrix with allele frequencies fixed to 0.5; HOM = percentage of homozygous SNPs.
When comparing the significance of inbreeding depression estimates, SNP-based inbreeding measures generally had lower p-values than pedigree inbreeding (Table 2). For example, F ROH had a lower p-value than F PED in 30 out of 38 comparisons (79%). In these comparisons, the median p-value was 0.026 for F ROH and 0.131 for F PED . Similarly, the p-values for F GRM , F GRM0.5 and HOM were lower than those for F PED in the majority of comparisons (7 out of 7 for F GRM , 21 out of 26 for F GRM0.5 , and 15 out of 18 for HOM).
Among the SNP-based measures, no consistent differences in p-values were observed (Table 2). For example, the percentage of comparisons in which the second measure had a lower p-value than the first measure was 57% for the comparisons between F ROH and F GRM , 56% between F ROH and HOM, and 50% between F GRM0.5 and HOM. The latter was expected because of the equivalence of F GRM0.5 and HOM (Appendix A). In the comparison between F ROH and F GRM0.5 , F ROH had a lower p-value in 25 out of 40 comparisons (62%), but the median p-value was very similar for the two measures (0.037 vs. 0.036). In the comparison between F GRM and HOM, F GRM had a lower p-value in 10 out of 14 comparisons (71%), whereas the median p-value was smaller for HOM than for F GRM (0.009 vs. 0.071). In the comparison between F GRM and F GRM0.5 , F GRM0.5 had a lower p-value in 100% of the comparisons, but this was based on only three comparisons in a single study.

Discussion
In this meta-analysis, 154 studies were evaluated. The objective was to assess and compare the degree of inbreeding depression across different trait groups and across different pedigree-based and SNP-based measures of inbreeding.
Across all studies and traits, a 1% increase in pedigree inbreeding was associated with a median (mean) decrease in phenotypic value of 0.13% (0.22%) of a trait's mean, or 0.59% (0.71%) of a trait's standard deviation. These effects are similar to the mean b m of 0.14% and mean b s of 0.56% reported by Leroy [10]. Distributions of b m and b s showed substantial kurtosis and were negatively skewed (Figure 1). The observed kurtosis might be the result of the final distribution being a mixture of underlying distributions with the same mean, but different SD (where studies with small sample size have a larger SD of estimates). Such a mixture can have a higher kurtosis than the separate distributions, as illustrated in Figure 5. The observed negative skewness could be due to publication bias [7,8,190]. Multiple studies explicitly stated that non-significant estimates were not reported (e.g., [93,130,147,178]). Omitting non-significant results does not necessarily introduce bias, as long as results in both directions are equally likely to be omitted. To further investigate the presence of publication bias, we retrieved the number of records per study (when reported) and made a funnel plot with b m on the x-axis and the number of records on the y-axis ( Figure 6). As expected, the funnel plot showed more variation in b m -estimates for studies with few records compared to studies with many records. In addition, for studies with relatively few records, positive inbreeding effects were relatively scarce compared to negative outliers, suggesting indeed some degree of publication bias.
Across trait groups, there were some differences in mean b m and b s ( Table 1, Tables S3 and S4). These differences, however, were not consistent for b m and b s and did not support the hypothesis that primary fitness traits such as survival and reproduction exhibit more inbreeding depression than other traits. In fact, when comparing the EMMs of b s across trait groups, reproduction/survival traits showed the least inbreeding depression (except for the group "other traits"). This is similar to Leroy [10], who reported EMMs for b m and b s of −0.222 and −0.336 for reproduction/survival, of −0.092 and −0.473 for conformation, of −0.24 and −0.563 for weight/growth, of −0.351 and −0.817 for production, and of −0.093 and −0.488 for other traits. The relatively mild EMM of b s for reproduction/survival traits could be the result of such traits generally showing more phenotypic variation due to environmental sources, which is in analogy with the observation that reproduction/survival traits generally have lower heritabilities than other traits, whereas coefficients of genetic variation (also known as "evolvabilities") tend to be more similar across trait groups [191].
further investigate the presence of publication bias, we retrieved the number of records per study (when reported) and made a funnel plot with on the x-axis and the number of records on the y-axis ( Figure 6). As expected, the funnel plot showed more variation in -estimates for studies with few records compared to studies with many records. In addition, for studies with relatively few records, positive inbreeding effects were relatively scarce compared to negative outliers, suggesting indeed some degree of publication bias.  Across trait groups, there were some differences in mean and (Tables 1, S3 and S4). These differences, however, were not consistent for and and did not support the hypothesis that primary fitness traits such as survival and reproduction exhibit more inbreeding depression than other traits. In fact, when comparing the EMMs of across trait groups, reproduction/survival traits showed the least inbreeding depression (except for the group "other traits"). This is similar to Leroy [10], who reported EMMs for and of −0.222 and −0.336 for reproduction/survival, of −0.092 and −0.473 for conformation, of −0.24 and −0.563 for weight/growth, of −0.351 and −0.817 for production, and of −0.093 and −0.488 for other traits. The relatively mild EMM of for reproduction/survival traits could be the result of such traits generally showing more phenotypic variation due to environmental sources, which is in analogy with the observation that reproduction/survival traits generally have lower heritabilities than other traits, whereas coefficients of genetic variation (also known as "evolvabilities") tend to be more similar across trait groups [191].
The hypothesis that fitness traits exhibit more inbreeding depression is largely based on results from wildlife and laboratory populations. In a survey among laboratory populations of Drosophila melanogaster, Lynch and Walsh [2] observed a high degree of inbreeding depression for primary fitness traits such as viability, fertility and egg production, and a low degree of inbreeding depression for morphological traits. In a metaanalysis of non-domestic animal populations, DeRose and Roff [6] also reported more inbreeding depression for life history traits (fecundity, survival and development) than for morphological traits (adult body size). They reported a median and of −0.47 and −1.45 for life history traits and of −0.09 and −0.59 for morphological traits. Coltman and Slate [7] performed a meta-analysis on correlations between phenotypes and two The hypothesis that fitness traits exhibit more inbreeding depression is largely based on results from wildlife and laboratory populations. In a survey among laboratory populations of Drosophila melanogaster, Lynch and Walsh [2] observed a high degree of inbreeding depression for primary fitness traits such as viability, fertility and egg production, and a low degree of inbreeding depression for morphological traits. In a meta-analysis of non-domestic animal populations, DeRose and Roff [6] also reported more inbreeding depression for life history traits (fecundity, survival and development) than for morphological traits (adult body size). They reported a median b m and b s of −0.47 and −1.45 for life history traits and of −0.09 and −0.59 for morphological traits. Coltman and Slate [7] performed a meta-analysis on correlations between phenotypes and two measures of genetic variation at microsatellite loci, multilocus heterozygosity (MLH), and mean squared allele size differences (d 2 ). Using data from domestic and non-domestic populations, they found significant correlations for life history traits (0.0856 for MLH and 0.0479 for d 2 ) and smaller non-significant correlations for morphometric traits (0.0052 for MLH and 0.0038 for d 2 ) and physiological traits (0.0075 for MLH and 0.0055 for d 2 ). It should be noted that these estimates were obtained while not accounting for dependence between estimates from the same studies and the same populations within studies. When they analyzed the average per trait group within study units ("study unit average" approach), the difference between correlations for the different trait groups decreased and confidence intervals overlapped [7]. Chapman et al. [8] found similar results. In their meta-analysis of heterozygosity-fitness correlations, they also found that the study unit average approach resulted in smaller differences in correlations across trait groups than when they treated all records independently. They additionally used a linear mixed model to account for a population effect, which resulted in even more similar correlations across trait groups than the study unit average approach, with confidence intervals that strongly overlapped. With the mixed model, they found a mean correlation (and confidence interval) between phenotype and MLH of 0.098 (0.0674-0.1293) for life history traits, of 0.0611 (0.0302-0.0919) for morphometric traits, and of 0.0809 (0.0048-0.1560) for physiological traits [8]. Thus, although the empirical results are equivocal, there are indications from wild and laboratory populations that fitness traits might exhibit more inbreeding depression.
There are also theoretical arguments why primary fitness traits would show more inbreeding depression than traits less related to fitness. Inbreeding depression depends on directional dominance, as indicated by the expected inbreeding depression that (in absence of epistasis) equals −F ∑ i 2p i q i d i [1,2]. To exhibit inbreeding depression, a trait should be influenced by dominance effects (resulting in dominance variance) and, more importantly, these dominance effects should be favorable on average. For traits strongly under directional selection such as a primary fitness trait like survival, the average dominance effect is expected to be favorable, because fixation occurs more quickly for loci with an unfavorable dominance effect [2]. For a trait that is less related to fitness as well as for traits under stabilizing selection, directional dominance is expected to be less pronounced because of the lower directional selection pressure.
Given that directional dominance is a function of how much a trait has been selected upon, the results of this meta-analysis (and that of Leroy [10]) are in line with expectation. Livestock populations are typically under directional selection for a combination of production, conformation, growth, reproduction, survival, behavioral and health traits (in addition to natural selection on primary fitness traits). Hence, all of these trait groups may show a similar degree of directional dominance and, consequently, a similar degree of inbreeding depression. This is also in line with the relative dominance variance (i.e., the proportion of dominance variance over phenotypic variance), which appears to be similar across trait groups in livestock. For example, Doekes et al. [192] compared estimates of relative dominance variance across five studies in cattle and found no clear differences between yield, fertility, and health traits. Thus, any breeding goal trait in livestock can be considered under selection just as a fitness trait and may exhibit considerable inbreeding depression.
The efficiency of selection against (partially) deleterious alleles is increased by inbreeding, a process called purging [193,194]. Since livestock populations typically have relatively small effective population sizes (although they may have very large census sizes), purging can be efficient to reduce the inbreeding load in these populations [194]. Since purging acts on all traits under selection, it is not expected to cause differences in the degree of inbreeding depression between traits under selection. Nevertheless, the effective population size influences purging [193,194] and it would be interesting to study the association between effective population size and observed inbreeding depression.
In this meta-analysis, we compared inbreeding depression estimates across different pedigree-based and SNP-based measures by standardizing b m and b s (expressing them per 1 SD increase in the inbreeding measure) and by comparing p-values based on reported bvalues and corresponding standard errors. Alternatively, it has been suggested to compare correlations between inbreeding measures and phenotype [26]. The use of correlations, however, was unfeasible in this meta-analysis, because most livestock studies report inbreeding depression as b-values obtained from animal models.
Standardized b m -and b s -estimates correlated well across the different pedigree-based and SNP-based inbreeding measures (Figures 3 and 4). These correlations are expected to be largely driven by the correlations between the underlying inbreeding estimators and, with that in mind, they also followed expectation. For example, the high correlations between inbreeding depression estimates for HOM and F ROH (0.96 for b m and 0.92 for b s ) are in line with the high reported correlations between the coefficients of HOM and F ROH in the underlying studies (e.g., 0.81 [129], 0.94 [157], and 0.86 [181]). Similarly, the moderate correlations between inbreeding depression estimates for F PED and F ROH (0.52 for b m and 0.65 for b s ) are in line with the moderate reported correlations between the coefficients of F PED and F ROH in the underlying studies (of e.g., 0.66 [168], 0.63 [177], and 0.60 [181]). pvalues of inbreeding depression estimates were smaller for SNP-based inbreeding measures than for pedigree inbreeding (Table 2). SNP-based measures may be more accurate than pedigree inbreeding because the former account for Mendelian sampling (e.g., [195]) and do not depend on pedigree completeness and quality (e.g., [196]). Since measurement errors in the independent variable lead to downward bias in the estimated slope (a statistical phenomenon called "regression dilution"), regression on less accurate pedigree-based coefficients may result in smaller b-values than regression on more accurate SNP-based coefficients. The benefit of using SNP-based measures will, among others, depend on the number of SNPs in relation to the genome length and effective population size (e.g., [23]). Having more SNPs available is expected to allow for a better estimation of the realized inbreeding and, therefore, of the realized inbreeding depression. In addition, when SNPbased measures depend on allele frequencies in the population (e.g., F GRM ), a sufficient number of individuals is required to accurately estimate allele frequencies.
Since the scale of inbreeding measures strongly influences b-values, only pedigreebased estimates were used for the comparison between trait groups. One may argue that the pedigree-based comparison of b-values across studies is also inappropriate because the scale of F PED differs across populations (due to differences in pedigree depth). To account for differences in scale of F PED , we initially considered the SD of F PED as an explanatory variable in our model. However, since this SD was available for less than a third of the estimates, and since its effect was not significant in the preliminary analyses, it was removed from the final model (as presented in the Materials and Methods). A population effect, defined as breed within study, was included in the model, which indirectly may have accounted for the population-specific SD of inbreeding coefficients.
No clear differences in effect sizes and p-values were found between inbreeding depression estimates for F ROH , F GRM , F GRM0.5 , and HOM (Figures 3 and 4 and Table 2). However, it is difficult to draw firm conclusions, because of (1) the limited number of empirical studies directly comparing the different measures; (2) the arbitrary definitions of especially F ROH , with many different criteria that are often not fully reported [189]; and (3) not knowing the true inbreeding depression. These limitations can be partly overcome in simulation studies. Various simulations studies have been performed to investigate which SNP-based measure might be most appropriate to estimate inbreeding depression. Keller et al. [22] investigated the correlation between different inbreeding measures and the homozygous mutation load (HML), which they defined as the number of homozygous loci for rare alleles (with MAF < 0.5) in an individual. They found that F ROH had more power to detect HML than F PED or excess of SNP-by-SNP homozygosity. Kardos et al. [24] found that SNP-based measures (using 35k SNPs) better explained the variation in realized genomic IBD than F PED (with 20 generations known). They also found that F ROH and the excess of SNP-by-SNP homozygosity explained very similar amounts of variation in realized IBD. Yengo et al. [25] used the SNP data of humans and simulated inbreeding depression by assigning phenotypic effects to samples of SNPs. They reported that F GRM from Yang's method [20] performed best. This is a SNP-by-SNP measure with high weight on rare alleles that is also known as F UN I ("inbreeding based on the correlation of uniting gametes"). In their study, F ROH resulted in the overestimation of inbreeding depression. In the study of Nietlisbach et al. [27], in contrast, F ROH provided unbiased results, whereas F GRM from Yang's method [20] resulted in upwardly biased inbreeding depression estimates. Recently, Caballero et al. [28] showed that these (seemingly) contradictory results might be explained by population characteristics. They found that inbreeding depression estimates obtained from F ROH were appropriate for populations with small effective sizes (e.g., N e = 100), but were downwardly biased for populations with large effective sizes (e.g., N e = 5000), unless sufficiently long ROHs (>5 Mb) were used. Inbreeding depression estimates based on F GRM from Yang's method [20], on the other hand, were upwardly biased for populations with small effective sizes, but were nearly unbiased for populations with large effective sizes. They also studied F GRM from VanRaden's method 2 [19] and F HOM , the latter being a measure of the deviation from Hardy-Weinberg frequencies that has a correlation of 1 with HOM and F GRM0.5 when base allele frequencies are known. These measures always underestimated inbreeding depression. Overall, Caballero et al. [28] concluded that it depends on the population as to which SNP-based inbreeding measure is most appropriate to estimate inbreeding depression.
Aside from providing more powerful measures to detect inbreeding depression, genomic data offer additional opportunities to study the genetic background of inbreeding depression. First, the length of ROHs can be used to study the effect of the age of inbreeding on inbreeding depression, in addition to already existing pedigree-based methods [168,177]. Recent inbreeding may be more harmful than ancient inbreeding, as a result of purging. Using genomic time series, the process of purging could also be studied in more detail. Second, SNP-data can be used to search for genomic regions associated with inbreeding depression (e.g., through ROH scans [192,197,198]). It should be noted, however, that such methods are prone to statistical issues such as multiple testing and tend to go against the infinitesimal model (i.e., the idea that inbreeding depression is predominantly caused through many loci with small effects). Third, with genomic data, it has become possible to study the role of regulatory mechanisms such as methylation, in explaining inbreeding depression [199]. Last, genomic data may help to shed more light on the role of partial dominance, overdominance, and epistasis in explaining inbreeding depression.
Various studies were excluded from this meta-analysis, because they lacked detailed information on the inbreeding depression estimates. Here, we therefore list recommendations on how to report inbreeding depression estimates. First, it is important to report descriptive statistics of the traits that are investigated (number of animals, mean, SD, etc.). This allows future studies, among others, to scale b-values (e.g., to b m and b s ) and to investigate the effect of population size on the results (as e.g., in Figure 6). The descriptive statistics should be provided for the individuals used in the analysis and, when multiple populations or subgroups (e.g., breeds, sexes, or age classes) are studied, statistics should be reported for each population separately. Second, it is important to provide details on how inbreeding measures were calculated. For F PED , for example, this includes information on pedigree completeness such as the complete generation equivalent or number of complete generations. Again, such statistics should be reported for the individuals used in the analysis and not (only) for the entire pedigree. For F ROH , the criteria and approach to identify ROHs should be fully explained. Third, descriptive statistics on the inbreeding measures (distribution, mean, SD) should be provided to enable scaling (as e.g., in Figures 3 and 4). Fourth, when estimating b-values, it is important to correct for appropriate fixed effects as well as for additive genetic effects [200] (e.g., with an animal model). Fifth, it should be clearly stated whether b-values are expressed per 1%, 10%, or 100% increase in inbreeding. Sixth, for traits such as litter size, calving ease, and success of insemination, it should be clearly stated whether b-values correspond to a regression of phenotypes on inbreeding of the offspring/litter, on inbreeding of the dam ('maternal'), or on inbreeding of the sire ('paternal'). Seventh, b-values should be reported with standard errors and sufficient decimal places to facilitate calculation of test statistics. Eighth, it is important to also report non-significant and/or favorable effects to prevent publication bias. Last, it would be valuable to provide estimates of the effective population size, when available, because the effective population size influences the amount of genetic purging in the population and, thus, the degree of inbreeding depression. Overall, these recommendations should contribute to enable comparisons of inbreeding depression estimates across studies and facilitate meta-analyses in future.

Conclusions
Inbreeding has an unfavorable effect on livestock traits. Based on this meta-analysis, a 1% increase in pedigree inbreeding is associated with a median decrease in phenotypic value of 0.13% of a trait's mean, and 0.59% of a trait's SD. Various trait groups (i.e., reproduction/survival, weight/growth, conformation, production, and health) show a similar degree of inbreeding depression. p-values of inbreeding depression estimates for SNP-based inbreeding measures were smaller than those for pedigree inbreeding, suggesting more power for SNP-based measures. There were no consistent differences in pvalues for percentage of homozygous SNPs, inbreeding based on ROH or inbreeding based on a genomic relationship matrix. Comparisons between measures, however, are difficult because of the limited number of studies that directly compares them, the different scales of measures, and arbitrary definitions for particularly ROH-based inbreeding. To facilitate comparisons across studies in future, we highly recommend always reporting detailed information about inbreeding depression estimates (on traits, inbreeding coefficients, and models used).

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10 .3390/genes12060926/s1, Table S1: Dataset with inbreeding depression estimates of 154 studies, Table  S2: Dataset with inbreeding depression estimates for pedigree-based and SNP-based inbreeding measures of 12 studies, Table S3: Pairwise comparisons of estimated marginal means of b m between trait groups, Table S4: Pairwise comparisons of estimated marginal means of b s between trait groups, Table S5: Estimated marginal means of inbreeding depression estimates per trait, with standard errors.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
In this appendix, it is explained why inbreeding based on the diagonal of a GRM computed with allele frequencies of 0.5 (F GRM0.5 ) is equivalent to the percentage of homozygous SNPs (HOM), except for a difference in scale, with F GRM0.5 = 2HOM − 1.
When using a GRM, inbreeding for animal j is typically calculated as the diagonal element of the GRM for animal j minus one (i.e., G jj − 1). For VanRaden's method 1 [19], the diagonal element of the GRM for animal j is: where, at the i th SNP, x ij is the count of allele A (coded as 0, 1 or 2) in animal j; p i is the allele frequency of allele A and q i is the allele frequency of allele B; and N is the number of SNPs. When allele frequencies of 0.5 are used, this simplifies to: x 2 ij − 2x ij + 1 0.5 * N Since x 2 ij − 2x ij + 1 equals 1 when a SNP is homozygous (x ij = 0 or x ij = 2), or 0 if a SNP is heterozygous (x ij = 1), the numerator is the number of homozygous SNPs (N hom ): G jj = N hom 0.5 * N and, thus, G jj equals two times the fraction of homozygous SNPs.
For VanRaden's method 2 [19], which was initially proposed by Leutenegger et al. [201] and Amin et al. [202], the diagonal element of the GRM for animal j is: which is the same as VanRaden's method 1, except that the scaling occurs per SNP. When allele frequencies of 0.5 are used, this equation simplifies to: equals 2 when a SNP is homozygous (x ij = 0 or x ij = 2), or 0 if a SNP is heterozygous (x ij = 1), this equation becomes: and, thus, again G jj equals two times the fraction of homozygous SNPs. For Yang's method [20], the diagonal element of the GRM for animal j is: When allele frequencies of 0.5 are used, this equation simplifies to: Since x 2 ij −2x ij +0.5 0.5 equals 1 when a SNP is homozygous (x ij = 0 or x ij = 2), or −1 if a SNP is heterozygous (x ij = 1), this equation becomes: where N het is the number of heterozygous SNPs. Thus, again, G jj equals two times the fraction of homozygous SNPs. Since inbreeding for animal j is calculated as G jj − 1, and G jj is two times the fraction of homozygous SNPs, F GRM0.5 = 2HOM − 1, for all three GRMs considered above.