1. Introduction
Runs of homozygosity (ROH) are contiguous homozygous segments of the genome, which can arise from the mating of two related individuals that transmit identical haplotypes to their offspring [
1]. Long ROH segments are often associated with recent inbreeding, while short ROH are linked to ancient inbreeding, owing to the higher probability of recombination events occurring as the number of generations increases [
2]. Thus, ROH analyses are paramount for estimating genetic diversity metrics such as ROH-based inbreeding coefficient (FROH), i.e., the ratio of the total length of an individual’s autosomal genome in ROH to the total length of the autosomal genome covered by single nucleotide polymorphism (SNP) [
2]. FROH tends to be more accurate than pedigree-based inbreeding coefficients and enables the identification of specific genomic regions with greater inbreeding [
3]. The identification of ROH regions also contributes to the characterization of population history, structure, and demographic events [
4], and further reveals the selection signatures that are characterized by fixation of alleles under high selection pressure on a population [
5,
6], with a subsequent increase in homozygosity in regions around these alleles [
7,
8].
Runs of homozygosity have been extensively studied across many species for the quantification of inbreeding [
2,
3,
9,
10,
11,
12], detection of selection signatures [
13,
14,
15], and comparison of statistical methods and identification parameters [
4,
16,
17,
18,
19]. Runs of heterozygosity, most appropriately defined as heterozygosity-rich regions (HRRs) [
20], represents a more recent concept [
21], and is not as well described in the literature as ROH [
20,
22,
23,
24]. HRRs can also provide insights about population structure and demographic history [
24], and these HRRs may harbor important
loci for key functional traits such as immune response, survival rate, fertility, and other fitness traits [
25]. To the best of our knowledge, there are no reported studies characterizing HRRs in sheep populations.
The process of sheep domestication (
Ovis aries) started in the Fertile Crescent, approximately 11,000 years ago [
26]. Nowadays, sheep are raised across the globe under divergent environmental conditions. While some breeds have been artificially selected for certain purposes (e.g., meat, milk, or wool), other populations have evolved without direct human interventions. The characterization of ROH patterns on populations selected for specific or divergent purposes could reveal genomic regions of predominant homozygosity related to the fixation of certain alleles associated with the traits under selection. Furthermore, HRRs on these populations may be an indicator of regions associated with important fitness traits [
25]. Purfield et al. [
27] analyzed the genome of sheep from six meat breeds to identify selection signatures using ROH and two complementary methods, Fixation Index and hapFLK [
28], and observed regions under putative selection that frequently overlapped with high ROH regions. Dzomba et al. [
29] also characterized the distribution of ROH islands in 13 South African sheep breeds and 31 worldwide sheep populations, which enabled the identification of common and unique ROH islands across populations.
Data visualization is a critical step in genomic data analytics for proper interpretation of the findings. Plotting results instead of looking at tabular data frequently provides additional insights into the patterns and trends of the results. Several tools have been developed to visualize genomic data, which can be challenging owing to the structure and complexity of the data [
30]. Furthermore, the volume of data that usually results from independent analyses may present an additional challenge for the integration and comparison of such results, which are usually projected in distinct static images. Business intelligence (BI) is a concept used in the corporate environment to support managers in the decision-making process by enabling informed decisions based on data [
31]. Distinguishing features of BI tools include the possibility of creating dynamic data visualizations and integrating distinct data sources. Consequently, BI tools provide a great opportunity for combining results from different analyses and navigating through parameters more quickly, such as by changing the exhibited chromosome or breed in a click.
In this study, we chose 17 worldwide sheep populations of eight breeds, which were intensively selected for different purposes (meat, milk, or wool), or locally-adapted breeds. Our main objectives were as follows: (1) to identify and characterize factors impacting the detection of ROH and HRR in sheep breeds selected for different breeding goals; (2) to evaluate the feasibility of using a BI tool to visualize and filter the observed results, integrating multiple types of information in a single visualization, such as ROH islands and previously-identified quantitative trait loci (QTL) or linkage disequilibrium (LD) pattern; and (3) to compare different parameters for the identification of HRR, with the aim of providing some basis for future studies of this nature in sheep and other livestock populations.
4. Discussion
In this study, we evaluated the use of a BI Software to integrate data obtained from different databases and analyses, regarding ROH and HRR detected in worldwide sheep populations. The use of the BI concept allowed us to dynamically visualize outputs from different analyses, as well as apply filters to efficiently select specific populations, chromosomes, and parameters and focus on the interaction between the studied phenomena. Furthermore, we would like to emphasize that, although the genotypic data used in this study were collected from multiple flocks [
32], and sizes of the samples were taken into consideration when selecting the populations to be included herein, any conclusions drawn from the present study should be carefully considered along with other studies that used different data sources and a considerable sample size, in order to avoid any chances of misrepresentation of the populations. Moreover, the visualization method implemented in this study could also be applied to future studies.
All sheep populations included in this study presented more than 45% of their detected ROH between 1 and 2 Mb, the shortest ROH length class defined. Many studies also reported the prevalence of ROH in the shortest length category for several sheep breeds [
27,
44,
45,
46,
47]. It has been reported that modern populations of sheep usually present higher effective population sizes (Ne) and SNP diversity than cattle populations [
11,
27,
32,
48], which could be related to the prevalence of short over long ROH in sheep. Moreover, Ferenčaković et al. [
17] reported that the use of low-density SNP chips for the detection of ROH may lead to an overestimation of the number of ROH shorter than 4 Mb.
Nosrati et al. [
48] detected on average 50.38 ROH in individuals from the same Soay population used in the present study, which corresponds to roughly one-quarter of the runs detected herein (188.4). This divergence in the results could be attributed to the differences in the detection parameters, such as higher values of minimal number of SNPs in an ROH (40) and maximal gap between adjacent SNPs (1 Mb), as well as lower SNP density (100 kb/SNP). Our results suggest that setting a low minimal number of SNPs (20) and maximal gap (250 kb), and higher SNP density (70 kb/SNP) when using a low-density SNP chip may lead to the break of runs in regions of lower SNP density, as illustrated in
Figure 3, creating an overestimation of the number of runs and an underestimation of the percentage of long runs. On the other hand, Dzomba et al. [
29] applied similar parameters as in the present study, with a higher minimum number of SNPs per run (30), a lower density (100 kb/SNP), and used the method Consecutive Runs. The authors reported higher averages of the number of ROH per animal per population (considering the same populations used in the present study). We have also tested the effects of applying a 0.01 MAF filter, which had almost no effects on the overall results and caused the break of some runs. Therefore, we decided not to prune the data for MAF. Besides the Sliding Windows and Consecutive Run approaches implemented by Detect Runs, there are other software and methods that could also be used for the detection of ROH, and might lead to different results.
The distribution of ROH in length classes (
Figure 1a), chromosomes (
Figure 2), and positions (
Figure 3) showed an obvious differentiation between populations. ROH has been shown to be non-randomly distributed across the genomes, instead they reflect the occurrence of demographic events and selection pressure for different objectives [
4]. The East Friesian Brown and Soay populations showed a similar total ROH length, which leads to similar inbreeding levels. However, the percentage of long ROH was much higher in the East Friesian Brown population, indicating recent inbreeding events. The Soay population was raised in isolation on the Soay Island for hundreds of years [
49], and inbreeding was probably frequent when the first individuals arrived on the island, hence the high number of small runs. The three Texel and the two Lacaune populations presented similar averages of total length and number of ROH within each breed, while the two Suffolk and the six Merino populations showed a significant divergence on these metrics (
Figure 1), which might indicate that the processes of selection in different countries can be more differentiated for some breeds than for others.
Few studies have been conducted with the aim of characterizing HRR in livestock, and only one has attempted to identify factors impacting HRR detection, using a low-density SNP chip [
23]. Furthermore, most of the studies on HRRw used high-density SNP chips [
21,
22,
24], which have been shown to require other parameters than low density SNP chips for ROH detection [
17,
41]. The same is most likely true for the identification of HRRs. In this study, we set the minimum number of SNPs within an HRR at 5 or 10, which is lower than the number used for ROH (15) because HRRs are usually reported as being shorter than ROH [
23]. The same difference in the parameters was observed in other studies [
20,
21,
24]. We observed that changing the minimal number of SNPs and window size from 10 to 5 did not increase the number of HRRs detected; in fact, the number and length of HRRs detected decreased. This could be related to the fact that we used the sliding window approach, and the reduction in the window size may have had an interaction with the other parameters, such as number of missing and homozygous SNPs allowed, causing the HRR to break even shorter. We also tested allowing different numbers of homozygous (1 to 3) and missing (1 or 2) SNPs within an HRR. Biscarini et al. [
23] reported that allowing only one homozygous SNP reduced the number of detected HRR and increased its average size when compared with allowing two homozygous SNPs, while increasing this number to up to five caused both metrics to increase. We observed a similar effect in our data—when reducing the number of homozygous allowed from three to two, the number of HRR detected was reduced and the length increased, and reducing it to one caused both metrics to decrease.
Scenario 2 was chosen as the best scenario for the detection of HRR islands, for presenting a high number of HRRs and a satisfactory maximum HRR length, when compared with the other scenarios. The average number of HRRs detected per animal (139.59) was higher than that detected by other authors in turkey (57.80), cattle (9.87), and horse (52.17) populations [
20,
23,
24], and similar to the number detected by Ferenčaković et al. [
22] in a cattle population (122.52). Most of these studies reported the detection of higher numbers of ROH than HRR; however, our results showed the opposite. We hypothesized two reasons: (1) misadjustment of parameters for the detection of HRR, or (2) the sheep genome of the populations analyzed presents small and frequent HRR. Therefore, further research is needed as a means to further test these hypotheses, using different parameters and methods for the detection of HRRs. The use of a higher density SNP chip could also provide further insights.
Kijas et al. [
32] reported the inbreeding coefficient (F) calculated for each of the populations used in this study, and the populations with the lowest F, such as Chinese Merino (0.08), Australian Suffolk (0.08), and Australian Poll Merino (0.09), presented higher average numbers of HRRs and total HRR length per individual (
Table 3), while populations with the highest F, such as Soay (0.33), East Friesian Brown (0.26), and Irish Suffolk (0.22), presented the lowest HRR metrics (
Table 3). When comparing the average numbers and total length of ROH (
Figure 1) and HRR (
Table 3) for the populations, a negative correlation between them was also observed.
With the purpose of investigating the occurrence of LD within ROH and HRR islands, we plotted the results from pairwise SNP calculations of r
2 against the islands, applying filters on Tableau of minimal LD length and r
2 values. This approach was shown to be effective because, differently from other studies, the LD values could be presented directly and not through summarizations such as average r
2 per bins of distance (e.g., Mastrangelo et al. [
50]). Moreover, when observing LD within the islands, we could identify the minimum amount of LD present and visualize the location of the LD blocks within the islands, instead of calculating the r
2 between the first and last SNPs (e.g., Mastrangelo et al. [
51] and Purfield et al. [
27]), which could overshadow the presence of stronger LD between closer SNPs within the island.
Using the approach described above, we observed that most of the regions in ROH islands identified in more than one population (
Table 5) were located in regions with some extent of LD (r
2 > 0.2), with few exceptions where no LD was detected in some portion of the islands, even when allowing the minimal LD length (0 bp) and r
2 (0.2). The presence of stronger LD within the islands varied depending on the chromosome and the population, and some populations showed more overall LD than others. Interestingly, some regions showed a strong LD (r
2 > 0.9) in blocks over 250 Mb long across many populations, such as the region around 112 Mb in OAR2, and no islands were identified in such regions. New Zealand Texel presented LD over 0.9 in blocks within the region of 118,497–121,331 kb, and no island within the region (
File S1). These findings could indicate poor identification of ROH islands, but also that the presence of strong LD in certain regions does not always result in an increase in homozygosity.
The regions detected as ROH islands for two or more sheep populations in the present study spanned across populations selected for different purposes. Abied et al. [
52], using data from the OARv4.0 assembly, detected candidate regions on OAR2, OAR6, and OAR10 for five Chinese sheep breeds. Gorssen et al. [
53], analyzing 100 populations from the same public database used in this study, identified islands in the same region of OAR6 (around 38 Mb) identified herein, for 15 populations. This region was a common island for four of the six merino populations we analyzed, including the Chinese Merino and the two Lacaune populations (meat and milk). He et al. [
46] also identified an ROH hotspot on this region in a Chinese Merino population, and reported the influence of
NCAPG/LCORL, genes associated with calving ease and fetal growth in cattle [
54,
55], body size in mammals [
56,
57,
58], and reduced subcutaneous fat thickness in cattle [
58]. A few QTLs within or very close to the region were associated with body weight (7), bone area (2), and milk fat yield. Taken together, these results suggest that this region on OAR6 is important for multiple traits, which could be beneficial for meat, wool, and milk production.
The region from 109 Mb to 119 Mb on OAR2 harbored ROH islands from six different populations, including breeds selected for meat, milk, and wool. Moreover, a great number of genes with distinct functions were observed within this region, such as
CLCN3, a gene involved in several basic cellular functions, and that was shown to reduce the inflammatory response induced by a high-fat diet in mice [
59];
HPF1, associated with early embryonic development in zebrafish [
60];
PMS1 and
ERCC3, identified as candidate genes in a genomic footprint for dryland stress adaptation in Egyptian fat-tail sheep [
61]. Purfield et al. [
27] reported the region between 115.48 and 126.34 Mb on OAR2 as the ROH hotspot with the most occurrences and as under putative selection in breeds selected for meat (i.e., Texel), but not for Suffolk. In our study, the Texel and the Suffolk populations did not share common islands, in agreement with Purfield et al. [
27], who reported a significant differentiation between these breeds. The QTLs observed within 109 Mb and 119 Mb on OAR2 were mostly related with horn type (21); meat color (1) and texture (1); and health traits, such as fecal egg count, platelet count, mean corpuscular volume, and hemoglobin level. These results also indicate that a variety of traits are impacted by this region, thus harboring ROH islands for different selection groups.
We identified three genes (
BIN1,
MYO7B, and
GAS7) in common ROH islands that were associated with terms related to muscle development and enriched in the wool group: Actin Cytoskeleton (GO:0015629) Actin Binding (GO:0003779), Contractile Fiber (GO:0043292), and Motor activity (GO:0003774).
BIN1 and
MYO7B were detected in a region in OAR2 shared by Chinese Merino, Merino Landschaf, and Scottish Texel.
B1N1 is involved in muscle cell differentiation [
62]. It was reported by Purfield et al. [
27] as a candidate gene in Texel, and by Al Kalaldeh et al. [
63] as a candidate gene in a GWAS study for parasite resistance in Australian sheep.
GAS7 was identified in a different region, located on OAR11 and shared by Australian Industry Merino and Australian Suffolk. This gene is expressed in the central nervous system and associated with motor activity and muscle fiber composition [
64].
Furthermore, Australian Industry Merino and Australian Suffolk shared a region on OAR11 where two genes (
PIK3R5 and
STX8) were previously detected in a putative selection region in Swiss sheep [
45], and are associated with body size [
65,
66].
DHRS7C and
NTN1, also detected within this region, were reported as being related to enhanced muscle performance [
67] and body size [
65,
66], respectively. QTLs detected within this region are associated with body height, average daily gain, milk yield, and milk fat yield. According to Safari et al. [
68], there are moderate positive correlations between live weight at various ages and wool traits. They suggested that a greater need for both wool and meat products led sheep breeders to combine these two traits, as well as quality and disease resistance, into their breeding objectives. Other authors also endorsed the selection of Merino flocks for meat and carcass traits [
69,
70] and disease resistance [
71]. Therefore, we suggest that the need to improve a variety of traits led breeds with distinct selection purposes to present a higher homozygosity in certain common regions, described herein as well as in other studies, where these distinct traits would be improved.
No gene nor QTL were detected within the region shared by Australian Industry Merino, Australian Merino, Chinese Merino, and Tibetan populations in OAR11 (41,526–42,049 kb), which may indicate the need for better annotation of the sheep genome, or that this region contains distal regulatory elements, such as silencers or enhancers. Fewer common genomic regions were identified in HRR islands than in ROH islands. From those, two regions contained identified genes. Australian Merino, Australian Poll Merino, and Chinese Merino shared a region in OAR8 (89,939–90,351 kb), which contains
TCTE3, a gene previously described as a candidate influencing congenital diaphragmatic hernia [
72] and sperm motility and morphology [
73]. Three protein-coding genes (
ERMARD, PHF10, and
WDR27) detected within this region were previously reported in a study about structural brain abnormalities in humans, and only
ERMARD and
PHF10 were considered as plausible candidates [
74]. Furthermore, it was reported that heterozygous variants in
ERMARD (C6orf70) are associated with brain anomalies and syndromic dominant forms of periventricular nodular heterotopia in humans [
75,
76].
WDR27 was also detected as a candidate for insomnia [
77].
The other common region in HRR with detected genes was identified on OAR21 (400–926 kb) and was shared by Australian Industry Merino, Australian Suffolk, German Texel, Lacaune (meat), New Zealand Texel, and Scottish Texel.
CEP295 and
MED17, genes identified within this region, are responsible for building centrioles [
78,
79] and for the transcriptional activation of lipogenic genes in response to insulin [
80], respectively.
VSTM5, also identified within this region, codes a protein responsible for the regulation of neuronal morphogenesis and migration during cortical development in the brain [
81].
A common region in HRR was observed in OAR13 (34,513–34,530 kb) for Merino de Rambouillet and New Zealand Texel. Despite no annotated genes being detected within this region, two QTLs were identified nearby. A QTL for milk fat yield was detected within the region in HRR island exclusive of New Zealand Texel (34,254.2–34,530.07 kb), and a QTL for average daily gain was detected outside, but near the HRR island detected in the Merino de Rambouillet (34,513.4–34,887.99 kb). A QTL for milk fat yield was also identified near an HRR island detected in Australian Suffolk, Churra, and Lacaune (milk) in OAR26 (43,609–44,004 kb).