Assessment of Genetic Diversity, Runs of Homozygosity, and Signatures of Selection in Tropical Milking Criollo Cattle Using Pedigree and Genomic Data

The objective of this study was to evaluate the genetic diversity of the Tropical Milking Criollo cattle (TMC) breed in Mexico through parameters derived from pedigree and genomic information assessment. The pedigree file consisted of 3780 animals. Seventy-nine bovines were genotyped with the medium-density single nucleotide polymorphism chip and considered a reference population for pedigree analysis. The effective population size and the probability of gene origin used to assess the evolution of genetic diversity were calculated with pedigree information. Inbreeding coefficients were evaluated based on pedigree (FPed), the genomic relationship matrix (FGRM), and runs of homozygosity (FROH) of different length classes. The average inbreeding was 2.82 ± 2.66%, −0.7 ± 3.8%, and 10.9 ± 3.0% for FPED, FGRM, and FROH, respectively. Correlation between FPED and FROH was significant only for runs of homozygosity > 4 Mb, indicating the FPED of a population with an average equivalent complete generation of five only recovers the most recent inbreeding. The parameters of the probability of gene origin indicated the existence of genetic bottlenecks and the loss of genetic diversity in the history of the TMC cattle population; however, pedigree and genomic information revealed the existence of current sufficient genetic diversity to design a sustainable breeding program.


Introduction
Genetic diversity refers to the heritable variation observed between and within populations, and it can indicate the degree of differentiation between or within any species, breed, or livestock population. Whether natural or artificial, population evolution relies primarily on the existing genetic diversity. A long history of evolutionary processes such as migration, mutation, selection, genetic drift, and adaptation, together with domestication, has created an enormous variety of breeds [1]. Locally adapted breeds have passed through these changes in harsh environments and probably developed specific adaptive features.
Cattle were introduced to America five centuries ago; these original introductions provided the genetic background of the American Criollo cattle, with influences from Spanish, Portuguese, and African breeds [2]. Tropical Milking Criollo (TMC) is a tropically adapted breed developed due to the geographic isolation of the Criollo cattle and farmers' selection of milk production traits [3]. In Mexico, a nucleus herd of TMC cattle was formed in the mid-20th century. The base of this nucleus herd were cows imported from Nicaragua, creole cows from Mexico, and some bulls from the Centro Agronómico Tropical de Investigación y Enseñanza (CATIE), Turrialba, Costa Rica [4].
The knowledge of an animal population's genetic diversity can help decide the breeding management for genetic improvement or conservation programs. Furthermore, the maintenance of genetic diversity in small populations is vital because heterozygosity and allelic diversity can be lost at an accelerated rate when these populations are closed and under artificial selection [5]. Additionally, the assessment of genetic diversity is a crucial step in understanding the evolutionary history of a breed since it provides essential information for the conservation and management of its biodiversity [6].
Population structure and genetic diversity can be assessed using pedigree information. Pedigree analysis is an important tool to describe genetic variability and its evolution across generations; however, its results heavily depend on the integrity of the pedigree [7]. SNP genotyping is one of the best ways to evaluate genetic diversity because of its availability, accuracy, and cost-effectiveness in livestock species [8].
The genetic variability of TMC cattle in Mexico has only been assessed by pedigree analysis [9]. Therefore, the objective of this study was to evaluate the genetic diversity of the TMC cattle breed population in Mexico through parameters derived from pedigree analysis and the assessment of the genome-wide single nucleotide polymorphism (SNP) genotyping array.

Materials and Methods
Genealogical and genotyping data were provided by the Asociación Mexicana de Criadores de Ganado Romosinuano y Lechero Tropical (AMCROLET). The pedigree file consisted of 3780 records (546 males and 3234 females) of animals born between 1956 and 2021. Seventynine bovines (5 males and 74 females) born between 2003 and 2018 were genotyped with the medium-density Affymetrix chip (63K SNP markers). These animals were selected for this study because they belong to a nucleus herd of the breed in Mexico. Quality control included a call rate ≥ 0.90 for SNP and animals. SNP and animals that did not satisfy this quality criterion were excluded, as well as those mapped on sexual chromosomes or not mapped on the Bos Taurus Autosome (BTA) 3.1.1 release. SNP were not filtered out for low minor allele frequency since this is a single-breed study, and we wanted to detect homozygosity correctly [10].

Pedigree Analyses
The pedigree analyses were performed using the ENDOG 4.8 program [11]. All analyses were performed for the total individuals in the pedigree and reference population, which consisted of the genotyped animals.
The inbreeding coefficient for each animal was estimated using the algorithm of Meuwissen and Luo [12]. The inbreeding coefficient determines the probability that two alleles at any locus are identical by descent. The average inbreeding coefficient among all animals (F PED ), percentage of inbred animals, and average inbreeding coefficient for inbred animals were calculated. The average relatedness coefficient (AR) was also estimated. This coefficient is computed as the probability that an allele randomly selected from the entire population (included in the pedigree) belonged to a particular animal [11].
The average equivalent complete generations, average maximum generations, and pedigree completeness index (PCI) by generation were used as indicators of the pedigree's integrity. The equivalent complete generations of an individual were calculated as follows [13]: where n is the number of generations separating the individual i from his known ancestor j. The PCI is the mean proportion of ancestors known in each ancestral generation, and it was computed as the proportion of known ancestors in each ascending generation.
The interval generations (parent's average age when their progeny, kept for future reproduction, were born) were calculated for the four following genetic pathways: sire of sire (L SS ), sire of dam (L SD ), dam of sire (L DS ), and dam of dam (L DD ). The average generation interval was computed as follows: The evolution of the genetic diversity in the population was assessed based on the probability of gene origin. The effective numbers of founders and ancestors were computed. The effective number of founders is defined as the number of founders contributing equally to produce the same genetic diversity in the evaluated population, and it was calculated according to Lacy [14]: where q i is the expected proportional genetic contribution of the founder i computed as the average relationship of that founder to each animal in the population; and n is the number of founders. The effective number of ancestors is the minimum number of ancestors (founders or not) necessary to explain the genetic diversity of the population, and it was calculated according to Boichard et al. [15]: where p i is the marginal contribution of the ancestor j computed as the genetic contribution not explained by other ancestors, and m is the number of ancestors. The effective population size was estimated based on the individual increase of the inbreeding coefficient (∆F i ), calculated according to Gutierrez et al. [16]: where Geq is the number of equivalent generations and F is the inbreeding coefficient for the individual i. The effective population size was estimated as:

Genomic Analyses
Individual inbreeding coefficients were calculated from the genomic relationship matrix (F GRM ) and runs of homozygosity (F ROH ). The F GRM were obtained by subtracting one from the diagonal elements of the genomic relationship matrix (G), built according to Yang et al. [17] using CTA v1.93.2 (Genome-wide Complex Trait Analysis) software. Runs of homozygosity (ROH) are long stretches of homozygous genotypes in an individual's genome [18]. ROH were calculated separately for every animal by applying a sliding window of 20 SNP computed using the R package "DetectRuns" [19]. The minimum density considered and the maximum gap between consecutive SNP were 1 SNP every 1000 kb and 1 Mb, respectively. The F ROH was computed following the method proposed by McQuillan et al. [20]: where L ROH is the total length of all the ROH detected in the animal's autosomes, L AUT is the entire length of the autosomal genome. The F ROH by chromosome (F ROH CHR ) also were calculated as the total length of all the ROH of the chromosome divided by the total length of the chromosome. Additionally, the ROH in the population were categorized into five length classes (1-2 Mb, 2-4 Mb, 4-8 Mb, 8-16 Mb, and >16 Mb) to analyze the distribution of the F ROH across these. Selection signatures studies aim to identify genomic regions with a systematically reduced variation compared to the average across the genome. This idea has been implemented with ROH, searching for continuous parts of the genome without heterozygosity in the diploid state, and used on a genome-wide scale to detect signals of past selection [21]. The percentage of SNP existing within an ROH was estimated by counting the number of times each SNP appeared in an ROH and dividing that number by the number of animals [22]. This percentage had to be higher than 50% to signify a potential hotspot of ROH in the genome [23]. Genomic regions detected were interrogated for genes annotated to the Bos taurus genome assembly ASR-UCD1.2 using Genome Data Viewer (NCBI; https://www.ncbi.nlm.nih.gov/assembly/GCF_002263795.1/ accessed on 17 October 2022).

Pedigree Analyses
The average F PED was 1.73% in the total population and 2.82% in the reference population (i.e., genotyped animals only). Inbred animals were 37.3% of the total population and 86.84% of the reference population. The average F PED for the inbred animals was 4.62% in the total population and 3.25% in the reference population. The F PED estimations in the total and reference populations suggest a low inbreeding level in the TMC cattle population. The mean AR estimated in the total and reference population were 2.75% and 4.67%, respectively.
The realized effective population size based on the individual increase of F PED was 94.24, and the individual increase of F PED was 0.53%. The evolution of F and AR across complete generations traced and by the maximum number of generations traced are illustrated in Figure 1.
where is the total length of all the ROH detected in the animal's autosomes, is the entire length of the autosomal genome. The by chromosome ( ) also were calculated as the total length of all the ROH of the chromosome divided by the total length of the chromosome. Additionally, the ROH in the population were categorized into five length classes (1-2 Mb, 2-4 Mb, 4-8 Mb, 8-16 Mb, and >16 Mb) to analyze the distribution of the across these. Selection signatures studies aim to identify genomic regions with a systematically reduced variation compared to the average across the genome. This idea has been implemented with ROH, searching for continuous parts of the genome without heterozygosity in the diploid state, and used on a genome-wide scale to detect signals of past selection [21]. The percentage of SNP existing within an ROH was estimated by counting the number of times each SNP appeared in an ROH and dividing that number by the number of animals [22]. This percentage had to be higher than 50% to signify a potential hotspot of ROH in the genome [23]. Genomic regions detected were interrogated for genes annotated to the Bos taurus genome assembly ASR-UCD1.2 using Genome Data Viewer (NCBI; https://www.ncbi.nlm.nih.gov/assembly/GCF_002263795.1/ accessed on 17 October 2022).

Pedigree Analyses
The average FPED was 1.73% in the total population and 2.82% in the reference population (i.e., genotyped animals only). Inbred animals were 37.3% of the total population and 86.84% of the reference population. The average FPED for the inbred animals was 4.62% in the total population and 3.25% in the reference population. The FPED estimations in the total and reference populations suggest a low inbreeding level in the TMC cattle population. The mean AR estimated in the total and reference population were 2.75% and 4.67%, respectively.
The realized effective population size based on the individual increase of FPED was 94.24, and the individual increase of FPED was 0.53%. The evolution of F and AR across complete generations traced and by the maximum number of generations traced are illustrated in Figure 1. The average equivalent complete generations were 3.0 in the total population and 5.01 in the reference population. The average maximum generations were 7.3 in the total population and 10.4 in the reference population. The PCI one generation ago was 74.1% and 96.7% in the total and reference populations, respectively.
The PCI is an essential indicator of the F PED quality because it represents the harmonic mean of the parental genetic contributions. It is zero if one of the parents is unknown regardless of how deep and complete the pedigree of the other parent is [7].
Five generations back, the PCI declined to 30.5% in the total population and 54.3% in the reference population ( Figure 2). mean of the parental genetic contributions. It is zero if one of the parents is unknown regardless of how deep and complete the pedigree of the other parent is [7].
Five generations back, the PCI declined to 30.5% in the total population and 54.3% in the reference population ( Figure 2). The average generation interval for the total population was shorter than seven years ( Table 1). The largest generation interval was for the path sire-sire, which means breeders take longer to select sires than dams. The Fe was 78 and 45, whereas the Fa was 48 and 24 in the total and reference populations, respectively. A total of 19 and 9 ancestors explained 50% of the genetic diversity in the total and reference populations, respectively. The Fa/Fe ratio was 0.61 in the total population, while it was 0.53 in the reference population, indicating the existence of genetic bottlenecks and loss of genetic diversity in the TMC cattle population. The average generation interval for the total population was shorter than seven years ( Table 1). The largest generation interval was for the path sire-sire, which means breeders take longer to select sires than dams. The Fe was 78 and 45, whereas the Fa was 48 and 24 in the total and reference populations, respectively. A total of 19 and 9 ancestors explained 50% of the genetic diversity in the total and reference populations, respectively. The Fa/Fe ratio was 0.61 in the total population, while it was 0.53 in the reference population, indicating the existence of genetic bottlenecks and loss of genetic diversity in the TMC cattle population.

Discussion
The differences between F ROH and F PED estimates suggest underestimation of F PED due to the shallow and missing pedigrees. The reference population, a sample of the actual TMC herds in Mexico, had a greater F PED and AR than the total population, meaning a loss of genetic diversity. When the AR is greater than F PED , the frequency of mating between related animals is greater than between unrelated individuals [7]. In the last generation, considering the whole population, F PED was greater than AR, suggesting the frequency of mating between unrelated individuals was greater than the frequency of mating between related animals, as an effort to minimize the increase of the inbreeding level in this population.
Franklin [24] proposed the classic 50/500 recommendation for the minimum effective population size required to preserve adequate levels of genetic diversity in a population in the short/long term. The value Ne = 50 was derived for animal breeding programs, suggesting a maximum tolerable increase of inbreeding of 1% per generation to maintain genetic diversity at an acceptable level [25]. In our study, the estimated Ne (94.24) based on F PED is above the minimum tolerable in the short term; however, it is lower than the minimum Ne suggested for survival in the long term; thus, strategies to increase genetic diversity in the TMC cattle population are pertinent.
Hidalgo et al. [7] investigated the genetic diversity using pedigree and genomic analysis in Mexican Romosinuano cattle. They reported similar parameters of the probability of gene origin (Fe = 71; Fa = 31; Fa/Fe = 0.44) with a pedigree file of 4875 animals. Larger estimates related with the probability of gene origin (Fe = 113-541; Fa = 33-254; Fa/Fe = 29.2-61.5) were reported for six cosmopolitan beef cattle raised in Mexico [26].
The decrease in the effective population size and the loss in genetic diversity within a breed are usually due to increased mating between relatives because of a limited number of breeding animals. This is generally accompanied by an increase in the average inbreeding level of offspring and, the whole population. This scenario is common in populations under strong artificial selection, where the number of breeding animals is limited by human choices, or in small close populations, where natural constraints limit the number of breeding animals.
F GRM is based on the variance of additive genetic values and provides a measure relative to frequencies of the reference alleles in the base population. The F GRM estimated is sensitive to allele frequencies. In fact, F GRM by Yang's method ranges from −1 to ∞ [27]. This means that F GRM can indicate some variability has been gained, but this gain can never be greater than 100% of the initial variability. It also means that F GRM can indicate a loss of variability higher than 100%. Zhang et al. [28] reported a mean F GRM of three dairy cattle breeds: in that study, Jersey cattle also had a negative value of F GRM .
On the contrary, the F ROH does not depend on the allele frequencies. Therefore, estimators based on ROH from genotypes reflect homozygosity at the genome level and have the advantage of not being affected by allele frequency estimates or the pedigree's incompleteness [28]. The length of an ROH can also be a valuable indicator of the time of the inbreeding event with which it is associated. Long ROH are related to recent events of inbreeding in the history of a breed or of a single individual, whereas short ROH indicate a more ancient event [29]. Most of the homozygosity in the TMC cattle population was found in ROH < 2 Mb length. This indicates that most of the inbreeding found in the TCM cattle population is old and might not have significant adverse effects (inbreeding depression) as deleterious alleles could be purged. However, some of the short-length ROH are likely to be false positives when detected using a medium-density SNP BeadChip, which leads to an overestimation of short-length ROH (<5 Mb) because of the high distance between adjacent SNP [30,31].
The longer ROH segments are, the more likely that recent inbreeding occurred within a population [29]. A pedigree with an average equivalent complete generation of five in the genotyped animals could be a limitation for detecting the presence of more ancient relatedness. The genes found within the ROH islands involve many molecular functions, biological processes, and cellular components. The ROH island in BTA22 was reported to be associated with age at first calving [32]. The DCAF1 gene has been related to the hydroxymethylation of genomic DNA to maintain oocyte survival. The HEMK1 plays a role in the development of the immune system, depending on environmental stressors [33]. Genes in the ROH island on BTA21 (SNUPN, PTPN9, and CSPG4) have been associated with the marbling score in Korean cattle [34]. The PIK3CD and SPSB1 genes (both mapped on BTA16) have been related to immune response and regulation, respectively [35]. The PEX14 gene on BTA16 is a candidate gene related to dairy production, which is under selection in dairy Holstein cattle [36]. The ROH island on BTA11 harbors important genes previously reported to be involved in reproductive traits, such as the RBKS gene associated with oocyte maturation [37], SLC30A3 [38], and CAD [39] described as estrogen receptors, and EMILIN1 associated with placentation and trophoblast invasion in the uterine wall [40].

Conclusions
The inbreeding coefficient estimates based on homozygosity runs in the Mexican Tropical Milking Criollo cattle population were larger than the estimates based on the pedigree and genomic relationship matrix. Most ROH had a length of less than 4 Mb, indicating ancient inbreeding. The correlation between F PED and F ROH was significant when the ROH were greater than 4 Mb, meaning that only long ROH, associated with recent inbreeding events in the breed's history, are captured by the available pedigree records. The completeness of the pedigree was not sufficient to show more ancient inbreeding. Selection signatures were detected through highly homozygous regions related to reproductive and dairy production traits. Although there is evidence of past genetic bottlenecks and loss of genetic diversity, the Tropical Milking Criollo cattle population exhibits a reasonable amount of genetic diversity to design a sustainable breeding program.

Data Availability Statement:
The dataset is not publicly available since it belongs to AMCROLET and is treated as confidential information.