Lentivirus Susceptibility in Brazilian and US Sheep with TMEM154 Mutations

Small ruminant lentiviruses (SRLVs) affect sheep and goats worldwide. The major gene related to SRLV infections is the Transmembrane Protein Gene 154 (TMEM154). We estimated the haplotype frequencies of TMEM154 in the USA (USDA-ARS) and Brazil (Embrapa) Gene Banks by using two different SNP genotyping methodologies, FluidigmTM and KASPTM. We also genotyped the ZNF389_ss748775100 deletion variant in Brazilian flocks. A total of 1040 blood samples and 112 semen samples from 15 Brazilian breeds were genotyped with Fluidigm for the SNP ZNF389_ss748775100 and 12 TMEM154 SNPs. A total of 484 blood samples from the Santa Inês breed and 188 semen samples from 14 North American sheep breeds were genotyped with KASP for 6 TMEM154 SNPs. All the Brazilian samples had the “I/I” genotype for the ZNF389_ss748775100 mutation. There were 25 TMEM154 haplotypes distributed across the Brazilian breeds, and 4 haplotypes in the US breeds. Haplotypes associated with susceptibility were present in almost all breeds, which suggests that genetic testing can help to improve herd health and productivity by selecting non-susceptible animals as founders of the next generations. Fluidigm and KASP are reliable assays when compared with Beadchip arrays. Further studies are necessary to understand the unknown role of TMEM154 mutations, host–pathogen interaction and new genes associated with the clinical condition.


Introduction
Maedi-visna virus (MVV) and caprine arthritis encephalitis virus (CAEV) are small ruminant lentiviruses (SRLVs) that belong to the genus Lentivirus in the Retroviridae family. They are distributed worldwide, causing significant economic losses in many countries [1]. Infections in sheep include interstitial pneumonia with dyspnea, indurative mastitis and cachexia, persisting for the lifetime of the host and causing chronic inflammation [2]. Seroconversion in MVV-infected sheep occurs over weeks to months, and, such as other diseases caused by lentiviruses, no effective treatment or vaccine is available [3]. Infected animals show progressive weight loss, low milk yield and a reduced production rate that can lead to premature culling [4].
Iceland was successful in ridding local sheep breeds of maedi-visna (MV) or Ovine Progressive Pneumonia (OPP) after an effort that lasted more than three decades [5]. There are no reports of MVV in Australia or New Zealand, despite problems with CAEV in goats from both countries [6,7]. Within the UK, MV has reduced lamb production by up to 40% in commercial flocks, whereas in the United States, studies have shown that nearly a quarter of the sheep herd is infected with the disease [8,9]. In Brazil, the condition is widespread in

Animal Samples
The present study used 1040 blood samples from the sheep of 15 Brazilian breeds, kept in conservation centers of the Brazilian Agricultural Research Corporation (Embrapa) distributed across the country. The main criteria for sample selection were the biological material and breed availability in our National Gene Bank and the Embrapa Conservation Nucleus. Within breeds, the criterion was regional sampling. The breeds used in the study were: Santa Inês, Morada Nova, Crioula Lanada, Rabo Largo, Somali, Bergamasca, Corriedale, Ile de France, Pantaneiro, Dorper, Damara, Suffolk, Hampshire, Texel and Barriga Negra. Semen samples (N = 112) from these breeds kept at the Brazilian Animal Germplasm Biobank (BBGA) were also included in the study (Table 1). Additionally, we used 484 blood samples spotted on the FTA (Flinders Technology Associates) cards of the Santa Inês breed, belonging to a herd kept by a Conservation Nuclei at Embrapa Tabuleiros Costeiros (CPATC) and, finally, 188 animal samples of 14 North American breeds: Barbados Blackbelly, Black Welsh Mountain, Bluefaced Leicester, Hampshire, Hog Island, Katahdin, Leicester Longwool, Lincoln, Navajo Churro, Polypay, Rambouillet, Romanov, St. Croix and Suffolk kept at the National Center for Genetic Resources Preservation (NCGRP) ( Table 1). Semen samples were acquired from artificial insemination centers either by the American National Animal Germplasm Program or Brazilian Animal Germplasm Network as an effort to conserve genetic resources.
DNA extraction was performed using the Puregene purification protocol (Gentra Puregene ® Kit, QIAGEN, USA) for blood and semen samples. The DNA of the bloodspotted FTA TM cards was extracted using the GenSolve™ DNA Recovery Kit (GenTegra, USA) according to the manufacturer's instructions. A DNA quality check of the Brazilian sheep samples was performed in two ways. Initially, 1% agarose gel was used, stained with ethidium bromide, using lambda standards of 200 ng/µg, 100 ng/µg and 50 ng/µg for comparison. In addition, the samples were analyzed in a NanoDrop Thermo Scientific spectrophotometer (NanoDropTM 8000. Thermo Fisher Scientific, 2010. https://www. thermofisher.com/ accessed on 12 July 2015). The DNA obtained from the North American sheep was quantified by spectrophotometry only.

Genotyping Methodologies
We used two genotyping methodologies in the present study: KASP™ (LGC Genomics, Hoddesdon, UK) and Fluidigm EP1™ system (Fluidigm, San Francisco, CA, USA). After a quality and quantity check, 90 µg of DNA was used for genotyping the Brazilian breeds, except for the Santa Inês samples from the CPATC. Genotyping data were generated according to standard protocols provided by Fluidigm for use with an EP1 platform. Each sample underwent pre-amplification with a set of Locus-Specific Primer (LSP) and Specific Target Amplification (STA) oligonucleotides for the assayed SNPs. Diluted amplified product was loaded into twelve 96.96 Dynamic Array™ IFCs (Fluidigm, San Francisco, CA, USA) with a ROX reference dye, amplification mixture, assays containing the Allele-Specific Primers (ASP) and LSP oligonucleotides for each SNP, according to the manufacturer's instructions at Embrapa Genetic Resources and Biotechnology (Brasilia, Brazil). Briefly, Fluidigm ® SNP Type™ assays were designed for 12 selected SNPs with a Fluidigm D3™ assay web-based tool, according to the manufacturer's instructions and rules described at [https://d3.fluidigm.com/account/login accessed on 12 July 2015]. Endpoint fluorescence image data were acquired on the EP1 Fluidigm imager, and the genotype calls were obtained by the Fluidigm SNP Genotyping Analysis Software.
The North American breeds and the Santa Inês samples from the CPATC were genotyped by KASP, using 10 ηg of DNA per sample in a 96-well plate format. The reaction and the components used in the KASP methodology are described at [http: //www.lgcgenomics.com/genotyping/kasp-genotyping-reagents/ accessed on 12 July 2015]. Oligonucleotides were designed with the Kraken™ software system according to the manufacturer's instructions (LGC Genomics, Hoddesdon, UK).
Genotypes from the locus OAR17_5388531 within Illumina ® Ovine SNP50 and Illumina ® Ovine 600K were used to determine genotype reproducibility among methodologies. We were able to access 14 and 7 common animals between Fluidigm and Illumina ® Ovine SNP50 and Illumina ® Ovine 600K, respectively (Table S2). Furthermore, 7 and 14 samples were used to compare KASP and Illumina ® Ovine SNP50 and Illumina ® Ovine 600K, respectively (Table S3). Additionally, the locus TMEM154_E35K was accessed in 10 common animals between Fluidigm and Illumina ® Ovine 600K BeadChip (Table S4). Finally, to compare both methodologies used in this study, we used 3 SNPs (TMEM154_E35K, TMEM154_N70I and TMEM154_I102T) in 27 samples (Table S5). The Friedman test [33] was applied to determine whether the differences in the genotypes across the methodologies were statistically significant.

Data Analysis
Clustering was used to define the genotype classes for each SNP for both methods and was processed using GenomeStudio (Illumina Inc., San Diego, CA, USA). The confidence threshold for each genotype was >0.90 for both methodologies. Samples, in batches of 96, underwent quality control using SNP & Variation Suite v8.9.1-SVS (Golden Helix, Bozeman, MT, www.goldenhelix.com accessed on 15 November 2019) [34], eliminating samples with a call rate < 0.80 and markers with a call rate < 0.75. The linkage disequilibrium (LD) between the TMEM154_E35K and OAR17_5388531 SNPs using r 2 statistics [34] was also estimated using SVS v 8.9.1 (Golden Helix, Bozeman, MT, USA) [26]. Allele and haplotype frequencies were estimated by GenAlEx 6.5 [35] and Arlequin 3.5.2.2 [36]. Monomorphic SNPs were not included in the haplotype analysis. Chi-square test (p < 0.05) [37] was performed to determine whether the differences in allele frequencies across populations of the Santa Inês breed were statistically significant (Table S6).

Quality Control and Linkage Disequilibrium (LD)
All SNPs and samples passed the quality control filters for the KASP methodology. From the 6 SNPs genotyped, 3 were monomorphic in all breeds (TMEM154_I102T, TMEM154_A13V and TMEM154_T25I). After the quality control of the Fluidigm methodology, 845 samples with a call rate >0.80 remained. The SNPs OAR17_5388531, TMEM154_A13V-Fl2, TMEM154_D33N, TMEM154_E31Q_v2, TMEM154_E35K, TMEM154_I74F, TMEM154_N70I, TMEM154_T44M and TMEM154_E84Y successfully passed quality control. The SNPs (TMEM154_L14H and TMEM154_T25I) with a call rate <0.75 were excluded from the analysis. The SNP TMEM154_I102T was monomorphic, presenting the allele "T" in all breeds genotyped and was not included in the haplotype analysis. The locus ZNF389_ss748775100 was also monomorphic in all populations genotyped in this study, with all breeds homozygous for the "I/I" genotype. The r 2 measure between the TMEM154_E35K and OAR17_5388531 alleles across all populations were 0.96 and 0.93 on Fluidigm and KASP, respectively, indicating a strong LD between the allele "C" of the locus OAR17_5388531 and the allele "G" of the SNP TMEM154_E35K.

Allele Frequency of Locus OAR17_5388531 in Brazilian and North American Sheep
The frequency of the "C" allele of the SNP OAR17_5388531 ranged from 0.0% in the Hog Island breed to 100% in the Katahdin and Damara breeds ( Figure 1). Breeds with a higher frequency of the "C" allele have higher frequencies of TMEM154 E35 (highly susceptible haplotypes 2 and 3) and, consequently, are animals that are more prone to develop the disease. Alternatively, breeds with a low frequency of the C allele have a higher frequency of TMEM154 K35 (less-susceptible haplotype 1), and, therefore, are animals that are less susceptible to OPP.

Data Comparison among Genotyping Platforms
The locus OAR17_5388531 was genotyped on 14 common animals on both Fluidigm and Illumina ® Ovine SNP50. The reproducibility of the genotypes between methodologies was 100% (Table S2), with all 28 alleles identical. The same result was obtained when we compared seven animals genotyped with Ilumina ® Ovine 600K BeadChip and Fluidigm (Table S2). Another locus between Ilumina ® Ovine 600K BeadChip and Fluidigm was used to compare the repeatability of the genotypes between 10 common animals (TMEM154_E35K), and, once again, a total of 20 alleles were the same between the methodologies (Table S4).
Regarding KASP and Illumina ® Ovine SNP50, we used seven samples and the locus OAR17_5388531 as a reference. The results were the same as above, with a repeatability of 100% for the 14 alleles. For the same locus, using Ilumina ® Ovine 600K BeadChip and 14 samples (Table S3), 28 alleles were identical.
Finally, comparing the two genotyping methodologies used here, KASP and Fluidigm, 27 common animals and 3 loci were analyzed: TMEM154_E35K, TMEM154_N70I and TMEM154_I102T (Table S5). Following the pattern found in the previous analysis, the comparison between Fluidigm and KASP points to a repeatability of 100% on all 3 loci genotyped, as all 54 alleles were identical in the 27 common animals genotyped by both methodologies.

Differences in Allele Frequencies among Santa Inês Populations
For some loci, a difference in the allele frequency was observed according to the geographic region of origin of the Santa Inês flock. The TMEM154_N70I locus have, in general, the allele "A" as the most frequent allele. However, the Empresa Baiana de Desenvolvimento Agricola (EBDA-BA) and the Embrapa Meio Norte-PI (CPAMN) populations have "T" as their highest frequency allele with frequencies of 0.75 and 0.63, respectively (χ 2 , p > 0.05) (Table S6). Similar patterns occur for the locus TMEM154_I74F where the BBGA, CNPC and CPATC have the allele "A" as the most frequent. On the other hand, the UnB-DF and CPAMN presented the allele "T" as the most frequent (χ 2 , p < 0.05). Furthermore, the EBDA-BA has a frequency of 0.5 for each allele. For some loci, e.g., TMEM154_A13V-Fl2 and TMEM154_E31Q_v2, we observed rare alleles with low frequencies in single populations: the CNPC (0.02, allele "T") and CPAMN (0.04, allele "C"), respectively (Table S6). Ultimately, the locus TMEM154_D33N has frequencies around 0.15 for the allele "G" in most Santa Inês populations (Table S6). Still, for the UnB-DF and CNPC, the frequencies were 0.44 and 0.33, respectively (χ 2 , p < 0.05).

Discussion
The KASP method was 100% successful, as all designed SNPs were converted into primers and genotyped, and none of the SNPs or samples were eliminated by call rate in this methodology. For the Fluidigm methodology, we designed 13 assays and performed 12 different runs with 96.96 plates, in which each run has to be considered a separate event. Analysis of the entire dataset resulted in the elimination of 2 of the 13 SNPs (TMEM154_L14H and TMEM154_T25I) as they showed call rates <0.75 in 11 of the 12 plates genotyped. The quality filter for samples eliminated 307 samples with a call rate <0.80 in this methodology. This may be due to the high confidence threshold set up for each genotype, >0.90, increasing the accuracy of the results. It is important to mention that the SNP TMEM154_I102T was monomorphic in all breeds genotyped across both methodologies.
The data obtained from Fluidigm and KASP matched 100% with the results from the 50k and HD panels. Additionally, we did not find any differences in the genotypes between the two methodologies, indicating that both Fluidigm and KASP can be reliable alternatives for fast, customized and cost-effective genotyping. Furthermore, it is important to reinforce that the KASP methodology was 100% successful, with no samples or loci eliminated by call rate. However, samples and markers eliminated by call rate (<0.75) in the Fluidigm methodology can be explained by the higher confidence call, and each 96.96 plate genotyped represents a different PCR reaction, considered an independent genotyping event.
Haplotypes encoding (E) glutamate at position 35 of TMEM154 are considered highly susceptible to OPP. In contrast, haplotypes encoding (K) lysine at the same position are considered less susceptible [25]. The high LD we found between the loci OAR17_5388531 and TMEM154_E35K was expected and is in accordance with the results obtained by a previous study [26]. Mainly, we observed a higher frequency of the allele "C" in the breeds genotyped in this study. The highest frequency (100%) was found in the Damara, and the lowest (0.19%) in the Crioula Lanada. The Santa Inês breed is raised across Brazil. Nevertheless, we did not observe a difference in the allele frequencies for the locus OAR17_5388531 among the populations, with a high frequency of the "C" allele in all populations analyzed. A high value of r 2 indicates that the SNP OAR17_5388531 can estimate highly susceptible TMEM154 alleles in all the studied populations. OAR17_5388531 can be used as a surrogate to relate the "C" allele with susceptibility, easing the genotyping process for breeders that have already performed it using Illumina ® Ovine SNP50 BeadChip or Illumina ® Ovine 600K BeadChip. Breed populations with a higher occurrence of the "C" allele are predicted to have a higher frequency of K35 animals, which is associated with a lower susceptibility to the disease [26].
It is expected that distinct Santa Inês populations within the county present differences in allele frequencies, as demonstrated by previous authors with neutral markers [38]. The hypotheses can be raised that geographic distances affect the gene flow and level of admixture of the Santa Inês breed within certain populations [39,40]. For example, Santa Inês belonging to the UnB-DF herd seem to be crossbred with Bergamasca. This was proposed previously, where it was observed that Santa Inês animals from the center-west and southwest are genetically closer to Bergamasca than in other regions [38][39][40][41][42]. We did not find any significant allelic frequency differences among the BBGA, CNPC and CPATC Santa Inês populations. Two facts may explain this: (i) the BBGA is composed mainly of samples that come from the CNPC and CPATC flocks and (ii) both flocks are close geographically speaking, so there is an exchange of germplasm material between them. The same result could be expected for the Morada Nova breed. Despite the large number of animals of this breed, these are concentrated in the northeast region of the country [43], close to the CNPC conservation nuclei.
The selection of animals with favorable haplotypes is crucial to ensure the quality and the genetic variability of the germplasm stored in biobanks. Therefore, results such as those found here, where a single population of a breed has a rare allele or differences in allele frequencies, can aid in the long-term conservation of breed diversity and function as an important resource for breeders.
The TMEM154_I102T variant was observed for the first time in the Santa Inês breed, during resequencing by a previous study [26]. The authors identified the polymorphism in a single animal as a compound heterozygote for this locus. The animal was also heterozygous for the TMEM154_N70I variant. However, in our study, none of the populations presents heterozygous animals for the TMEM154_I102T locus. The TMEM154_N70I locus in the Santa Inês breed has a frequency of 0.6% for the allele "A" and 0.4% for the allele "T". For the Santa Inês from the CPATC, the frequency was 0.7% for the allele "A" and 0.3% for the allele "T", indicating heterozygous animals in the flocks. Overall, the allele "T" of the TMEM154_N70I locus has a total frequency of 5.9% in Brazilian sheep and 23.7% in North American sheep. Heaton et al. [26] called the variant TMEM154_I102T based on two reads of nine with high-quality scores, but, here, we could not reproduce the rare genotype found by the authors in any of the Brazilian breeds. We used this variant to test the reproducibility and check if this was an exclusive mutation of the Santa Inês breed.
Heaton et al. [26] defined haplotypes 1 (less susceptible), 2 (susceptible) and 3 (highly susceptible) in the Santa Inês breed. These authors also identified haplotypes 1 and 3 in the Crioula Lanada and Morada Nova breeds, as per the data obtained in the present study. Additionally, haplotype 6, considered rare, was previously observed in the Suffolk breed but not observed in the present study [26]. This might be due to the relatively low sample size used here. This reservation also applies to the following breeds: Bluefaced Leicester, Hog Island, Katahdin and Leicester Longwool, since their sample size is ≤2. Murphy et al. [44] confirmed that animals with diplotype "1", "1" of the K35 variant of TMEM154 had a reduced incidence of OPP infection, which leads to an improvement in productivity. Furthermore, the author suggests that the selection of sheep with the TMEM154 haplotype "1" in flocks with a high frequency of haplotype "3", can be a costeffective alternative to reduce the economic damage caused by OPP. However, further investigation is required to understand the effect of other TMEM154 mutations on the susceptibility to the disease and whether these effects can be extended to multiple breeds.
SRLVs present high genetic variability, contributing to the evolution of multiple viral strains worldwide [45]. MVV-like and CAEV-like strains of SRLVs were primarily considered strictly host-specific for sheep and goats, respectively. However, recent investigations indicated that cross-species transmission events are possible in sheep and goats from Brazilian mixed flocks [46,47]. Ramirez et al. [30] suggest that selection based on TMEM154 is suitable for specific SRLVs strains and ovine breeds. Nonetheless, the same authors propose that generalization to the whole genetic spectrum of Lentiviruses, ovine breeds and epidemiological situation across the globe may need further validation.
Molaee et al. [29] indicate that the association of the SNP TMEM154_E35K with a susceptibility to the disease must be undertaken carefully as, in the Merinoland breed, a high number of KK (less-susceptible genotype) sheep was positive for SRLV. Therefore, follow-up breed-specific studies can be useful to detect new TMEM154 variants associated with the development of MV, as well as to understand the unknown role of already identified markers. Furthermore, other genes such as Ovine-DRB1 can be associated with a susceptibility to MV, implying that studies aimed at understanding other genetic factors involved in the occurrence of SRLVs are crucial [48].
White et al. [18] established an association between the deletion variant ZNF389_ss748775100 and higher proviral concentrations in Rambouillet, Polypay and Columbia sheep from Idaho, US. Conversely, in our study, no breeds genotyped presented the deletion allele, as all populations were homozygous for the insertion "I/I". According to White et al. [18], animals with this genotype had less than half the adjusted mean proviral concentrations, which could possibly indicate an association with susceptibility to the virus.
Although ZNF389 has been previously associated with MV, further investigation is required to understand the functional importance of this genomic region. Zinc finger proteins (ZFPs) are characteristic DNA binding domains that can be found in a variety of transcription factors [49]. White et al. [18] implicate that ZNF389 plays a biological role related to ovine lentivirus proviral concentration in sheep. The same authors propose that one or more zinc finger genes located in this region can act as a transcriptional regulator of host genes, such as TRIM5a, limiting the proviral replication of the virus [18,50].
The results obtained in this study can be of paramount importance for the Brazilian and American gene banks. For instance, investigation of the ZNF389_ss748775100 deletion variant genotypes, the TMEM154 haplotypes and their frequencies in the breeds can help to select animals with favorable alleles and haplotypes. Nevertheless, gene banks should store all genotypes to facilitate future work on susceptibility and resistance. Consequently, these methodologies can be reliable resources for breeders if MV becomes a threat, thus aiding in the conservation of breed diversity.

Conclusions
Overall, haplotypes associated with a lower susceptibility risk to OPP are common in both Brazilian and North American sheep. This suggests an opportunity to reduce Lentivirus susceptibility in multiple sheep breeds using genomic selection. There is a significant difference in the allele frequencies of the TMEM154 mutations among different Santa Inês populations, resulting from the genetic subdivision previously observed in the breed. Contrary to the insertion allele, the ZNF389_ss748775100 deletion variant is associated with higher proviral concentrations in sheep. However, we did not detect the deletion variant in any of the breeds here.
Genetic variability is crucial to ensure the quality of the material stored in conservation centers and gene banks. The results found here of the TMEM154 haplotypes and their frequency suggest a range of genetic variability has been captured by the gene banks. A few breeds showed a lack of variability, which was likely due to a small sample size. Finally, the comparison of the results obtained from the Fluidigm and KASP assays with more robust technologies, such as the 50k and the HD panels, indicate that these methodologies are reliable. Therefore, they can be useful, cost-effective tools to improve genomic selection programs. Further validation is still necessary to understand the unknown role of some of the TMEM154 mutations and the role of other genetic factors associated with the disease.