Tracing Viral Transmission and Evolution of Bovine Leukemia Virus through Long Read Oxford Nanopore Sequencing of the Proviral Genome

Bovine leukemia virus (BLV) causes Enzootic Bovine Leukosis (EBL), a persistent life-long disease resulting in immune dysfunction and shortened lifespan in infected cattle, severely impacting the profitability of the US dairy industry. Our group has found that 94% of dairy farms in the United States are infected with BLV with an average in-herd prevalence of 46%. This is partly due to the lack of clinical presentation during the early stages of primary infection and the elusive nature of BLV transmission. This study sought to validate a near-complete genomic sequencing approach for reliability and accuracy before determining its efficacy in characterizing the sequence identity of BLV proviral genomes collected from a pilot study made up of 14 animals from one commercial dairy herd. These BLV-infected animals were comprised of seven adult dam/daughter pairs that tested positive by ELISA and qPCR. The results demonstrate sequence identity or divergence of the BLV genome from the same samples tested in two independent laboratories, suggesting both vertical and horizontal transmission in this dairy herd. This study supports the use of Oxford Nanopore sequencing for the identification of viral SNPs that can be used for retrospective genetic contact tracing of BLV transmission.


Introduction
Enzootic bovine leukosis (EBL) is a disease caused by the retrovirus bovine leukemia virus (BVL) and is a major economic detriment to the U.S. dairy industry. Most infected cattle remain asymptomatic but approximately 30% develop persistent lymphocytosis and elevated number of B-cells, while 5% of them develop B-cell leukemia/lymphosarcoma [1]. BLV pathogenesis compromises the host's immune system, leaving the cows susceptible to several other opportunistic pathogens, all of which might serve as a potential disease reservoir within the herd and ultimately shorten the lifespan and productivity of the cows.
Many European countries have successfully eradicated BLV using bulk tank surveillance. However, our group has found that 46% of dairy cows in 94% of dairy farms in the United States are infected with BLV [2], which is consistent with the reports from USDA APHIS (https://www.aphis.usda.gov/animal_health/nahms/dairy/downloads/dairy0 7/Dairy07_is_BLV.pdf accessed on 6 January 2021). In addition to affecting the longevity and overall health of the cows, BLV has also been found to affect milk production, which accounts to a loss of approximately 380 USD per cow per year [3]. Detection of BLV DNA in human blood or tumor samples has been linked to potential human health concerns, including breast cancer [4,5].
BLV primarily infects B-cell lymphocytes, and following proviral integration, is propagated through mitosis of immature B-cell progenitors. Transmission of BLV is thought to occur through the introduction of an infected lymphocyte to the bloodstream of the uninfected animal through milk/colostrum and blood-borne routes, including biting flies and iatrogenic means such as shared needles and rectal palpation sleeves [6]. However, the use of best management practices has proven to reduce within-herd prevalence of BLV significantly [7]. Vertical transmission, in utero, during parturition, or postnatally via colostrum administration, has also been found to contribute to 4-18% of new BLV infections [8,9]. Feeding BLV-infected unpasteurized colostrum [10] and contact with bodily fluids in the birth canal during the delivery [11] have all been found as potential routes of BLV transmission in neonatal calves. Lastly, viral transmission through the placenta could result in a positive calf at birth [11].
The proviral load of infected animals may be directly correlated with the animal's ability to infect others, as with other retroviral diseases such as HIV [12]. In fact, a multiherd BLV control field trial [13] showed a reduction in the within-herd incidence of BLV infection associated with the removal of high proviral load cattle from the herd. Antibody detection via Agar Gel Immunodiffusion (AGID) and Enzyme-linked Immunosorbent Assay (ELISA) are currently accepted methods used to diagnose cows but are poorly correlated to an animal's proviral load.
Detection of BLV proviral DNA can be accomplished through endpoint polymerase chain reaction (PCR) with varying levels of specificity and sensitivity, depending on the design of primer sequences. Nested end-point PCRs are commonly used as a highly sensitive method of to detect BLV proviral DNA, despite their poor specificity [12,[14][15][16]. Quantitative PCR (qPCR) allows for the relative and simultaneous quantification of provirus and host DNA with a high degree of specificity and sensitivity when used with TaqMan ® fluorophores. Our group developed the SS1 BLV proviral load qPCR multiplex assay (Cen-tralStar Cooperative, Lansing, MI, USA) consisting of three TaqMan™ primer/probe assays that detect the BLV pol gene, bovine β-Actin, and an internal amplification control [17][18][19]. Through qPCR calibration using digital droplet PCR (ddPCR) normalized DNA standards, the SS1 assay has 100% specificity and sensitivity down to a limit of detection of 10 BLV copies. We therefore can precisely compare the relative infectiousness between animals over time, using the SS1 BLV proviral load qPCR multiplex assay.
To date, genetic diversity of the BLV proviral DNA has been characterized mostly based on sequencing the envelope gene (env) either by Sanger or short-read next-generation sequencing (NGS, Illumina) methods. Ten different genotypes have been identified so far based on the variations within the env gene and whole genome analysis via tiled PCR based NGS sequencing [20,21]. Previous studies agree that BLV genotypes associate with the global geographical origin [21,22]. However, it is precarious to draw on international BLV genetic relationships based on short and conserved genomic regions, like the env gene. Sanger sequencing and short-read NGS have established important geographical and molecular relationships between different viral variants, but the utility of these approaches for elucidating the genetic structure and high-resolution genetic relationships of BLV variants within confined populations is limited.
Single-molecule long-read sequencing methods, such as Oxford Nanopore Technologies (ONT), have enabled the sequencing of large genetic elements spanning across repetitive regions and eliminate common sequencing issues encountered with other technologies, such as high G/C content and palindromic regions. The real-time nature of ONT sig-nificantly reduces the long turnaround times involved in Illumina sequencing and is considerably less expensive [23]. This methodology would also enable accurate and timely identification of novel genotypes, which can be critical to effectively identify emerging pathogens. This pilot study aimed to develop a molecular sequencing tool that can be used to accurately identify new BLV variants and discern routes of transmission while simultaneously assessing the origin, evolution, and genetic diversity of BLV within herds.

ONT Sequencing of Long-Range PCR Amplicons from TOPO ® Vectors Containing Bacterial Inserts
Prior to drawing conclusions on genetic variability among BLV sequences from related animals, it was important to determine the feasibility and accuracy of the experimental workflow and bioinformatics pipeline. PCR amplicons derived from mastitis-causing microorganisms, ligated into TOPO ® plasmids, and verified by Sanger sequencing, were used to evaluate the fidelity of the developed workflow. Approximately 4 kb PCR products were generated using primers with 5 overhangs containing the BLV_CS forward and reverse primer sequences (Table 1, Figure S1). Secondary TOPO ® PCR products produced from BLV_CS primers were sequenced on the ONT GridION. The resulting sequences were compared to their corresponding Sanger sequence-verified vectors to confirm sequencing accuracy. Both sets of vectors with their associated inserts, either a 16s rDNA sequence from Corynebacterium bovis or Thioredoxin (TrxA) from Mycoplasma bovis, were found to be 100% identical ( Figure 1) with an average 2900 × coverage ( Figure S1C).

ONT Sequencing of BLV Whole Genome Sequences
Blood samples were collected on a single date from sixteen BLV ELISA-positive adult dam and daughter Holstein pairs from a commercial dairy farm. BLV proviral load was determined by the SS1 BLV proviral qPCR multiplex assay (CentralStar Cooperative, Lansing, MI, USA; Table 2) [14]. Following organic phase separation and a modified high molecular weight (HMW) DNA extraction [24], 15 DNA samples were split into two aliquots for PCR amplification of the BLV proviral genome. Approximately 7.5 kb PCR products were amplified using two different sets of primer pairs developed by two independent laboratories ( Figure 2). The negative BLV SS1 sample (15.1) was consequently removed from the study. Electrophoretograms of the amplicons generated from the two laboratories were compared to evaluate the PCR efficacy of two different primer sets ( Figure 3). The number of trimmed reads per corresponding sequencing adaptor and the coverage of each amplicon were evaluated to determine amplicon quality and infer the accuracy of subsequent analyses. The differences seen with a cross-laboratory comparison of bands could be due to read depth variability caused by amplicon quality which can confound

ONT Sequencing of BLV Whole Genome Sequences
Blood samples were collected on a single date from sixteen BLV ELISA-positive adult dam and daughter Holstein pairs from a commercial dairy farm. BLV proviral load was determined by the SS1 BLV proviral qPCR multiplex assay (CentralStar Cooperative, Lansing, MI, USA; Table 2) [14]. Following organic phase separation and a modified high molecular weight (HMW) DNA extraction [24], 15 DNA samples were split into two aliquots for PCR amplification of the BLV proviral genome. Approximately 7.5 kb PCR products were amplified using two different sets of primer pairs developed by two independent laboratories ( Figure 2). The negative BLV SS1 sample (15.1) was consequently removed from the study. Electrophoretograms of the amplicons generated from the two laboratories were compared to evaluate the PCR efficacy of two different primer sets ( Figure 3). The number of trimmed reads per corresponding sequencing adaptor and the coverage of each amplicon were evaluated to determine amplicon quality and infer the accuracy of subsequent analyses. The differences seen with a cross-laboratory comparison of bands could be due to read depth variability caused by amplicon quality which can confound phylogenetic interpretation. The minimal number of reads generated from the NTC (Non-Template Control) are likely due to barcode bleed-through resulting from the use of single barcodes as opposed to dual barcodes [25].

ONT Sequencing of BLV Whole Genome Sequences
Blood samples were collected on a single date from sixteen BLV ELISA-positive adult dam and daughter Holstein pairs from a commercial dairy farm. BLV proviral load was determined by the SS1 BLV proviral qPCR multiplex assay (CentralStar Cooperative, Lansing, MI, USA; Table 2) [14]. Following organic phase separation and a modified high molecular weight (HMW) DNA extraction [24], 15 DNA samples were split into two aliquots for PCR amplification of the BLV proviral genome. Approximately 7.5 kb PCR products were amplified using two different sets of primer pairs developed by two independent laboratories ( Figure 2). The negative BLV SS1 sample (15.1) was consequently removed from the study. Electrophoretograms of the amplicons generated from the two laboratories were compared to evaluate the PCR efficacy of two different primer sets ( Figure 3). The number of trimmed reads per corresponding sequencing adaptor and the coverage of each amplicon were evaluated to determine amplicon quality and infer the accuracy of subsequent analyses. The differences seen with a cross-laboratory comparison of bands could be due to read depth variability caused by amplicon quality which can confound phylogenetic interpretation. The minimal number of reads generated from the NTC (Non-Template Control) are likely due to barcode bleed-through resulting from the use of single barcodes as opposed to dual barcodes [25].

Phylogenetic Analysis of BLV Whole Genome Amplicons
Phylogenetic analysis of the ONT consensus sequences confirmed the fidelity of the independent PCR approaches and served to assess the genetic diversity of the BLV found within a herd. Ten out of the 14 pairs, dams 1-7 and daughters 5-8 of technical replicates from the two laboratories matched on taxa location on the tree, solidifying the accuracy of both PCR techniques along with the sequencing of generated amplicons ( Figure 4). The outlying samples, daughters 1, 2 and 3_UMN, were concluded as resulting from poor coverage of those amplicons ( Figure S3).   When analyzing samples based on familial relationships, the distance between proviral sequences from dams and daughters varies. Pair five, for example, shows sequence identity matching based on clade location. This phylogenetic analysis also displayed viral divergence of BLV strains in one generation. In dam/daughter pair two, the strain found in the daughter branches off of the node shared by the dam, representing viral evolution. The fidelity of the sequencing was confirmed via the bacterial TOPO ® plasmids, confirming viral evolution rather than sequencing error. These results indicate that this tool is capable of discerning the route of infection while capturing viral diversity and evolution with herds over time.
Historically, the majority of BLV phylogenetic analyses is based solely on the env gene sequences alone (18)(19)(20) [20][21][22]. Analysis of the env gene sequences extracted from the ONT reads within this study ( Figure 5) identified only six different nucleotide substitutions which were synonymous at the amino acid level. ( Figure S2). Env polyprotein was completely conserved between the animals in the herd studied, which would significantly limit the understanding of genetic variability between the animals in this herd if only this BLV gene was sequenced.

Amino Acid Substitution Among BLV Positive Animals
To further compare the dam and daughter pair relationship, we performed a comparative amino acid analysis of six well defined proteins that might play a role in disease dynamics (Table 3. Only the sequences generated from the BLV_CS primers were used due to the variability in read depth and coverage seen in the BLV_UMN amplicons. Amino acid profiles from five dams, 71%, (1, 3, 4, 6 and 7) diverged from their daughters. On the other hand, only 2/7 (29%) of the dams and daughters shared identical amino acid profiles. The highest number of amino acid changes was observed between dam/ daughter pair 1 (4 AA substitutions), followed by the pair 7 (4 AA substitutions). At the individual protein level, no functional SNPs could be observed in the R3 protein. Gag Pro Pol protein was the most polymorphic with the highest number of functional SNPs (4 AA substitutions) followed by the env polyprotein (3 AA substitutions). Despite the variation observed with the env polyprotein, it is very interesting to note that the env protein was

Amino Acid Substitution among BLV Positive Animals
To further compare the dam and daughter pair relationship, we performed a comparative amino acid analysis of six well defined proteins that might play a role in disease dynamics (Table 3. Only the sequences generated from the BLV_CS primers were used due to the variability in read depth and coverage seen in the BLV_UMN amplicons. Amino acid profiles from five dams, 71%, (1, 3, 4, 6 and 7) diverged from their daughters. On the other hand, only 2/7 (29%) of the dams and daughters shared identical amino acid profiles. The highest number of amino acid changes was observed between dam/ daughter pair 1 (4 AA substitutions), followed by the pair 7 (4 AA substitutions). At the individual protein level, no functional SNPs could be observed in the R3 protein. Gag Pro Pol protein was the most polymorphic with the highest number of functional SNPs (4 AA substitutions) followed by the env polyprotein (3 AA substitutions). Despite the variation observed with the env polyprotein, it is very interesting to note that the env protein was absolutely conserved among the animals investigated from this herd. These results clearly show a significant level of functional variations/modulation between the proteins and the dam daughter pairs investigated, even within a small set of animals investigated from a single herd.  (1-515)   31  I  I  I  I  I  I  I  I  I  I  I  T  I  I  I   475  H  H  H  P  P  H  H  H  P  P  P  H  H  H  H

Discussion
This study successfully utilized a bioinformatics pipeline, which included Longshot, to call single nucleotide variants with high accuracy using whole-genome ONT data and assemble near complete BLV proviral consensus genomes from field-collected samples. The use of vectors with inserts verified by the Sanger sequence was able to support the integrity of the sequencing reads developed by the ONT workflow. The BLV genome from the DNA of whole blood samples were independently amplified twice, in two different laboratories, following proviral load testing. Phylogenetic analysis of all resulting consensus genomes highlighted the accuracy of the independent amplification of the proviral genomes due to the location of the duplicated samples. Using this study data, we also defined differing transmission routes of BLV within the herd, demonstrating the value of this tool to characterize primary transmission pathways in tested populations. The utilization of this tool also enables further research into the tracing of genetic elements, such as the proviral genomes of retroviruses infecting cattle.

Fidelity of Approach
Development of long-read BLV genome PCR occurred independently between the two participating laboratories with similar objectives, to assess BLV viral transmission. Therefore, primer design criteria, including BLV reference sequences, and PCR condition considerations were individually optimized respective to the polymerase used. This approach allowed for the evaluation of the robustness of the sequencing workflow and downstream interpretation. The efficiency of PCR reactions differed resulting in variation seen in Figure 3, corresponding to coverage achieved. However, the sequencing workflow was able to overcome PCR inefficiencies and generate BLV proviral genome sequences.
Due to the fact that a novel library preparation and bioinformatics pipeline were used, the fidelity of the generated reads needed to be evaluated. Oxford Nanopore Technology sequencing has been previously criticized for its high error rates [26,27] so it was crucial to confirm the accuracy of the resulting sequences. When comparing the sequences of the TOPO ® vectors with the known bacterial inserts generated by both Sanger sequencing and ONT, 100% sequence identity was found (Figure 1). Along with proven identical sequences, the read depth generated by our developed workflow helps overcome the high error rate, as seen in other studies as well [28]. This is a significant obstacle to have overcome and allows for greater insights into nucleotide substitution of large cis-acting genetic elements, such as viral genomes.
A stepwise validation was used as a quality control check at each point along the pipeline. A time intensive HMW DNA extraction was used to ensure high-quality DNA and was confirmed via Nanodrop quantification and gel electrophoresis (Data not shown). Success of the BLV PCR was evaluated with gel electrophoresis and the use of the bacterial amplicons as positive sequencing controls. Following sequencing, reads were trimmed and filtered with our developed bioinformatics protocol. After visualizing the generated genomes on a phylogenetic tree and sequencing coverage plots, unreliable samples with poor PCR product quality and low coverage were removed from future analysis to increase the confidence of generated results ( Figure S3).

Reflection of Findings and Implications for the BLV Field
Despite the high estimated prevalence of BLV within the United States and the asymptomatic nature of the retrovirus infection, disease control is challenging for dairy cattle producers. Without availability of effective treatment or vaccination programs, effective herd management practices to break the chain of transmission is critical. Various transmission routes of BLV have been reported [19][20][21][22][23], but it has been challenging to find direct evidence to prove vertical transmission of the virus. We sought to develop a tool to accurately analyze the majority of the proviral BLV genome to aid in the detection of transmission among United States dairy farms. This method is likely to be important for comparisons among a larger scale population of samples from varying geographical locations. To test this approach, we selected dam and daughter pairs that were both BLV-positive to assess the genetic identity of the proviral genomes.
On a local level, this phylogenetic analysis tool can help dairy farmers identify patterns of BLV transmission in their herds. Due to the assortment of viral variants, dam-daughter pairs with identical genetic sequences were concluded to likely be a result of vertical transmission ( Figure 4). However, it could be possible that another animal passed on that exact sequence to both dam and daughter. In this study herd, only milk replacer is used for dairy replacement calves and not colostrum. This increased the likelihood of vertical transmission from prenatal or perinatal infection and not transmission post-calving. Contrastingly, daughter 6, for example, was likely infected horizontally due to the greater genetic distance between her and her mother's strain (Figure 4). Increased rates of vertical transmission could be decreased with do not breed orders implemented on BLV positive dams or screening practices established before artificial insemination to ensure all bred cows are BLV negative. Separately, increased rates of horizontal transmission could be reduced with improved bloodborne biosecurity and better isolation practices used for BLV-positive cattle.
Globally, BLV sequences have been mostly categorized into 10 different genotypes based on the analysis of the env gene. The use of small proviral fragments heightens the difficulty in drawing phylogenetic and evolutionary conclusions. In our study, an amino acid alignment of just the env gene sequences resulted in 100% identical peptide sequences among all samples ( Figure S2). The utilization of only the env gene would not have provided the level of resolution required to discern different routes of transmission gained when using whole-genome sequence data. The use of entire proviral genomes would aid in the comparison of international strains, compared to conclusions drawn from analysis of a singular gene. Amino acid analysis of these genomes could increase awareness of BLV pathogenesis and virulence differences among genotypes. Functional changes resulting from mutations in Tax, Rex, and G4-along with multiple other proteins have been previously studied, but do not align with the mutations detected in our sample set [29][30][31][32][33]. More research is needed to see if the amino acid changes listed in Table 3 impact BLV pathogenicity and transmissibility.

Application
We have developed a tool to accurately determine full-length BLV proviral genome sequences. Within the field of BLV research in cattle, Oxford Nanopore sequencing of whole targeted BLV whole-genome amplicons will aid in tracing spatial and temporal modeling of infections within different herds and elucidate how cattle movement and farm management practices affect transmission. This approach can also be used to help resolve genetic relationships of dynamic viral populations of other pathogens such as SARS-CoV-2. Li et al. used a similar ONT workflow with SARS-CoV-2 samples to identify various sources and transmission patterns in China [34]. The developed workflow could also be a valuable tool for resolving transmission patterns of Human T Lymphotropic Virus type 1 (HTLV-1) in the central regions of Australia where disease prevalence has been estimated at 33% within indigenous populations [35] as this virus most similarity related to BLV. Furthermore, this tool could be used to resolve structural variation within several genomic loci at once to extend GWAS-based approaches that identify highly associated SNPs with a given phenotype within humans and livestock [36,37].

Limitations to This Pilot Study
The sample set used in this study is relatively small and does not represent all dairy herds within the United States. The data collected on selected animals only displays one snapshot within their lifetimes. A longitudinal study, including BLV-positive neonate samples is needed to evaluate direct and temporally relevant transmission patterns. This sampling strategy could also identify changes in the genetic diversity of the viruses within a herd and draw more concrete conclusions on the routes of transmission.
The proviral DNA was the main sample source used, not RNA. Future research could analyze BLV RNA transcripts to analyze strain functionality differences. The proviral DNA was extracted using a time-intensive high molecular weight protocol. The efficacy of this tool with extracts derived from other extraction protocols, such as column-based, needs to be evaluated in the future. The use of less time-consuming DNA extractions would lead to the better universal application of this tool. However, the use of the HMW protocol allows for future direct DNA sequencing, without PCR amplification, to potentially evaluate the differential states of DNA methylation of the BLV proviral genome, its transcription and resulting disease dynamics [38].

Sample Collection
Whole blood was collected in K2 EDTA Vacutainer ® tubes (BD Vacutainer, Franklin Lakes, NJ, USA) via coccygeal vein of eight BLV ELISA-positive dam-daughter pairs of Holstein cows on a single date (Table 1). Samples were collected from these adult cattle ranging from 25 to 91 months of age. Procedures for this study were reviewed and approved by the Institutional Animal Care and Use Committee (A-3955-01, PROTO201900271) at Michigan State University.

BLV Antibodies
Individual DHI milk samples were tested on the same date, with the Leukosis Serum × 2 Ab test (IDEXX, Westbrook, ME, USA) to identify the BLV serostatus of the animals at the time of bleeding. In short, milk samples were diluted in sample buffer and pipetted into 96-well plates coated with BLV-GP51 antigen per the manufacturer's instructions. Horseradish peroxidase-labeled bovine anti-immunoglobulin was added followed by incubation at room temperature for 5 min. Plates were washed after each incubation and before adding the enzyme-substrate. Reaction times were standardized using color development of positive controls and stopped by adding 0.5 mL Sulfoamino oxidanide (H 2 NO 4 S). Results were reported as corrected 450 nm optical density (OD) measurements with a corrected OD > 0.5 being considered antibody positive.

Proviral Load Diagnosis of BLV-Infected Animals
Qiagen DNeasy blood and tissue kit (Qiagen, Hilden, Germany) was used to extract genomic DNA from whole blood collected from previously ELISA-positive cows one week following milk ELISA screening. The SS1 qPCR assay, developed by CentralStar Cooperative Inc., is a multiplex probe-based quantitative PCR assay that targets the BLV proviral polymerase gene, bovine β-Actin gene, and an internal amplification spike-in control ultramer to quantify proviral load. Briefly, 3 µL of extracted DNA, 12.5 µL of 2 × PrimeTime Gene Expression Master Mix (ThermoFisher, Austin, TX, USA), 1.25 µL of a 20 × primer mix, 1 µL of internal spike-in control (10,000 copies/µL), and 7.25 µL of DNA-free water were combined for each qPCR reaction. All SS1 qPCR was performed on Applied Biosystems 7500 Fast Real-Time PCR system (FAST Real-Time PCR, Foster City, IA, USA) with qPCR conditions as follows: 95 • C for 10 min, 40 × (95 • C for 15 sec, 60 • C for 1 min). BLV and bovine β-Actin (a measure of host DNA) copy numbers were derived using a standard curve consisting of linearized plasmids containing respective target sequences previously quantified and normalized by digital droplet PCR. Amplification efficiency and manual thresholds were established from initial qPCR machine calibration. Proviral Load was calculated and expressed as the ratio between proviral BLV copies and bovine β-Actin copies.

High Molecular Weight DNA Isolation
Frozen aliquots containing 500 µL of whole blood were thawed to room temperature and added to 1 mL of 1 × ChemCruz red cell lysis buffer (RBC: Santa Cruz Biotechnology, Dallas, TX, USA), mixed via inverting, and incubated at room temperature on a rotisserie tube rotator for 10 min. Samples were centrifuged at 4700× g for 5 min and the supernatant was removed. This was repeated once more with 500 µL of RBC lysis buffer. White blood cell pellets were resuspended in 1.5 mL of modified mammalian cell lysis buffer [26] and incubated at 37 • C for 1 h prior to adding 100 µg/mL of Proteinase K (Invitrogen, Waltham, MA, USA) and mixing with a wide bore pipette. The samples were incubated at 55 • C for an hour with occasional agitation. 500 µL of buffer-saturated phenol (Invitrogen, Waltham, MA, USA) was added at room temperature and mixed on the tube rotator for 20 min. Organic phase separation was accomplished via centrifugation at 5000× g for 15 min. The aqueous phase was transferred to a fresh tube and 500 µL of phenol-chloroform-isoamyl alcohol (Acros Organics, Fair Lawn, NJ, USA) was added. After an additional phase separation step, 0.266 volumes of 7.5 mM ammonium acetate were added to the aqueous phase. Following a 5 min incubation at room temperature, 100% ethanol was added followed by thorough mixing. The precipitate was collected via centrifugation at 8000× g for 8 min at room temperature. Pellets were then dehydrated in a succession of 90% and 70% ethanol washes followed by centrifugation at 8000× g for 8 min. DNA pellets were air-dried, eluted in 50 µL of 7.5 pH Tris-EDTA (10 mM Tris, 0.1 mM EDTA), and incubated overnight at 4 • C. DNA was quantified using a Nanodrop spectrophotometer and stored at −20 • C. Amplicons were run out on a 1% agarose gel at 100V for an hour to confirm amplification before library preparation.

University of Minnesota Veterinary Diagnostic Laboratory (MVDL) BLV Whole Genome PCR
High molecular weight DNA extracted (as described above) was used as a template to amplify a 7604 bp amplicon corresponding to the region, 550-8154 bp of BLV whole genome. A 50 µL PCR reaction mixture containing 10 µL of 5 × Primestar GXL buffer, 4 µL of dNTPs (2.5 mM each), 1.5 µL (15 pm) of each primer (Table 1: BLV_UMN), 1 µL of GXL Primestar Polymerase (Takara Bio USA Inc., Mountain view, CA, USA), 28 µL of nuclease-free water (Ambion), along with 4 µL of the template was prepared. PCR mixture was initially incubated for 10 s at 98 • C for denaturation, followed by 30 cycles of 60 • C for 15 s and 68 • C for 2.5 min in a MiniAmp Plus Thermal cycler (Applied Biosystems, Waltham, MA, USA)

Use of Vectors Containing Bacterial Inserts as Positive Sequencing Controls
To measure the fidelity of the sequencing approach, previously generated linearized TOPO2.1 ® (Thermo Fisher Scientific, Austin, TX, USA) vectors containing bacterial genes (Mycoplasma bovis -TrxA, Corynebacterium bovis-16 s) were used as PCR templates. Briefly, the TOPO2.1 ® vector containing 123 bp insertion of M. bovis thioredoxin DNA was linearized using the Sca I restriction enzyme. Topo-BLV primers (Table 1) were used to target the 5 and 3 ends of the linearized vector producing a 3945 bp amplicon. The linearized vector containing the 143 bp insert of C. bovis 16s rDNA resulted in a 3967 bp amplicon ( Figure S1B). BLV_CS primers were used to amplify the primary TOPO ® amplicons prior to Oxford Nanopore sequencing.

Bead Clean Up and Library Preparation for Oxford Nanopores Sequencing
One µL of the BLV or positive control PCR amplicon/s were electrophoretically separated and visualized on a genomic DNA tape screen using a 4200 TapeStation (Agilent Technologies, Santa Clara, CA, USA) for size and integrity. All amplicons, irrespective of size, served as templates for ONT GridION sequencing. The remaining BLV and positive control PCR amplicons were subjected to a 2 × bead clean-up using Sera-Mag Select beads (GE HealthCare Life Sciences, Chicago, IL, USA) and eluted in a final volume of 12 µL of nuclease-free water. The purified PCR product was finally quantified using the Qubit 1X dsDNA HS assay kit (Thermo Fisher Scientific, Waltham, MA, USA) using a Qubit 4 fluorometer (Thermo Fisher Scientific, Waltham, MA, USA) and then used for library preparation.
Library preparation was performed using the Rapid Barcoding Sequencing Kit (SQK-RBK004) (Oxford Nanopore Technologies, New York, NY, USA). Briefly, 7.5 µL of the purified BLV amplicons were mixed with 2.5 µL of a fragmentation mix barcode individually (1 barcode for each sample; 12 samples were barcoded and used on a single flow cell, wherever possible), incubated at 30 • C for 1 min and 80 • C for 1 min, followed by rapid cooling on ice. One µL each of the 12 barcoded libraries were pooled, mixed gently with 1µL of RAP (Rapid Adapter), and incubated for 5 min at room temperature. This final library pool (13 µL) was combined with 34 µL of sequencing buffer, 25.5 µL of loading buffer, 2.5 µL nuclease-free water (total of 75 µL) and loaded onto a primed Flow Cell R9.4 (FLO-MIN106D) on a GridION device and run with the SQK-RBK004_plus_Basecaller script of MinKNOW1.5.12 software. The run was stopped after 4 h, and the flow cell was washed with a Wash Kit (EXP-WSH002) (Oxford Nanopore Technologies, New York, NY, USA) according to the manufacturer's recommendation and stored at 4 • C for later use.

Phylogenetic and Amino Acid Analysis of BLV Whole-Genome Consensus Sequences
A comparative phylogenetic analysis was performed to investigate the genetic relatedness of BLV genomes within related Holstein dams and adult daughters within a Michigan dairy herd All the sequences were aligned with high accuracy and high throughput using MUSCLE (MUltiple Sequence Comparison by Log-Expectation) with 64 iterations [42] (National Center for Biotechnology Information, Bethesda, Maryland, USA). Phylogenetic trees were constructed using maximum-likelihood (ML) methods [43] for each segment with 1000-bootstrap replications and GTR Gamma nucleotide substitution in RAXML v7.6.8, utilizing the resources available at the Minnesota SuperComputing Institute. Trees were rooted to the midpoint, with the increasing order of nodes. Tree topology was supported by >70 bootstrap values. Bootstrap values below 70 were manually removed from the trees using Adobe Illustrator CC 2018 (version 22.0.1 accessed on 7 March 2021). Trees were visualized and depicted using FigTree (version 1.4.3 accessed on 7 March 2021). BLV nucleotide sequences, translated to amino acids using Geneious Prime11.0.6 + 10, were used to identify nonsynonymous mutations, if any, among the major proteins of BLV using AP019598.1 as a reference genome.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/pathogens10091191/s1, Figure S1A: Map of the TOPO ® vector used with Corynebacterium bovis and Mycoplasma bovis inserts, Figure S1B: Primary amplicon of the TOPO ® plasmids, generated from Topo-BLV primers (Table 1) used as a template for controls. Amplicon contains BLV seed sequence for use as a positive control in the BLV PCR, Figure S1C: Electrophoretogram showing the amplification, size, and integrity of TOPO ® plasmids encoding the 16s rDNA gene and Thioredoxin gene from two different pathogens, respectively, used for Oxford Nanopore Sequencing. Figure S2: Amino acid alignment of the ENV protein showing 100% identity between the animals studied. Only sequences derived from BLV amplicons generated by CentralStar Laboratories were used for this analysis. AP019598 is the Genbank accession number of the reference genome used for comparison. Figure S3: Coverage plots generated on samples from Daughter 1 used to filter samples for further data analysis. A: UMN generated amplicon. B: CentralStar group generated amplicon. Yellow areas signify low coverage, and the colored vertical lines are SNPs when compared to the AP019598.1 reference genome.