Phylogenetic Analysis of South African Bovine Leukaemia Virus (BLV) Isolates

Bovine leukaemia virus (BLV) causes chronic lymphoproliferative disorder and fatal lymphosarcoma in cattle, leading to significant economic losses in the beef and dairy industries. BLV is endemic globally and eleven genotypes have been identified. To date, only Zambian isolates have been genotyped from Africa. Although high BLV prevalence has been reported in South Africa, there has been no molecular characterisation of South African BLV isolates. To characterise BLV isolates in South Africa for the first time, we investigated the phylogenetic relationships and compared the genetic variability of eight South African BLV isolates with BLV isolates representing the eleven known genotypes from different geographical regions worldwide. Phylogenetic analyses based on full-length and partial env sequences as well as full-length gag sequences revealed that at least two genotypes, genotypes 1 (G1) and 4 (G4), are present in cattle in South Africa, which is consistent with studies from Zambia. However, our analysis revealed that the G1 South African isolate is more similar to other G1 isolates than the G1 Zambian isolates whereas, the G4 South African isolates are more divergent from other G4 isolates but closely related to the G4 Zambian isolate. Lastly, amino acid sequence alignment identified genotype-specific as well as novel amino acid substitutions in the South African isolates. The detection of two genotypes (G1 and G4) in southern Africa highlights the urgent need for disease management and the development of an efficacious vaccine against local strains.


Introduction
Bovine leukosis is the most common neoplastic disease in cattle and consists of two types of leukosis: enzootic bovine leukosis (EBL) and sporadic bovine leukosis (SBL). The occurrence of EBL is more frequent than that of SBL [1,2]. SBL is not transmissible and the causative agent of SBL remains unknown. The aetiological agent of EBL is bovine leukaemia virus (BLV), which is a B-lymphotropic oncovirus, belonging to the Deltaretrovirus genus of the Retroviridae family [3]. BLV naturally infects cattle, water buffalo and zebu [4]. The majority of transmission is mediated by direct transfer of infected body fluids during iatrogenic procedures such as ear tattooing, dehorning, re-use of needles gloves and plastic sleeves for veterinary procedures, including rectal palpation for the detection of In addition to the global distribution of eleven genotypes, multiple genotypes are circulating in Western Europe, Asia and South America. Some countries in these regions with multiple detected genotypes such as America, Bulgaria, Japan and Argentina have a long history of BLV prevalence or high BLV prevalence rates [40,[49][50][51][52][53][54][55], indicating that not only may the prolonged or delayed eradication process have resulted in the persistence of BLV but also allowed the introduction of new and multiple genotypic strains. This is also a serious concern with respect to the development of an efficacious BLV vaccine.
To date, only Zambian BLV isolates have been genotyped from Africa. Although high BLV prevalence has been reported in South Africa (Afrivet, 2018), there has been no molecular characterisation of South African BLV isolates. Therefore, this study was aimed to sequence BLV env and gag genes from South African isolates in order to investigate phylogenetic relationships and compare genetic variability of the South African isolates with other BLV isolates representing the eleven known genotypes from different geographical regions worldwide. Our analyses revealed that at least two genotypes, G1 and G4, are currently circulating in South Africa. Pairwise comparison and intragenotype evolutionary distances of full-length Env nucleotide and amino acid sequences showed that the G4 South African isolates are closely related to each other and their sequences are highly conserved. Furthermore, our analyses using full-length Gag nucleotide and amino acid sequences indicated that Gag sequences are generally more variable than Env sequences. Lastly, the alignment of BLV Env and Gag amino acid sequences identified genotype-specific as well as novel substitutions in the South African isolates.

Blood Samples and Extraction of Total Bovine Genomic DNA
Whole blood samples were obtained from a single herd of dairy cows in Port Elizabeth in South Africa in 2018. The blood samples were delivered to the University of Cape Town, South Africa for further analyses. Genomic DNA was extracted from whole blood samples (n = 8) using New MagNA Pure Compact Nucleic Acid Isolation Kits on the MagNA Pure Compact Instrument (Roche, Switzerland) as per the manufacturer's instructions. The extracted DNA was stored at −20 • C until needed. The eight blood samples were designated sample IDs: K1170, K1194, M1878, M2746, P591, P2152, P2677 and L3401.

Amplification of BLV env Gene by Nested Polymerase Chain Reaction (nPCR) and Sanger Sequencing of Full-Length Env Gene
A 1782 bp fragment containing the full-length BLV env gene was amplified from total bovine DNA extracted from the eight blood samples by nPCR using the following primers ( Figure 1): Env OF (Outer Forward) (5 -CCTCCTACCAATTCTAAAGACC-3 ), Env OR (Outer Reverse) (5 -CACGCAGA AGCGACAATCTC-3 ), Env if (Inner Forward) (5 -GGGCGGAGAAACACCYAAGG-3 ) and Env ir (Inner Reverse) (5 -CACTGACTATTCCACTAAGCC-3 ). Both rounds of nPCR were performed in a 20 µL reaction mixture containing 1 µL of total genomic DNA or PCR product from the first round of nPCR, 1 µL of 2x KAPA 2G Robust HotStart ReadyMix (KAPA BIOSYSTEMS, Wilmington, MA, USA), 1 µL of each primer (10 µM each) and high-performance liquid chromatography (HPLC) water (Sigma-Aldrich, St Louis, MO, USA). The following thermocycling parameters were used: initial denaturation at 95 • C for 3 min followed by 25 cycles (30 cycles for the second round) of denaturation at 95 • C for 15 s, annealing at 55 • C (61 • C for the second round) for 15 s, extension at 72 • C for 2 min and final extension at 72 • C for 4 min. genome and gag and env regions amplified using nested PCR (nPCR). Triangles denote primers and dotted lines denote amplified regions. Gag if, Gag inner forward primer; Gag ir, Gag inner reverse primer; Gag OF, Gag outer forward primer; Gag OR, Gag outer reverse primer; Env if, Env inner forward primer; Env ir, Env inner reverse primer; Env OF, Env outer forward primer; Env OR, Env outer reverse primer; Pro, protease; Pol, polymerase; Env, envelope; miRNAs, microRNAs; p15, matrix protein; p12, nucleocapsid protein; p24, capsid protein; gp51, surface protein; gp30, transmembrane protein. Adapted from   [56].

Phylogenetic Analysis of BLV Full-Length and Partial env and Full-Length gag Nucleotide Sequences
The BLV env and gag nucleotide sequences of the eight South African isolates were assembled and aligned using CLC Main Workbench 7.9.3 (QIAGEN Bioinformatics, Copenhagen, Denmark). To select reference BLV nucleotide sequences for downstream analyses, 203 BLV full-length (1548 bp) env, 564 partial (444 bp) env and 113 full-length (1182 bp) gag nucleotide sequences with known genotypes were obtained from the Basic Local Alignment Search Tool (BLAST) database [57] using eight South African env and gag nucleotide sequences as query sequences; identical and highly similar sequences were removed using 98.5% sequence similarity as a threshold value in VSEARCH [58]. The selected sequences were further adjusted so that they represented isolates from a wide range of countries (Supplementary Table S2). The final sequences were aligned with the eight South African isolates using MUSCLE [59]. Model testing was performed to select the best evolutionary model based on the Bayesian information criterion (BIC) [60]. Gag OF, Gag outer forward primer; Gag OR, Gag outer reverse primer; Env if, Env inner forward primer; Env ir, Env inner reverse primer; Env OF, Env outer forward primer; Env OR, Env outer reverse primer; Pro, protease; Pol, polymerase; Env, envelope; miRNAs, microRNAs; p15, matrix protein; p12, nucleocapsid protein; p24, capsid protein; gp51, surface protein; gp30, transmembrane protein.

Phylogenetic Analysis of BLV Full-Length and Partial Env and Full-Length Gag Nucleotide Sequences
The BLV env and gag nucleotide sequences of the eight South African isolates were assembled and aligned using CLC Main Workbench 7.9.3 (QIAGEN Bioinformatics, Copenhagen, Denmark). To select reference BLV nucleotide sequences for downstream analyses, 203 BLV full-length (1548 bp) env, 564 partial (444 bp) env and 113 full-length (1182 bp) gag nucleotide sequences with known genotypes were obtained from the Basic Local Alignment Search Tool (BLAST) database [57] using eight South African env and gag nucleotide sequences as query sequences; identical and highly similar sequences were removed using 98.5% sequence similarity as a threshold value in VSEARCH [58]. The selected sequences were further adjusted so that they represented isolates from a wide range of countries (Supplementary Table S2). The final sequences were aligned with the eight South African isolates using MUSCLE [59]. Model testing was performed to select the best evolutionary model based on the Bayesian information criterion (BIC) [60].
Neighbour-joining (NJ) and Maximum-likelihood (ML) trees based on the BLV full-length and 444 bp partial env nucleotide sequences were constructed using the Tamura-Nei substitution model with the Gamma distribution (TN83 + G) [61] and the Kimura-2-parameter model with the Gamma distribution (K2P +G) [62], respectively. NJ and ML trees based on the BLV full-length gag nucleotide sequences were constructed using the TN83 + G [61] model with the Gamma distribution (HKY + G) [63]. Nonparametric bootstrap analysis with 1000 iterations [64] was used to evaluate the robustness of evolutionary relationships. Alignment, model testing and tree building were performed using MEGA 7 [59].

Pairwise Comparison of BLV Env and Gag Nucleotide and Amino Acid Sequence
Pairwise evolutionary distances were estimated using the TN83 + G model for the BLV full-length env and gag nucleotide sequences and the K2P + G model for the partial env nucleotide sequences using MEGA7 [59]. Nonparametric bootstrap analysis with 1000 iterations [64] was used to evaluate the robustness of evolutionary relationships. For the full-length Env and partial Env sequences, the analysis was conducted only between South African isolates and selected G1 and G4 isolates from other geographic regions worldwide.
To evaluate the nucleotide and amino acid sequence percent identity and differences, pairwise comparison of the BLV full-length Env, partial Env and full-length Gag sequences between the South African isolates and selected isolates from other geographic regions (46, 20 and 29 sequences, respectively) was performed using CLC Main Workbench 7.9.3 (QIAGEN Bioinformatics, Copenhagen, Denmark). For the partial Env sequences, sequence comparison was made only between South African isolates and selected G1 and G4 isolates from other geographic regions worldwide.

Intragenotype and Intergenotype Evolutionary Distances of BLV Env Nucleotide and Amino Acid Sequences
Two hundred and three full-length env nucleotide sequences of BLV isolates within nine known genotypes that were obtained from the BLAST database [57] in Section 2.4 were used for this analysis. Translated amino acid sequences were aligned using MUSCLE in MEGA7 [59] Mean intragenotype and intergenotype evolutionary distances of BLV Env nucleotide and amino acid sequences were estimated using the TN83 + G model [45] and the Jones-Taylor-Thornton model with the Gamma distribution (JTT + G) [49], respectively, using MEGA7 [59]. Nonparametric bootstrap analysis with 1000 iterations [64] was used to evaluate the robustness of evolutionary relationships.

Alignment of BLV Env and Gag Amino Acid Sequences
To identify genotype-specific amino acid substitutions as well as those unique to the South African isolates, the full-length Env and Gag amino acid sequences of the South African isolates were aligned with 851 Env amino acid sequences of BLV isolates with eleven known genotypes and with 113 Gag amino acid sequences with six known genotypes (G1, G2, G4, G6, G9 and G10) obtained from the GenBank sequence database using MUSCLE in MEGA7 [59].

Phylogenetic Analysis of BLV Full-Length Env Nucleotide Sequences
The full-length env and gag sequences were successfully amplified and sequenced from total bovine DNA extracted from whole blood of eight BLV-infected cattle. To determine evolutionary relationships between the South African and BLV isolates from other geographic regions worldwide, neighbour-joining (NJ) (Supplementary Figure S1) and maximum-likelihood (ML) (Figure 2) phylogenetic trees based on the alignment of full-length (1548 bp) env nucleotide sequences were constructed using 70 BLV isolates representing G1 to G7, G9 and G10 from 18 countries. NJ and ML phylogenetic methods were used to validate these two results and both phylogenetic trees showed congruent topologies with high bootstrap values. The NJ and ML trees revealed that seven of the South African isolates (K1170, K1194, M1878, P591, P2152, P2677 and L3401) are closely related to each other with 99.8% bootstrap value and all belong to G4. One South African isolate (M2746) did not cluster with the seven other South African isolates and grouped with G1. This phylogenetic analysis also indicated that the G4 South African isolates are more divergent from other G4 isolates, whereas the G1 South African isolate is less divergent from other G1 isolates. The G4 South African isolates appeared to be moderately related to a Russian isolate (JN695878.1) with 74.5% (NJ) and 75   The NJ and ML trees revealed that seven of the South African isolates (K1170, K1194, M1878, P591, P2152, P2677 and L3401) are closely related to each other with 99.8% bootstrap value and all belong to G4. One South African isolate (M2746) did not cluster with the seven other South African isolates and grouped with G1. This phylogenetic analysis also indicated that the G4 South African isolates are more divergent from other G4 isolates, whereas the G1 South African isolate is less divergent from other G1 isolates. The G4 South African isolates appeared to be moderately related to a Russian isolate (JN695878.1) with 74.5% (NJ) and 75

Phylogenetic Analysis of BLV Partial Env Nucleotide Sequences (444 bp)
The only African BLV sequences that have been sequenced are partial (444 bp) sequences of the gp51 gene from Zambian isolates [65,66]. Furthermore, since the partial env sequences have been used for BLV genotyping in various studies [45,[67][68][69][70][71][72], they are available from 10 genotypes from various geographic regions. To confirm the results obtained from the phylogenetic analyses based on the full-length env nucleotide sequences and to further investigate the phylogenetic relationships between the South African isolates with more diverse BLV isolates from different geographic regions, NJ (Supplementary Figure S3) and ML ( Figure 3) trees based on the alignment of partial env nucleotide sequences were constructed using 112 BLV isolates from 32 countries.
Despite the low to moderate bootstrap values, this second analysis was consistent with the first analysis using the full-length env nucleotide sequences, showing that seven of the South African isolates belong to G4 and one (M2746) belongs to G1. G2 sequences in the NJ tree clustered differently from the ML tree due to the difference in tree-building methods (distance-based vs. character-based methods) but the ML tree based on the partial env nucleotide sequences was congruent with the phylogenetic tree based on the full-length env nucleotide sequences. The seven G4 South African isolates clustered within the G4 group with 44% (NJ) and 44.2% (ML) bootstrap values whilst the South African isolate (M2746) clustered with other G1 isolates with 70.6% (NJ) and 66% (ML) bootstrap value. The G1 South African isolate did not cluster with three selected G1 Zambian isolates (LC440653.1, LC440663.1 and LC440666.1), which form a distinct monophyletic group as in Phiri's study [66]. In contrast, despite low bootstrap values (32.5% for NJ and 36.4% for ML), the G4 South African isolates are more closely related to a G4 Zambian isolate (LC193462.1) than other G4 isolates. The estimated evolutionary distance between the seven G4 South African isolates and the G4 Zambian isolate was 0.0023 (K1170, K1194, P2677 and P591) and 0.0045 (L3401, M1878 and P2152) (Supplementary Figure S4).

Phylogenetic Analysis of BLV Full-Length Gag Nucleotide Sequences
Although the BLV gag sequence has rarely been used for phylogenetic analyses, this was included here to add more insight into the genetic variabilities of the BLV isolates and to compare the results with the phylogenetic analyses of the BLV env nucleotide sequences. NJ (Supplementary Figure S5) and ML ( Figure 4) trees based on the alignment of BLV full-length gag nucleotide sequences were constructed using 40 BLV isolates representing G1, G2, G4, G6, G9 and G10 from 11 countries.  analyses of the BLV full-length env nucleotide sequences supported by high bootstrap values. Similar to the BLV env phylogenetic trees, the BLV gag trees showed that seven of the South African isolates belong to G4 with 90.4% (NJ) and 95.4% (ML) bootstrap values, whereas one South African isolate (M2746) belongs to G1 with 100% bootstrap value (NJ and ML). Interestingly, there appeared to be more genetic variability in the BLV gag nucleotide sequences amongst different genotypes as evident from the tree topology, as well as the pairwise evolutionary distance (Supplementary Figure S6).  Despite the limited number of full-length gag nucleotide sequences currently available on the GenBank sequence database, NJ and ML trees showed congruent topologies with the phylogenetic analyses of the BLV full-length env nucleotide sequences supported by high bootstrap values. Similar to the BLV env phylogenetic trees, the BLV gag trees showed that seven of the South African isolates belong to G4 with 90.4% (NJ) and 95.4% (ML) bootstrap values, whereas one South African isolate (M2746) belongs to G1 with 100% bootstrap value (NJ and ML). Interestingly, there appeared to be more genetic variability in the BLV gag nucleotide sequences amongst different genotypes as evident from the tree topology, as well as the pairwise evolutionary distance (Supplementary Figure S6).

Pairwise Comparison of the BLV Full-Length Env Sequences
Pairwise comparison of the BLV full-length Env nucleotide and amino acid sequences between the South African isolates and 46 selected isolates (Supplementary Figures S7 and S8) were performed. Figure 5 shows pairwise percent identity of the BLV full-length Env sequences between the South African isolates and 17 selected G1 and G4 isolates. The analysis revealed that the G4 South African isolates shared significantly high sequence identity with each other: ≥99.74% nucleotide sequence identity (0-4 differences in 1548 nucleotides) and ≥99.81% amino acid sequence identity (0-2 differences in 515 amino acids) ( Figure 5 and Supplementary Figure S8). In particular, the nucleotide sequences of the P2677 and P591isolates are identical. Furthermore, this analysis also showed that despite some nucleotide sequence variabilities (99.74-99.94% identity or 1-4 differences in 1548 nucleotides), five South African isolates K1170, P591, P2152, P2677 and L3401 shared 100% amino acid sequence identity, indicating that their nucleotide substitutions were synonymous mutations.
in 515 amino acids) ( Figure 5 and Supplementary Figure S8). In particular, the nucleotide sequences of the P2677 and P591isolates are identical. Furthermore, this analysis also showed that despite some nucleotide sequence variabilities (99.74-99.94% identity or 1-4 differences in 1548 nucleotides), five South African isolates K1170, P591, P2152, P2677 and L3401 shared 100% amino acid sequence identity, indicating that their nucleotide substitutions were synonymous mutations.
Consistent with the phylogenetic analyses based on the NJ and ML trees, the full-length Env nucleotide and amino acid sequences of the G1 South African isolate showed high sequence identity with the majority of the G1 isolates ( Figure 5 and Supplementary Figure S8). The G1 South African fulllength Env sequence shared 98.77-99.61% nucleotide sequence identity (6-19 differences in 1548 nucleotides) and 98.64-99.81% amino acid sequence identity (1-7 differences in 515 amino acids) with all G1 isolates except the Australian isolate (D00647.1). In contrast, there is slightly more sequence variability between the G4 South African isolates and other G4 isolates. The G4 South African isolates shared 98.19-98.90% nucleotide sequence identity (18-28 differences in 1548 nucleotides) and 98.25-99.03% amino acid sequence identity (5-9 differences in 515 amino acids) with all other G4 isolates. Figure 5. Pairwise percent identity of BLV full-length Env nucleotide (1548 bp) and amino acid (515 amino acids) sequences between G1 and G4 isolates from South Africa and other geographic regions worldwide. The lower matrix shows the percent identity of nucleotide sequences and the upper matrix shows that of amino acid sequences. The country of origin is indicated by 2-letter country codes (see Supplementary Table S2) Figure 5. Pairwise percent identity of BLV full-length Env nucleotide (1548 bp) and amino acid (515 amino acids) sequences between G1 and G4 isolates from South Africa and other geographic regions worldwide. The lower matrix shows the percent identity of nucleotide sequences and the upper matrix shows that of amino acid sequences. The country of origin is indicated by 2-letter country codes (see Supplementary Table S2). SA, South Africa.
Consistent with the phylogenetic analyses based on the NJ and ML trees, the full-length Env nucleotide and amino acid sequences of the G1 South African isolate showed high sequence identity with the majority of the G1 isolates ( Figure 5 and Supplementary Figure S8). The G1 South African full-length Env sequence shared 98.77-99.61% nucleotide sequence identity (6-19 differences in 1548 nucleotides) and 98.64-99.81% amino acid sequence identity (1-7 differences in 515 amino acids) with all G1 isolates except the Australian isolate (D00647.1). In contrast, there is slightly more sequence variability between the G4 South African isolates and other G4 isolates. The G4 South African isolates shared 98.19-98.90% nucleotide sequence identity (18-28 differences in 1548 nucleotides) and 98.25-99.03% amino acid sequence identity (5-9 differences in 515 amino acids) with all other G4 isolates.

Pairwise Comparison of BLV Partial Env Sequences
Pairwise comparison of the BLV partial Env nucleotide and amino acid sequences between the South African isolates and 20 selected G1 and G4 isolates ( Supplementary Figures S9 and S10). Furthermore, in order to compare South African and Zambian isolates, all fourteen G1 Zambian sequences currently available on the GenBank sequence database were used in this analysis (Supplementary Tables S3 and S4). Supplementary Figure S11 shows alignment of the deduced amino acid sequences of the 148 amino acid region in the Env gp51 protein between the South African isolates and G1 and G4 isolates.
As opposed to the analyses based on the full-length Env sequences, sequences of the 444 bp region in the gp51 gene appear to be slightly more variable as this region contains highly polymorphic epitope sequences and neutralising domains. The G1 South African sequence shared moderate sequence identity with the G1 isolates 97.30-99.55% nucleotide sequence identity (2-12 differences in 444 nucleotides) and 95.95-99.32% amino acid sequence identity (1-6 differences in 148 amino acids) (Supplementary Figures S9 and S10). The G4 South African isolates and G4 isolates showed 97.75-99.55% nucleotide sequence identity (2-10 differences in 444 nucleotides) and their amino acid sequences showed 95.95-99.32% (1-6 differences in 148 amino acids).
Comparison of the South African isolates with the Zambian isolates indicated that the G1 South African isolate is more similar to G1 isolates from other parts of the world than to the G1 Zambian isolates. The G1 South African nucleotide sequence is only moderately similar to the fourteen G1 Zambian isolates (Supplementary Table S3), showing 98.20-98.87% sequence identity or 5-8 differences in 444 nucleotides. However, their amino acid sequences shared relatively low sequence identity (96.62-98.65% identity or 2-5 differences in 148 amino acids). In contrast, the G4 South African isolates are more divergent from the other G4 isolates but highly similar to the G4 Zambian isolate. Three G4 South African isolates (M1878, P2677 and L3401) and four other G4 South African isolates (K1170, K1194, P591 and P2152) shared 99.55% (two nucleotide differences) and 99.77% nucleotide sequence identity (one nucleotide differences) with the Zambian isolate (LC193462), respectively (Supplementary Table S4). Furthermore, all G4 South African isolates except the M1878 isolate shared 100% amino acid sequence identity with the G4 Zambian isolate. The M1878 isolate shared 99.32% amino acid sequence identity with the Zambian isolate, which is only one amino acid difference between these isolates.

Pairwise Comparison of BLV Gag Sequences
Pairwise comparison of the BLV full-length Gag nucleotide and amino acid sequences between the South African isolates and 29 selected isolates from other geographic regions ( Figure 6 and Supplementary Figures S12 and S13) showed that Gag sequences from different genotypes appeared to be more variable than Env sequences. Pairwise sequence identities of the full-length env nucleotide sequences were ≥94.44% ( Figure 5 and Supplementary Figure S7) whereas those of the full-length gag nucleotide sequences were ≥92.58% (Figure 6 and Supplementary Figure S12).
Similarly, the Gag sequences amongst the G4 South African isolates appeared to be slightly more variable than their Env sequences. Whilst the full-length Env sequences of the G4 South African isolates shared high nucleotide and amino acid sequence identity (≥99.74% and ≥99.61%, respectively) with each other and Env amino acid sequences of the five G4 South African isolates are identical, their Gag sequences shared ≥99.24% nucleotide sequence identity (1-9 differences in 1182 nucleotides) and ≥98.98 amino acid sequence identity (0-4 differences in 393 amino acid) with each other (Supplementary Figures S12 and S13). The K1170 and P2677 isolates shared 100% Gag amino acid sequence identity, as did P2152 and L3401 isolates.
In contrast to the G4 isolates, the G1 South African isolate shared relatively high sequence identity, 99.24-99.66% nucleotide sequence identity (4-9 differences in 1182 nucleotides) and 99.24-99.75% amino acid sequence identity (1-4 differences in 393 amino acids), with all G1 isolates except an Australian isolate (D000647) and a Japanese isolate (K02120.1). It should be noted that the Australian and Japanese isolates have a three-nucleotide deletion, equivalent to one amino acid deletion (which appears to result in an in-frame deletion) and they are more divergent from other G1 isolates. Figure 6. Pairwise percent identity of BLV full-length Gag nucleotide (1182 bp) and amino acid (393 amino acids) sequences between G1 and G4 isolates from South Africa and other geographic regions worldwide. The lower matrix shows the percent identity of nucleotide sequences and the upper matrix shows that of amino acid sequences. The country of origin is indicated by 2-letter country codes (see Supplementary Table S2). SA, South Africa.

Genetic Variabilities of BLV Env Sequences
Using 203 full-length Env sequences with nine known genotypes obtained from the GenBank sequence database and the eight South African Env sequences, mean intragenotype and intergenotype nucleotide and amino acid distances were estimated (Supplementary Table S5). The analysis confirmed that the seven G4 South African isolates and one G1 South African isolate belong to G4 and G1, respectively, showing the lowest mean intergenotype distances. The mean nucleotide and amino acid distances between the G1 South Africa isolate and the G1 isolates were 0.0082 and  Figure 6. Pairwise percent identity of BLV full-length Gag nucleotide (1182 bp) and amino acid (393 amino acids) sequences between G1 and G4 isolates from South Africa and other geographic regions worldwide. The lower matrix shows the percent identity of nucleotide sequences and the upper matrix shows that of amino acid sequences. The country of origin is indicated by 2-letter country codes (see Supplementary Table S2). SA, South Africa.
Only two G4 gag sequences have been published and these are from Belgium and America. Consistent with the phylogenetic analyses, this analysis showed that the G4 South African isolates are more divergent from the G4 Belgian (KT122858.1) and American (AF033818) isolates. The G4 South African isolates showed only 97.29-98.14% nucleotide identity and 97.71-98.47% amino acid identity with these two G4 isolates, which corresponds to 22-32 nucleotide differences and 6-9 amino acid differences.
Viruses 2020, 12, 898 13 of 25 In contrast to the G4 isolates, the G1 South African isolate shared relatively high sequence identity, 99.24-99.66% nucleotide sequence identity (4-9 differences in 1182 nucleotides) and 99.24-99.75% amino acid sequence identity (1-4 differences in 393 amino acids), with all G1 isolates except an Australian isolate (D000647) and a Japanese isolate (K02120.1). It should be noted that the Australian and Japanese isolates have a three-nucleotide deletion, equivalent to one amino acid deletion (which appears to result in an in-frame deletion) and they are more divergent from other G1 isolates.

Genetic Variabilities of BLV Env Sequences
Using 203 full-length Env sequences with nine known genotypes obtained from the GenBank sequence database and the eight South African Env sequences, mean intragenotype and intergenotype nucleotide and amino acid distances were estimated (Supplementary Table S5). The analysis confirmed that the seven G4 South African isolates and one G1 South African isolate belong to G4 and G1, respectively, showing the lowest mean intergenotype distances. The mean nucleotide and amino acid distances between the G1 South Africa isolate and the G1 isolates were 0.0082 and 0.0063, respectively and those between the G4 South African isolates and the G4 isolates were 0.0138 and 0.0141, respectively. Furthermore, comparison of these mean intergenotype evolutionary distances indicated that the G1 South African isolate is less divergent from other G1 isolates whereas the G4 South African isolates are slightly more divergent from other G4 isolates, which also confirmed the results of the phylogenetic analyses based on the full-length env nucleotide sequences.
This analysis was also consistent with the results obtained from pairwise comparison of the full-length Env sequences, showing that the Env sequences of the G4 South African isolates themselves are significantly conserved. The mean intragenotype distances of the G4 South African Env sequences were 0.0133 and 0.0011 for nucleotide and amino acid sequences, respectively.
Furthermore, this analysis revealed that the Env sequences of G6 and G10 isolates are more variable than other genotypes. The G6 and G10 Env sequences showed mean intragenotype nucleotide distances of 0.0267 and 0.0273, respectively, and mean intragenotype amino acid distances of 0.0249 and 0.0320, respectively.

Alignment of BLV Env Amino Acid Sequences
To assess whether the nucleotide substitutions altered amino acid sequences in the South African isolates and to identify the genotype-specific amino acid substitutions previously described [41,44,45,73] as well as amino acid substitutions unique to the South African isolates, the full-length Env and Gag deduced amino acid sequences of the South African isolates were aligned with a total of the 851 Env amino sequences and 113 Gag amino acid sequences available from the GenBank sequence database. Table 1 shows the alignment of Env amino acid sequences of the South African isolates with those of the 43 selected isolates representing eleven genotypes. M2746 a, amino acid substitutions described by Rodriguez et al. (2009) [41]; b, genotype-specific amino acid substitution described by Zhao and Buehring (2007) [73]; c, genotype-specific amino acid substitution previously described by Lee et al. (2016) [45]; d, genotype-specific amino acid substitutions described by Polat et al. (2016) [44]. SU, surface protein; TM, transmembrane protein; ND 2, neutralising domain 2; PXXP motif, proline-rich motif for the cell signalling; Dash (-) indicates amino acid residues are not available due to the partial sequences. Amino acid substitutions that were exclusively found in the South African isolates are shown in red font. Each amino acid has been assigned a unique background color. The country of origin is indicated by 2-letter codes (see Supplementary Table S2).
Viruses 2020, 12, 898 16 of 25 Alignment of the BLV Env amino acid sequences confirmed the presence of the G1-and G4-specific amino acid substitutions in the South African isolates at fourteen residues (at positions 29,48,56,73,74,82,121,132,134,144,254,479,480 and 504) in their Env sequences. As some exceptions to these genotype-specific substitutions, a proline-to-alanine substitution at position 73 (P73A) in the L3401 isolate and a D134N substitution in the M1878 isolate were found. Both of these amino acid substitutions appeared to be G1-like and could be random mutations.
Furthermore, three unique amino acid substitutions in the Env sequences (at positions 59, 153 and 476) were found in some South African sequences. An isoleucine-to-leucine substitution at position 59 (I59L) and a histidine-to-glutamine substitution at position 153 (H153Q) were only present in the K1194 and M2746 isolates, respectively and thus, this could be a random mutation. The E476D substitution was consistently found in the seven G4 South African isolates. This substitution was also found in G1 Mexican sequences (KY548770.1, KY548786.1 and KY548787.1; data not shown) and a G9 Bolivian sequence (LC080675.1; data not shown). However, none of the (non-South African) G4 isolates had this substitution, suggesting that this amino acid substitution is unique to the G4 South African isolates.

Alignment of BLV Gag Amino Acid Sequences
Alignment of the Gag amino acid sequences of the South African isolates with 113 Gag amino acid sequences available on the GenBank sequence data also confirmed the presence of G1-and G4-specific amino acid substitutions in the South African isolates. Table 2 shows the alignment of Gag amino acid sequences of the South African isolates with those of the 21 selected isolates representing six genotypes. The analysis showed that the South African isolates contained genotype specific amino acid substitutions at four residues (at positions 63, 69, 88 and 365) in their Gag sequences. Furthermore, seven amino acid substitutions (positions 29, 107, 108, 278, 318, 341 and 343) in the Gag sequences that had not been previously described were detected in the South African isolates and these were mainly found in the G4 South African isolates.
a, genotype-specific amino acid substitutions previously described by Polat et al. (2016) [56]. Amino acid substitutions that are previously not described are highlighted in blue. G1 and G4 sequences are highlighted in yellow for comparison. MA, matrix; CA, capsid; NC, nucleocapsid. The country of origin is indicated by 2-letter codes Dash (-) indicates amino acid residues are not available due to the partial sequences. Amino acid substitutions that were exclusively found in the South African isolates are shown in red font. Each amino acid has been assigned a unique background color. The country of origin is indicated by 2-letter codes (see Supplementary Table S2).
Four of these seven amino acid substitutions were particularly unique to the South African sequences. The amino acid alignment showed that N29D, I278V and P343S substitutions were only found in the G4 South African sequences, though the I278V substitution was also detected in an Iranian isolate (LC193727.1; data not shown), which has not been genotyped. Although previously not described, our analysis indicates that the presence of methionine at position 318 in the M2746 sequence and isoleucine at the same position in the seven G4 South African sequences could be genotype-specific.
A K341Q substitution was only found in the South African P591 isolate amongst 113 isolates studied. In addition, a G1-like K69R substitution was found in the G4 South African M1878 isolate and a V108I substitution was only found in the M1878 isolate but not in other six G4 South African isolates. It is uncertain whether these amino acid substitutions are spontaneous mutations.
A genotype-specific glycine-to-glutamic acid substitution at position 88 (G88E) was previously described [44] and the presence of the glycine residue at this position was identified as G1-specific and the presence of the glutamic acid residue as G4-specific. However, in our study, the G4 South African isolates M1878, P2152 and L3401 had the glycine residue at this position. The A107V substitution in the K1194 isolate and an A107S substitution in the M1878, P2152 and L3401 isolates could be random mutations as these mutations did not appear to be genotype-specific.

Discussion
This study is the first known phylogenetic analysis of BLV isolates from South Africa. Full-length env and gag genes from eight South African BLV isolates were sequenced. Phylogenetic analyses based on these genes revealed that at least two genotypes, genotype 1 and genotype 4, are present in South Africa, which is consistent with the Zambian studies [65,66]. Phylogenetic analyses and pairwise comparison of Gag sequences further provided two findings. Firstly, the Env and Gag sequences of the G4 South African isolates are highly conserved and some of these G4 South African isolates encode identical amino acid sequences. Secondly, Env sequences are highly conserved within as well as between genotypes whereas Gag sequences are more variable between genotypes. Lastly, alignment of Env and Gag amino acid sequences identified genotype-specific as well as novel amino acid substitutions in the South African isolates.
In 2016, Pandey et al. identified a G4 isolate from a dairy cow in Zambia by phylogenetic analysis based on the 444 bp region of the gp51 gene [65]. In 2018, using the same 444 bp region, they identified fourteen G1 isolates from beef cattle [66]. These Zambian studies enabled comparison of the South African and Zambian BLV partial Env sequences. This comparison revealed that the G1 South African isolate is more closely related to the majority of the G1 isolates (not from Zambia) than to the G1 Zambian isolates. This result was not surprising as Phiri et al. found in their 2018 study that the G1 Zambian isolates form a distinct group within G1 [66]. In contrast, the G4 South African isolates appeared to be highly similar to the G4 Zambian isolate. The G4 South African isolates differed from the G4 Zambian isolate only at one or two nucleotide positions; the amino acid sequence identity was 100% for six of the seven G4 South African isolates (K1170, K1194, P591, P2152, P2677 and L3401) and 99.32% for the M1878 isolate. However, we cannot entirely exclude the possibility that there are G1 South African isolates that are closely related to the G1 Zambian isolate as this was a very small study with samples only taken from a single herd. Similarly, only one G4 isolate was sequenced from Zambia and thus, there might be G4 Zambian isolates that are less similar to South African isolates. Furthermore, if future studies could sequence full-length env (1548 bp) and gag (1182 bp) genes from Zambian isolates as well as South African isolates, it would enable a more comprehensive analysis of the G1 and G4 isolates circulating in these two countries, as these comparisons are only based on a 444 bp region of env.
It should be noted that the collection dates are unknown for the majority of the available BLV sequences. Therefore, we could not perform the analysis and interpretation of the BLV evolutionary history for eleven genotypes as well as the G4 South African isolates.
Although gag sequences are rarely used for phylogenetic analysis and the number of gag sequences available on the GenBank sequence database is limited, we sequenced the full-length gag gene to compare the results obtained from phylogenetic analyses as well as sequence comparison and alignment of full-length and partial env sequences. Phylogenetic analyses based on the full-length gag nucleotide sequences were in agreement with the phylogenetic analyses based on the full-length and partial env nucleotide sequences, validating that one South African isolate belongs to G1 and the seven other South African isolates belong to G4. Consistent with the analyses based on the full-length Env sequences, Gag sequences between the G1 South African isolate and G1 isolates from other parts of the world appeared to be more conserved than those between the G4 South African isolates and G4 Belgian and American isolates.
Furthermore, full-length Gag sequences are slightly more variable between different genotypes than full-length Env sequences as evident from the phylogenetic analysis and pairwise sequence comparison. Similarly, whilst five out of the seven G4 South African Env amino acid sequences are identical, only two of their Gag sequences are identical. Since only 113 Gag sequences with known genotypes are available from a limited number of countries, obtaining more Gag sequences is desirable to validate these conclusions and will provide a more comprehensive understanding of evolutionary changes in the Gag sequences.
Amino acid sequence alignment between the South African isolates and global isolates representing all eleven genotypes detected seventeen and eleven amino acid substitutions in the South African Env and Gag sequences, respectively. These substitutions include genotype-specific substitutions as well as those unique to the South African isolates. The amino acid substitution at position E476D in the Env sequences as well as the amino acid substitutions at positions N29D, I278V and P343S in the Gag sequences were consistently detected in all G4 South African isolates. These substitutions are absent in other G4 isolates from other countries as well as any isolates from other genotypes. Importantly, these amino acid substitutions which are unique to the G4 South African isolates appear to be fairly conservative and thus, they are unlikely to cause functional changes to the proteins. No insertions and deletions were detected in the South African sequences.
The genotype-specific amino acid substitutions in the Env sequences detected in this study include  [51,[73][74][75]. Homology modelling of the BLV Env protein has suggested that the conformational epitopes and the neutralising domains are located on the surface of the virus [14,73]. Therefore, it is not surprising that these epitopes and domains are targets of neutralising antibodies (NAbs) [16,18,19,24] and under strong selection pressure [73], leading to the antigenic variations in these regions [19].
There are three types of conformational epitopes (F, G and H) [17,76,77] on the Env gp51 surface protein which are important for cell fusion and syncytium formation during viral dissemination [16]. In our study, four point mutations in the G epitopes (positions 48, 73, 74 and 82) and one point mutation in the H epitope (position 56) were detected from the South African isolates. Since functional studies on the Env protein and Env mutants are scarce, functional consequences of the Env mutations found in our studies are unknown. However, non-conservative mutations in the conformational epitopes and the neutralising domain 2 (ND2) would affect the recognition of NAbs to these epitopes. A previous study which identified the S56F substitution in the H epitope showed that this substitution altered the recognition of a monoclonal antibody to this epitope owing to the drastic change from the small serine residue to the large hydrophobic phenylalanine residue [78]. This effect would also apply to the S82F substitution in the G epitope. Mutations in the conformational epitopes appear to benefit viral persistence through antigenic changes and immune escape. Previous studies demonstrated that BLV natural variants lacking one or two of the conformational epitopes were viable and appear to have evolved these mutations to evade host immune responses by altered antibody recognition [19,78]. However, BLV mutants lacking all conformational epitopes have not been detected and these deletion mutants retained at least one of the three conformational epitopes [19,78]. These observations indicate that the simultaneous loss of these epitopes would be deleterious and possibly cause loss of infectivity to the virus [16].
The BLV Env protein contains three neutralising domains (ND1 to ND3) [24] and these domains also play a role in viral infectivity (cell fusion and syncytium) as well as immunogenicity [15,24,79]. Previous studies demonstrated that the ND2 is particularly prone to multiple point mutations [41,46] but it was also shown that ND2 is under purifying selection [73,75], reflecting its functional significance. Rodríguez et al. [41] identified eleven point mutations within the ND2 from BLV strains from a variety of geographic regions. In contrast, the ND1 and ND3 appear to be less prone to mutations [41,80]. Using homology modelling, Moratorio et al. [79] demonstrated that the D134N substitution in the ND2 changed net charge in a loop of the Env protein, possibly disturbing the immunogenicity and/or fusogenic properties of the ND2. One G4 South African isolate (M1878) had the G1-like D134N substitution in this ND2. It is unclear whether this is a spontaneous mutation.
Eleven amino acid substitutions were found in the South African Gag sequences and the majority of the substitutions (six out of eleven substitutions) occurred in the p15 matrix (MA) protein (positions 26, 63, 69, 88, 107 and 108) whereas only two amino acid substitutions were found in the p24 capsid (CA) protein (positions 278 and 318). Since the CA protein is the second major target for the NAbs next to the gp51 surface protein [32], this protein could be under selection pressure. Yet, it is possible that mutations in the CA could render a fitness cost to the virus and cause deleterious effects to viral replication. A genetic footprinting analysis on the Moloney murine leukaemia virus demonstrated that the majority of the MA regions were tolerant of insertions whereas the N-terminal region of the CA was susceptible to insertions [81]. They further demonstrated that although Gag mutants harbouring mutations in the MA and N-terminal region of the CA all produced virions, CA mutants produced an abundance of immature virions [81]. Furthermore, whilst the MA mutants showed reverse transcriptase (RT) and nuclear transport activities and appeared infectious, the RT and nuclear transport activities were not detected in the CA mutants, suggesting the mutations affect uncoating or viral entry [81].
Three amino acid substitutions were detected in the p12 nucleocapsid (NC) protein (positions 341, 343 and 365). The retroviral NC protein contains conserved zinc-finger motifs [82][83][84][85] and basic residues [86][87][88] for RNA packaging and virion assembly [89,90]. The K341Q substitution in the NC protein was only detected from the South African P591 isolate, not from any other South African or global isolates. This mutation might be a spontaneous mutation. However, this amino acid change may cause functional changes in the virion assembly. Wang et al. [91] performed alanine-scanning mutagenesis study for the basic residues in the BLV NC and demonstrated that although the K341A substitution in the NC did not affect RNA packaging, the K341A mutant had two-to three-fold reduction in virion production. Although lysine can be substituted with other polar amino acids including glutamine, it is unknown whether the loss of charge may also affect virion production.
This study served as the first phylogenetic analysis of the BLV full-length env and gag sequences from eight South African isolates. Since the sample population was small, and was only collected from a single herd, and there was only one G1 isolate in our study, future studies will be required to obtain more samples from different herds and geographic regions to investigate whether other BLV genotypes are circulating in cattle in South Africa. Needless to say, if more BLV sequences from Africa become available, it would provide greater insight into BLV genotypes currently circulating in Africa. Lastly, the detection of two genotypes (G1 and G4) in southern Africa highlights the urgent need for disease management and the development of an efficacious vaccine against local strains.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/12/8/898/s1, Figure S1: Neighbour-joining phylogenetic tree based on the alignment of BLV full-length env nucleotide sequences (1548 bp) from South Africa and other geographic regions worldwide. Figure S2: Pairwise evolutionary distance of BLV full-length env nucleotide sequences (1548 bp) from the South African isolates and other G1 and G4 isolates from other geographic regions worldwide. Figure S3: Neighbour-joining phylogenetic tree based on the alignment of BLV partial env nucleotide sequences (444 bp) from South Africa and other geographic regions worldwide. Figure S4: Pairwise evolutionary distance of BLV partial env nucleotide sequences (444 bp) from the South African isolates and other G1 and G4 isolates from other geographic regions worldwide. Figure S5: Neighbour-joining phylogenetic tree based on the alignment of BLV full-length gag nucleotide sequences (1182 bp) from South Africa and other geographic regions worldwide. Figure S6: Pairwise evolutionary distance of BLV full-length gag nucleotide sequences (1182 bp) from the South African isolates and 29 isolates from other geographic regions worldwide. Figure S7: Pairwise percent identity of BLV full-length Env nucleotide (1548 bp) and amino acid sequences (515 amino acids) between the South African isolates and 46 isolates from other geographic regions worldwide. Figure S8: Pairwise comparison of BLV full-length Env nucleotide (1548 bp) and amino acid (515 amino acids) differences between the South Africa isolates and 46 isolates from other geographic regions worldwide. Figure S9: Pairwise percent identity of the BLV partial Env nucleotide (444 bp) and amino acid (148 amino acids) sequences between the South African isolates and G1 and G4 isolates from other geographic regions worldwide. Figure S10: Pairwise comparison of BLV partial Env nucleotide (444 bp) and amino acid (148 amino acids) differences between the South Africa isolates and G1 and G4 isolates from other geographic regions worldwide. Figure S11: Comparison of 148 amino acid region in the Env gp51 protein between the G1 and G4 South African and Zambian isolates. Figure S12: Pairwise percent identity of BLV full-length Gag nucleotide (1182 bp) and amino acid (393 amino acids) sequences between the South African isolates and 29 isolates from other geographic regions worldwide. Figure S13: Pairwise comparison of BLV full-length Gag nucleotide (1182 bp) and amino acid (393 amino acids) differences between the South Africa isolates and 29 isolates from other geographic regions worldwide. Table S1: GenBank accession numbers, sample IDs and genotypes of the BLV full-length env and gag nucleotide sequences of the South African isolates used in this study. Table S2: GenBank accession numbers, the country of origin and genotypes of the BLV full-length env, partial env and full length gag nucleotide sequences used in this study. Table S3: Pairwise comparison of BLV partial Env nucleotide and amino acid sequences between G1 South African and Zambian sequences. Table S4: Pairwise comparison of BLV partial Env nucleotide and amino acid sequences between G4 South African and Zambian sequences. Table S5: Mean nucleotide and amino acid distances in the BLV full-length Env sequences within (intragenotype) and between (intergenotype) BLV strains from South Africa and other geographic regions worldwide.