A Novel Subtype of Bovine Hepacivirus Identified in Ticks Reveals the Genetic Diversity and Evolution of Bovine Hepacivirus

Hepaciviruses represent a group of viruses that pose a significant threat to the health of humans and animals. New members of the genus Hepacivirus in the family Flaviviridae have recently been identified in a wide variety of host species worldwide. Similar to the Hepatitis C virus (HCV), bovine hepacivirus (BovHepV) is hepatotropic and causes acute or persistent infections in cattle. BovHepVs are distributed worldwide and classified into two genotypes with seven subtypes in genotype 1. In this study, three BovHepV strains were identified in the samples of ticks sucking blood on cattle in the Guangdong province of China, through unbiased high-throughput sequencing. Genetic analysis revealed the polyprotein-coding gene of these viral sequences herein shared 67.7–84.8% nt identity and 76.1–95.6% aa identity with other BovHepVs identified worldwide. As per the demarcation criteria adopted for the genotyping and subtyping of HCV, these three BovHepV strains belonged to a novel subtype within the genotype 1. Additionally, purifying selection was the dominant evolutionary pressure acting on the genomes of BovHepV, and genetic recombination was not common among BovHepVs. These results expand the knowledge about the genetic diversity and evolution of BovHepV distributed globally, and also indicate genetically divergent BovHepV strains were co-circulating in cattle populations in China.


Introduction
The genus Hepacivirus, belonging to the family Flaviviridae, comprises a genetically diverse group of human and animal pathogens. The genome of hepaciviruses is an unsegmented, single-stranded, and positive-sense RNA molecule about 10 kb in length, which contains a 5 untranslated region (UTR) and a 3 UTR, and a single long open reading frame (ORF) encoding a single polyprotein. This polyprotein is further cleaved by cellular and viral proteases into three structural proteins (Core, E1, and E2) and seven nonstructural proteins (p7, NS2, NS3, NS4A, NS4B, NS5A, and NS5B) [1,2]. The Hepatitis C virus (HCV), the prototypical member of the genus Hepacivirus, is one of the leading causes of acute and chronic hepatitis, liver failure, and hepatocellular carcinoma in humans, and the infection rate of HCV worldwide is approximately 3%, with an estimated 58 million people suffering from chronic HCV infection (https://www.who.int/en/news-room/fact-sheets/detail/ hepatitis-c, accessed on 27 July 2021). HCV exhibits a restricted host range pattern, and humans are the only natural host, although experimental infection of HCV in chimpanzees is possible [3]. Since 2011, some HCV-like viruses have been identified from a wide variety of mammalian hosts, including dogs [4], horses [5], monkeys [6], bats [7], rodents [8,9], cattle [10,11], and donkeys [12]. Some HCV-like viruses have been assigned to additional Hepacivirus species, and designated as Hepacivirus A-N based on their phylogenetic relationships and host range [13]. HCV-like viruses have been described in non-mammalian

Tick Samples Collection
From June to July in 2020, 300 blood-sucking adult ticks were collected from cattle in Zhanjiang, Guangdong, Southern China. The tick species were identified following morphological criteria and further confirmed by sequencing and analyzing the 16S ribosomal RNA (rrs) gene of ticks [33]. Ticks were pooled, ten ticks per pool, and stored at −80 • C for further use.

RNA Extraction and Meta-Transcriptome Sequencing
Each pooled tick sample was soaked in 70% ethanol for 30 min, and washed with double distilled water three times. The samples were homogenized in 500 µL sterile phosphatebuffered saline (PBS), and their total RNA was extracted from 200 µL homogenates using the TRIzol LS reagent (Invitrogen, Carlsbad, CA, USA) and subsequently purified using the RNeasy Plus Mini Kit (Qiagen, Hilden, Germany). The quantity and quality of extracted RNA was evaluated with a NanoDrop 2000 (Thermo Fisher Scientific, Waltham, MA, USA). The extracted RNA was aliquoted and stored at −80 • C. One aliquot of each extracted RNA was then merged as two pools in equal quantity, and the quality of the pooled RNA was evaluated using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) before library construction and sequencing.
For library preparation, ribosomal RNA (rRNA) in the pooled RNA was removed using a Ribo-Zero-Gold (Epidemiology) kit (Illumina Inc., San Diego, CA, USA) following the manufacturer's instructions, and the remaining RNA was fragmented, reverse-transcribed, adaptored, purified, and examined by the Agilent 2100 Bioanalyzer and ABI StepOnePlus Real-Time PCR System. Paired-end (150-bp) sequencing was performed on the Illumina Hiseq2500 platform. All library preparation and sequencing were performed by Novogene (Tianjin, China).

Bioinformatics Analyses and Genome Sequence Determination
Sequencing reads were adaptor-and quality-trimmed using the FASTP program [34] before de novo assembly using the Megahit program [35], with default parameter settings. The assembled contigs were compared against the database comprising all reference virus proteins using the diamond BLASTX program [36] with an e-value threshold of 1e-4, and the putative viral contigs were further compared to the non-redundant nucleotide and protein database to eliminate false positives. The confirmed viral contigs with unassembled overlaps or from the same scaffold were merged using the SeqMan program implemented in the Lasergene software package (version 7.1, DNAstar, Madison, WI, USA). To verify the assembly results, reads were mapped back to the target contigs with Bowtie2 [37], and inspected using integrated genomics viewer (IGV) [38] for any assembly errors. Gaps between these contigs were filled by RT-PCR and Sanger sequencing. The genome terminal of virus was determined by using 5 /3 RACE kits (TaKaRa, Dalian, China) as described previously [39]. The complete viral genome was confirmed by Sanger sequencing with overlapping primers that covered the entire genome, which were designed based on the assembled sequences.

Sequence Comparison and Phylogenetic Analyses
The prediction of potential open reading frames (ORFs) was performed by ORFfinder (https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/ORFfinder/linux-i64/, accessed on 4 May 2021), and annotated based on comparisons against the non-redundant protein database. The cleavage sites for the viral polyprotein processing were extrapolated by manually comparing the polyprotein sequence with previously described BovHepV B1 strain identified in Germany [10]. N-glycosylation sites were predicted using NetNGlyc 1.0 (http://www.cbs.dtu.dk/services/NetNGlyc, accessed on 18 March 2012). The sequences of viruses were retrieved from the GenBank database, and sequences identities of nucleotide (nt) and amino acid (aa) were calculated by MegAlign program available within the Lasergene software package (version 7.1, DNAstar).
The phylogenetic relationships were estimated using the maximum-likelihood method (ML) implemented in PhyML version 3.0 [40], employing the general time reversible (GTR) nucleotide substitution model with a gamma (Γ)-distribution model of among-site rate variation and a proportion of invariable sites (i.e., GTR + Γ + I) and a subtree pruning and regrafting (SPR) branch-swapping algorithm. The support values on the phylogenetic trees were calculated from 100 bootstrap replicate.

Recombination Analyses
To find potential recombination events involved in the evolutionary history of BovHepV, we used the RDP, GENECONV, bootscan, maximum chi-square, Chimera, SISCAN, and distance plot methods available within RDP4 program [41]. The analyses were performed with default settings for the different test methods and a Bonferroni corrected p value cutoff of 0.05, and only sequences with significant evidence (p < 0.05) of recombination, namely, (i) detected by at least two methods and (ii) confirmed by phylogenetic analysis, were considered to represent strong evidence for recombination. Additionally, sequence alignment was also analyzed using similarity plot and bootscan analysis methods as implemented in Simplot version 3.5.1 [42].

Selection Pressures Analyses
The numbers of synonymous nt substitutions per synonymous site (dS) and the numbers of nonsynonymous substitutions per nonsynonymous site (dN) for each coding region of hepaciviruses identified in bovine, humans, and equine were calculated using the Nei-Gojobori model [43] implemented in MEGA version 7.0 [44]. To assess the selection pressure involved in the evolution of the hepacivirus genome, the site-specific selection pressures were estimated across the sequence of the entire polyprotein-coding gene using single-likelihood ancestor counting (SLAC) method as implemented in the Datamonkey web server (http://www.datamonkey.org, accessed 29 July 2010), and plotting the difference between dN and dS rates (dN − dS) for each codon. Additionally, the codon-specific analyses of the entire polyprotein-coding region were also assessed using three methods performed in the Datamonkey web server: fixed effects likelihood (FEL), fast unconstrained Bayesian approximation (FUBAR), and mixed effects model of evolution (MEME). Only codons with significance of p < 0.05 or a posterior probability > 0.95 identified by at least three methods were considered to be subject to positive selection [45].

Identification of Three BovHepV in Ticks
A total of 300 adult ticks, which were identified as Rhipicephalus microplus, were collected from cattle in Zhanjiang city, Guangdong province, from June to July in 2020. After default quality control (QC) and de-barcoding steps, a total of 10,627,704 and 7,924,932 paired-end clean reads were generated in the libraries of GDZJ-02 and GDZJ-0103, respectively. Through de novo assembly and compared against nr database using diamond BLASTX program, 10 contigs with 388 to 1112 nt in length were annotated as sequences of BovHepV, with 92.5% to 98.2% amino acid sequence identity.
We then identified BovHepV from the 30 original RNA samples (each was pooled from 10 ticks) using RT-PCR. The complete genomic sequences of the BovHepV in these three samples were determined using RT-PCR and the RACE method, and were designated as BovHepV/GDZJ02, BovHepV/GDZJ02-2, and BovHepV/GDZJ02-3. These sequences have been deposited in GenBank under accession numbers MZ221927, MZ540979, and MZ540980, respectively. Using the complete genome sequence of BovHepV/GDZJ02 as the reference sequence, 224 reads were remapped to this reference sequence and provided 92.1% genome coverage (8115 nt/8808 nt) with 99.7% pairwise identity at a mean depth of 3.8×, and the percentage virus reads of the total number of nonribosomal reads is 0.0038% (224 reads/5,883,446 reads) ( Figure 1).

Genomic Features of the Newly Identified BovHepV
The genomes of the above three BovHepV strains consisted of 8808 nt, with the same G + C contents of 52.7%, which was similar to those of other BovHepV strains (51.5-53.8%). These three genomes contain a large ORF with 8340 nt in size, flanked by 5 UTR (263 nt) and 3 UTR (205 nt). The large ORF encoded a predicted polyprotein of 2779 aa, which was further cleaved into 10 typical hepacivirus proteins in the order of Core-E1-E2-p7-NS2-NS3-NS4A-NS4B-NS5A-NS5B (Figure 2A). The putative cleavage sites specific for the processing of polyprotein were shown to be well conserved among the BovHepV strains (Table 1). Similar to HCVs and other hepaciviruses, two and seven N-glycosylation sites were also predicted in the pupative E1 and E2 proteins, respectively (Figure 2A).    Table 1.

Sequences Comparison of BovHepV
The complete polyprotein-coding gene of these three virus strains identified herein are highly homogenous to each other, with 99.6−99.8% nt identity and 99.6−99.8% aa identity ( Table 2). In addition, they shared 67.7−84.8% nt identity and 76.1-95.6% aa identity to other BovHepV strains identified worldwide, while sharing the highest identities (84.4−84.8% nt identity and 94.8−95.6% aa identity) with the strains identified from Brazil (accession numbers MG781019 and MG781018). Meanwhile, these three newly identified strains shared 67.7−81.7% nt identity and 76.1−94.6% aa identity with other strains identified in China (Table 2). According to the cut-off values for HCV subtyping that the nucleotide sequence identity < 85% [46], all known BovHepV strains globally can be classified into two genotypes, and genotype 1 has strains identified worldwide, while genotype 2 was only identified in China and Brazil. BovHepV in different genotypes showed 75.9−76.8% aa identity and BovHepV strains in the same genotypes showed 91.1−100.0% aa identity ( Table 2).

Phylogenetic Relationships of BovHepV
Phylogenetic relationships inferred based on the sequences of the complete polyprotein, NS3 and NS5 coding regions all suggested that BovHepV strains can be classified into two genotypes, and genotype 1 strains were clearly segregated into eight well-separated subtypes (A−H) including the novel one (subtype H) corresponding to the three BovHepV strains identified in this study, which was supported by >90% bootstrap value at the key nodes ( Figure 3). The three viruses identified in this study presented the closest relationship with that identified in Brazil (subtype D), but divergent from other sequences identified in China, suggesting the presence of multiple subtypes in China. Similar patterns were observed for Germany, Ghana, and Brazil ( Figure 3A-C). Remarkably, all sequences of BovHepV strains identified in China were divided into two genotypes and three subtypes in genotype 1 also indicate the high genetic diversity of BovHepV strains circulating in China.

Recombination and Mutation Analysis of BovHepV
No statistically supported recombination event was detected within BovHepV strains after systematic analyses. All subtypes of genotype 1 group clearly showed the high degree of similarity with each other ( Figure 2B). Among the genotype 1 strains, the sequence of subtype H exhibited more than 90% amino acid sequence identity within the nearly complete polyprotein sequences of subtypes A−G, except for a small decrease in the partial NS5A protein of BovHepV subtype F. In addition, the sequences of subtype H showed high amino acid sequence divergence with genotype 2 strains in core, E1, E2, p7, partial NS2 and NS5B proteins, that shared < 90% amino acid sequence identity.
No frameshift mutations were identified in the genomes of HCV, BovHepV, and equine hepaciviruses (EqHV). In contrast, nucleotide substitution and insertion or deletion likely dominate the evolution of these hepaciviruses ( Figure S1).

Selection Pressures on the Hepacivirus Genome
The dN and dS values of each site in the genome sequences of HCV, BovHepV, and EqHepV strains were calculated. Negatively selected sites (dN − dS < 0) were observed predominantly across the whole polyprotein-coding region of hepaciviruses (Figure 4), and the mean dN − dS of hepaciviruses identified in bovine, human, and equine were −1.55, −0.90, and −1.98, respectively. Meanwhile, all dN/dS ratio estimated in the coding sequences of ten proteins of hepaciviruses were lower than 1 (Table 3). Additionally, 1, 1, and 3 putatively positive selection sites in the polyprotein coding region of bovine, human, and equine hepaciviruses were predicted by three methods (FEL, FUBAR, and MEME) ( Table 4).    The GenBank accession number of viruses used in this analysis were shown in Table S1.

Discussion
As a worldwide distributed human pathogen, HCV has posed a great threat to human health that causes liver failure, hepatitis, and hepatocellular carcinoma. Likewise, BovHepV causes acute or persistent infections in cattle [10,23]. BovHepV infections in cattle have been identified in five continents, and two genotypes and several subtypes of BoVHepV have been described (Figure 3), suggesting the ubiquitous presence of this highly variable virus. In this study, the complete genomes of three novel BovHepV strains were determined and parsed. Sequences comparison and phylogenetic analysis suggested that these three BovHepV strains constituted a novel subtype group of BovHepV, and hence expanded the known diversity of BovHepV.
Previous studies based on the limited BovHepV sequences provided evidence that subtypes of BovHepV are associated with their geographic origins [10,11,25,26]. However, this study demonstrated that two or more genotypes or subtypes co-circulated in countries such as China, Germany, Ghana, and Brazil (Figure 3), suggesting a complex geographic distribution of BovHepV genotypes and subtypes worldwide. For example, this study suggested that at least four subtypes (three in genotype 1 and one in genotype 2) cocirculate in China. This could have resulted from frequent international trade of live cattle, which can facilitate transboundary transmission of BovHepV. In addition, BovHepV has only been detected in cattle from limited areas of China, i.e., Guangdong, Jiangsu, Yunnan, and Sichuan provinces [28][29][30][31]. Epidemiological surveys in broader areas may better reveal the genetic diversity and epidemiological characteristics of BovHepV circulating in China.
In the present study, both dN/dS ratio analyses in the coding sequences of individual proteins, and the site-specific selection pressures analyses across the entire polyproteincoding sequence confirmed the predominance of purifying selection in the genomic evolution of bovine hepaciviruses, which was consistent with previous studies conducted in other hepaciviruses [12,47,48], collectively suggesting that purifying selection is the dominant evolutionary pressure acting on the hepaciviruses genome. Interestingly, 1, 1, and 3 putative positively selected codons in the polyprotein coding region of bovine, human, and equine hepaciviruses were simultaneously predicted by FEL, FUBAR, and MEME methods. However, the functional significance of these sites putatively under positive selection should be experimentally assessed.
In this study, BovHepV strains were detected in blood-sucking ticks collected in cattle. However, as per the epidemiology of hepaciviruses, ticks are more likely to be the mechanical carrier rather than the arthropod vector for this virus, although ticks serve as the hosts of a variety of other viruses [13]. The low abundance of BovHepV in ticks in this study, and the similar situation of hepacivirus in Australian ticks [20], suggest that hepaciviruses identified in ticks were more likely to derive from the tick's vertebrate host and were present in the blood contained in the engorged tick rather than being from the tick itself, which need further verification.
In conclusion, a new BovHepV subtype was identified in ticks in Guangdong, China through unbiased RNA sequencing, which expands the knowledge about the genetic diversity and evolution of BovHepV, and shows extensive genetically divergent BovHepV strains in China.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/v13112206/s1, Figure S1: Multiple sequence alignment showing the example of variation in genomics of hepaciviruses in bovine, humans, and equine, Table S1: Information about the sequences used in this study.