Phylogenetic characterisation of Crimean-Congo haemorrhagic fever virus detected in African blue ticks feeding on cattle in a Ugandan abattoir.

: Crimean-Congo haemorrhagic fever virus (CCHFV) is the most geographically widespread tick-borne virus. However, African strains are poorly represented in sequence databases. In addition, almost all CCHFV sequence data have been obtained from cases of human disease, while information regarding circulation of the virus in tick and animal reservoirs is severely lacking. Here, we characterise the complete coding region of a novel CCHFV strain, detected in African blue ticks ( Rhipicephalus (Boophilus) decoloratus ) feeding on cattle in an abattoir in Kampala, Uganda. These cattle originated from a farm in Mbarara, a major cattle-trading hub for much of Uganda. Phylogenetic analysis indicates that the newly sequenced strain belongs to the African genotype II clade, which predominantly contains the sequences of strains isolated from West Africa in the 1950’s and South Africa in the 1980’s. Whilst, the viral S (nucleoprotein) and L (RNA polymerase) genome segments shared >90% nucleotide similarity with previously reported genotype II strains, the glycoprotein-coding M segment shared only 80% nucleotide similarity with the next most closely related strains, which were from India and China. This segment also displayed a large number of non-synonymous mutations previously unreported in genotype II strains. Characterisation of this novel strain adds to our limited understanding of the natural diversity of CCHFV circulating in both ticks and in Africa. Such data can be used to inform the design of vaccines and diagnostics, as well as studies exploring the epidemiology and evolution of the virus for the estab-lishment of future CCHFV control strategies.


Introduction
Crimean-Congo haemorrhagic fever virus (CCHFV) is a zoonotic pathogen primarily transmitted between vertebrate hosts through the bite of infected ticks, as well as through direct contact with infected livestock, ticks and bodily fluids [1]. CCHFV is the most geographically widespread of the tick-borne viruses; endemic throughout Africa, Asia, Eastern Europe and the Middle East, it circulates in an enzootic cycle between animals and ticks, sporadically emerging to cause severe outbreaks of human disease [2]. Outbreak fatality rates up to 30% have been reported, although CCHF symptoms can range in severity from mild febrile illness, to haemorrhagic fever and multi-organ failure [2].
CCHFV is classified within the genus Orthonairovirus (family Nairoviridae) and carries a tripartite RNA genome of negative polarity. The three genome segments are labelled small (S), medium (M) and large (L), according to their respective lengths [3]. The ~1.7 Kb S segment encodes the viral nucleoprotein (NP) and, in the reverse orientation, an overlapping non-structural (NSs) protein. The M segment is approximately 5.3 Kb and encodes a single polyprotein precursor, which is post-translationally cleaved into two envelope glycoproteins (Gn and Gc), several secreted glycoproteins (GP38, GP85 and GP160) and the non-structural M protein, NSm. The pre-Gn precursor also contains a hyper-variable region known as the mucin-like domain (MLD). The ~12.1 Kb L segment encodes the RNA-dependent RNA polymerase.
Strains of CCHFV display a high degree of genetic variability for an arthropod-borne virus, with nucleotide diversity as high as 20% for the S segment and 31% for the M segment [4]. Genotypes are assigned to each genomic segment independently based on phylogenetic analysis, with each genotype largely defined by a distinct geographical region: three are African (genotypes I, II and III), two Asian (genotypes IVa and IVb), and two European (genotypes V and VI). However, limited sampling and uneven representation of CCHFV sequences has hindered phylogeographic and evolutionary analyses [2,5]. This is particularly true for African strains, which are poorly represented in sequence databases in comparison to strains from Europe, Asia and the Middle East. Furthermore, the majority of CCHFV strains reported to date have been derived from cases of severe human disease, while insufficient sampling from tick vectors and reservoir hosts is almost certainly limiting our knowledge of the natural diversity of CCHFV and the potential for spillover of unanticipated viral variants.
Recent detection of an African genotype III strain following an outbreak in Spain [6], and the presence of African genotype II M segments in viruses obtained from pools of ticks in India and China [7,8] has highlighted the need for a more complete genomic dataset, in order to better understand the epidemiology of the virus, including viral transmission routes and the extent of re-assortment among genotypes. The potential for such re-assortment is of importance, as many molecular tests for CCHFV are only capable of detecting a restricted number of closely related genotypes [9]. It also remains to be seen whether recently reported vaccine candidates, including one based on the S and M segments of a Hoti strain (genotype V) [10] and another based on the M segment of IbAr102000 strain (genotype III) [11], are capable of eliciting a protective immune response against viruses belonging to other more distantly related genotypes.

Materials and Methods
Ethical approval for this study (NS 673) was granted by the Uganda National Council for Science and Technology Review Board on 31/07/2019. Seventy ticks were collected from cattle, goats and a pig in an abattoir in Kampala, Uganda between September -December 2019 (Table S1) as part of a study into the virome of ticks in Uganda. Ticks were immediately frozen in liquid nitrogen following collection and stored at -80°C until ready for processing. Individual ticks were morphologically identified using taxonomic keys [12] before being crushed using a sterile micropestle and resuspended in 250 µL of phosphate buffered saline (PBS). Solid debris was pelleted by low speed centrifugation (400 x g) and total nucleic acid (TNA) was extracted from 200 µL of the supernatant using the High Pure viral nucleic acid extraction kit (Roche) following the manu-3 of 13 facturer's instructions. The extracted nucleic acid was then lyophilized before shipping at ambient temperature to the University of Cambridge, UK for further analysis.
Upon arrival in the UK, TNA was reconstituted in 20 µL of RNase-free water (RFW) supplemented with 2 U of RNaseOUT (Thermo Fisher Scientific). Two µL of reconstituted TNA was subjected to reverse transcription in 20 µL reactions using Superscript III reverse transcriptase (Thermo Fisher Scientific) and priming with 50 ng random hexamers (Thermo Fisher Scientific), as per the enzyme manufacturer's instructions. Two µL from each reverse transcription reaction was used as direct input to screen for the presence of Nairovirus RNA by (RT-)PCR. PCR was performed in 20 µL reactions, using Phusion DNA polymerase (New England Biolabs) and 0.5 µM pan-nairovirus primers targeting the S segment [13], set up according to the enzyme manufacturer's instructions. PCR cycling conditions were as follows: incubatation at 98 °C for 2 minutes, followed by 45 cycles of 98 °C for 5 secs, 60 °C for 20 secs and 72 °C for 20 secs. A final extension was performed at 72 °C for a further 5 mins, followed by incubation at 4 °C. The resulting PCR products were visualised by gel electrophoresis and amplicons of the expected size (~400 bp) were individually purified and confirmed to be CCHFV by Sanger dideoxy sequencing.
Five RT-PCR positive ticks from Mbarara were selected at random for further characterisation by Illumina metagenomic sequencing. Five µL of TNA from each tick was pooled for a total volume of 25 uL before being treated with 2 U of TURBO DNase (Ambion), incubating for 20 mins at 37 °C. DNase-treated RNA was purified using RNA clean & concentrator-5 spin columns (Zymo Research), eluted in 15 µL of RFW and quantified using the Qubit High Sensitivity RNA assay (Invitrogen). Illumina sequencing libraries were prepared from 200 ng of purified RNA using the Zymo-Seq RiboFree Total RNA Library kit (Zymo Research), which inccorporates a ribosomal RNA depletion step. The resulting library was sequenced on the Illumina NextSeq 500 platform, using the 300-cycle, mid-output kit (v2.5). Resulting paired-end reads were imported into CLC Genomics Workbench v7.5.1 (Qiagen) and sequences sharing 95% sequence identity over 35% of their length were assembled de novo. Contiguous sequences corresponding to the CCHFV S, M and L segments were identified using BLASTn against the redundant nucleotide sequence database with default settings [14].
The CCHFV genome segments (S, M, and L) were independently characterised through phylogenetic analysis. MAFFT v7.427 [15] was used to align the open reading frame of each genome segment against a representative set of CCHFV sequences from GenBank, including members of each genotype (Africa, Asia & European etc.) and all available sequences belonging to the genotype to which Mbarara strain was found to belong (genotype II). For the S and L segments, complete ORF sequences were analysed. For the M segment, an 857 bp region at the 5' end, corresponding to the MLD, was excluded from phylogenetic analysis due to the extreme variability of the region, which produced sub-optimal alignments. Inclusion of this region did not significantly alter phylogenetic clustering, however bootstrap values were adversely affected. Following trimming of the M segment alignment, the final datasets were 1,449, 4,323 and 11,838 nt long and included 50, 56 and 46 sequences for S, M and L segments, respectively.
Maximum likelihood phylogenies were calculated using IQTree v1.6.11 [16], employing 1,000 iterations of the ultrafast bootstrap estimation [17]. Optimal substitution models for each alignment were selected using the ModelFinder algorithm [18] according to Bayesian information criteria (GTR+F+I+G4 for S and L segments and GTR+F+R3 for M segment). Nucleotide identity matrices were generated for each segment using CLUS-TAL Omega [19] and intra-genotype nucleotide sequence similarity plots were generated using the EMBOSS v6.6.0 [20] Plotcon function across a window size l = 4. Mid-point rooted, maximum likelihood phylogenies and sequence similarity plots were visualised in R v3.6.3 [21] using the ggplot2 [22] and ggtree [23] packages. Genotype II M segment sequences were also analysed for evidence of recombination using the RDP5 tool [24] and for evidence of episodic diversifying selection using the adaptive branch-site random effects likelihood (aBRSEL) method [25]. Nucleotide alignments of genotype II sequences were also translated into predicted amino acid alignments in Seaview v4.7 [26] and polymorphic and informative sites were identified using the ape v5.0 package [27] in R v3.6.3.

Results
A total of 70 ticks were collected from cattle (n = 48), goats (n = 21) and a pig (n = 1) and morphologically identified as Rhipicephalus appendiculatus (n = 16), Rhipicephalus (Boophilus) decoloratus (n = 40) and Ambylomma variegatum (n = 14) (Table S1). Nairovirus RNA was detected by RT-PCR in 8 of the 70 ticks (11.4 %). These 8 ticks were all identified as the African blue tick Rhipicephalus (Boophilus) decoloratus and all were picked from cattle. Seven of the CCHFV positive ticks were sampled from cattle originating from a farm in Mbarara, western Uganda, and 1 from a farm in Nakasongola in central Uganda, as determined based on cattle movement permits. Sanger dideoxy sequencing of the RT-PCR products confirmed the presence of CCHFV and revealed all 8 400 bp fragments of the S segment to be identical. Phylogenetic analyses revealed that all 3 segments of the Mbarara CCHFV strain clustered within the African genotype II (Africa 2) clade, together with strains from the Democratic Republic of Congo, Uganda and South Africa (Figures 1-3). The S and L genome segments were highly similar to genotype II strains previously reported, sharing between 91.79 -94.34 % nucleotide identity (98.76 -99.38 % amino acid identity) and 91.57 -94.12 % nucleotide identity (97.49 -98.56 % amino acid identity), respectively ( Figure S1).   tide sequence similarity within the genotype II clade was lowest (~40 %) in the 5' region that encompasses the signal peptide and mucin-like domains (Figure 4). Regions coding for the secreted glycoprotein GP38 and the surface glycoproteins Gn and Gc also displayed some degree of intra-genotype nucleotide variability (80 -85 % similarity), whilst nucleotide conservation was highest in the non-structural coding region. Analysis of the M segment for recombination and diversifying episodic selection revealed no evidence for either (p > 0.05). Alignment of genotype II predicted amino acid sequences revealed 164 residues unique to the Mbarara strain (singletons) ( Table 1). These novel residues were located throughout all 3 genomic segments, at a frequency ranging from average to above average for the genotype. The majority of singletons (148/164) were observed in the M segment, in which Mbarara strain displayed the greatest number of unique residues across every gene, including the MLD, the non-structural NSm and the glycoproteins GP38, Gn and Gc.  * Sites with at least two alleles. † Sites with at least two alleles, where each of those alleles is found in at least two sequences. ‡ Sites displaying amino acid residues unique within the genotype II clade. § Genotype II strains, excluding Mbarara strain.

Discussion
Here we describe the detection and characterisation of a genotype II strain of CCHFV, present in African blue ticks feeding on cattle from a farm in Mbarara, Uganda. To our knowledge, CCHFV has only rarely been described in this tick species (Rhipicephalus (Boophilus) decoloratus): twice in Senegal (1972 and1975) and twice in Uganda (1978 and [1, 28]. It was not possible to determine whether the virus was derived from within the tick or the host bloodmeal. However, the tick appears to be the more likely source, as a limited number of experimental studies have found CCHFV viremia to be transient and short lived in cattle [29]. As a single-host tick species, Rhipicephalus (Boophilus) decoloratus is unlikely to play a major role in zoonotic transmission of the virus. However, it is possible that these ticks may be acting as a reservoir for the virus, maintaining infection of the animal host and facilitating the infection of major vector species (e.g Hyalomma spp.) through co-feeding [30]. Additionally, a number of two-and three-host Rhipicephalus species have also been shown to support the presence of CCHFV [30]. Therefore, given that Rhipicephalus ticks are common throughout Africa, further investigation into their role with regards to CCHFV epizootology and epidemiology is clearly warranted. Previous reports of high seroprevalence in livestock and animal handlers throughout Africa (e.g Bukbuk et al 2016 [31]), suggest that CCHFV infected ticks and/or animals can often be found in abattoirs, highlighting the potential for focusing surveillance measures around these facilities. Notably, the majority of CCHFV-infected ticks sampled from the abattoir were picked from cattle originating from Mbarara. This district, situated in the cattle corridor of western Uganda, is a major source of beef and dairy products for the country. Indeed, cattle herds in Nakasongola have been largely sourced from Mbarara, following their depletion during rebel occupation. This may account for the surprisingly high degree of similarity between the Mbarara CCHFV strain and the strain detected in a tick picked from a cow from Nakasongola. Further sampling of CCHFV strains from other regions of Uganda is required in order to examine transmission dynamics within the country. Possible alternative explanations for this similarity between the CCHFV strain detected in Mbarara and Nakasongola ticks include host-switching of the tick upon arrival at the abattoir, or that the CCHFV-infected tick population originated from the abattoir itself.
Two of the Mbarara strain genomic segments: the nucleocapsid (S) and RNA polymerase (L) showed a high degree of nucleotide conservation, sharing >90 % similarity with other African genotype II sequences, despite the fact that these were derived from strains isolated several decades ago. A similar observation has previously been made for viruses from Pakistan and Nigeria [31], suggesting a tendency for these genomic segments to remain genetically stable over time. In contrast, sequence diversity was greatest in the surface glycoprotein-coding (M) segment, which shared only 78 % nucleotide similarity with strains previously reported from Africa and elsewhere. Interestingly, the closest related M segment sequences belonged to several strains from Asia (e.g. MCL-19-T-1929 (India, 2019) and YL04057 (China, 2004)), which had also clustered within the African genotype II clade, although nucleotide homology was still low (80 %). The S and L segments of these Asian strains belonged to the Asian genotype IV, indicating that the M segment had likely been obtained through re-assortment with an African virus, a process made possible through intercontinental movement of CCHFV-infected ticks on livestock and migratory birds [32]. It is unclear whether the relatively low homology of the Mbarara strain M segment to previously reported genotype II strains, is due to it belonging to a novel lineage or simply the result of several decades of evolution. Testing this theory is made difficult by the paucity of closely related sequences. However, the stark dissimilarity to previously reported genotype II M segment sequences, in contrast to the S and L segments, implies recombination or re-assortment with an as of yet unidentified viral lineage. Unfortunately, many methods designed to identify instances of recombination and re-assortment are reliant upon inclusion of the parent lineages in the analysis, which is not possible with the current, small sample of genotype II strains.
Unique amino acid residues were detected throughout the Mbarara strain genome. However, much of the observed diversity, at both nucleotide and amino acid levels, was located in the genomic M segment. A high proportion of this variability could be attributed to the region encoding the MLD, a highly glycosylated protein of unproven function but hypothesized to be involved in shielding viral epitopes against the host immune response, based on the role of a similar domain in filoviruses [33]. This hypothesis is supported by a recent epitope-mapping study, which reported a high degree of immune reactivity by CCHFV survivors against the MLD [34]. Amino acid variation was also high in the genes encoding the viral glycoproteins, the secreted glycoprotein GP38 in particular. The function of GP38 is similarly unknown, however it is believed to act as a chaperone, assisting in PreGn folding [3]. Interestingly, GP38 has recently shown promise as an antigen following the use of anti-GP38 monoclonal antibodies to protect mice against lethal challenge by CCHFV [35]. Knowledge of the spectrum of GP38 variants in animal and vector reservoirs may therefore prove useful for the design of future vaccines.

Conclusions
A severely neglected pathogen, the lack of genomic data from African CCHFV makes exploration of the evolutionary history of this novel strain difficult. In addition, a lack of viral data from animal and tick reservoirs has highlighted the need to better characterise the natural variation of CCHFV strains in circulation outside of severe, human cases of disease. Indeed, to our knowledge only one other CCHFV strain has been reported from African ticks (IbAr102000). This genotype III virus was isolated in 1966 from a Hyalomma tick taken from a Camel in Nigeria. Notably, the virus was passaged multiple times over several decades prior to sequencing, likely introducing a number of mutations throughout this period. It is therefore uncertain how representative this sequence is of the original virus, particularly for a tick virus adapting to mammalian cell culture. In conclusion, generation of additional CCHFV genome sequence directly from ticks, such as that presented here, is vital in order to grant a more complete understanding of CCHFV ecology, transmission and evolution, all of which are necessary for the effective design and deployment of future vaccines and molecular diagnostic tests.
Supplementary Materials: Figure S1: Pairwise nucleotide identity comparisons for CCHFV genome segments S, M and L,