Whole-Genome Sequencing of Six Neglected Arboviruses Circulating in Africa Using Sequence-Independent Single Primer Amplification (SISPA) and MinION Nanopore Technologies

On the African continent, a large number of arthropod-borne viruses (arboviruses) with zoonotic potential have been described, and yet little is known of most of these pathogens, including their actual distribution or genetic diversity. In this study, we evaluated as a proof-of-concept the effectiveness of the nonspecific sequencing technique sequence-independent single primer amplification (SISPA) on third-generation sequencing techniques (MinION sequencing, Oxford Nanopore Technologies, Oxford, UK) by comparing the sequencing results from six different samples of arboviruses known to be circulating in Africa (Crimean–Congo hemorrhagic fever virus (CCHFV), Rift Valley fever virus (RVFV), Dugbe virus (DUGV), Nairobi sheep disease virus (NSDV), Middleburg virus (MIDV) and Wesselsbron virus (WSLV)). All sequenced samples were derived either from previous field studies or animal infection trials. Using this approach, we were able to generate complete genomes for all six viruses without the need for virus-specific whole-genome PCRs. Higher Cq values in diagnostic RT-qPCRs and the origin of the samples (from cell culture or animal origin) along with their quality were found to be factors affecting the success of the sequencing run. The results of this study may stimulate the use of metagenomic sequencing approaches, contributing to a better understanding of the genetic diversity of neglected arboviruses.


Introduction
Recent pandemics have illustrated that emerging and re-emerging infectious diseases are of utmost importance for the global population. Despite not being a novel phenomenon, the worldwide transport of passengers and cargo, extensive land use, and the ongoing increase in the world population combined with urbanization and deforestation are favoring the emergence and accelerating the spread of pathogens [1]. Most of the emerging infectious diseases are caused by zoonotic pathogens, with the importance of vector-borne diseases having increased greatly in recent decades [2]. Especially in Africa, there are a large number of (neglected) arthropod-borne viruses (arboviruses) with zoonotic potential, and yet little is known of most of these viruses regarding their actual distribution, life cycle, host ranges, and genetic diversity [3]. Therefore, it is of major importance to investigate these tropical arboviruses in order to reduce the threat they pose to human and animal health and to proactively prevent large-scale emergence [2]. Alongside reliable molecular diagnostics such as RT-qPCRs, producing longer sequence reads (up to the full genomes) is essential for the phylogenetic characterization of viruses. First-generation sequencing (e.g., The results of this study may contribute to, as well as encourage, the generation and provision of viral sequences of neglected tropical arboviruses, allowing a better understanding of their genetic diversity and distribution.

Virus Samples, Metadata and Cultivation
Six different viruses belonging to four different genera were analyzed by nanopore sequencing. For four of them, two different types of samples were tested and compared: samples of animal origin (vectors or hosts) from field studies or experimental animal trials and samples derived from cell culture. Moreover, for two selected viruses (CCHFV and RVFV), the obtained number of specific reads was compared for three different Cq values of the samples.
RNA extraction of in vitro samples was performed using the QIAamp Viral RNA Mini Kit (Qiagen, Hilden, Germany). The extraction of RNA from samples of animal origin was conducted using the NucleoMag ® VET kit (MACHEREY-NAGEL GmbH &Co. KG, Düren, Germany) and a King Fisher extraction device (Thermo Fisher Scientific, Waltham, MA, USA) according to the manufacturer's instructions.

CCHFV
The African strain Ibar10200 (Africa-I lineage) was grown on Vero E6 cells (African green monkey kidney cells, Collection of Cell Lines in Veterinary Medicine, Friedrich-Loeffler-Institut, FLI; CCLV-RIE 0929) under biosafety level (BSL)-4 conditions. Analysis of the extracted sample by RT-qPCR [22] revealed a Cq value of 21. The CCHFV field samples originated from a study conducted in Mauritania in 2018 [23]. In this survey, Hyalomma ticks were collected from cattle and camels. The ticks were individually homogenized in AVL lysis buffer, and then RNA extraction was performed. For a better assessment of the influence of the Cq value on the quality of the sequencing output, three positive ticks with different Cq values (19; 26; 30) were selected for sequencing. The three field samples belonged to the lineages Africa I and III.

RVFV
The live-attenuated RVFV vaccine strain MP-12 was grown on Vero 76 cells (African green monkey kidney cells, Collection of Cell Lines in Veterinary Medicine, FLI; CCLV-RIE 0228). After heat inactivation in AVL lysis buffer at 70 • C for 10 min, RNA extraction of the supernatant was performed. The following RT-qPCR [24] yielded a Cq value of 20. Furthermore, RNA from positive tissue samples originating from black rats (Rattus rattus) that were infected with RVFV strain 35/74 under BSL-3 laboratory conditions [25] was used as samples of animal origin. As for CCHFV, three samples with three different Cq values (lungs 20, kidney 24, and spleen 30) were applied for nanopore sequencing.

DUGV
The Nigerian DUGV prototype strain IbAr 1792 (kindly provided by the World Reference Center for Emerging Viruses and Arboviruses, University of Texas Medical Branch (UTMB), Galveston, TX, USA) was grown on SW13 cells (human adrenal gland cells, kindly provided by Karolinska Institutet, Solna, Sweden), and RNA was extracted from the viruscontaining supernatant. A Cq value of 15 was determined by RT-qPCR [26]. The field sample originated from a DUGV-positive Amblyomma tick collected in 2018 in Nigeria [27]. The tick was individually homogenized in AVL lysis buffer before RNA extraction was conducted, and RT-qPCR showed a Cq value of 20.

NSDV
The NSDV strain IG619 (kindly provided by the World Reference Center for Emerging Viruses and Arboviruses, UTMB, Galveston, TX, USA) was grown on SW13 cells. In RT-qPCR [28], the extracted RNA of the cell culture supernatant showed a Cq value of 15. A Pathogens 2022, 11, 1502 4 of 12 positive tissue (bovine liver) sample from an NSDV animal infection trial conducted under BSL-3 laboratory conditions [28] was used as a sample of animal origin. This sample had a Cq value of 23.

MIDV and WSLV
The MIDV strain MT MP160 and the WSLV strain SA H177 (kindly provided by the World Reference Center for Emerging Viruses and Arboviruses, UTMB, Galveston, TX, USA) were grown on BHK-21 cells (baby hamster kidney cells, Collection of Cell Lines in Veterinary Medicine, FLI, CCLV-RIE 0164), and RNA extraction was performed from the virus-containing supernatants. By using RT-qPCR (Supplementary Table S1), Cq values of 19 (MIDV) and 25 (WSLV) were obtained. Since no positive field samples or material from experimental infection trials of those viruses were available, only the cell culture supernatants could be applied for sequencing.

SISPA and Sample Preparation for Nanopore Sequencing
The SISPA methodology was carried out using primers and PCR conditions as outlined in the protocol published by Peserico et al. [15]. In the RT step, the first SISPA primer (GCCGGAGCTCTGCAGATATCNNNNNN) and nNTPs (1 µL each) were mixed with 11 µL of viral RNA and incubated at 65 • C for 5 min. Afterwards, a second master mix (4 µL SSIV buffer 5×; 1 µL DTT; 1 µL RNase Inhibitor; and 1 µL SSIV Reverse Transcriptase (SuperScript IV Reverse Transcriptase Kit; Invitrogen, Waltham, MA, USA)) was added and incubated (23 • C for 10 min; 50 • C for 50 min; 80 • C for 10 min/one cycle each) in a GeneTouch Plus Thermal Cycler (Biozym Scientific GmbH, Hessisch Oldendorf, Germany). Double-strain synthesis was performed by adding 1 µL of a Klenow polymerase (New England Biolabs, Ipswich, MA, USA) under the following conditions: 37 • C for 60 min and 75 • C for 10 min. The amplification of 5 µL of ds cDNA was carried out after adding the third master mix (5 µL 10× PfU Ultra II reaction buffer; 1 µL PfU Ultra II Fusion HS DNA Polymerase (both Agilent, Santa Clara, CA, USA); 1.25 µL dNTPs (Invitrogen, Waltham, MA, USA); 1 µL of the second SISPA primer; and 36.75 µL nuclease-free water). Hereby, the following temperature profile was used: initial denaturation for 1 min/95 • C; DNA denaturation for 20 s/95 • C; annealing for 20 s/65 • C; extension for 3 min/72 • C; and final extension for 3 min/72 • C. DNA denaturation, annealing and extension were repeated for 45 cycles. Moreover, an additional SISPA primer (GACCATCTAGCGACCTCCAC-NNNNNNNN) by Chrzastek et al. [29] was used in the same concentrations and quantities as described in the protocol above [15]. The difference between these primers lies in the length of the 5 tag N (6 N vs. 8 N) of the binding site; while the barcode length is identical for both (20 bp), they differ in their sequence. In order to evaluate whether one primer set is more suitable for the generation of specific reads, each virus sample was sequenced individually using one of the two SISPA primer sets. After the SISPA amplification step, all samples were purified using AMPure XP magnetic beads (Beckman Coulter, Brea, CA, USA) in an ×1.8 sample volume to bead volume ratio, followed by a sample library preparation for MinION sequencing according to a previously published and adapted protocol [30]. This protocol combines the SQK-LSK109 kit with the EXP-NBD104 kit (both from Oxford Nanopore Technologies, Oxford, UK) to allow simultaneous sequencing of multiple samples. The prepared library was spotted onto a Flow Cell R9.4.1 (FLO-MIN106D, Oxford Nanopore Technologies, Oxford, UK) and sequenced with a MinION Mk1C instrument (Oxford Nanopore Technologies, Oxford, UK). Figure 1 provides an overview of the workflow. Sequencing was run for at least 48 h until all pores of the flow cell were depleted. For each run, six to ten different barcoded samples were sequenced. Usually, only reads of barcodes dedicated to a specific sample are used to build up the consensus sequence, whereas the unclassified reads (all reads that were unable to be assigned to any of the barcodes used) are not included in the evaluation. Those unlabeled reads represent a potpourri of DNA sequences of all samples in one sequencing run and thus can be used to search for additional virus-specific reads. Since two samples of the same virus from different origins were never included within the same run, both the classified and unclassified reads could be evaluated.
vides an overview of the workflow. Sequencing was run for at least 48 h until all pores of the flow cell were depleted. For each run, six to ten different barcoded samples were sequenced. Usually, only reads of barcodes dedicated to a specific sample are used to build up the consensus sequence, whereas the unclassified reads (all reads that were unable to be assigned to any of the barcodes used) are not included in the evaluation. Those unlabeled reads represent a potpourri of DNA sequences of all samples in one sequencing run and thus can be used to search for additional virus-specific reads. Since two samples of the same virus from different origins were never included within the same run, both the classified and unclassified reads could be evaluated.

Analysis of MinION Sequence Data
The steps of the data analysis are shown in Panel VI "Evaluation" of Figure 1. Basecalling is the initial process of assigning nucleobases to electrical current changes as a result of nucleotides passing through the nanopores. Raw signals (Fast5 raw data reads) are translated into nucleotide sequences, and these sequences are provided for downstream analysis. After that, reads are demultiplexed (NGS reads are assigned to the sample of their corresponding barcode) and trimmed (removal of adapter sequences and low-quality bases). In this study, Fast5 raw data reads produced by the arbovirus libraries were base-called (high accuracy), demultiplexed and trimmed using the Mk1C sequencer (Guppy version 3.2.10, Oxford Nanopore Technologies). Additional demultiplexing and adaptor removal were performed using Porechop on the NanoGalaxy platform [31]. Basecalled and demultiplexed sequencing data quality was assessed with NanoPack (version 1.13.0, https://github.com/wdecoster/NanoPlot; accessed on 1 August 2022). Reads with a minimum quality of 7 were considered for further analysis. For consensus sequence generation from trimmed FastQ reads, alignment against redundant databases and mapping with reference genomes (version 20, https://rvdb.dbi.udel.edu/; accessed on 3 September 2020) were performed using k-mer alignment (KMA) [32] and Minimap2 [33]. The KMA readouts were used for computing the genome coverage and accuracy of the consensus sequence. Table 1 provides an overview of the total and the virus-specific read counts that were received for all six arboviruses (CCHFV, RVFV, NSDV, DUGV, WSLV and MIDV), including outcomes for the different origins (animals and cell culture). The read counts obtained by using the two different SISPA primer sets [15,29] and the total number of reads (including both primer sets as well as unclassified reads) are given for each sample.

Analysis of MinION Sequence Data
The steps of the data analysis are shown in Panel VI "Evaluation" of Figure 1. Basecalling is the initial process of assigning nucleobases to electrical current changes as a result of nucleotides passing through the nanopores. Raw signals (Fast5 raw data reads) are translated into nucleotide sequences, and these sequences are provided for downstream analysis. After that, reads are demultiplexed (NGS reads are assigned to the sample of their corresponding barcode) and trimmed (removal of adapter sequences and low-quality bases). In this study, Fast5 raw data reads produced by the arbovirus libraries were basecalled (high accuracy), demultiplexed and trimmed using the Mk1C sequencer (Guppy version 3.2.10, Oxford Nanopore Technologies). Additional demultiplexing and adaptor removal were performed using Porechop on the NanoGalaxy platform [31]. Base-called and demultiplexed sequencing data quality was assessed with NanoPack (version 1.13.0, https: //github.com/wdecoster/NanoPlot; accessed on 1 August 2022). Reads with a minimum quality of 7 were considered for further analysis. For consensus sequence generation from trimmed FastQ reads, alignment against redundant databases and mapping with reference genomes (version 20, https://rvdb.dbi.udel.edu/; accessed on 3 September 2020) were performed using k-mer alignment (KMA) [32] and Minimap2 [33]. The KMA readouts were used for computing the genome coverage and accuracy of the consensus sequence. Table 1 provides an overview of the total and the virus-specific read counts that were received for all six arboviruses (CCHFV, RVFV, NSDV, DUGV, WSLV and MIDV), including outcomes for the different origins (animals and cell culture). The read counts obtained by using the two different SISPA primer sets [15,29] and the total number of reads (including both primer sets as well as unclassified reads) are given for each sample.

Results
Important quality parameters of the sequencing results (coverage, depth, read quality, read length and identity levels) of all samples are summarized in Table 2. Likewise, Table 2 includes the results obtained with the two different SISPA primer sets for preamplification, as well as the complete results, i.e., the results obtained with the two primer sets combined with the unclassified reads. Table 1. Total and virus-specific read counts of all examined samples by using two different primer sets [15,29] and by combining results of both primer sets and unclassified reads. The increase (%) in specific reads while using unclassified reads is indicated in parentheses.

Virus
Sample      [15]. C = Chrzastek, Lee, Smith, Sharma, Suarez, Pantin-Jackwood and Kapczynski [29]. U = unclassified reads. -= no data available. Total reads = the total number of reads at the end of the sequencing run that is included in the downstream analysis. Specific reads/segment = the number of reads that align to a known reference genome/genome segment. Depth = the ratio between the total number of bases yielded by sequencing and the size of the genome. Genome coverage = the average number of reads that align to a known reference genome/genome segment. Mean read quality = the probability of a base being called incorrectly. A higher score indicates that a sequence is actually correct, and a lower score indicates that the sequence is more likely to be incorrect. Read length N50 = the length of the shortest read in the group of longest sequences that together make up (at least) 50% of the nucleotides in the sequence set (based on the median and mean length of a set of sequences). Identity levels in percent = the number of nucleotide matches in the alignment (aligned with known reference genome, matched or mismatched).
The generated genome sequences are deposited in Supplementary Table S2. The highest numbers of reads were found for unsegmented viruses, namely MIDV (400,767) and WSLV (296,123; both Table 1). On the other hand, considerably lower read numbers were obtained for the examined segmented bunyaviruses (DUGV L-segment: <57,423; CCHFV L-segment: <6691; NSDV L-segment: <4696; RVFV L-segment: <2567) ( Table 1). By using the described protocol [15], the primers of Chrzastek et al. [29] showed less efficiency compared to the primer set of Peserico et al. [15]. The best results were achieved by combining the sequencing results of the two different primer sets for the respective samples (Tables 1 and 2). Using the unclassified reads, the output of specific reads could even be enhanced to a total of 13-57% (Table 1, P+C+U). The data obtained for the different CCHFV and RVFV samples indicated that as the Cq values in the PCR decrease, more specific reads are obtained in the sequencing run (Table 1).
Genome assembly (de novo and map-to-reference) was successfully performed for samples with low Cq values (Table 2), and identities of more than 98% of the investigated viruses with reference sequences (Supplementary Table S3) were achieved in all target genomes (segments). Full genome sequences could be generated for all samples that showed Cq values of less than 22 in the respective qRT-PCR. Samples with Cq values of 23 to 27 showed varying results depending on the virus sequenced and the origin of the sample (e.g., generation of the whole genome of a WSLV cell culture sample with Cq 25). For the two samples with Cq values of 30, only a few specific reads were found (Tables 1 and 2). In comparison to samples of animal origin, more specific reads were produced when sequencing cell culture isolates (Table 1). For NSDV, good coverage was achieved only for the cell culture sample (Cq = 18), whereas the bovine sample (Cq = 23) did not yield any sequencing results (Table 2). Identity levels (KMA) to the reference sequences ( Table 2) ranged from 77.45% to 99.9%, while the coverage varied much more, from 1.19% to 100% (Table 2). Similar to the mean reading quality (Q), these two values were lower for samples with higher Cq values and/or for samples of animal origin.

Discussion
In recent years, third-generation sequencing with nanopores using MinION devices has become a reliable alternative and/or complement to second-generation sequencing techniques. Due to its small size and lower acquisitional costs, the device can be a valuable game changer, improving diagnostic capacities both in the field and in well-equipped laboratory facilities. Presequencing enrichment is a crucial aspect of sample preparation. In this context, virus-specific whole-genome PCRs are considered the gold standard, since due to their high specificity, whole-genome sequences can be obtained even with lower viral loads in the samples. However, more or less complex primer mixes have to be prepared depending on the virus to be sequenced. In some cases, very genetically diverse viruses such as CCHFV share only 70-80% of genetic identity among various strains [34], thus requiring different primer mixes for each strain amplification. Furthermore, the large amounts of viral amplicons produced by whole-genome PCRs can bear a potential risk of laboratory contamination. Another approach for a broad enrichment of viral genomic RNAs in cell culture and animal samples is the so-called SISPA technique, which allows a more open-view approach due to its nonselective amplification. In the studies herein described, we have therefore assessed the suitability of this nonspecific enrichment method as a preamplification step for MinION sequencing of different arboviruses occurring in Africa and compared the MinION sequencing results obtained from different sample types.
A comparison of different samples of animal origin of CCHFV and RVFV showed that the highest number of specific reads was found in samples with lower Cq values, whereas the number of reads declined with increasing Cq values (Table 1). In contrast, no specific reads were obtained for an NSDV sample with Cq = 23, while more than 300,000 reads were found for WSLV with a Cq value of 25. Due to the limited comparability of different PCR assays, the Cq value can only be used as a vague indicator of the expected sequencing data outcome. However, the results of this study indicate that good sequencing performance can be expected for samples with Cq values below 22 when utilizing SISPA as a preamplification step for MINION sequencing. Besides a quantitative benchmark, sample quality should also be considered. Time and storage conditions of RNA samples strongly affect the quality of the nucleic acids [35]. In the case of field samples, a considerable amount of time can elapse from collection to the transportation/actual analysis in the lab, often making it difficult to fully maintain the cold chain.
The quality of the reads and results obtained ( Table 2) generally correlated with the number of reads generated for each sample (Table 1), i.e., the lower the Cq value, the better the quality of the results. Moreover, samples derived from cell culture supernatants performed very well for most of the viruses studied, especially in the case of WSLV and MIDV (Table 1). That might be explained by the higher degree of purity of cell culture supernatants (less foreign and interfering DNA, less nucleases, etc.) and the fact that the samples can be further processed immediately without longer transport distances. All cell culture samples also resulted in a better mean read quality compared to the respective animal samples (Table 2).
In general, the primer set of Peserico et al. [15] appeared to be more efficient, which was to be expected since the SISPA protocol used was designed for this primer pair and was not specifically adapted for the other primer set [29]. Interestingly, RVFV seems to be the only one of the six viruses examined for which the primers of Chrzastek et al. [29] performed better. Based on our data, it is rather difficult to determine whether the length of the 5 tag N, the difference in the barcode sequence, or the fine-tuning of the original protocol using the primer set of Peserico et al. was responsible for those findings. The possibility of including reads that were unclassified in the first iteration resulted in a higher number of specific reads that ranged from 13% to 100% (Table 1). Since every specific read is valuable for assembling the target DNA/RNA alignment using nonspecific primers for sample enrichment, this can be a very helpful supplement. The dual primer set approach and consideration of unclassified reads also resulted in considerably better coverage and depth for all viruses. Therefore, to increase data yield in multiplexed sequencing runs, it seems advisable to sequence samples in duplicates (preferably with two different primer sets) if sufficient RNA material is available and to use the unclassified reads. It has to be mentioned that unclassified reads can solely be used if only one sample of the same origin (alone or in duplicate) is applied in one sequencing run (e.g., two samples of CCHFV from the same tick or two samples of CCHFV from the same cell culture supernatant). Unclassified reads cannot be distinguished when multiple samples of the same virus but of different origin have been sequenced in the same run (e.g., CCHFV from a tick and CCHFV from cell culture supernatant).
In this study, it has been shown that with good sample quality, the use of SISPA amplification and MinION sequencing can provide a nearly complete genome sequence of the virus in most cases. Regardless of read quality and coverage, very good identity levels (between 90-100%) were achieved for all viruses when comparing them with reference sequences in the public database. Even for the RVFV cell culture sample, excellent identity levels of 98.4-99.9% were obtained despite a comparatively low coverage of 11.29-24.97% (Table 2B).
In summary, this study demonstrates and underlines the broad applicability of enrichment with SISPA for MinION sequencing. As the method allows the generation of viral (full) genomes without the need for virus-specific whole-genome PCRs, the main application of SISPA consists in the sequencing of a broad range of pathogens previously detected by different PCR assays to obtain an initial overview of the genetic diversity inside the sample panel. This makes it particularly interesting for emerging or neglected viruses that do not have a large history of published whole-genome primer protocols (e.g., MIDV or WSLV) and also for laboratories that are less well equipped. Nevertheless, enrichment by virus-specific whole-genome PCR would result in a better sequencing result for samples with poorer quality or a higher Cq value. Additionally, SISPA could be used in more open sequencing approaches (metagenomics) to identify yet unknown infectious agents. However, the initial quality of the samples represents the main limiting factor for a successful sequencing run. If an initial screening of the sample in a virus-specific qRT-PCR is possible, the Cq value can be taken as a rough parameter.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/pathogens11121502/s1, Table S1: RT-qPCR protocol for MIDV and WSLV; Table S2: Generated genome sequences of all six viruses; Table S3: Reference sequences used for the genome assembly (de novo and map-to-reference).