Novel Amplicon-Based Sequencing Approach to West Nile Virus

West Nile virus is a re-emerging arbovirus whose impact on public health is increasingly important as more and more epidemics and epizootics occur, particularly in America and Europe, with evidence of active circulation in Africa. Because birds constitute the main reservoirs, migratory movements allow the diffusion of various lineages in the world. It is therefore crucial to properly control the dispersion of these lineages, especially because some have a greater health impact on public health than others. This work describes the development and validation of a novel whole-genome amplicon-based sequencing approach to West Nile virus. This study was carried out on different strains from lineage 1 and 2 from Senegal and Italy. The presented protocol/approach showed good coverage using samples derived from several vertebrate hosts and may be valuable for West Nile genomic surveillance.


Introduction
The threat from new re-emerging viruses has markedly increased in recent decades due to population growth, urbanization, and the expansion of global travel, facilitating the rapid spread of infection during an outbreak. West Nile virus (WNV), an arbovirus belonging to the flavivirus genus, was firstly isolated in 1937 in Uganda [1] before spreading throughout the world [2]. The enzootic cycle includes mosquitoes and several vertebrate species including birds, allowing long-distance viral spread during migratory seasons [3,4].
Humans are considered WNV dead-end hosts because no human-to-mosquito transmission has been reported yet [5]. Most WNV infections are asymptomatic or may develop into self-limited febrile illness, but a very small percentage of cases progress to neuroinvasive disease with a range of symptoms and occasionally death [6,7].
Before 1990, WNV disease was considered to have a minor public health impact with only sporadic human cases. Since the first outbreaks reported in Algeria and Romania in 1994 and 1996, the virus has diffused to cause large epidemics in North America, Northern African, and Western and Eastern European countries [7].
In Italy, areas with either proven active asymptomatic WNV circulation or high probability of human infection have been previously reported [8,9], and an increasing number of neuroinvasive human infections have been described [10,11].
In Africa, little evidence of WNV epidemics has been noted. In Senegal, where WNV was first isolated in an acute human case in 1970, the virus has also been detected in mosquitoes, birds, horses, and human samples. From 2012 to 2021, active WNV circulation in mosquitoes and humans was documented following a reintroduction event from Europe [12].
WNV exhibits great genetic diversity with currently eight different lineages (excluding Koutango virus) circulating in the world [13]. Lineages 1 (WNV-L1) and 2 (WNV-L2) are the ones causing the main public health concern [7,12]. Genetic characterization of the strains detected yield potential tracking of the routes of the introduction of viruses, which is a particular interest for public health authorities in designing surveillance and countermeasures plans.
Genome sequencing of viruses has proven to be critical in the management of epidemics. Many approaches can be used to obtain viral whole genomes: (i) propagation with cell cultures followed by nucleic acids metagenomic (mNGS); (ii) hybrid capture using specific biotinylated probes; and (iii) a multiplex PCR-based target enrichment or ampliconbased protocol. This last approach became the most used one for the SARS-CoV-2 genomic surveillance during the COVID-19 pandemic due to its applicability in a wide range of input titers, yielding directly sequenced clinical samples, as well as its high specificity and scalability under resource-limited conditions with lower costs [14][15][16][17].
Due to the wide range of WNV hosts, many One Health studies focus on WNV. As genomic data are key information for understanding the mechanisms of the emergence and circulation of this virus, it is crucial to develop a rapid, reliable, and cost-effective sequencing tool that is more accessible than isolation methods or mNGS.
We describe here the development and evaluation of a whole-genome amplicon-based sequencing approach for WNV-L1 and WNV-L2 using Illumina technology in different types of vertebrates and mammals from Senegal and Italy.

Primers Design for Tiled Amplicon-Based Sequencing Systems for West Nile Virus
Primer design was made in IPD using a web-based tool entitled Primal Scheme [18] in order to obtain two non-overlapping pools of WNV targeting primers to perform multiplexed PCR reactions, generating approximately 400 bp amplicons tiled along the targeted genome. A WNV reference genome (accession number: NC009942) was chosen as the template. An alignment of WNV whole-genome sequences available on Genbank representative of all WNV lineages in both Africa and Europe was then used to identify nucleotide mismatches for potential correction at ambiguous sites of each primer to ensure both good coverage and high specificity for diverse WNV lineages. Overall, the approach used was a two-pool multiplex amplicon-based sequencing.

West Nile Virus Primer Pools Validation
Validation of the primer sets followed several steps: (i) inclusivity test by sequencing attempts on several WNV-L1 and WNV-L2 strains; (ii) specificity and sensitivity tests by sequencing attempts on several flaviviruses and other arboviruses, as well as serial dilutions of WNV-L1 and WNV-L2 culture isolates; and (iii) final validation by sequencing confirmed positive WNV samples derived from different species of vertebrates and mosquitoes from Italy and Senegal.

Sequencing of WNV-L1 and WNV-L2 Isolates
The designed primer systems were challenged for amplicon-based whole-genome sequencing of well-characterized WNV-L1 and WNV-L2 isolates from Senegal and Italy. The experiments were undertaken by both the teams in Senegal and Italy with their local isolates. WNV-L1 (n = 10) and WNV-L2 (n = 8) well-characterized viral isolates from both countries were used to assess the ability of the designed primer pools for whole-genome amplicon-based sequencing. WNV strains from Senegal were obtained after infection of C6/36 monolayer cells with homogenized mosquito pools as previously described [12]. Isolates from Italy were obtained from birds' internal organ homogenates after two to three passages on Vero monolayer cell lines, followed by an infection on C6/36 cell lines. A genome coverage of 95% and above was targeted.

Validation on Confirmed Positive WNV Samples
Finally, sequencing attempts on both WNV-L1 and WNV-L2 positive samples from mosquitoes, birds, and horses from Italy and Senegal were conducted. The CT values of the samples were confirmed by RT-qPCR using a consensus WNV assay [6] in Senegal and a molecular WNV sub-typing assay [19] in Italy, prior to proceeding to the sequencing.

Next-Generation Sequencing and Genome Assembly
Viral RNAs were extracted using the QIAamp viral RNA mini-kit (QIAGEN, Hilden, Germany) and were reverse-transcribed into cDNAs using the Superscript IV Reverse Transcriptase enzyme (ThermoFisher Scientific, Waltham, MA, USA). The synthesized cDNAs served as templates for direct amplification to generate approximately 400 bp amplicons tiled along the genome using two non-overlapping pools of WNV targeting primers at 10 nM and Q5 ® High-Fidelity 2X Master Mix (New England Biolabs) with the following thermal cycling protocol: 98 • C for 30 s; 35 cycles of 95 • C for 15 s and 65 • C for 5 min; and a final cooling step at 4 • C.
In Senegal, libraries were then synthesized by tagmentation using the Illumina DNA Prep kit and the IDT ® for Illumina PCR Unique Dual Indexes. After a cleaning step with the Agencourt AMPure XP beads (Beckman Coulter, Indianapolis, IN), libraries were quantified using a Qubit 3.0 fluorometer (Invitrogen Inc., Waltham, MA, USA) for manual normalization before pooling in the sequencer. Cluster generation and sequencing were conducted with ab Illumina MiSeq instrument with 2 × 300 nt read length. Consensus genomes were generated using the nextflow-based nf-core viral reconstruction pipeline (https://github.com/nf-core/viralrecon, accessed on 20 January 2023) from the standardized nf-core pipelines [20,21]. The versions of nextflow and viralrecon used were v21.10.6 and v2.5, respectively. In Italy, amplified DNA was diluted to obtain a concentration of 100-500 ng, then used for library preparation with an Illumina DNA prep kit, and sequenced with a NextSeq 500 (Illumina Inc., San Diego, CA, USA) using a NextSeq 500/550 Mid Output Reagent Cartridge v2 for 300 cycles with standard 150 bp paired-end reads. After quality control and trimming with the Trimmomatic v0.36 (Usadellab, Düsseldorf, Germany) [22] and FastQC tool v0.11.5 (Bioinformatics Group, Babraham Institute, Cambridge, UK) [23,24], reads were de novo assembled using SPADES v3.11.1 (Algorithmic Biology Lab, St Petersburg, Russia) [25]. The contigs obtained were analyzed with BLASTn to identify the best match reference. Mapping of the trimmed reads was then performed using the iVar computational tool [26] to obtain a consensus sequence.

West Nile Virus Oligonucleotide Primers Sets
A first multiplex primer system was designed based on a WNV-L1 reference genome (accession number: NC009942), generating a set of 35 oligonucleotide primer pairs that amplify overlapping products spanning almost the whole WNV genome.
The primers set (set A) was subsequently compared to an alignment of 15 sequences representing the different WNV lineages (Table S1). Degeneration was then added in relevant ambiguous sites on each primer in order to cover a maximum of lineages while trying to maintain a balance for specificity. The list of WNV primers in set A can be found in Table 1. We should notice that two extra primers (KOUV_2_RIGHT and KOUV_7_LEFT) were incorporated into set A to potentially extend the sequencing to Koutango virus, even if this work was not carried out in this study.

WNV Primers Set A WNV Primers Set B
WNV_21_RIGHT A second primer set (set B) was designed based on a WNV-L2 reference genome (accession number: MH021189) and was compared with an alignment of 82 WNV-L2 sequences (Table S2) to capture the diversity within the lineage. The list of WNV primers in set B can be found in Table 1.

Validation of Set A Inclusivity Test
After the design of set A, seven WNV-L1 and three WNV-L2 isolates from Senegal were selected, and three viral culture supernatants for each lineage from three different Italian regions were processed for amplicon-based sequencing in triplicate. Overall, tiled amplicon whole-genome sequencing undertaken on both strains from Senegal and Italy yielded 99-100% horizontal coverage with genome length between 10,961 nt and 11,018 nt for WNV-L1 and between 10,914 nt and 10,926 nt for WNV-L2 (Table 2).

Sensitivity Test
One representative isolate of each lineage, i.e., WNV 15217 (accession number: FJ483548) and WNV Thessaloniki_MC82m/2018 (accession number: MN652880) for WNV-L1 and WNV-L2, respectively, was selected to evaluate the detection limit of the set A primers under optimal conditions. Serial dilutions from 10 6 to 10 2 cp/µL were processed in triplicate for sequencing. The set A primers were able to detect more than 95% of the total WNV-L1 genome up to 10 4 cp/µL. At 10 3 cp/µL, the horizontal coverage was between 91% and 94%, while at 10 2 cp/µL, 80 to 82% of the WNV-L1 sequence was completed. However, poor coverage was observed in the WNV-L2 samples (between 17% and 35% completeness) as shown in Table 3.

Specificity Test
Amplicon-based whole-genome sequencing with set A primers was conducted on six flavivirus species (YFV, ZIKV, DENV-2, WSLV, KDGV, USUV), as well as RVFV and CHIKV, in order to assess the specificity of this WNV targeted approach. All the samples failed the bowtie2 1000 mapped-read threshold and no consensus genome could be assembled.

WNV Set A Primers Validation on Real Homogenates
Thirty-one (31) WNV-L1 and fifty-four (54) WNV-L2 homogenates with known Ct values by RT-qPCR were selected for targeted sequencing using the set A primers. Homogenates were obtained from mosquito pools and the internal organs of birds with low to high viral loads.
Among WNV-L1 homogenates, horizontal coverage was between 34% and 100%. A total of 35% of the samples reached above 95% horizontal coverage and about 65% of samples for 90% horizontal coverage. Most complete genomes had Ct values between 16 and 28. However, we also noted that among the least well-covered samples, Ct values ranged from 25 to 35, highlighting that factors other than the viral load could be involved. Additionally, five samples were WNV-L1/WNV-L2 co-infections, and the amplicon-based approach yielded from 87% to 96% WNV-L1 horizontal coverage, even when WNV-L2 had a higher viral load. Relatively correct coverage (between 74% and 92%) was obtained from other four samples from mosquitoes trapped in Senegal, for which viral co-infections with either alphaviruses, mesoniviruses, or flaviviruses were reported. All these results are summarized in Table 4. Regarding WNV-L2 homogenates, experiments undertaken with the set A primers were consistent with the data from inclusivity and specificity tests. Indeed, less than 6% of the samples processed had above 95% of the genome covered (3 out 54), and 87% had ≤64% horizontal coverage, regardless of the viral load (Table 5).

Validation of Set B Inclusivity Test
Five WNV-L2 isolates from Italy were selected to assess the set B primers. A total of 100% horizontal coverage was obtained for all the strains after sequencing on an Illumina MiSeq (Table 6).

Sensitivity Test
In order to identify the set B primers' detection limit under optimal conditions, serial dilutions from 10 6 to10 2 cp/µL of the strain WNV Thessaloniki_MC82m/2018 (accession number: MN652880) were processed in triplicate for sequencing (except the to10 2 cp/µL concentration, which was carried out in duplicate due to insufficient volume during the experiment). A total of 100% horizontal coverage was obtained between 10 6 to10 3 cp/µL, while the two replicates for to10 2 cp/µL covered 93% and 95% of the genome, as shown by Table 7.

Specificity Test
Similar to the test conducted for set A, no amplification was observed using set B on the six flavivirus species mentioned above, as well as RVFV and CHIKV.

WNV Set B Primers Validation on Real Homogenates
Fifteen WNV-L2 homogenates from Italy with known CT values by RT-qPCR were selected for targeted sequencing using the set B primers. Homogenates were obtained from mosquito pools, as well as the internal organs of birds and horses with low to high viral loads. Overall, horizontal coverage between 97% and 100% was obtained on 14 out of 15 homogenates (93.3% with horizontal coverage > 95%). Only the horse sample exhibited 93% horizontal coverage. This sample was also the one with the lowest viral load (CT value: 35). All these results are summarized in Table 8.

Validation of Set A + B
In order to obtain a system able to efficiently sequence both WNV-L1 and WNV-L2 strains, the first set of primers (set A) was combined with the second one (set B) in equal volume. The new system, set A + B primers, was evaluated and compared in parallel with set A and set B after sequencing the WNV-L1 (n = 4) and WNV-L2 (n = 7) positive samples from internal organs of birds and horses, as well as mosquito homogenates, at different CT values (Table 9). In WNV-L1 samples, no loss of sensitivity was observed between set A and set A + B for all the samples tested. Notably, for one sample from a yellow-legged gull at CT value 25, a gain of sensitivity was observed at 88% horizontal coverage using set A to 93% using set A + B joined. In the same way, sequencing conducted on WNV-L2 samples worked just as well with set B as with set A + B, regardless of Ct values. Indeed, almost 72% of the samples had 100% full genome (n = 5 out of 7).

Discussion
NGS is now an essential tool in the study of infectious diseases, both at the fundamental level and in its application to public health. The COVID-19 pandemic has thus been a patent example of the importance of being able to obtain information on the genetic signature of pathogens in real time. However, it should be noted that sequencing technology, and in particular whole-genome sequencing, remains an expensive approach with significant experimental constraints (for instance, the host genome background with a relatively lower amount of genetic material of the pathogen of interest in clinical specimens) in order to have some quality of data generated. A multiplex PCR-based target enrichment or amplicon-based protocol [14] was mostly used to overcome these challenges during SARS-CoV-2 genomic surveillance, yielding more than 14 million genomes in the GISAID platform at the time of writing this manuscript [27].
WNV is becoming a major health problem in Europe and cases have also recently been detected in Africa [7,12].
WNV cases are mainly due to lineages 1 and 2. The mechanisms of diffusion of viral strains, in particular by the migratory movements of birds, are actively studied. The genetic characterization of the identified strains allow better control of the dissemination routes for effective sanitary measures. NGS showed the persistence of a WNV strain after winter in Andalusia in Spain, suggesting endemicity with potential future epidemics in the area [28]. Another recent genomic study evidenced continuous WNV-L2 circulation in Italy throughout the year [29], while a reintroduction event was identified from Europe to Senegal, highlighting a potential threat [12].
Genomic characterization is even more important because it has been shown that West African lineages have higher virulence and replicative efficiency in vitro and in vivo compared to similar lineages circulating in the United States and Europe [6]. Genomic surveillance is thus essential as it allows a better understanding of the dissemination and dynamic of WNV strains.
In order to ensure the sustainability of this type of surveillance, we describe here the development and evaluation of a whole-genome amplicon-based sequencing approach for WNV-L1 and WNV-L2 by Illumina technology in different types of vertebrate and mosquito species from Senegal and Italy.
Three sets of primers were then designed and assessed with WNV-L1 and WNV-L2 strains. Set A and set B are specific to WNV-L1 and WNV-L2 strains, respectively, while the third one, a mixture of the two previous sets, is able to amplify both lineages.
Thus, the use of one set or another depends on the context. Indeed, in the case where the lineage is already well defined, it is appropriate to use the specific sets, whereas set A + B fits more in a context where no lineage characterization could be made before sequencing.
The evaluation in this study could only be carried out with the WNV-L1 and WNV-L2 strains. Because set A was designed from at least one representative of all the WNV lineages, it would be appropriate to undertake a similar evaluation with at least set A and set A + B on other lineages than WNV-L1 and WNV-L2. Moreover, the repetition of these experiments by other groups allows the observed results to be refined, particularly in terms of correlation with Ct values. Indeed, even if this work was carried out with rigor and with two teams in Senegal and Italy, external factors such as the sample quality after long-term storage or the sample type may have impacted the outputs of the results.
In any case, the approach presented in this manuscript could be a valuable tool for any WNV genomic investigation.