Viral Metagenomics on Cerebrospinal Fluid

Identifying the causative pathogen in central nervous system (CNS) infections is crucial for patient management and prognosis. Many viruses can cause CNS infections, yet screening for each individually is costly and time-consuming. Most metagenomic assays can theoretically detect all pathogens, but often fail to detect viruses because of their small genome and low viral load. Viral metagenomics overcomes this by enrichment of the viral genomic content in a sample. VIDISCA-NGS is one of the available workflows for viral metagenomics, which requires only a small input volume and allows multiplexing of multiple samples per run. The performance of VIDISCA-NGS was tested on 45 cerebrospinal fluid (CSF) samples from patients with suspected CNS infections in which a virus was identified and quantified by polymerase chain reaction. Eighteen were positive for an RNA virus, and 34 for a herpesvirus. VIDISCA-NGS detected all RNA viruses with a viral load >2 × 104 RNA copies/mL (n = 6) and 8 of 12 of the remaining low load samples. Only one herpesvirus was identified by VIDISCA-NGS, however, when withholding a DNase treatment, 11 of 18 samples with a herpesvirus load >104 DNA copies/mL were detected. Our results indicate that VIDISCA-NGS has the capacity to detect low load RNA viruses in CSF. Herpesvirus DNA in clinical samples is probably non-encapsidated and therefore difficult to detect by VIDISCA-NGS.


Introduction
For patients with a suspected central nervous system (CNS) infection, rapid and accurate diagnosis is vital to determine treatment and improve prognosis [1]. The differential diagnosis of such patients includes infectious etiologies, of which viruses are the most common [2], but also non-infectious etiologies, such as auto-immune diseases [3]. Nonetheless, in more than half of cases, the cause remains unknown [4]. Identification of a virus can aid in patient management as it may initiate specific antiviral treatment, or cease or prevent ineffective antiviral, antibiotic, and/or immunosuppressive treatments, which all have potential harmful side effects. For example, when differentiating between an auto-immune and viral origin, immune suppression could lead to deleterious outcomes when caused by an unidentified virus [5].
During the last two decades, conventional diagnostics for viral CNS infections have shifted from non-specific culturing techniques towards highly-specific viral nucleic acid amplification tests, like

Materials and Methods
CSF samples which previously tested positive by viral qPCR were selected from two biobanks of the departments of medical microbiology and neurology of the Amsterdam UMC (location AMC). An HIV-1 qPCR was performed using the RealTime HIV-1 Viral Load Assay (Abbott Molecular, Abbott Park, IL, USA), the other viruses were tested by in-house qPCRs using previously published methods [19]. The first sample set consisted of anonymized leftover CSF samples (n = 27), sent in from patients with suspected CNS infection. The second set of CSF samples (n = 18) were selected from a clinical study on the etiology of encephalitis and meningitis in adult patients [2]. The study was approved by the medical ethics committee of the Academic Medical Centre, Amsterdam, The Netherlands (reference number 2014_290). All samples had a quantifiable viral load and were stored at −80 • C until library preparation for VIDISCA-NGS.
VIDISCA library preparation was performed as previously described [9,17]. Briefly, CSF samples were centrifuged and the supernatant was treated with TURBO™ DNase (Thermo Fisher Scientific, Waltham, MA, USA) to remove naked chromosomal or bacterial DNA. Nucleic acids were extracted using the Boom method [20], followed by reverse transcription with non-ribosomal random hexamers [21] and second strand synthesis. DNA was digested with MseI (TˆTAA; New England Biolabs, Ipswich, MA, USA) and ligated to adapters containing a sample identifier sequence. During the fragmentation in VIDISCA, the sample cannot be "over-digested" as fragmentation relies entirely on the presence of restriction enzyme recognition sites and not on the duration of fragmentation. Ligation to adaptors leads to loss of the restriction enzyme recognition site (after ligation to an adaptor the sequence is TTAT) whereas ligation to another DNA fragment will restore it, allowing re-digestion. Next, size selection with AMPure XP beads (Beckman Coulter, Brea, CA, USA) was performed to remove small DNA fragments prior to a 28-cycle PCR using adaptor-annealing primers. Small and large size selection was performed with AMPure XP beads to select DNA-strands with a length ranging between 100 and 400 nucleotides. Libraries were analyzed using the Bioanalyzer (High Sensitivity Kit, Agilent Genomics, Santa Clara, CA, USA) and Qubit (dsDNA HS Assay Kit, Thermo Fisher Scientific) instruments to quantify DNA length and concentration, respectively. Seventy sample libraries were pooled at the equimolar concentration. The current number of 70 samples was chosen because this has worked for other sample types (non-CSF) [16,17]. In total, 50 pmol DNA of the pool was clonally amplified on beads using the Ion Chef System (Thermo Fisher Scientific) and sequencing was performed on the Ion PGM™ System (Thermo Fisher Scientific) with the ION 316 Chip (400 bp read length and 2 million sequences per run). The method for the DNase-free VIDISCA library preparation omitted the TURBO™ DNase step.
All VIDISCA-NGS reads with a minimum length of 45 nucleotides were translated into protein sequences and aligned to a local database of the NCBI eukaryotic viral Identical Protein Groups (downloaded March 2018) using UBLAST [22], the VIDISCA bioinformatics workflow [23], and an online metagenomic profiler (Taxonomer) [24] for identification of probable viral reads and background sequence classification. Probable viral reads were subsequently confirmed when the original VIDISCA-NGS read could be aligned to a reference sequence of the virus with a nucleotide identity of at least 80% using CodonCode Aligner (version 6.0.2). Each alignment was manually inspected for confirmation. Samples were considered VIDISCA-NGS positive when at least one VIDISCA-NGS read could be identified. The number of reads aligned to a reference sequence in CodonCode Aligner was taken as the number of viral reads per sample. Analysis by VIDISCA-NGS was performed blind to qPCR results to avoid biased analysis. All statistical analyses were performed in R (version 3.5.1), and graphs were plotted using R package ggplot2 (version 3.1.0).

RNA Virus Detection by VIDISCA-NGS
Six samples were positive for enterovirus and eight for HIV-1 by VIDISCA-NGS, all of which were also qPCR positive ( Figure 1). The RNA virus concentration in the VIDISCA-NGS positive samples ranged between 1.07 × 10 2 RNA copies/mL and 8.64 × 10 5 RNA copies/mL (median: 8.63 × 10 3 RNA copies/mL). Two samples positive for enterovirus and two for HIV-1 by qPCR were missed by VIDISCA-NGS, with viral loads ranging from 9.40 × 10 2 to 1.05 × 10 4 RNA copies/mL (median 2.54 × 10 3 RNA copies/mL). To exclude that competition by background nucleic acids or other viruses might have hampered virus detection, we assessed whether co-infection by other pathogens or large quantities of the host genomic background had competed with viral sequences in the four samples that were negative in VIDISCA-NGS. The profile of the background sequences of the negative samples was similar to those of the positive samples, indicating that no major sequence competition was present ( Figure 2). Next, we determined whether the sequencing depth of the four negative samples, in combination with the low viral load, may have been insufficient. All four missed samples had fewer than 10,000 sequence reads and had a viral load below 2 × 10 4 copies/mL, as depicted in the lower left quadrant of Figure  1. Overall, this quadrant contained nine samples of which five were positive and four were negative by VIDISCA-NGS. The five positive samples had only one (n = 4) or two (n = 1) reads mapped to the detected RNA virus. These small numbers of viral reads suggest that such samples (with low viral load, combined with a low sequencing depth) were on the detection limit of VIDISCA-NGS. Samples with a similarly low viral load, but with a higher sequence depth (upper left quadrant of Figure 1), had, on average, more than 5 viral reads per sample. Moreover, a correlation between sequence depth and viral read number was seen for all samples below 10 4 RNA copies/mL (rho = 0.64 p = 0.02, Spearman's rank correlation test). To exclude that competition by background nucleic acids or other viruses might have hampered virus detection, we assessed whether co-infection by other pathogens or large quantities of the host genomic background had competed with viral sequences in the four samples that were negative in VIDISCA-NGS. The profile of the background sequences of the negative samples was similar to those of the positive samples, indicating that no major sequence competition was present ( Figure 2). Next, we determined whether the sequencing depth of the four negative samples, in combination with the low viral load, may have been insufficient. All four missed samples had fewer than 10,000 sequence reads and had a viral load below 2 × 10 4 copies/mL, as depicted in the lower left quadrant of Figure 1. Overall, this quadrant contained nine samples of which five were positive and four were negative by VIDISCA-NGS. The five positive samples had only one (n = 4) or two (n = 1) reads mapped to the detected RNA virus. These small numbers of viral reads suggest that such samples (with low viral load, combined with a low sequencing depth) were on the detection limit of VIDISCA-NGS. Samples with a similarly low viral load, but with a higher sequence depth (upper left quadrant of Figure 1), had, on average, more than 5 viral reads per sample. Moreover, a correlation between sequence depth and viral read number was seen for all samples below 10 4 RNA copies/mL (rho = 0.64 p = 0.02, Spearman's rank correlation test).

DNA Virus Detection by VIDISCA-NGS
Only one sample was VIDISCA-NGS positive for a herpesvirus (VZV), which was also qPCR positive at a concentration of 9.29 × 10 7 DNA copies/mL. Among the samples that remained herpesvirus negative by VIDISCA-NGS, 33 were positive for at least one herpesvirus by qPCR (median: 9.01 × 10 3 , range: 5.28 × 10 3 -1.62 × 10 7 DNA copies/mL). Because of the poor performance of VIDISCA-NGS, we hypothesized that our library preparation method, which uses a specific restriction enzyme, may have hampered herpesvirus detection. We examined the number of putative VIDISCA-NGS fragments (the number of unique genomic fragments that can theoretically be detected by VIDISCA-NGS based on the location of the Mse1 restriction enzyme recognition sites and resulting fragments lengths) in the human herpesvirus genomes. All human herpesviruses genomes have at least 16 putative VIDISCA Genes 2019, 10, 332 5 of 12 fragments (Table 1). By comparison, the enterovirus and HIV-1 genomes produced a nearly equal number of fragments and were detected at a high success rate as described above. indicates human mitochondrial or genomic background, "Bacterial" indicates prokaryotic background, "Ambiguous" represents sequences with simultaneous hits to eukaryotes and prokaryotes, and "Unknown" are the sequences that do not match with any reference sequence.

DNA Virus Detection by VIDISCA-NGS
Only one sample was VIDISCA-NGS positive for a herpesvirus (VZV), which was also qPCR positive at a concentration of 9.29 × 10 7 DNA copies/mL. Among the samples that remained herpesvirus negative by VIDISCA-NGS, 33 were positive for at least one herpesvirus by qPCR (median: 9.01 × 10 3 , range: 5.28 × 10 3 -1.62 × 10 7 DNA copies/mL). Because of the poor performance of VIDISCA-NGS, we hypothesized that our library preparation method, which uses a specific restriction enzyme, may have hampered herpesvirus detection. We examined the number of putative VIDISCA-NGS fragments (the number of unique genomic fragments that can theoretically be detected by VIDISCA-NGS based on the location of the Mse1 restriction enzyme recognition sites and resulting fragments lengths) in the human herpesvirus genomes. All human herpesviruses genomes have at least 16 putative VIDISCA fragments (Table 1). By comparison, the enterovirus and HIV-1 genomes produced a nearly equal number of fragments and were detected at a high success rate as described above.  indicates human mitochondrial or genomic background, "Bacterial" indicates prokaryotic background, "Ambiguous" represents sequences with simultaneous hits to eukaryotes and prokaryotes, and "Unknown" are the sequences that do not match with any reference sequence. Next, we hypothesized that the nuclease treatment may have hampered the detection of herpesvirus DNA. DNase treatment is done prior to nucleic acid extraction to remove naked chromosomal and bacterial DNA. It is assumed that viral genomic DNA is protected from DNase by the virus particle, however, if viral DNA is non-encapsidated, it will also be degraded. We therefore repeated the library preparation for all 45 CSF samples, now without a DNase treatment.

Virus Detection by DNase-Free VIDISCA-NGS
With the DNase-free VIDISCA-NGS, only eight samples contained sequences of an RNA virus (six HIV-1 and two enterovirus) ( Table 2), indicating that background DNA seriously hampered detection of RNA viruses. On the other hand, detection of herpesviruses greatly increased. Without a DNase treatment, 11 samples became VIDISCA-NGS positive: four for HSV-1/2, five for VZV, and two for CMV ( Figure 3). The viral load of the nuclease-free VIDISCA-NGS herpesvirus positive samples was higher (median: 1.04 × 10 5 ) than the negative samples (median: 4.42 × 10 3 , p = 0.00009, Mann Whitney U test). This association between the virus load and VIDISCA-detection became more visible when 10 4 DNA copies/mL was taken as a threshold; 11 of 18 samples positive by qPCR with >10 4 DNA copies/mL were also positive by VIDISCA-NGS, but none below. the virus particle, however, if viral DNA is non-encapsidated, it will also be degraded. We therefore repeated the library preparation for all 45 CSF samples, now without a DNase treatment.

Virus Detection by DNase-Free VIDISCA-NGS
With the DNase-free VIDISCA-NGS, only eight samples contained sequences of an RNA virus (six HIV-1 and two enterovirus) ( Table 2), indicating that background DNA seriously hampered detection of RNA viruses. On the other hand, detection of herpesviruses greatly increased. Without a DNase treatment, 11 samples became VIDISCA-NGS positive: four for HSV-1/2, five for VZV, and two for CMV ( Figure 3). The viral load of the nuclease-free VIDISCA-NGS herpesvirus positive samples was higher (median: 1.04 × 10 5 ) than the negative samples (median: 4.42 × 10 3 , p = 0.00009, Mann Whitney U test). This association between the virus load and VIDISCA-detection became more visible when 10 4 DNA copies/mL was taken as a threshold; 11 of 18 samples positive by qPCR with >10 4 DNA copies/mL were also positive by VIDISCA-NGS, but none below.  On the x-axis, the viral load in CSF is displayed; on the y-axis, the total number of sequence reads.

Effect of a DNase Treatment on Virus Detection by VIDISCA-NGS
We identified several co-infecting DNA viruses (torque teno virus (TTV), n = 5; human papillomavirus (HPVs), n = 5; and hepatitis B virus (HBV), n = 1), which were not included in the routine diagnostics of the CSF samples, but were identified by VIDISCA-NGS (n = 11). Similar to the effects we observed for herpesvirus detection, we hypothesized that more non-herpes DNA viruses would be detected under the DNase-free condition. Surprisingly, no additional non-herpes DNA viruses were identified using the DNase-free method. On the contrary, of the 11 samples containing non-herpes DNA viruses detected by regular VIDISCA-NGS, only four samples were positive when excluding a DNase treatment (Figure 4). papillomavirus (HPVs), n = 5; and hepatitis B virus (HBV), n = 1), which were not included in the routine diagnostics of the CSF samples, but were identified by VIDISCA-NGS (n = 11). Similar to the effects we observed for herpesvirus detection, we hypothesized that more non-herpes DNA viruses would be detected under the DNase-free condition. Surprisingly, no additional non-herpes DNA viruses were identified using the DNase-free method. On the contrary, of the 11 samples containing non-herpes DNA viruses detected by regular VIDISCA-NGS, only four samples were positive when excluding a DNase treatment (Figure 4).
To assess the overall effect of a DNase treatment, we determined the ratio of viral reads, adjusted for sequencing depth, between the two treatment arms for all viruses identified by VIDISCA-NGS in this study ( Figure 5). All herpesviruses had substantially more, or a roughly equal number of viral reads in the DNase-free condition. In contrast, the opposite was true for non-herpes DNA and RNA viruses.  To assess the overall effect of a DNase treatment, we determined the ratio of viral reads, adjusted for sequencing depth, between the two treatment arms for all viruses identified by VIDISCA-NGS in this study ( Figure 5). All herpesviruses had substantially more, or a roughly equal number of viral reads in the DNase-free condition. In contrast, the opposite was true for non-herpes DNA and RNA viruses. Figure 5. Effect of DNase on the detection of RNA and DNA viruses by VIDISCA-NGS in CSF. Viral read ratio (x-axis) is calculated as the ratio between the number of viral reads for samples with and without a DNase treatment, adjusted for the sequencing depth. Samples with a ratio >1 favor regular library preparation whereas samples with a ratio <1 favor a DNase-free treatment. Green dots: nonherpes DNA viruses, orange diamonds: herpesviruses, blue triangles: RNA viruses. On the y-axis, the viral species are displayed.

Discussion
Metagenomic assays have the potential to benefit the diagnosis of CNS-infections. To this end, they need to meet certain prerequisites: Besides being broad-preferably detecting all viruses-an assay should be fast, sensitive, and affordable. VIDISCA-NGS is a unique method for viral metagenomics, which requires a relatively limited sequence depth and allows multiplexing, which reduces costs and runtime per sample [23]. As limited sequence depth, multiplexing, and speed may come at the expense of sensitivity, we evaluated the performance of VIDISCA-NGS on 45 clinical CSF samples containing viruses, quantified via conventional diagnostics (qPCR). VIDISCA-NGS detected an RNA virus in all medium to high viral load samples (>2 × 10 4 RNA copies/mL) and most (67%) of the low viral load samples. One VIDICSA-NGS positive HIV-1 sample had only 1.07 × 10 2 RNA copies/mL, demonstrating the capability to detect even very low load viruses.

Discussion
Metagenomic assays have the potential to benefit the diagnosis of CNS-infections. To this end, they need to meet certain prerequisites: Besides being broad-preferably detecting all viruses-an assay should be fast, sensitive, and affordable. VIDISCA-NGS is a unique method for viral metagenomics, which requires a relatively limited sequence depth and allows multiplexing, which reduces costs and runtime per sample [23]. As limited sequence depth, multiplexing, and speed may come at the expense of sensitivity, we evaluated the performance of VIDISCA-NGS on 45 clinical CSF samples containing viruses, quantified via conventional diagnostics (qPCR). VIDISCA-NGS detected an RNA virus in all medium to high viral load samples (>2 × 10 4 RNA copies/mL) and most (67%) of the low viral load samples. One VIDICSA-NGS positive HIV-1 sample had only 1.07 × 10 2 RNA copies/mL, demonstrating the capability to detect even very low load viruses.
Metagenomics has been used to detect novel or unexpected viruses in CSF in several studies [7][8][9][10], but only a limited number of studies have evaluated the performance. Two studies investigated the limit of detection using dilutions of spiked HIV-1 in CSF. One study used the Ribo-SPIA pipeline [25], the second used a tailor-made protocol, including Nextera, to fragment and amplify [26,27]. Both studies used >5 million reads per sample and found a limit of detection of ≈10 2 RNA copies/mL for HIV-1, comparable to that of VIDISCA-NGS when 10,000 reads are used.
The vulnerability of herpesviruses to DNase is not unexpected. Boom et al. found that CMV DNA in serum and plasma is highly fragmented and susceptible to DNases [33]. Similarly, Perlejewski et al. described a four-fold decrease in HSV-1 reads when using a DNase treatment for metagenomics on CSF [34]. Our study expands on this knowledge by showing that the vulnerability to DNase also applies to the other herpesviruses. This vulnerability signifies that the performance of metagenomic assays should not be evaluated on spiked samples. Herpesvirus culture harvests contain infectious virions with non-fragmented DNA [33,35], whereas herpesvirus in cell-free clinical material is non-infectious and, as mentioned above, contains highly fragmented DNA [33,36]. The only two studies that examined the performance of a metagenomics assay to detect herpesviruses used virus culture harvests, and found low limits of detection (≈10 1 and 10 3 DNA copies/mL for CMV and HSV-1, respectively) [25,26]. Caution should be taken to translate these findings to a clinical setting, as virus culture harvests are, especially for herpesviruses, not a correct representative of reality.
Herpesviruses have large DNA genomes and use rolling-circle amplification to produce head-to-tail concatamers of progeny virus [37]. During the lytic replication phase, large amounts of non-infective naked progeny virus are released from the cell and may enter the CSF if replication occurs in the CNS compartment. Because of the high genome copy number and the generally low DNase activity in CSF [38], degradation may take a significant amount of time. Naked herpesvirus DNA could thus persist for an extensive amount of time in CSF, even after the local infection has ceased. In theory, the persistence of naked DNA could also occur for other DNA viruses, such as HPV and TTV. These viruses use similar replication strategies to herpesviruses. The detection of these DNA viruses by VIDISCA-NGS was, however, not hampered by a DNase treatment (Figure 4), indicating that the viral DNA of these viruses was part of an intact virion.
Without amplification, the nucleic acid yield from CSF is generally too low for effective NGS library preparation for metagenomics [39]. For that reason, VIDISCA-NGS implements an amplification step to increase the number of viral genomic fragments from CSF. We previously found that viruses with a concentration of >10 4 copies/mL were detected when 5000 sequence reads or more were generated per sample from nasopharyngeal swabs [17]. Since then, we have used this number as a threshold to ensure that a sufficient sequence depth was achieved for virus detection. Our current results suggest this threshold may have to be increased for CSF. All RNA virus samples missed by VIDISCA-NGS had fewer than 10,000 reads and a strong correlation between the sequencing depth and number of viral reads was observed. Increasing the sequence depth could therefore increase the detection of low load RNA viruses. As such, we recommend to generate 10,000 or more reads per sample.
In the current study, we multiplexed 70 samples per VIDISCA-run. While it is uncommon for a large number of patients with encephalitis to present at the same time, this method could be of substantial benefit in outbreaks [40] and research settings where large cohorts of patients have to be screened at the same time. Because the performance of VIDISCA-NGS remains lower than qPCR, especially for the detection of herpesviruses, VIDISCA-NGS cannot replace conventional diagnostics. Nonetheless, we suggest the use of standard VIDISCA-NGS (including a DNase) in parallel with conventional diagnostics, as this provides a cheap, low-input, and sensitive method to detect known, rare, and novel viruses in CSF.