Comparison of Nucleic Acid Extraction Methods for a Viral Metagenomics Analysis of Respiratory Viruses

Viral metagenomics next-generation sequencing (mNGS) is increasingly being used to characterize the human virome. The impact of viral nucleic extraction on virome profiling has been poorly studied. Here, we aimed to compare the sensitivity and sample and reagent contamination of three extraction methods used for viral mNGS: two automated platforms (eMAG; MagNA Pure 24, MP24) and the manual QIAamp Viral RNA Mini Kit (QIAamp). Clinical respiratory samples (positive for Respiratory Syncytial Virus or Herpes Simplex Virus), one mock sample (including five viruses isolated from respiratory samples), and a no-template control (NTC) were extracted and processed through an mNGS workflow. QIAamp yielded a lower proportion of viral reads for both clinical and mock samples. The sample cross-contamination was higher when using MP24, with up to 36.09% of the viral reads mapping to mock viruses in the NTC (vs. 1.53% and 1.45% for eMAG and QIAamp, respectively). The highest number of viral reads mapping to bacteriophages in the NTC was found with QIAamp, suggesting reagent contamination. Our results highlight the importance of the extraction method choice for accurate virome characterization.

In particular, the extraction of viral nucleic acids is a crucial step in the molecular detection of viruses from clinical samples [23]. While there are many manual and automatic extraction methods available, it is important to choose the most sensitive and reliable one for mNGS. Numerous studies evaluating different extraction platforms in terms of their viral qPCR performance have found that the choice of extraction platform has a major impact on the reliability of the diagnostic results [24][25][26][27][28][29]. Furthermore, nucleic acid extraction methods can also impact bacteriome profiles [30][31][32] as well as the detection of particular viruses with mNGS [16,23,27,29].
Other potential issues related to extraction methods are sample cross-contamination (contamination from one sample to another) [24,33,34] and contamination by sequences present in the environment [32,34] or in the molecular biology reagents (referred to as the kitome) [35][36][37]. In viral mNGS studies, these two aspects constitute a major concern and must be precisely evaluated [38][39][40]. The impact of nucleic acid extraction methods on human virome characterization, kitome, and cross-contamination has thus far been poorly studied [41,42].
The aim of this study was to compare two automated extraction platforms commonly used in diagnostic laboratories, the eMAG (bioMérieux, Marcy-l Étoile, France) and the MagNA Pure 24 (MP24) (Roche, Basel, Switzerland), and one manual QIAamp Viral RNA Mini Kit extraction (Qiagen, Hilden, Germany), which is among one of the most popular methods used in research laboratories. The performance of each extraction kit was evaluated in terms of (1) their ability to detect different DNA and RNA viruses in one mock sample and in clinical samples, (2) their sample cross-contamination rate, and (3) the detection of the kitome.

Design of the Mock Virome
The mock virome included known concentrations of five viruses isolated from respiratory samples ( Table 1). These viruses were selected as representatives of a wide range of virus characteristics, such as different virion sizes (ranging from 30 to 300 nm), the presence or absence of an envelope, different genome lengths (ranging from 7 to 150 kb), different genome types (dsDNA, ssRNA), and different genome compositions (linear, segmented). All the viruses were provided by the virology laboratory at the university hospital of Lyon (Hospices Civils de Lyon). This mix contained the cell culture supernatant of Adenovirus 31 (AdV), respiratory syncytial virus A (RSV-A), herpes simplex virus 1 (HSV-1), influenza A virus, and rhinovirus. For each virus, clinical samples obtained from hospitalized patients were cultured with the appropriate cell line and media, for which the viral supernatant was then collected (Table 2). Table 1. List of the five selected viruses included in the mock virome and the clinical respiratory samples. The mock virome consists of known concentrations of five viruses isolated from respiratory samples, selected as representatives of a wide range of virus characteristics (virion size, the presence or absence of an envelope, genome length, genome type (dsDNA, ssRNA), and genome composition (linear, segmented)). All the viruses were provided by the virology laboratory (Hospices Civils de Lyon). Human clinical respiratory samples were obtained from hospitalized patients. dsDNA: double stranded DNA; ssRNA: single stranded RNA. Using the Ct values obtained by semi-quantitative real-time PCR assays (r-gene, bioMérieux, Marcy l'Étoile, France) a mix with an identical Ct value for each virus was prepared. Individual aliquots of 250 µl were prepared in triplicate for each extraction method to evaluate (9 aliquots). Aliquots were stored at −80 • C ( Figure 1).

Figure 1.
Overview of the study design. A mock virome containing five viruses isolated from respiratory samples representative of a wide range of virus characteristics (adenovirus 31, respiratory syncytial virus A, herpes simplex virus 1, influenza A virus, and rhinovirus) was prepared. Human clinical samples were obtained from hospitalized patients (one bronchoalveolar lavage positive for herpes simplex virus 1 (HSV-1) and one nasopharyngeal aspiration positive for respiratory syncytial virus A (RSV-A). No template controls (NTCs) and transport medium samples were implemented in the process to control for sample and kitome cross-contamination. For the automatic extractors, NTCs were interspersed between the samples of each batch (n=3 per batch), whereas for the manual method only 1 NTC was included per batch (n = 1). To assess the reliability and reproducibility of the experimental results, the extractions and next generation sequencing (NGS) workflow were set up in triplicate (for the mock and respiratory samples), using the same amount of sample input. Finally, libraries were sequenced in the same run with Illumina NextSeq 500 ™ using a 2 x 150 paired-end (PE) high-output flow cell.

Sample Collection
Two additional patient samples-one positive for a DNA virus and one positive for an RNA virus-that were initially sent to our laboratory for routine viral diagnosis were also selected ( Table 1). These clinical samples were a bronchoalveolar lavage (BAL) positive for HSV-1 and a nasopharyngeal aspiration (NPA) positive for RSV-A. These samples were stored at +4 • C for initial diagnosis and then diluted in transport medium (MEM medium + 1% L-glutamine + 1% fetal bovine serum + 2% Hepes) in order to obtain a sufficient volume for all the tests (up to 2.3 mL). Then, 250 µL aliquots were prepared in triplicate for each extraction method to evaluate (9 replicate samples in total). The aliquots were stored at −80 • C ( Figure 1).

Nucleic Acid Extraction
The selection of the manual kits and the different platforms was based on their commercial and hospital availabilities. The 3 different methods chosen were the NucliSENS eMAG platform (bioMérieux, Marcy l'Etoile, France), the magNA Pure 24 platform (Roche, Basel, Switzerland), and the QIAamp Viral RNA Mini Kit (Qiagen, Hilden, Germany)-all methods widely used in diagnostic laboratories. Frozen samples were thawed and homogenized by vortexing. Nucleic acids were extracted in parallel from 220 µL of the aliquot in triplicate for each kit, according to the manufacturer's instructions. For the NucliSENS eMAG platform, specific protocol B 2.0.1 was selected. For the MP24 platform, protocol pathogen 1000 was selected. In addition, in order to evaluate the cross-contamination during automated extractions, no-template controls (NTC) were regularly interspersed between samples (i.e., 7 samples per series) ( Figure 2). The QIAamp Viral RNA Mini Kit was used following the manufacturer's recommendation with the addition of an inert Linear Acrylamide (LA) carrier (Thermo Fisher Scientific, Waltham, MA, USA) to ensure the maximum recovery of nucleic acids. To assess the reproducibility of the experimental results, the extraction and NGS analysis were set up in triplicate (for mock and respiratory samples) using the same amount of sample input. . Nucleic acids were extracted in triplicate from the same aliquot for each kit according to the manufacturer's instructions. In order to evaluate cross-contamination during the automated extractions, an NTC was regularly interspersed between samples (7 samples per series). To evaluate the kitome contamination, a transport medium sample was added in addition to NTC. Here, each color represents a different extraction method. NTC: No Template Control; HSV: Herpes Simplex Virus; RSV: Respiratory Syncytial Virus; TM: Transport Medium.

Metagenomic Workflow
As previously described, we used an mNGS protocol optimized in our lab [43]. Briefly, after thawing all the samples were supplemented with MS2 bacteriophage (Levivirus genus) from a commercial kit (MS2, IC1 RNA internal control; r-gene, bioMérieux) to check the validity of the process. Only the RNA internal control MS2 was added because it validates all the steps of our protocol (including RT stage during amplification) in contrast to a DNA internal control. A no-template control (NTC) consisting of RNase free water was implemented to evaluate the contamination during the process. An additional negative control consisting of viral transport medium was added. For sample viral enrichment, a 3-step method was applied to 220 µL of vortexed sample spiked with MS2 (low-speed centrifugation, followed by the filtration of the supernatant and then Turbo DNase treatment), as described in detail in Bal et al. [43]. After viral enrichment, the total nucleic acids were extracted using one of the three methods selected for the study described above. After random nucleic acid amplification using modified whole transcriptome amplification (WTA2, Sigma-Aldrich, Darmstadt, Germany), libraries were prepared using the Nextera XT DNA Library kit and sequenced with Illumina NextSeq 500 ™ using a 2 x 150 PE high-output flow cell (Illumina, San Diego, CA, USA).

Bioinformatic Analysis
High-quality reads were filtered using trimmomatic PE and were further analysed using Kraken 2, followed by Braken for a taxonomic abundance estimation [44]. A custom kraken 2 database made up of (1) human, bacteria, fungi, archaea, and plasmid genome sequences given by kraken 2 and (2) an in-house viral genome database was used (viromedb, personal communication). The viromedb consists of complete viral genome sequences extracted from genbank and refseq subjected to vecscreen and seqclean softwares to remove the vectors and adaptor sequences and dustmasker to remove the low-complexity sequences.

Statistical Analyses
To compare the sensitivity of the three extraction methods, the mean proportion of total viral reads and specific viral abundance were determined. Kitome and sample cross-contamination were assessed by normalizing reads in reads per million (RPM), mapping the reads, and transforming them in log 10 (RPM). For the kitome assessment, a sample was considered to be positive for a particular virus when the log 10 (RPM) of this virus exceeded 1. Analyses were performed at the genus taxonomy level, except for the kitome, for which analyses were performed at the family taxonomy level. All the plots were constructed via ggplot2 and statistical analyses were performed with Rstatix using R (version 3.6.1). For all the statistical tests, the Student's t-test was used.

Data Availability
The raw sequence data were deposited at SRA (PRJNA665071).

Ethics
Respiratory samples were collected for regular disease management during hospital stay and no additional samples were taken for this study. In accordance with the French legislation relating to this type a study, written informed consent from participants was not required for the use of de-identified collected clinical samples (bioethics law number 2004-800 of August 6, 2004). During their hospitalization in the Hospice Civils de Lyon (HCL), patients were made aware that their de-identified data including clinical samples may be used for research purposes, and they could opt out if they objected to the use of their data.

Results
Three different extraction methods were evaluated: two automated extraction platforms (eMAG and MP24) and a manual extraction kit (QIAamp) (Figure 1).

Sensitivity for the Detection of the Targeted Viruses
To evaluate the sensitivity of each method, the mean proportion of viral reads out of the total reads generated was first compared (Figure 3).
To determine potential bias in the detection of DNA or RNA viruses, the relative abundance of Levivirus (Internal Quality Control) and targeted viruses in each triplicate were then compared ( Figure 4). Levivirus was detected in all samples for MP24 and QIAamp, and in 8/9 samples for eMAG. For the targeted viruses, all the viruses were detected with all the extraction methods ( Figure 4).
For the mock sample, a difference in the relative abundance of both the RNA and DNA viruses was noted when comparing the extraction methods.
Surprisingly, many reads associated with Dependoparvovirus, an ssDNA virus, were observed in the mock sample after the QIAamp extraction (29.4% with eMAG, 9.9% with MP24, and 61.4% with the QIAamp extraction).

Sample Cross-Contamination
The impact of the different extraction methods on sample cross-contamination was then evaluated from the NTC samples included between each sample during the extractions (Figure 2) by mapping the read count of viruses that were present in samples from the same batch: internal quality control (MS2, Levivirus) and targeted viruses (Adenovirus, Orthopneumovirus, Simplexvirus, Alphainfluenzavirus, Enterovirus, and Dependoparvovirus) ( Figure 5). Levivirus was not found in any NTC extracted by eMAG and QIAamp but was found in 1/9 NTCs extracted by MP24 (MS2 log 10 RPM = 2.1).
Overall, the viral sample cross-contamination represented on average 0.002%, 0.107%, and 0.015% of the total reads generated from the NTC for the eMAG, MP24, and QIAamp extraction methods, respectively (corresponding to 1.53%, 36.09%, and 1.45% of the viral reads for the eMAG, MP24, and QIAamp extraction methods, respectively).
Importantly, for MP24 we also noted that sample cross-contamination was more associated with a batch effect than with the position of the sample in the extraction cartridge. Hence, while NTC#1, #2, and #3 were all contaminated, there was less contamination in all NTCs from batch #2 than in the NTCs from the two other batches of extractions ( Figure 5).

Kitome Assessment
The impact of the different extraction methods on the viral kitome contamination was then evaluated by detecting in the NTC and TM the presence of reads associated with viruses other than the targeted viruses. The log 10 (RPM) of the kitome was significantly higher with the QIAamp extraction compared to the eMAG and MP24 extractions (p < 0.01).
The viral kitome contamination generated from the NTC represented an average of 11.31 log 10 (RPM), 16.88 log 10 (RPM), and 70.77 log 10 (RPM) with the eMAG, MP24, and QIAamp extraction methods, respectively (Figure 6a).  showed the viral family read count associated with the kitome (the presence of reads associated with other viruses than the targeted viruses) normalized in log 10 (RPM) in each NTC and TM between the different extraction methods. A gradient of colors was defined from gray (no count) to blue (few counts) to red (highest counts). During the automated extractions, the NTCs were interspersed between samples (i.e., 3 NTCs per batch). For manual extraction, only one NTC was added. In addition to the NTC, transport medium was added. Analyses were performed on the family taxonomical level. A sample was considered to be positive for a particular virus when the log 10 (RPM) of this virus exceeded 1. HSV: herpes simplex virus; RSV: respiratory syncytial virus; NTC: no template control; TM: transport medium; RPM: reads per million.
The contaminants derived mainly from bacteriophage families. In particular, Siphoviridae was found in all three methods (ranging from 2.32 to 3.39 log 10 (RPM)), corresponding to 21.4%, 13.7%, and 4.8% of the total viral kitome reported with the eMAG, MP24, and QIAamp extraction methods, respectively.
Regarding the transport medium, the viral kitome contamination represented an average of 24.11 log 10 (RPM), 19.94 log 10 (RPM), and 72.45 log 10 (RPM) with the eMAG, MP24, and QIAamp extraction methods, respectively (Figure 6b). The same main viral families associated with kitome were found for the two automatic extractors with a majority of Poxviridae, while for the manual extractor the main family found was Siphoviridae (Figure 7b).
Overall, the kitome contamination was higher with the QIAamp extraction (p < 0.01). Siphoviridae bacteriophages were found in the three methods, while other contaminants such as Poxviridae were specifically found in the transport medium extracted by the automated methods.

Discussion
In this study, we compared the performance of three extraction methods commonly used in clinical laboratories for viral mNGS analysis (eMAG, MagNA Pure 24, and QIAamp Viral RNA Mini Kit). The extractors yielding the highest proportion of viral reads were the two automatic extractors, eMAG and MP24. A previous study found that Qiagen kits tend to extract a high proportion of human nucleic acids, which could explain the lower viral proportion reported in the present study [29].
Despite this difference, all the viruses present in the mock or clinical samples could be detected with all the methods evaluated herein. Nonetheless, a difference in the relative abundance of RNA and DNA viruses was noted in the mock sample. The highest relative abundance of RNA viruses was reported with QIAamp, and the highest relative abundance of DNA viruses was reported with the eMAG and MP24 platforms. This bias should be taken into account during the interpretation of mNGS studies and underlines the importance of the extraction method choice, depending on the virus to be explored.
These differences could be due to the various properties of viruses, including the presence of an envelope, the type of genome, or the size of the virions. Yang et al. showed a better performance of RNA virus recovery with EasyMag (identical silica extraction technology and similar performance to that of Emag [45]) compared to the MagNA Pure Compact. They explained this difference by a possible RNA degradation or by an ineffective binding of RNA to the magnetic beads [30]. Finally, the higher sensitivity of the QIAamp kit for the detection of RNA viruses might be explained by the kit having been initially intended for the extraction of RNA viruses. The detection of DNA viruses would still be possible through the capture of the RNA transcripts of DNA viruses. Meanwhile, a higher number of reads on RNA viruses for the QIAMP method was noted from the mock sample including both RNA and DNA viruses at equi-Ct; the supplementation of both DNA and RNA internal controls in NTCs would have been interesting for evaluating also the extraction bias in low-biomass samples. In addition to the differences related to the extraction methods, certain types of viruses or genomes can be preferentially amplified as described for the Poliovirus by Lewandoska et al. [23].
Moreover, bioinformatics analysis of mNGS data can also impact the viral reads annotation. The presence of gaps at the level of taxonomic classification (genus) can bias the interpretation for these given taxonomic groups (notably phages). The choice of the viral reference database is therefore crucial in order to limit viral misclassifications or the lack of detection of new emerging viruses [46]. Moreover, short reads reduce the accuracy of viral read assignation. As there is currently no gold standard in de novo assembly software for virome assessment, extensive benchmarking will be necessary in order to choose the most adapted method for future studies. With the new advances in third-generation sequencing and the improvement in sequence quality generated by this approach, we expect that the use of longer reads will led to an increase in specificity as compared to short-read technology.
Interestingly, we observed many reads associated with Dependoparvovirus only in the mock sample, especially with the QIAamp extraction. This can be explained by the presence of ADV in the mock, which might be associated with Adeno-associated dependoparvovirus [47] or with contaminants from the QIAamp column [35,48]. Our internal control (MS2, Levivirus) was detected for all replicates extracted with the three methods, except for two extracts with eMAG. As previously described, competition between the target viruses and MS2 might be observed during the process, leading to undetected MS2 [43].
To date, few studies have assessed the impact of different extraction methods on the performance of metagenomics. Klenner  and Paramyxovirus), and reported that the selection of the kits has only a minor impact on the yield of viral reads and the quantity of reads obtained by NGS [28]. However, this study only evaluated manual kits from the same manufacturer and with separate RNA and DNA extraction methods.
Conversely, several studies have highlighted the importance of the nucleic acid extraction protocol in producing high-quality extracts suitable for sequencing [23,27,29]. Lewandoska et al. compared the impact of three extraction methods (QIAamp Viral RNA mini Kit, PureLink Viral RNA/DNA Mini Kit, and automated NucliSENS EasyMAG) on the recovery of different viral genomes (adenovirus, poliovirus, HHV-4, influenza A virus). The EasyMAG extraction was more efficient for both RNA and DNA viruses, leading to a higher recovery of viral genomes. The mNGS results are highly susceptible to inaccurate conclusions resulting from the sequencing of contaminants [36]. In the present study, the viral contamination was higher with MP24 than with eMAG and QIAamp. Automated extraction platforms may lead to sample cross-contamination due to the generation of aerosols or robotic errors. Knepp et al. compared two automated extractors (BioRobot M48 instrument (Qiagen, Inc.) and MagNA Pure) and did not show contamination with the automated instruments [24]. However, they only investigated cross-contamination related to enterovirus (RNA virus), unlike our study, which evaluated a panel of RNA and DNA viruses.
The second source of contamination may come from the reagents (kitome) used throughout the process or from laboratory contaminants. The extractor for which the kitome abundance was highest was the QIAamp. The main contaminants were Siphoviridae, Myoviridae, Microviridae, and Podoviridae, which is consistent with other studies that have reported similar findings with spin columns [34]. In order to monitor the kitome and avoid misinterpretation, it is important to implement negative controls at different steps of the process [38,42,49]. Here, we did not add any internal controls in the NTC in order to get the highest sensitivity in detecting the kitome (without using reads to sequence MS2). On the other hand, the internal control MS2 was added in the TMs and was detected in all except one batch of the eMAG extraction in order to estimate the potential contaminants present in the transport medium, as it was previously published that fetal bovine serum contains DNA [50,51].
Furthermore, a computational approach for removing contaminants of viral origin should be developed, as previously described for bacteriome data [34,52].
Although the results show higher sample cross-contamination with the MP24 and higher kitome-related contamination with the QIAamp, other steps throughout sample processing can produce contamination. Here, we did not include an NTC at each step of the process to control for other sources of contamination. In addition, only a few respiratory samples were tested herein, and so further studies on a larger number of respiratory samples, as well as on other types of samples (stool, blood, and tissue) or other respiratory viruses (such as the SARS-CoV-2, which has the largest human RNA virus genome), should be performed. While three commonly used extraction methods have been evaluated in this study, it could be interesting to test others.

Conclusions
Our findings highlight the importance of extraction method choice for viral mNGS analysis. The eMAG platform yielded a higher proportion of viral reads, with a limited impact of reagents and sample cross-contamination compared to the QIAamp and MP24 extractors. Funding: eMAG ® . The eMAG consumables and R-gene ® kits necessary for this evaluation were provided by bioMerieux France. However, the data obtained during the evaluation were independently analyzed in the virology department of Lyon University Hospital, which possesses the entire final data bank. BioMerieux had no role in the study design, data collection and interpretation, or the decision to submit the work for publication.