Evaluation of Extraction Methods for Clinical Metagenomic Assay

(1) Background: Clinical metagenomics is a promising approach that helps to identify etiological agents in cases of unknown infections. For the efficient detection of an unknown pathogen, the extraction method must be carefully selected for the maximum recovery of nucleic acid from different microorganisms. The aim of this study was to evaluate different extraction methods that have the ability to isolate nucleic acids from different types of pathogens with good quality and quantity for efficient use in clinical metagenomic identification. (2) Methods: A mock sample spiked with five different pathogens was used for the comparative evaluation of different commercial extraction kits. Extracted samples were subjected to library preparation and run on MiSeq. The selected extraction method based on the outcome of the comparative evaluation was used subsequently for the nucleic acid isolation of all infectious agents in clinical respiratory samples with multiple infections. (3) Results: The protocol using the PowerViral® Environmental RNA-DNA Isolation Kit with a 5-min bead beating step achieved the best results with a low starting volume. The analysis of the tested clinical specimens showed the ability to successfully identify different types of pathogens. (4) Conclusions: The optimized extraction protocol in this study is recommended for clinical metagenomics application in specimens with multiple infections from different taxa.


Introduction
Globally, infectious diseases are still the leading cause of human morbidity and mortality [1]. Respiratory infections are considered as the third leading cause of death worldwide and the leading cause of death in developed countries, resulting in nearly 4.18 million deaths per year [2,3]. The main limitation to prevent and minimize the burden of infectious disease is to establish a rapid and accurate laboratory method with the ability to identify the etiological agent associated with the infection [4]. Throughout observations in the clinical setting, a significant number of infectious diseases have been unidentifiable using currently available laboratory tests [5]. Approximately 30-70% of the pathogens associated with pneumonia, meningitis, and encephalitis are routinely unidentified in clinical laboratories [2,3,6]. The limited identification of a wide range of infectious agents that are "Allprep_PowerViral_DNARNA" (Qiagen, Hilden, Germany); and (4) the DNA-RNA Pathogen Miniprep (Zymo Research Irvine, CA, USA). Regardless of the volume of the starting material, the extracted samples were eluted in 50 µ L of DNase-/RNase-free water, and each protocol was evaluated using two independent samples. This protocol yielded total nucleic acids and was the only automated extraction method in this study that used the MagNA Pure Compact Instrument (Roche Diagnostics, Mannheim, Germany), which is a magnetic bead-based technology. Briefly, a pretreatment step was performed using MagNA Pure Bacteria Lysis Buffer (BLB) (Roche Diagnostics, Mannheim, Germany) following the product instructions with slight modifications to ensure complete lysis of the bacteria. First, 200 µ L of the mock sample was mixed with 180 µ L of BLB. Then, 10 µ L of 100 mg/mL lysozyme was added, and the mixture was incubated at 37 °C for 30 min in a ThermoMixer (Eppendorf, Hamburg, Germany) with mixing at 700 rpm. After incubation, 20 µ L of 10 mg/mL proteinase K was added and incubated at 65 °C for 10 min. The treated samples were cooled on ice and then extracted with the MagNA Pure Compact NA Isolation Kit I. The eluted samples were labeled "MagNA".

Direct-zol™ RNA Miniprep Plus Protocol
This protocol yielded RNA only. The mock sample (250 µ L) was mixed with 750 µL of TRIzol™ LS Reagent (Ambion Life Technologies, Carlsbad, CA, USA) as per the manufacturer's instruction. Briefly, the mixture was applied directly to the provided column from the Direct-zol™ RNA Miniprep Plus Kit without phase separation. Subsequently, the lysate was subjected to on-column DNA digestion according to the protocol. The eluted samples were labeled as "Zol". This protocol yielded total nucleic acids and was the only automated extraction method in this study that used the MagNA Pure Compact Instrument (Roche Diagnostics, Mannheim, Germany), which is a magnetic bead-based technology. Briefly, a pretreatment step was performed using MagNA Pure Bacteria Lysis Buffer (BLB) (Roche Diagnostics, Mannheim, Germany) following the product instructions with slight modifications to ensure complete lysis of the bacteria. First, 200 µL of the mock sample was mixed with 180 µL of BLB. Then, 10 µL of 100 mg/mL lysozyme was added, and the mixture was incubated at 37 • C for 30 min in a ThermoMixer (Eppendorf, Hamburg, Germany) with mixing at 700 rpm. After incubation, 20 µL of 10 mg/mL proteinase K was added and incubated at 65 • C for 10 min. The treated samples were cooled on ice and then extracted with the MagNA Pure Compact NA Isolation Kit I. The eluted samples were labeled "MagNA".

Direct-zol™ RNA Miniprep Plus Protocol
This protocol yielded RNA only. The mock sample (250 µL) was mixed with 750 µL of TRIzol™ LS Reagent (Ambion Life Technologies, Carlsbad, CA, USA) as per the manufacturer's instruction. Briefly, the mixture was applied directly to the provided column from the Direct-zol™ RNA Miniprep Plus Kit without phase separation. Subsequently, the lysate was subjected to on-column DNA digestion according to the protocol. The eluted samples were labeled as "Zol".

PowerViral ® Environmental RNA/DNA Isolation Protocol
This protocol was intended for total nucleic acids extraction using the PowerViral ® Environmental RNA/DNA Isolation Kit, which is a filter-based technique. The manufacturer's instructions were followed with a slight modification. Briefly, 200 µL of the mock sample was mixed with 600 µL of prewarmed PV1 buffer and 6 µL of β-mercaptoethanol (Sigma, Heidelberg, Germany). The mixture was added to ZR BashingBead Lysis Tubes with mixed sizes (0.5 mm and 0.1 mm) of beads (Zymo Research, Irvine, USA). The tube was placed in a TissueLyser (Qiagen, Hilden, Germany) for 25 s, followed by a 5-s break and another 25 s of agitation at 30 Hz. The resulting mixture was centrifuged at 4 • C (5430 R; Eppendorf, Hamburg, Germany), and the supernatant was obtained by following the kit instructions. The eluted samples were labeled "MoBio".

ZymoBIOMICS™ DNA/RNA Miniprep Protocols
Three protocols were performed using the ZymoBIOMICS™ DNA/RNA Miniprep Kit, which is a filter-based technique with two different procedures: "Copurification", which yields total nucleic acids, and "Parallel Purification", in which DNA and RNA are eluted separately.
As a common step for both procedures, the beating step was performed as described in the previous protocol for a mixture consisting of 250 µL of the mock sample with 750 µL of DNA/RNA Shield. Then, the protocol in the instructions was followed, and nucleic acids that were eluted in the "Parallel Purification" protocol were labeled "ZDNA" for the DNA samples and "ZRNA" for the parallel RNA samples, while nucleic acids that were eluted in the "Copurification" protocol were labeled "Zymo".

Nucleic Acid Quantification
Extracted nucleic acids were quantified by Qubit 2.0 (Invitrogen Life technologies, Carlsbad, CA, USA) using both the Qubit™ dsDNA HS Assay Kit and Qubit™ RNA HS Assay Kit (Life Technologies, Eugene, OR, USA) following the manufacturer's instructions.

Library Preparation
The library preparation kit was selected according to the target nucleic acids, as described in the subsequent sections.

Preparation of the RNA Library
The manufacturer's instructions of the KAPA RNA HyperPrep Kit for Illumina sequencing (KABA Biosystems, Cape Town, South Africa) were followed to prepare libraries from the RNA extracted after diluting the samples, if needed, to a concentration of 100 ng in 10 µL using elution buffer EB (Qiagen, Hilden, Germany).

Preparation of the DNA Library
For DNA-Seq, the Nextera DNA Flex Library Prep (Illumina, San Diego, CA, USA) was prepared according to the manufacturer's instructions for DNA libraries, with at least 10 µL of each sample with concentrations ranging between 50 and 500 ng, as per the kit recommendation.

MiSeq Sequencing
In both library preparation types, the concentration of each library was quantified using the Qubit™ dsDNA HS Assay Kit, and the average library size was estimated by the 2100 Bioanalyzer System (Agilent, Waldbronn, Germany) with an Agilent High Sensitivity DNA Kit (Agilent, Santa Clara, CA, USA). The final pooled library was loaded using the 300 cycles MiSeq Reagent Kit v2 (Illumina, Singapore), and the run was performed on a MiSeqDX platform (Illumina, San Diego, CA, USA) to generate paired-end reads.

Analysis of NGS Data
The analysis was performed using the GENEIOUS Prime software (2019.1., Biomatters Ltd., Auckland, New Zealand). First, paired reads were merged together, and duplicates were removed using the Dedupe algorithm. The poor-quality sequences from both ends were trimmed with the BBDuk algorithm with an error probability = 0.05. A reference for each organism in the mock sample (Table 1) was loaded, each sample was mapped using the standard GENEIOUS mapper sequentially against C. neoformans, and then the unused reads were mapped to K. pneumoniae, S. aureus, AdV, and, finally, ALKV. The NGS yields of the extraction assays were compared in two ways: (1) comparison of the number of reads mapped to the reference by calculating the reads per million (RPM) following the equation (No. of Reads Mapped/Total No. of Reads w/o duplicates) × 10 6 ; (2) comparison of the coverage percentage (coverage %) of each microorganism reference.

Bead Beating Optimization
The bead beating technique was involved in a number of the protocols used in this study. Optimization of the bead beating step was performed to improve the yield of the difficult-to-lyse organisms, such as fungi and some Gram-positive bacteria. Because the samples could contain RNA viruses, which are easily degraded by the heat generated from the beating process, the process was performed in a cold room (4 • C). After testing different bead beating cycles by real-time PCR with both C. neoformans and ALKV (see Appendix A), bead beating continuously for 5 min was chosen for subsequent analysis with NGS using PowerViral ® Environmental RNA/DNA Isolation protocol (5-min MoBio samples) and ZymoBIOMICS™ DNA/RNA Miniprep copurification protocol (5-min Zymo sample).

Statistical Analysis
Using SPSS Statistics (Subscriptions, IBM, Armonk, New York, United States), one-way ANOVA followed by Bonferroni's post hoc test was used to check the statistical significance of the differences between assays, and the p value was considered statistically significant if it was equal to or less than 0.05.

Clinical Samples
Throat swabs were collected from patients admitted to KAUH and routinely submitted to SIAU for diagnosis against a panel of respiratory pathogens. The samples that tested positive for one or more respiratory pathogens were selected for the validation of the optimized extraction protocol under the ethical approval number 290-17, dated 13 June 2017, from the Unit of Biomedical Ethics, King Abdulaziz University Hospital.

Analysis of Clinical Metagenomics
An establishment protocol (see Appendix B) was followed that targeted all microbes in a sample, disregarding their taxa or genome type. The generated reads were analyzed using CosmosID's bioinformatics platform online app (https://www.cosmosid.com/platform) (1.0, Rockville, MD, USA) against its databases of bacteria, viruses, fungi, and protists by identifying unique and shared k-mers in the reference genome and searching for a match in the queried metagenomic sample. By using the filtration property, which depends on internal statistical scores, the confirmed organisms that were likely to be in the sample and the unconfirmed organisms needed to be validated by another laboratory test. The frequency (f), which is the number of unique k-mers found in the queried sample belonging to a referenced organism, was used for analysis. The highest RNA concentration was found with the extraction protocols that did not use bead beating. MagNA pure produced the highest RNA yield (32.75 ng/µL), followed by Zol (18.65 ng/µL). The Zymo and ZRNA samples, which were both extracted using the same kit with different protocols, provided approximately the same yield, with an average of 12.5 ng/µL. Despite the use of bead beating in the MoBio samples, as in the case of the Zymo and ZRNA samples, the sample had a lower concentration of RNA than the detection limit (<20 ng/mL). According to the KAPA RNA HyperPrep protocol, 10 µL was used directly without any dilution of the MoBio samples, and 14 cycles were used in the amplification step instead of the 6 cycles used for the other samples with higher concentrations.

Sequencing RNA Targets
The average number of reads generated for each sample after merging the R1 and R2 reads was 3,822,832, ranging from 6,387,496 for the MoBio samples to 2,741,110 for the MagNA samples. After removing duplicates, the reads decreased by approximately 1% in the Zol and ZRNA samples, 7% in the MagNA and MoBio samples, and 20% in the Zymo samples.
For comparison between the protocols, Figure A1a in the Appendix C shows the reads and reference coverage of spiked pathogens between the duplicates of RNA extraction protocols. Tables 2  and 3 show the average of the mapped reads and reference coverage, respectively, for each spiked pathogen in all extraction protocols. The highlighted cells contain the highest results without statistical significance between them.
For C. neoformans, Zymo and Zol had the highest RPM (90,684 RPM and 82,441 RPM, respectively), with no significant difference between them (p = 0.91), whereas significance (p ≤ 0.001) was found when compared with MoBio (30,427 RPM) and MagNA (19,257 RPM). The highest coverage % was obtained for Zymo and ZRNA at 0.09% and 0.08%, respectively, with no significant difference between them or compared with any other methods (p ≥ 0.26). The highlighted cells contain the highest results without statistical significance between them.
For K. pneumoniae, the MagNA sample showed the highest RPM (784,302 RPM), with p ≤ 0.001 compared with other methods. The highest reference coverage for K. pneumoniae was obtained by Zol (7.69%), followed by the MoBio coverage (7.20%), but with no significant difference (p = 0.30), whereas significance was found when compared with the rest of the extraction methods (p ≤ 0.05).
For S. aureus, ZRNA and MoBio showed the highest RPM (216,660 and 210,541, respectively), with no significant difference (p = 1.00), but significance was found when compared with both MagNA (46,263 RPM) and Zymo (143,569), with p value ≤ 0.024. For the reference coverage, MoBio had the highest coverage (64.10%), with a significant difference (p ≤ 0.001) compared with the rest of the extraction methods.
For AdV, the highest numbers of reads were obtained for MoBio (54,898 RPM) and Zymo (53,243 RPM); the difference between them was not significant (p = 1.00), but significant differences were found when compared with the others (p value ≤ 0.001). For ALKV, ZRNA and Zymo had the highest numbers of reads (12,945 and 12,830, respectively), with no significant difference between them (p = 1.00); however, differences were significant when compared with the others (p value ≤ 0.039). In all the extraction assays, the genomes of both Adv and ALKV were ≈100% covered regardless of the difference in RPM.
The highest DNA concentration was again obtained using the MagNA samples (14.1 ng/µL). The ZDNA samples had a higher concentration than that of the Zymo samples (13 ng/µL vs. 9.2 ng/µL), which were both extracted using the same kit with different protocols. The concentration was the lowest in the MoBio samples (3.3 ng/µL), as observed from its RNA concentration. For this, 17 µL of the MoBio samples were used for library preparation instead of the 10 µL used in the other DNA extraction protocols.
On average, the number of merged paired reads of each sample was 3,920,271, with the maximum number of reads (6,140,864 reads) obtained for ZDNA samples and the minimum number of reads (140,716 reads) for MoBio samples. The duplicate reads did not exceed 0.2% of the generated reads in all protocols.
MoBio had the highest RPM for three out of four DNA pathogens in the mock sample: C. neoformans (601 RPM), with a significant difference (p ≤ 0.001) compared with the other methods; while S. aureus (40,813 RPM), shown a significant difference (p ≤ 0.002) in contrast with the other methods; and K. pneumoniae (192,812 RPM), with no significant difference compared with the next highest result, i.e., MagNa (152,023 RPM) with p = 0.07, but with a significant difference compared with the other protocols (p ≤ 0.004). Finally, for Adv, MagNa had the highest RPM (119,851 RPM), with a  Figure A1b in the Appendix C). For the reference coverage %, ZDNA had the highest percentage for C. neoformans (0.48%), with a significant difference (p ≤ 0.026) over the other methods. For K. pneumoniae, MagNA, Zymo, and ZDNA showed the highest coverage %, with 13.51%, followed by MoBio, with 12.41%, with no significant differences between any of the extraction protocols (p ≥ 0.30). For S. aureus, Zymo and ZDNA covered almost the whole reference sequence (~99.9%), with a significant difference (p = 0.001) compared with MoBio, which covered only 23%. For Adv, all the extracted assays covered 100% of the Adv genome. As expected, no reads covered ALKV because it is an RNA virus (Table 5, Figure A1b in the Appendix C). The highlighted cells contain the highest results without statistical significance between them.

Bead Beating Optimization
Because the extraction protocols that depend on bead beating in their lysis step showed interesting results in DNA and RNA extraction, an attempt to improve their NGS results, especially with C. neoformans, was made. Bead beating with an increasing interval time with or without 5-s breaks using a TissueLyser was performed in a cold room (4 • C).
The protocol with 5 min of continuous bead beating provided the best results when assessed by real-time PCR (see Appendix B). The same mock sample (C. neoformans, K. pneumoniae, S. aureus, AdV and ALKV) was extracted with MoBio and Zymo protocols with a 5-min bead beating step and subjected to DNA-Seq (see Figure A1c in the Appendix C).
Comparing the results obtained from 5-min bead beating revealed an improvement in both the number of reads and reference coverage % for all DNA pathogens compared with the previous results using 2 cycles of 25 s. For the 5-min MoBio samples, significant increases were found in the reads of C. neoformans (p = 0.024) and AdV (p = 0.002) and in the reference coverages of K. pneumoniae (p = 0.001) and S. aureus (p ≤ 0.001). For the 5-min Zymo samples, a significant increase in the mapped reads was observed for Adv (p = 0.05) and K. pneumoniae (p = 0.004) and in the reference coverage for K. pneumoniae (p = 0.002). Consequently, the coverage that was already 100% with 25 s × 2 cycles was deepened with 5 min of bead beating. For AdV in the MoBio samples, the mean depth was 30 with 25 s × 2 cycles and became 1828 with 5 min of bead beating ( Figure 2). K. pneumoniae (p = 0.002). Consequently, the coverage that was already 100% with 25 s × 2 cycles was deepened with 5 min of bead beating. For AdV in the MoBio samples, the mean depth was 30 with 25 s × 2 cycles and became 1828 with 5 min of bead beating ( Figure 2). In a comparison between the 5-min bead beating results of the two protocols (Table 6), significant increases were found only for 5-min MoBio in the number of reads of C. neoformans and AdV, with p = 0.01 and 0.008, respectively. To check the ability of MoBio protocol to extract pathogens from different taxa in real clinical sample, a comparison was done with Zymo protocol where both protocols were applied on a clinical throat swab after 5 min bead beating. The results from MoBio protocol were superior to the Zymo protocol where it was able to detect three extra bacteria genera, which were Rothia, Neisseria, and Campylobacter. The metagenomics results (Table 7) showed the ability of both protocols to detect RNA virus (Human metapneumovirus), DNA viruses (ex. Staphylococcus phages), Gram-negative bacteria (as Kingella denitrificans and Pseudomonas aeruginos), and Gram-positive bacteria (as Staphylococcus lugdunensis and Staphylococcus aureus) beside the amoeba Naegleria fowleri.  In a comparison between the 5-min bead beating results of the two protocols (Table 6), significant increases were found only for 5-min MoBio in the number of reads of C. neoformans and AdV, with p = 0.01 and 0.008, respectively. To check the ability of MoBio protocol to extract pathogens from different taxa in real clinical sample, a comparison was done with Zymo protocol where both protocols were applied on a clinical throat swab after 5 min bead beating. The results from MoBio protocol were superior to the Zymo protocol where it was able to detect three extra bacteria genera, which were Rothia, Neisseria, and Campylobacter. The metagenomics results (Table 7) showed the ability of both protocols to detect RNA virus (Human metapneumovirus), DNA viruses (ex. Staphylococcus phages), Gram-negative bacteria (as Kingella denitrificans and Pseudomonas aeruginos), and Gram-positive bacteria (as Staphylococcus lugdunensis and Staphylococcus aureus) beside the amoeba Naegleria fowleri.

Clinical Samples Analysis
Six throat swabs were chosen for clinical metagenomic analysis using the MoBio extraction protocol with 5-min bead beating. One minute of incubation on ice between each minute of beating was added as a precautionary step to avoid RNA degradation for other RNA viruses that were not tested. The generated data were uploaded to the National Center for Biotechnology Information (NCBI) under the Sequence Read Archive (SRA) accession no. PRJNA636773. The generated sequences were submitted to the CosmosID app for analysis. A heat map for the filtered microbes based on frequencies (f) generated from the CosmosID app is shown in Table A3 in the Appendix D.

Sample No. 1
A 4-year-old male was hospitalized in the pediatric intensive care unit of KAUH because of chronic lung disease with previous multiple admissions due to chest infection aspiration. The analysis of his throat swab sample showed pathogens from different taxonomies. After applying CosmosID filtration, the fungal species Candida glabrata, Candida albicans, and Kluyveromyces marxianus were found in the sample with 799 f, 387 f, and 352 f, respectively. Additionally, Gram-negative bacteria were found, which were identified as Moraxella catarrhalis (867 f) and Haemophilus influenzae KR494 serotype f (398 f). Finally, the single-stranded RNA virus Human parainfluenza virus 2 was present with 2363 f. Without filtration, both Gram-positive bacteria Streptococcus pneumoniae and Staphylococcus aureus were detected in the sample with low frequencies (3 f and 10 f, respectively), but their existence was confirmed by the FTD Respiratory Pathogens kit (Figure 3). The colors grade from the red representing the maximum score to green for the minimum score. The gray color represents no score.

Sample No. 2
A 1-year-old male was diagnosed with an unspecified respiratory disorder in the pediatric intensive care unit of KAUH. The H1N1 strain of Influenza A (RNA virus) was detected with 1891 f. Additionally, Moraxella catarrhalis, Staphylococcus aureus, and Streptococcus pneumoniae were detected with frequencies of 535, 242, and 28, respectively. The Gram-negative bacteria Escherichia coli was found in the sample with 23 f. The maximum frequency among all pathogens in the sample was for the Gram-positive bacteria Dolosigranulum pigrum with 85,751 f, followed by the Gram-positive bacteria Rothia mucilaginosa (14,625 f), Actinomyces graevenitzii (12,585 f), and Corynebacterium pseudodiphtheriticum (2429 f), which have been reported to form part of the oropharyngeal flora in opportunistic human pathogen infections [14][15][16]. Three Streptococcus species that are primary inhabitants of the human upper respiratory tract and are also considered to be respiratory pathogens were found in the sample: Streptococcus mitis (758 f), Streptococcus agalactiae (81 f), and Streptococcus pseudopneumoniae (37 f) [17][18][19]. Further tests were done by the hospital on different samples from the patient, and Gram-positive cocci sepsis in the blood and ESBL E. coli from the respiratory sample culture were detected.  [14][15][16]. Three Streptococcus species that are primary inhabitants of the human upper respiratory tract and are also considered to be respiratory pathogens were found in the sample: Streptococcus mitis (758 f), Streptococcus agalactiae (81 f), and Streptococcus pseudopneumoniae (37 f) [17][18][19]. Further tests were done by the hospital on different samples from the patient, and Gram-positive cocci sepsis in the blood and ESBL E. coli from the respiratory sample culture were detected.

Sample No. 3
The third throat swab sample belonged to an 84-year-old male with chronic renal failure who complained of productive cough and presented with widespread inspiratory and expiratory wheezing. Candida albicans was found in the sample with 2965 f in addition to pathogenic bacteria, which were Haemophilus parainfluenzae (

Sample No. 3
The third throat swab sample belonged to an 84-year-old male with chronic renal failure who complained of productive cough and presented with widespread inspiratory and expiratory wheezing. Candida albicans was found in the sample with 2965 f in addition to pathogenic bacteria, which were Haemophilus parainfluenzae (16,865 f), Escherichia coli (588 f), and Streptococcus pneumoniae (230 f). In addition, the DNA virus Human gammaherpesvirus 4 (Epstein-Barr virus) was detected with (173 f). Streptococcus mitis (1275 f), Streptococcus agalactiae (282 f), and Rothia mucilaginosa (1257 f) were also detected in the sample. The maximum frequency (31,095) was found for the Gram-negative bacteria Gemella haemolysans. Other species from the same genus, Gemella morbillorum (2887 f) and Gemella sanguinis (4067 f), were detected in the sample. These three bacteria are known as normal microbiota of the mouth and have been reported as opportunistic pathogens that can cause some severe infections, which often occur in previously damaged tissue [20,21].

Negative Control
A limited number of viruses were detected after CosmosID filtration in the negative control, which underwent the same extraction process, followed by the rest of the steps of the metagenomics protocol, without the depletion step (see Appendix B). They were identified as White clover cryptic virus 2, Red clover cryptic virus 2, Rosellinia necatrix partitivirus, Piscine myocarditis-like virus, and Dill cryptic virus. Only one pathogenic virus was found, Hepatitis C virus genotype 1, without being detected in any sample of the run. The detected frequencies of these microbes in the clinical samples did not exceed 1%, except in sample no. 1 with 5%.

Discussion
Selection of the proper extraction protocol is a critical step for a successful clinical metagenomics procedure that serves to identify etiological agents that were not identified previously with the routinely available techniques. The challenge in this step is adapting an optimized protocol that is capable of extracting nucleic acids from diverse microbial taxa, varying from difficult-to-lyse organisms, such as yeasts, to organisms with easily degradable nucleic acids, such as RNA viruses.
In the present study, we compared a number of extraction protocols to select the best method for nucleic acid isolation from a pool of pathogens with different pathogenic and genomic characteristics to include the major types of possible pathogens that might be found in a natural co-infected clinical sample. The mock sample used in this study include the following pathogens; ALKV, representing enveloped RNA viruses; AdV, representing nonenveloped DNA viruses; S. aureus, representing Gram-positive bacteria; K. pneumoniae, representing Gram-negative bacteria; and C. neoformans, representing encapsulated yeast. On the basis of these structural and genomic differences, each microorganism showed a different yield with each extraction method. Consequently, the identification of an ideal extraction protocol for all types of pathogens was a difficult task. Instead, the efficiency of the kits used in this study was estimated on the basis of the results outcome evaluated by highest number of reads for most of the included organisms, leading to a high coverage of the reference genome.
The reads generated in this study showed that the best results were obtained for the extraction protocols that utilized bead beating in the lysis step. The first protocol was conducted using the PowerViral ® Environmental RNA/DNA Isolation Kit (MoBio samples), and the other protocol was the "Copurification" protocol of the ZymoBIOMICS™ DNA/RNA Miniprep Kit (Zymo samples). Although we used the approach of Leite et al. [30], who performed 2 cycles of 25-s agitation with a 5-s interval break, which they recommended to avoid possible degradation of nucleic acids and which was also reported by others [31,32], we improved the results of the MoBio and Zymo assays by performing the bead beating step at 4 • C and increasing the time to 5 min. Apart from the significant differences for the 5-min MoBio protocol compared with the 5-min Zymo protocol, the volume of the starting material for the MoBio extraction was 20% less than that for the Zymo kits (200 µL vs. 250 µL, respectively), and the number of steps of MoBio was fewer than that of Zymo, reducing the possibility of human error or contamination.
Overall, all the extraction protocols showed ≈100% coverage for AdV and ALKV, regardless of the number of reads that achieved this coverage, which could be the result of their relatively short genomes of ≈35 kb and ≈10 kb, respectively, and the ease of extraction. Furthermore, the "Copurification" protocol of the ZymoBIOMICS™ DNA/RNA Miniprep Kit (Zymo samples) targeting both DNA and RNA was competitive with the parallel protocol for the same kit for extracting either DNA or RNA (ZRNA and ZDNA samples). There were no significant differences between the results of RNA sequencing, except for S. aureus, which had better mapped read results with ZRNA (p = 0.024) but better coverage with Zymo (p ≤ 0.001), or the results of DNA sequencing, except for the reads of S. aureus for Zymo (p = 0.049) and AdV for ZDNA (p ≤ 0.001). Similar results were obtained by Kresse et al. [33], who compared the separate and simultaneous protocols supplied with the truEXTRACT kit and found no preference for one over the other. Given the above data, our study showed that separate extraction did not necessarily lead to better results than the "Copurification" protocols when using the kit for clinical metagenomic investigations, making the "Copurification" protocol more suitable for clinical samples that are limited in quantity.
Additionally, the reads produced from RNA sequencing using the KAPA RNA Hyper prep kit, which utilizes only RNA genomes for library preparation, were appropriate for the detection of all the pathogens used in the mock sample, even organisms with DNA genomes. This result supports the idea that infectious pathogens are transcriptionally active and that DNA pathogens can be identified by sequencing their rRNA [34,35]. Nonetheless, both libraries, DNA sequencing and RNA sequencing, are still recommended for the efficient coverage of all microbial taxa, as there was a significant difference in the DNA sequencing coverage of the bacteria used in the study (p = 0.001) compared with the coverage in RNA sequencing. Confidence in the results also increased when the same pathogen was detected in both libraries, as reported by Simner et al. [36].
The results from the clinical throat swabs showed that the chosen extraction method, which has the ability to extract DNA and RNA from different types of pathogens, avoids the use of more than one extraction method and consequently reduces the necessary quantity of the clinical sample. Moreover, it can help in constructing one protocol for clinical metagenomics, such as the one used in this study, targeting all taxa of etiological agents in the sample.

Conclusions
The above findings suggest that PowerViral ® Environmental RNA/DNA Isolation Kit with cooled 5-min bead beating in its lysis step can be useful for application in clinical metagenomics for samples with multiple infections from different taxa, whether RNA viruses or DNA microbes (fungus, bacteria, and DNA viruses), avoiding the need to use more than one extraction method. We recommend performing a large scale study using PowerViral ® Environmental RNA/DNA Isolation Kit to evaluate and validate the usefulness of this approach for clinical metagenomics application.  Acknowledgments: The authors are thankful to Muhammad Yasir Norwali, Hessa Al-Sharif, Randa Ba-Abdulah, Ahmed Hassan, Mohamed Al-Saadi, Norah Uthman, and Tagreed Al-Subhi for technical assistance. The authors also acknowledge the generous charitable donation from the Late Sheikh Ibraheem Ahmed Azhar in the form of reagents and supplies as a contribution to the scientific research community.

Conflicts of Interest:
The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Bead Beating Optimization
The optimization was performed on a mixture of 1:1 diluted C. neoformans and 1:10 diluted ALKV, representing the toughest structure and the more fragile genome among the spiked microorganisms used in this study; the microbes were diluted in phosphate-buffered saline (PBS) to 200 µL. By following the PowerViral ® Environmental RNA/DNA Isolation kit (MoBio protocol), different bead beating times were used as follows: sample 1 was beaten for 25 s, followed by a 5 s interval and another 25 s of agitation. Then, the beating times were increased consecutively from samples 2 to 11 with an interval of 5 s to reach a total time of 5 min in sample number 11. Finally, sample number 12 was beaten continuously for 5 min.
The results of different bead beating cycles were assessed by real-time PCR before proceeding to NGS. The primers and probes targeting C. neoformans and ALKV are listed in Table A1. The reaction was performed on an Applied Biosystems ® 7500 fast (Applied Biosystems, Singapore) using the QuantiFast-Probe-RT-PCR Kit (Qiagen, Hilden, Germany) and 5 µL of the extracted sample. Table A1. Primers and probes used in this study to detect C. neoformans and ALKV.

Metagenomics Protocol
An establishment protocol was followed that targeted all microbes in a sample, disregarding their taxa or genome type. After extraction using the maximum recommended elution volume (100 µL), a concentration step using RNA Clean & Concentrator™-5 (Zymo Research, Irvine, CA, USA) was applied following the kit instructions to 12 µL. Complementary DNA synthesis was performed using 10 µL of the concentrated yield and mixed with 10 µM of nonribosomal primers, consisting of 96 hexanucleotides chosen by Endoh et al. [39] to convert only RNA viruses to cDNA, leading to rRNA depletion. The mixture was heated to 65 • C for 5 min and incubated on ice for 5 min. Then, 1x of first-strand buffer, 0.1 M of dithiothreitol (DTT), 40 U of RNase OUT (Invitrogen, Carlsbad, CA, USA) and 400 U of SuperScript III Reverse Transcriptase (Invitrogen, Carlsbad, CA, USA) were added to the denatured RNA. The mixture was incubated at 25 • C for 5 min and then at 55 • C for 60 min, followed by enzyme inactivation by increasing the temperature to 70 • C for 15 min. Double-strand DNA synthesis was carried out using Polymerase I, Large (Klenow) Fragment (NEB, Ipswich, MA, USA), by preparing 1x NEBuffer 2, 10 mM of dNTPs, 10 U of Klenow and 20 U of Rnase H (NEB, Ipswich, MA, USA). This mixture was added to the previous reaction and incubated at 37 • C for 60 min, followed by holding at 4 • C. The reaction product was then purified using 1.8x of Agencourt AMPure XP beads (Beckman Coulter, Brea, CA, USA) following the kit instructions. NEBNext Microbiome DNA Enrichment Kit (NEB, Ipswich, MA, USA) was only applied in samples no. 3, 4, and 5 following its instructions to deplete human DNA, followed by a final purification step using DNA Clean & Concentrator (Zymo Research, Irvine, CA, USA) and elution in 35 µL. All purified samples, with or without the depletion step, were subject to library preparation using Nextera DNA Flex Library Prep as described previously.  Appendix D Table A3. Heat map for the filtered microbes in the studied clinical samples based on frequencies generated from the CosmosID app.