High-Throughput Next-Generation Sequencing Respiratory Viral Panel: A Diagnostic and Epidemiologic Tool for SARS-CoV-2 and Other Viruses

Two serious public health challenges have emerged in the current COVID-19 pandemic namely, deficits in SARS-CoV-2 variant monitoring and neglect of other co-circulating respiratory viruses. Additionally, accurate assessment of the evolution, extent, and dynamics of the outbreak is required to understand the transmission of the virus. To address these challenges, we evaluated 533 samples using a high-throughput next-generation sequencing (NGS) respiratory viral panel (RVP) that includes 40 viral pathogens. The performance metrics revealed a PPA, NPA, and accuracy of 95.98%, 85.96%, and 94.4%, respectively. The clade for pangolin lineage B that contains certain distant variants, including P4715L in ORF1ab, Q57H in ORF3a, and S84L in ORF8 covarying with the D614G spike protein mutation, were the most prevalent early in the pandemic in Georgia, USA. The isolates from the same county formed paraphyletic groups, indicating virus transmission between counties. The study demonstrates the clinical and public health utility of the NGS-RVP to identify novel variants that can provide actionable information to prevent or mitigate emerging viral threats and models that provide insights into viral transmission patterns and predict transmission/resurgence of regional outbreaks as well as providing critical information on co-circulating respiratory viruses that might be independent factors contributing to the global disease burden.


Introduction
The global society, set back by the COVID-19 pandemic, now in its second year, has seen over 150 million cases and over 3.1 million COVID-19-related deaths [1].Conversely, united efforts around the world in the cultural, economic, and scientific realms, notably high-throughput diagnostic solutions, therapeutic options, and, more recently, the massive rollout of vaccinations, with more than 450 M people receiving at least one dose [1], are showing promise toward control of the pandemic.For many countries, having weathered several waves of the pandemic, and with spikes still being anticipated in the future, persistent effort and surveillance are still required.Human activity and variation in viral subtypes that determine transmissibility and pathogenicity have largely been implicated for these trends.Several SARS-CoV-2 variants have been identified, but three recently reported strains are of significant concern.The 20B/501Y.V1 or VOC 202012/01 variant of the B.1.1.7 (α) lineage that is defined by 17 mutations (14 non-synonymous mutations and 3 deletions) was identified in the UK [2,3], the 20C/501Y.V2 strain that emerged independently of the B.1.17 to B1.351 (β) lineage was identified in South Africa, and the B.1.617.2 (δ) variant has emerged most recently [4,5].The VOC 202012/01 SARS-CoV-2 strain was 43-90% more transmissible compared to the preexisting strains and led to "Tier 4" restrictions in December 2020 in the UK, and the delta variant is emerging as a similar threat [6].The emergence of these novel strains imposes the need to sequence the SARS-CoV-2 genome in clinical laboratories across the entire globe, as any lacunae in monitoring the variation in the SARS-CoV-2 genome can lead to serious public health consequences.In addition, another immediate public health deficit pertaining to other respiratory viruses is already of major concern since co-circulating respiratory viruses other than SARS-CoV-2 largely remained undocumented for nearly the entire year of 2020.
In the past, pandemics due to novel pathogens have amplified the incidence of respiratory tract infections (RTI), leading to morbidity and mortality exceeding the seasonal levels of the disease.The Global Burden of Disease (2017) data demonstrated that influenza contributed 11.5% of the total lower respiratory tract infections (LRTIs), leading to over 9 million hospitalizations and 145,000 deaths across all age groups [7].Other viral pathogens, including rhinoviruses, parainfluenza viruses, respiratory syncytial virus, and adenoviruses, also account for respiratory tract infections of varying severity, as either mono-infections or coinfections.Coinfection with viral, bacterial, or fungal pathogens has been associated with disease severity and death in the current pandemic [8].A metaanalysis including 30 studies and 3834 COVID-19 patients of all age groups and settings found that 7% of hospitalized patients had bacterial coinfection, with the most common being mycoplasma pneumonia, Pseudomonas aeruginosa, and Haemophilus influenza, and viral coinfections were identified in 3% of the patients [9].Another meta-analysis found a slightly higher viral coinfection rate of 7% [10].A study conducted in California found a coinfection rate of 20.7% among SARS-CoV-2 positive patients [11], while this was 3.3% in a Chicago study [12].In both of these studies, rhinovirus and enterovirus were the most common.In addition to multiplex RT-PCR assays used to identify coinfections, metagenomics sequencing has demonstrated utility in unbiased identification of coinfection or colonization [13,14].
Owing to the diversion of resources and supplies to SARS-CoV-2 testing, testing of viral pathogens that normally cause seasonal RTI has been largely neglected.The current practice has posed serious public health gaps both at a clinical and epidemiological level, especially now that the transmission and infective mutations are emerging, and the virus is persisting to more transmissible variants.Furthermore, as the COVID-19 vaccination process is underway, the screening of COVID-19 by RT-PCR-based SARS-CoV-2 detection methods needs to be complemented with two additional monitoring measures.The first is sequencing the SARS-CoV-2 genome to identify novel variants that can provide actionable information to prevent or mitigate emerging viral threats, and the second is to test for co-circulating respiratory viruses that might be independent factors contributing to the global disease burden.The use of multiple antimicrobial agents in moderate-severely ill COVID-19 patients can also be rationalized based on such studies.To address these clinical and public health challenges, we aimed to evaluate the performance of a new highthroughput next-generation sequencing respirator viral panel (NGS-RVP) that includes 40 viral pathogens; to analyze viral subtypes or mutational variants of SARS-CoV-2 in the state of Georgia, USA; to develop models to understand the spread of the virus in the state of Georgia; and to assess the other circulating viruses in the same population.All patient samples were tested for SARS-CoV-2 using an assay based on RNA extraction followed by TaqMan-based RT-PCR assay to conduct in vitro transcription of SARS-CoV-2 RNA, DNA amplification, and fluorescence detection (PerkinElmer Inc., Waltham, MA, USA).The assay targets specific genomic regions of SARS-CoV-2, the nucleocapsid (N) gene and ORF1ab with an RNA internal control (IC, bacteriophage MS2), to monitor the processes from nucleic acid extraction to fluorescence detection.The probes are labeled with FAM, ROX, and VIC dyes to differentiate the fluorescent signals from each target.The assay validation was performed as per FDA guidelines, following the manufacturer's protocol.In brief, a 300 µL sample was used for RNA extraction (chemagic 360 instrument, PerkinElmer Inc., Waltham, MA, USA), to which 5 µL internal control, 4 µL poly(A) RNA, 10 µL proteinase K, and 300 µL lysis buffer 1 were added.From 60 µL eluate, the RT-PCR reaction was set up, which included 10 µL of extracted nucleic acid and 5 µL of PCR master mix.PCR was performed using QuantStudio 3 and 5 Real-Time PCR Systems (Thermo Fisher Scientific, Waltham, MA, USA).The LoD of this assay is 20 copies/mL.

Next-Generation Sequencing
Library preparation was performed following the Illumina RNA Prep with Enrichment kits, which leverage BLT technology paired with fast enrichment (cat number 20040537, Illumina, San Diego, CA, USA).Briefly, 8.5 µL of extracted RNA, by the methodology described above, was denatured followed by first-and second-strand DNA synthesis.This was followed by tagmentation, which uses enrichment bead-linked transposomes (BLT) to tagment double-stranded cDNA.This process fragments cDNA and adds adapter sequences.After tagmentation, the fragments were purified and amplified to add index adapter sequences for dual indexing and P7 and P5 sequences for clustering.Four index sets, A, B, C, and D, each containing 96 unique, single-use Illumina DNA/RNA UD Indexes, were used.Following clean-up, libraries were quantified using Invitrogen Qubit dsDNA broad range Assay Kit (Thermo Fischer Scientific, Waltham, MA, USA).Subsequently, tion in BaseSpace Sequence analysis (v3.5.16,Illumina, San Diego, CA, USA) that yielded results for pathogen detection and coverage for each of the viral genomes.In addition, sequences were also analyzed through the Dragen metagenomics pipeline (Illumina) for the detection of viruses and bacteria, in addition to the 40 viruses listed in Table 1.

Performance Metric Evaluation
The performance metric was calculated for both clinical and reference control samples by comparing detection results with the RT-PCR assay results.Seven performance criteria, viz., positive percentage agreement (PPA), negative percentage agreement (NPA), positive predictive value (PPV), negative predictive value (NPV), accuracy, false-negative rate (FNR), and false-positive rate (FPR), were evaluated.

Limit of Detection and Reproducibility Studies
The limit of detection (LoD) studies were conducted as per the FDA guidelines [15].Briefly, SARS-CoV-2 reference material was sequentially diluted and singly sequenced at 107 copies/mL and 106 copies/mL and at 90 copies/mL, 30 copies/mL, and 10 copies/mL in triplicate to evaluate both intra-run and inter-run reproducibility.The lowest concentration detected in all three triplicates was determined as the preliminary LoD.To confirm the LoD, 20 replicates of preliminary LoD were analyzed and deemed as confirmed if at least 19/20 replicates were detected.

Phylogenetic Clustering of Genomes
Inference and visualization of the phylogeny of the SARS-CoV-2 sequences were performed through the Nextstrain Command-Line Interface (CLI) tool, utilizing the associated augur and auspice toolkits [1].To ensure proper phylogenetic inference, the following sequence exclusion criteria were applied: (i) sequences of length less than 23,000 base pairs (~77% of the full genome length) were excluded from the analysis; (ii) sequences without an associated metadata entry were identified and removed; and (iii) after constructing the phylogeny using a skyline coalescent method and setting a fixed clock rate of 7 × 10 −4 substitutions per site per year with a standard deviation of 2 × 10 −4 , temporal outliers outside of four interquartile ranges from the root-to-tip vs. time regression were removed.These steps resulted in a set of 286 sequence tips on the final tree.These parameter decisions were informed by prior analyses of SARS-CoV-2 sequences across North America [2].These choices were validated by the resultant estimate of the Time of Most Recent Common Ancestor (TMRCA) confidence interval, which spans from 2 November 2019 to 26 January 2020.Similarly, the estimated clock rate of 8.55 × 10 −4 substitutions per site per year also falls within the range of estimates found throughout the literature.The metadata for the final tree were integrated to allow for both temporal and geographical visualization in the accompanying Nextstrain instance.

Sequencing Performance
A typical sequencing run of the viral sequencing panel performed on the NextSeq550 platform consists of ~192 samples.In this study, a total of 523 samples were sequenced in 3 runs.The first run included 143 samples sequenced using version one (V1) of the panel.Runs 2 and 3 included 380 samples sequenced using version two (V2) of the panel.The quality metrics and clinical performance were evaluated for V2 of the panel that focuses less reads on human control RNA genes compared to V1, resulting in a more efficient focus on pathogen RNA genes.Runs 2 and 3 resulted in a cluster density of 221 ± 2.1, and the Q30 scores were 90.2% and 91.3%, respectively (Figure 1).
3 runs.The first run included 143 samples sequenced using version one (V1) of the panel.Runs 2 and 3 included 380 samples sequenced using version two (V2) of the panel.The quality metrics and clinical performance were evaluated for V2 of the panel that focuses less reads on human control RNA genes compared to V1, resulting in a more efficient focus on pathogen RNA genes.Runs 2 and 3 resulted in a cluster density of 221 ± 2.1, and the Q30 scores were 90.2% and 91.3%, respectively (Figure 1).

Performance Metric Evaluation/Analytical Performance
The performance metric was calculated using both the clinical samples and the reference control material.The PPA and NPA were found to be 95.98% and 85.96%, respectively.The FPR and FNR were found to be 14.04% and 4.02%, respectively.The accuracy of the assay was found to be 94.47% (Table 2).

Performance Metric Evaluation/Analytical Performance
The performance metric was calculated using both the clinical samples and the reference control material.The PPA and NPA were found to be 95.98% and 85.96%, respectively.The FPR and FNR were found to be 14.04% and 4.02%, respectively.The accuracy of the assay was found to be 94.47% (Table 2).

Limit of Detection and Reproducibility Studies
In the preliminary LoD study, all replicates were detected at the five tested concentrations using the exact biosciences reference control material.The LoD was determined to be 10 copies/mL with all 25 replicates detected.The inter-and intra-run evaluation using 90 copies/mL, 30 copies/mL, and 10 copies/mL of reference material sequenced in triplicates in two different runs demonstrated high reproducibility, as all replicates were detected in both runs (Figure 2).Furthermore, all 25 replicates were detected at 10 copies/mL, demonstrating high inter-run reproducibility (Figure 3).
In the preliminary LoD study, all replicates were detected at the five tested concentrations using the exact biosciences reference control material.The LoD was determined to be 10 copies/mL with all 25 replicates detected.The inter-and intra-run evaluation using 90 copies/mL, 30 copies/mL, and 10 copies/mL of reference material sequenced in triplicates in two different runs demonstrated high reproducibility, as all replicates were detected in both runs (Figure 2).Furthermore, all 25 replicates were detected at 10 copies/mL, demonstrating high inter-run reproducibility (Figure 3).

Co-Circulating Viruses
This study identified that 0.8% (4/483) of patients were infected with viruses other than SARS-CoV-2.Of the four patients, three had coinfection with human enterovirus  In the preliminary LoD study, all replicates were detected at the five tested concentrations using the exact biosciences reference control material.The LoD was determined to be 10 copies/mL with all 25 replicates detected.The inter-and intra-run evaluation using 90 copies/mL, 30 copies/mL, and 10 copies/mL of reference material sequenced in triplicates in two different runs demonstrated high reproducibility, as all replicates were detected in both runs (Figure 2).Furthermore, all 25 replicates were detected at 10 copies/mL, demonstrating high inter-run reproducibility (Figure 3).

Co-Circulating Viruses
This study identified that 0.8% (4/483) of patients were infected with viruses other than SARS-CoV-2.Of the four patients, three had coinfection with human enterovirus

Co-Circulating Viruses
This study identified that 0.8% (4/483) of patients were infected with viruses other than SARS-CoV-2.Of the four patients, three had coinfection with human enterovirus C109 isolate NICA08-4327, WU polyomavirus, and KI polyomavirus Stockholm 60 in addition to SARS-CoV-2.One patient negative for SARS-CoV-2 was found to be infected with human parainfluenza virus 4a.

Phylogenetic Clustering of Genomes
The final instance has been posted through the Nextstrain community platform [16].Using the open-source platform Nextstrain, we built an interactive visualization of the phylogenic analysis of our SARS-CoV-2 isolates.Two major clades for SARS-CoV-2 that were named lineages A and B in pangolin lineage can be identified from our current phylogenetic analysis.Through our Nextstrain thread, it was observed that the clade for pangolin lineage B contains certain distant variants, including P4715L in ORF1ab, Q57H in ORF 3a, and S84L in ORF8 covarying with the D614G spike protein mutation, which were found to be the most prevalent in the early phase of the pandemic in the state of Georgia.In addition, we found that isolates from the same county form paraphyletic groups in our analysis, which indicated virus transmission between counties (Figure 4).

Phylogenetic Clustering of Genomes
The final instance has been posted through the Nextstrain community platform [16].Using the open-source platform Nextstrain, we built an interactive visualization of the phylogenic analysis of our SARS-CoV-2 isolates.Two major clades for SARS-CoV-2 that were named lineages A and B in pangolin lineage can be identified from our current phylogenetic analysis.Through our Nextstrain thread, it was observed that the clade for pangolin lineage B contains certain distant variants, including P4715L in ORF1ab, Q57H in ORF 3a, and S84L in ORF8 covarying with the D614G spike protein mutation, which were found to be the most prevalent in the early phase of the pandemic in the state of Georgia.In addition, we found that isolates from the same county form paraphyletic groups in our analysis, which indicated virus transmission between counties (Figure 4).

Discussion
The COVID-19 pandemic has led to massive socio-economic disruption, with a major impact on healthcare systems across the globe.The magnitude of the pandemic required channeling almost all available resources toward the diagnosis, management, and treatment of COVID-19.The diversion of all resources to cope with the pandemic has led to two major knowledge/epidemiological/public health gaps that could further contribute to the ongoing crisis.The first is the lack of documentation of the variation in the SARS-CoV-2 genome around the world, and the second is the complete neglect of co-circulating viruses in the population that contribute to the global disease burden.This study is an attempt to address these two major public health lacunae by evaluating the NGS-RVP, which can identify mutational variants in the SARS-CoV-2 genome and other co-circulating respiratory viruses in a single assay.
Since December 2019, after the SARS-CoV-2 genome was characterized, various sequencing approaches have been utilized to study the virus and, for some technologies, microbiome and host responses as well.Various NGS technologies have proven their worth not just in the identification of the exact viral etiology and origin of COVID-19 but

Discussion
The COVID-19 pandemic has led to massive socio-economic disruption, with a major impact on healthcare systems across the globe.The magnitude of the pandemic required channeling almost all available resources toward the diagnosis, management, and treatment of COVID-19.The diversion of all resources to cope with the pandemic has led to two major knowledge/epidemiological/public health gaps that could further contribute to the ongoing crisis.The first is the lack of documentation of the variation in the SARS-CoV-2 genome around the world, and the second is the complete neglect of co-circulating viruses in the population that contribute to the global disease burden.This study is an attempt to address these two major public health lacunae by evaluating the NGS-RVP, which can identify mutational variants in the SARS-CoV-2 genome and other co-circulating respiratory viruses in a single assay.
Since December 2019, after the SARS-CoV-2 genome was characterized, various sequencing approaches have been utilized to study the virus and, for some technologies, microbiome and host responses as well.Various NGS technologies have proven their worth not just in the identification of the exact viral etiology and origin of COVID-19 but also in diagnostic assay development, vaccine design, and epidemiologic surveillance of viral transmission and evolution.By employing either shotgun metagenomics, amplicon-based, or hybrid capture-based sequencing on mid-to ultra-high-throughput platforms and singlemolecule sequencing such as nanopore sequencing, complex clinical and research questions have been answered [17][18][19].These techniques have been found to have almost comparable genome coverage, with amplicon-based approaches showing high sensitivity for low viral load samples at a lower cost.Metagenomics and hybrid capture, on the other hand, have the advantage of unbiased sequencing at a high-throughput scale and coinfection identification [20].In this study, we utilized an enrichment workflow, a hybrid-capture-based approach on a high-throughput platform, and a metagenomic bioinformatic pipeline to interrogate patient samples that include saliva samples.
The analytical performance analysis using 522 samples demonstrated the ease of use and clinical utility of the RVP.The assay had a hands-on time of ~10.5 h and an assay time of ~3 days.The sequencing run parameters met the recommended threshold values of the manufacturer, with a high consistency among samples for each parameter (both across runs and within each run).The ability to sequence both NPS and saliva samples simultaneously adds a substantial advantage to any clinical laboratory with respect to time and efficiency.
The performance evaluation demonstrated favorable PPA, NPA, FPR, and FNR, with a high overall accuracy of the panel.The inter-and intra-run evaluation demonstrated high reproducibility, with an LoD of 10 copies/mL, rendering this assay highly sensitive and accurate for the detection of the SARS-CoV-2 genome.It is noteworthy that 88.7% (275/310) and 93.8% (291/310) of positive samples were sequenced with a coverage of more than 90% and 60% of the SARS-CoV-2 genome, respectively, which is essential for the phylogenetic clustering and mutational analysis of the genome.The coinfection rate in this study was low (0.8%), which reflects a low rate of coinfection in this population.Other studies in different populations have found variable rates of coinfection.A study in the New York metropolitan area found a coinfection rate by other respiratory pathogens of less than 3% and infection by other non-SARS-CoV-2 coronaviruses of 13.1% [21].
Genomic sequencing and epidemiological analysis of SARS-CoV-2 are important for a proper understanding of the evolution and spread of this pathogen in Georgia.To this end, we built an interactive visualization of the phylogenetic analysis of our samples using the open-source platform Nextstrain, which could reflect the phylogeny and circulating diversity during the early SARS epidemic in Georgia.Through our Nextstrain thread, we found that the clade for pangolin lineage B that contains certain distant variants covarying with the D614G spike protein mutation had become increasingly prevalent at the early phase of the pandemic in the state of Georgia.Furthermore, the isolates from the same county forming a paraphyletic group suggest a prolonged period of unrecognized community spreading.Such a properly maintained visualization tool can help to improve public health efforts involving continued exploration of the evolutionary relationships among SARS-CoV-2 samples and provide a better understanding of genetic diversity across the whole genome over time.This study demonstrates that genomic epidemiology is essential in predicting disease transmission and pattern of transmission and that it has the potential to recognize the imminent resurgence of a regional outbreak.Given the challenges associated with establishing such a surveillance program, it is essential to develop and maintain the infrastructure for such analysis for future pandemics.
We propose that as the COVID-19 vaccination process is underway, and with COVID-19 diagnosis being focused on symptomatic individuals, it is important that representative samples are sequenced for SARS-CoV-2 genome and variant detection from different regions of a state/country/world.This would provide actionable information to prevent or mitigate emerging viral threats and predict transmission/resurgence of regional outbreaks.Furthermore, it is important to test for co-circulating respiratory viruses that might be independent factors contributing to the global disease burden.
the need for con-sent was waived, all PHI was removed, and the data were anonymized before being accessed for the study.

Data Availability Statement:
All relevant data is made available within the manuscript.

Figure 1 .
Figure 1.Sequencing performance of the viral sequencing panel performed on NextSeq550 platform.In Qscore distribution chart: blue bars represent raw clusters and the green bars represent clusters passing filter.In cluster density chart, each color line represents each lane.

Figure 1 .
Figure 1.Sequencing performance of the viral sequencing panel performed on NextSeq550 platform.In Qscore distribution chart: blue bars represent raw clusters and the green bars represent clusters passing filter.In cluster density chart, each color line represents each lane.

Figure 2 .
Figure 2. Inter-run and intra-run performance of the RVP panel sequenced on NextSeq500/550 (Run 1 and 2 are represented with different bars).

Figure 2 .
Figure 2. Inter-run and intra-run performance of the RVP panel sequenced on NextSeq500/550 (Run 1 and 2 are represented with different bars).

Figure 2 .
Figure 2. Inter-run and intra-run performance of the RVP panel sequenced on NextSeq500/550 (Run 1 and 2 are represented with different bars).

Table 2 .
Performance metric evaluation of the respiratory viral panel.

Table 2 .
Performance metric evaluation of the respiratory viral panel.