Characterization of Emerging Swine Viral Diseases through Oxford Nanopore Sequencing Using Senecavirus A as a Model

Emerging viral infectious diseases present a major threat to the global swine industry. Since 2015, Senecavirus A (SVA) has been identified as a cause of vesicular disease in different countries and is considered an emerging disease. Despite the growing concern about SVA, there is a lack of preventive and diagnostic strategies, which is also a problem for all emerging infectious diseases. Using SVA as a model, we demonstrated that Oxford Nanopore MinION sequencing could be used as a robust tool for the investigation and surveillance of emerging viral diseases. Our results identified that MinION sequencing allowed for rapid, unbiased pathogen detection at the species and strain level for clinical cases. SVA whole genome sequences were generated using both direct RNA sequencing and PCR-cDNA sequencing methods, with an optimized consensus accuracy of 94% and 99%, respectively. The advantages of direct RNA sequencing lie in its shorter turnaround time, higher analytical sensitivity and its quantitative relationship between input RNA and output sequencing reads, while PCR-cDNA sequencing excelled at creating highly accurate sequences. This study developed whole genome sequencing methods to facilitate the control of SVA and provide a reference for the timely detection and prevention of other emerging infectious diseases.


Introduction
Emerging and re-emerging viral diseases have had a significant adverse impact on swine production and will be an ongoing challenge for the swine industry. Emerging infections can be caused by previously unknown/undetected agents or known pathogens spreading to a new geographic location or host. The appearance of emerging diseases has usually been characterized by sudden unpredictable outbreaks, which then spread across regions and countries [1]. This feature drives the need for effective emerging disease management via robust pathogen detection (including novel pathogens) and efficient epidemiological surveillance [2][3][4].
Over the last 40 years, emerging pathogens that cause devastating swine diseases include porcine reproductive and respiratory syndrome virus (PRRSV) first described in the late 1980s [5], porcine circovirus type 2 (PCV2) discovered in the late 1990s [6] and more recent porcine epidemic diarrhea virus (PEDV) appearing in the US in the early 2010s [7]. The current introduction of African Swine Fever (ASF) into China, affecting over half of China's swine herds, confirms the significant impact that viral diseases have on the swine industry. To date, ASF has spread from China to neighboring countries and it is very likely to eventually enter other ASF-free regions, such as the United States, despite all attempts to keep it out [8]. The United States is now free of foot-and-mouth disease (FMD),

SVA Samples
An SVA lab isolate (GenBank: MN164664) and clinical samples from swine SVA-positive vesicular fluid were provided by Dr. Fabio A. Vannucci at the University of Minnesota Veterinary Diagnostic Lab. The SVA lab isolate was propagated in cell culture in NCI-H1299 non-small cell lung carcinoma cell line (ATCC CRL-5803) as previously described [31]. Negative swine sera were spiked with the SVA lab isolate to generate "spike-in" samples. We tested 8 clinical samples in total, which were represented by vesicular fluid samples from SVA-positive animals.

RNA Extraction
SVA RNA was extracted from cell culture SVA supernatants (cell culture samples), virus-free pig serum spiked with the SVA lab isolate (spike-in samples), and clinical vesicular fluids (clinical samples) using the QIAamp Viral RNA mini kit (Qiagen, Germantown, MD, USA) following the manufacturer's instructions without the addition of carrier RNA and with a final elution in 50 µL nuclease-free water. The concentration of the viral RNA was performed using a SpeedVac lab concentrator (Savant, NY, USA). A Qubit3.0 fluorometer (Life Technologies, Carlsbad, CA, USA) and a Nanodrop1000 spectrophotometer (Thermo Scientific, Waltham, MA, USA) were used for RNA quantity and quality assessments.

Oxford Nanopore Sequencing
Viral RNA was sequenced using 2 different kits, the direct RNA sequencing kit or the PCR-cDNA sequencing kit (ONT, Oxford, UK). The input RNA for the direct RNA sequencing (DRS) library preparation was isolated SVA RNA with the addition of the RNA calibration strand (RCS, 1314 bp), which was provided in the sequencing kit, to increase the amount of total input RNA which is recommended for optimized results. The RCS is Enolase II mRNA (YHR174W, NCBI Reference Sequence: NC_001140.6) provided at a concentration of approximately 50 ng/µL. Library preparation was performed according to the direct RNA sequencing online protocol (DRS_9080_v2, ONT, Oxford, UK), which includes the addition of a sequencing adaptor ligated at the 3 end of the RNA and is used for initiation of sequencing [32]. The input RNA for the PCR-cDNA sequencing was the extracted SVA RNA only. PCR-cDNA sequencing (PCS) libraries were generated according to the PCR-cDNA sequencing online protocol (PCS_9085_v109, ONT, Oxford, UK). After library preparation, the DRS and PCS libraries were loaded onto a R9.4.1 SpotON flow cell and sequenced using a MinION Mk I sequencer (ONT, Oxford, UK) which was connected to a computer and remotely controlled by the MinKNOW software (ONT, Oxford, UK).
For the genome sequencing of the cell culture SVA lab isolate, two sequencing replicates were performed for both DRS and PCS, with the DRS starting with 60 ng SVA RNA plus 300 ng RCS and 60 ng SVA RNA alone for PCS. For the sequencing runs of clinical samples and negative pig serum spiked with the SVA lab isolate, the same amount of SVA RNA was used for the DRS and PCS library preparation, with the addition of 300 ng RCS to each of the DRS samples to increase the amount of total RNA input amounts for optimized sequencing output. Samples were sequenced for approximately 6 h and the estimated sequence yield was monitored in real time. All samples were sequenced individually. Flow cells were reused following nanopore guidelines until the number of total available pores was under 300. To minimize the potential contamination of previous runs and to protect the accuracy of the detection limit, samples with lower viral titers were sequenced first on a flow cell, followed by those with higher viral levels.

Bioinformatics Analysis
Basecalling was carried out using Guppy (ONT, Oxford, UK). Only raw sequencing reads that passed the quality filter of the Phred quality score ≥7 (pass reads) were used for downstream analysis.
For the DRS, raw reads of RNA Control Strand (RCS, 1314 bp), which was used to enhance library preparation and sequencing performance, were filtered out by turning on the corresponding Guppy parameter. The total yield, pass read yield, read quality, and the read length of raw reads from whole genome sequencing were analyzed using MinIONQC [33], a script written in R to provide quality control for Oxford Nanopore data.
For the sequencing of cell culture samples, raw pass reads were mapped to the SVA reference genome (GenBank: MN164664) using Minimap2 [34], then analyzed using Qualimap [35], generating raw error rates and coverage information which was then visualized using GraphPad prism software (GraphPad Software, San Diego, CA, USA). The reads that mapped the SVA reference genome (SVA reads) were extracted using SAMtools [36], and the SVA yield, average read length, and quality were determined using NanoPlot [37].
For the spike-in and clinical samples, taxonomic analyses at the species level was performed to identify pathogens existing in the sample using What's In My Pot (WIMP), which is provided by ONT's subsidiary Metrichor [38]. An SVA custom database was created to analyze the SVA sequencing reads by downloading all of the SVA whole genomes available from GenBank (132 complete SVA genomes as of March 2019). To detect SVA at the strain level, pass reads were analyzed against this SVA database using the Basic Local Alignment Search Tool (BLAST) [39] to identify the strain with the best match based on BLAST bit score.

SVA Consensus Generation and Optimization
Viral consensus sequences were generated using different assemblers and the results were compared to determine the optimal assembler for each sequencing method. Four different assemblers, Canu [40], Miniasm [34], Racon [41], and wtdbg2 [42], were used to examine the reads from direct RNA sequencing and PCR-cDNA sequencing. For DRS, an optimal consensus sequence was generated, without need for reference or assembly, by extracting the longest read among all sequencing reads as a scaffold and mapping all pass reads to this longest read sequence using Minimap2 [34] followed by consensus generation using Racon [41]. For PCS, de novo assembly was performed using the Canu assembler [40]. After determining the consensus generation strategies for both sequencing methods, optimization was performed in terms of total input sequencing yield and the pre-treatment of raw reads using the cell culture virus in which the whole genome sequence was already known. Groups containing different input sequencing yields, ranging from 0.7 to 70 megabases (Mb), were generated by random selection using fastq-tools-0.8 (https://homes.cs.washington.edu/~dcjones/fastq-tools/) from the dataset of total pass reads. In the same yield group, three subgroups were formed using different raw reads filters; (1) original pass reads (Phred quality ≥7) without further filters; (2) pass reads with a read length >1314 bp to remove short reads and the RCS; and (3) pass reads that can be mapped to the SVA database. The consensus length and accuracy were the two main parameters evaluated for comparison. Consensus accuracy was determined by comparing the consensus genome to the reference genome and was analyzed using the ClustalW pairwise alignment in Geneious v8.0.5 software (https://www.geneious.com, San Diego, CA, USA) [43].
For the spike-in and clinical samples, a consensus sequence was able to be generated for most samples. SVA reads were first extracted by mapping all of the reads against the custom SVA database (generated above), followed by consensus generation using Racon for DRS and Canu for PCS. The consensus length and accuracy were calculated to indicate the performance of consensus generation using varying amounts of input viral copies.

Analytical Sensitivity Determination
Analytical sensitivity was determined for both direct RNA sequencing and PCR-cDNA sequencing using spike-in and clinical samples containing a range of input virus amounts. For spike-in samples, an SVA viral stock was 10-fold serially diluted from 1× to 10,000× to generate decreasing amounts of virus which were then added into the SVA-free pig serum. These spike-in samples ranged from 10 2 to 10 7 viral copies/mL (Ct values ranging from 25 to 10). The Ct value and viral copies for all samples were determined by RT-qPCR at the University of Minnesota Veterinary Diagnostic Lab. For the clinical samples, 8 vesicular fluid clinical samples ranging from 10 2 to 10 6 viral copies/mL (Ct values ranging from 24 to 13) were sequenced. For both sample sets, viral RNA was extracted from 1 ml of sample with half of the sample used for direct RNA sequencing (DRS) and half for PCR-cDNA sequencing (PCS).
For spike-in samples, the SVA strain was determined by blasting raw sequencing reads to the custom SVA database, and then the detected strain was compared to the known reference genome using ClustalW pairwise alignment from the Geneious software [43] to identify the accuracy of the strain level detection. For the clinical samples, the consensus sequence generated from Sanger sequencing was used as a partial reference genome. To obtain the whole genome reference (referred to as "reference sequence"), we performed BLASTn to get the whole genome of the strain in GenBank with the best match. A comparison was made between the strain identified as the best BLAST match of the MinION consensus sequence to the reference sequence. We then performed ClustalW pairwise alignment to compare the "best match strain for the MinION concensus" and the reference genome using Geneious v8.0.5 software (https://www.geneious.com, San Diego, CA, USA) to determine the detection accuracy [46]. To minimize the effect of contamination from previous runs on the accuracy of analytical sensitivity, samples with lower viral titers were sequenced first when using the same flow cell.
A correlation analysis was performed to test if DRS and PCS were quantitative diagnostic methods. The total number of reads varied for each sequencing reaction, thus reads were normalized by calculating the ratio of SVA reads/total reads in order to compare between samples. Linear regression analysis was then performed to determine if there was any correlation using the SVA reads/total reads ratio and the amount of input viral copies using GraphPad prism software (GraphPad Software, San Diego, CA, USA).

Oxford Nanopore MinION Sequencing Data and Analysis Pipelines
The sequencing data were deposited at the NCBI Sequence Read Archive (SRA) and are available under accession numbers: SRR11124084 to SRR11124098.
Detailed information of our pipeline used for analyzing raw sequencing reads can be found at https://github.com/ShaoyuanTan/svaproject.

Assessment of Raw Reads from Direct RNA Sequencing and PCR-cDNA Sequencing
In order to evaluate and compare the general performance of DRS and PCS for viral whole genome recovery, two whole genome sequencing runs using a cell culture-grown virus with a known reference sequence were carried out for each method (Table 1). All runs started with 60 ng of SVA RNA for the library preparation and were sequenced for 6 h. The available pores for each sequencing run were recorded to indicate the condition of the flow cell (Table 1). SVA reads were extracted from total sequencing reads and analyzed. The PCS had a much better performance than DRS in terms of higher SVA yield (DRS 4.5Mb, PCS 66.1Mb), longer average read length (DRS 1267bp, PCS 1721bp), and lower raw error rates (DRS 15.14%, PCS 11.23%) ( Table 1). These differences could be explained by the intrinsic features of Oxford Nanopore DNA sequencing (PCS) and RNA sequencing (DRS), where the latter is a novel technology still under development and DNA sequencing has been well optimized. The main reason for the higher SVA yield from the PCS would be that although the two methods started with the same amount of SVA RNA, PCS involves a PCR amplification step which increases the number of SVA DNA strands available for sequencing. Coverage analysis showed that PCS was able to generate a more even coverage distribution than DRS ( Figure 1). For DRS, significantly more coverage was seen at the 3 end. This uneven distribution of DRS has been observed previously and may be explained due to partially degraded RNA and RNA secondary structures hampering the movement of the RNA through the nanopores, exhibiting higher coverage where sequencing is initiated, which is at the 3 end of the genome [47][48][49].
Viruses 2020, 12, x FOR PEER REVIEW 6 of 17 reference sequence were carried out for each method (Table 1). All runs started with 60 ng of SVA RNA for the library preparation and were sequenced for 6 h. The available pores for each sequencing run were recorded to indicate the condition of the flow cell (Table 1). SVA reads were extracted from total sequencing reads and analyzed. The PCS had a much better performance than DRS in terms of higher SVA yield (DRS 4.5Mb, PCS 66.1Mb), longer average read length (DRS 1267bp, PCS 1721bp), and lower raw error rates (DRS 15.14%, PCS 11.23%) ( Table 1). These differences could be explained by the intrinsic features of Oxford Nanopore DNA sequencing (PCS) and RNA sequencing (DRS), where the latter is a novel technology still under development and DNA sequencing has been well optimized. The main reason for the higher SVA yield from the PCS would be that although the two methods started with the same amount of SVA RNA, PCS involves a PCR amplification step which increases the number of SVA DNA strands available for sequencing. 15.14 ± 0.32 11.23 ± 0.23 * Data shown as the mean ± SD of 2 independent replicates.
Coverage analysis showed that PCS was able to generate a more even coverage distribution than DRS ( Figure 1). For DRS, significantly more coverage was seen at the 3′ end. This uneven distribution of DRS has been observed previously and may be explained due to partially degraded RNA and RNA secondary structures hampering the movement of the RNA through the nanopores, exhibiting higher coverage where sequencing is initiated, which is at the 3′ end of the genome [47][48][49].

Optimization of Consensus Sequence Generation
Different assemblers were tested to determine the best fit assembler for the sequencing data from DRS and PCS. In terms of consensus length and accuracy, DRS datasets were assembled best using Racon [41], and the PCS datasets were assembled best using Canu [40]. After choosing the assembler, pre-assembly read filters were examined to determine the optimal conditions for the generation of an optimized consensus sequence.

Optimization of Consensus Sequence Generation
Different assemblers were tested to determine the best fit assembler for the sequencing data from DRS and PCS. In terms of consensus length and accuracy, DRS datasets were assembled best using Racon [41], and the PCS datasets were assembled best using Canu [40]. After choosing the assembler, pre-assembly read filters were examined to determine the optimal conditions for the generation of an optimized consensus sequence.
Datasets containing different sequencing yields (0.7, 7, and 70 Mb) were generated by randomly selecting reads from the total pass reads dataset. Within the same yield dataset, three groups were generated based on different filters, with group 1 containing all the pass reads (Phred quality >7), group 2 consisting of pass reads with a length filter >1314 bp to remove short reads and all RCS reads (RCS was added to DRS to increase the efficiency of library preparation), and group 3 with pass reads that mapped to the SVA database. The rationale behind the length filter was to test if a dataset with longer reads on average would help with consensus generation, and at the same time to delete all remaining RCS reads. Although the RCS reads should all be removed during basecalling, in fact more than a third of the RCS reads remained in the pass reads dataset due to low filtering efficiency. The rationale for the use of the mappable filter was the assumption that a "less noisy" dataset would be beneficial for SVA assembly and consensus generation, especially in some clinical samples where the desired viral RNA reads would only account for <1% of the total sequencing reads. Using the "70 Mb yield" datasets as an example, we evaluated the effect of the different filters on the read recovery, read length and read quality ( Table 2). The read recovery of the DRS dataset after the length filter (group 2) was 11% of the pass reads (group 2 yield/group 1 yield) and after the SVA mappable filter (group 3) it was 7% of the pass reads (group 3 yield/group 1 yield) ( Table 2). The low recovery was due to the large number of short reads present, mainly RCS reads. Read recovery of the PCS dataset after the length filter (group 2) and SVA mappable filter (group 3) was 75% and 73% of the pass reads (group 1), respectively ( Table 2). The average read length and Phred quality score was greater for PCS than for DRS irrespective of the filters ( Table 2). The average read length of the unfiltered DRS reads (group 1) was especially low and was mainly due to the presence of short RCS reads, which account for >90% of reads that are less than 1314 bp (Table 2). Examination of each read filter at different sequence yields was performed to determine the optimal conditions for the generation of a consensus sequence. For the DRS groups, as the starting yield increased, the length and accuracy of the generated consensus sequence increased ( Table 3). The highest consensus length and accuracy were observed at the 70 Mb yield (~5 Mb of SVA reads, 7% SVA mappable rate) (Tables 2 and 3). At the same yield level, the consensus accuracy and length with different filters were similar, indicating that the sequencing yield is the leading factor for consensus accuracy and length and the raw read filters have minimal influence on results (Table 3). A similar observation was observed for PCS sequencing, as within a sequencing yield, the different filters showed similar consensus length and accuracy (Table 3). However, for PCS, an increase in yield did not always result in better consensus generation, as 70 Mb pass reads generated a lower accuracy and a shorter consensus than that of the 7 Mb read group (Table 3). The most accurate consensus for PCS was generated using a total sequencing yield of 7 mb (~5 Mb of SVA reads, 73% SVA mappable rate) (Tables 2 and 3). Table 3. Performance of consensus generation using different raw read filters at different yields *.

Group 1 Group 2 (Length Filter) Group 3 (SVA Mapped)
Sequencing method While both DRS and PCS can generate a nearly full-length SVA genome, the consensus from PCS achieved a 99% accuracy, much higher than that of DRS which only reached a 94% accuracy (Table 3). In this study, no obvious differences were observed when comparing the different filters using the cell culture samples. However, the filters may be useful in some situations not examined here such as in clinical samples from tissues that would contain a large amount of host RNA. In order to make our pipeline applicable to all sample types, we used the SVA mappable reads (group 3 filter set) for the following spike-in and clinical sample analysis.

Determination of Analytical Sensitivity of Direct RNA and PCR-cDNA Sequencing
The analytical sensitivity of Oxford Nanopore DRS and PCS was evaluated by sequencing spike-in and clinical samples with a range of 4.7 × 10 2 to 1.0 × 10 7 viral copies. After sequencing, the number of total reads from each run was determined (Table 4). In order to detect SVA at the species level in an unbiased and hypothesis-free manner, the taxonomic analysis was performed using WIMP and the number of reads classified as SVA were recorded (Table 4). Results showed that SVA was able to be easily detected using both sequencing methods in both spike-in and clinical samples containing more than 5 × 10 4 total viral copies. Using DRS to investigate spike-in samples, SVA reads were detected in samples with as low as 4.7 × 10 2 SVA viral copies, while for PCS, SVA reads were detected in samples with viral copies of 1.2 × 10 4 or greater. In clinical samples, the detection limit was 9.2 × 10 2 viral copies for DRS and 2.2 × 10 3 viral copies for PCS. The number of total reads indicated the overall performance of sequencing, while the ratio of SVA reads to total reads suggested the presence and abundance of SVA in a sample ( Table 4). As a reference for future experimental design, at least 1 SVA read should be obtained if sequencing a minimum of 5 × 10 4 viral copies from a clinical sample and generating around 10 4 total reads (Table 4). Of note, SVA was not detected using PCS from a clinical sample with viral copies of 1.2 × 10 4 , but was detected in other clinical samples with lower numbers of viral copies (Table 4). This could be explained by poor flowcell performance since few total reads were generated in the 1.2 × 10 4 viral copy clinical sample. For example, even though the 1.2 × 10 4 viral copy sample has five times greater viral copies than the 2.2 × 10 3 viral copy sample, the total reads generated was 16 times less, thus explaining why a sample with a higher viral copy number did not detect SVA; poor sequencing performance and too few total reads generated for this sample. Our observation of varying total and SVA reads generated from samples with similar viral copies and sequencing time indicated inconsistent sequencing output for each run, mostly due to the condition of the flow cell used.
For epidemiological and precise infection control purposes, it is necessary to know not only the infectious virus present, but the strain of the virus-causing disease. Thus, to investigate whether DRS or PCS can identify the strain of SVA that is present in a sample, total reads were BLASTn analyzed against our SVA whole-genome database and the sequence with the best match (top BLAST hit based on bit score) was considered as the SVA strain present in the sample ( Table 4). The percent identity between the top BLAST hit and the known sequence of the sample was determined to identify the accuracy of strain level detection (Table 4). For the spike-in samples, a laboratory strain with a known whole genome reference sequence (MN164664) was used and this sequence was also present in our SVA whole genome database. The MN164664 sequence was compared to the top BLAST hit to determine the percent identity which indicates the accuracy of strain level detection (Table 4). For each of the clinical samples, a partial genome reference sequence was obtained using Sanger sequencing, which was then used to compare with the top BLAST hit, but since these reference sequences are partial sequences, they are not present in our SVA whole-genome database, so we did not expect a 100% identity between the top BLAST hit and our reference sequence (Table 4). Both sequencing methods were 100% accurate when detecting strains for the spike-in samples, in which the reference strain was present in the SVA whole genome database and it was identified as the best match (Table 4). For clinical samples, a comparison of the known partial genome to that of the top BLAST hit showed a sequence identity of 97.0-98.2% for both sequencing methods (Table 4). Some disagreements observed between the DRS and PCS "Best match" genome revealed a limitation of detection accuracy, which can be observed between highly similar strains (Table 4).
Further examination of sequencing accuracy was determined by creating a consensus genome which was then compared to the known reference sequence to determine the sequencing accuracy. All raw reads that were mapped to the SVA database were used to generate a consensus sequence. This consensus sequence (or longest read when no consensus could be generated) was then compared to the known viral reference sequence (Table 4). A nearly complete SVA consensus genome (breadth of coverage >95%) was generated using both DRS and PCS sequencing methods from samples containing 1.1 × 10 6 viral copies or more, giving an accuracy greater than 91% for DRS and greater than 99% for PCS (Table 4). In these experiments, a consensus genome coverage of 95% required a minimum level of 299 SVA reads for DRS and 436 SVA reads for PCS. For samples containing less than 10 6 viral copies, a shorter consensus genome was obtained with lower accuracy (Table 4).
Then, a quantitative relationship between the output SVA reads and the input SVA viral copies was investigated. In order to minimize the inter-sequencing variations, SVA reads were normalized for each sequencing run based on the total reads generated. A correlation analysis between the ratio of SVA reads/total reads and input SVA viral copies was performed. The results showed that DRS had a strong linear regression with an r 2 = 0.99 while PCS had a weak linear regression with an r 2 = 0.54, indicating that DRS was a quantitative method while PCS was not. Considering that PCS contains more steps than DRS that can introduce bias, such as PCR amplification and amplicon selection, this was not surprising.

Discussion
The early and reliable detection of infectious agents as soon as clinical signs are observed is essential for efficient disease control. Delays and misdiagnosis inevitably lead to the spread of disease and escalation of adverse impacts. Prompt actions against an emerging pathogen are especially important because there is usually no existing immunity among the susceptible population, no vaccine, and no specific treatment against the pathogen. However, emerging infections are more difficult to identify since most diagnostics are based on previously known and expected infectious agents and miss unexpected pathogens. Diagnostic methods that are rapid, available at the point-of-care, able to detect new pathogens, and robustly applicable across a wide range of pathogens are greatly needed to effectively fight against emerging eventualities [50,51]. Among all pathogens, RNA viruses have the highest mutation rates, and are anticipated to have the highest possibility to cause the next emerging event [52]. They are also of special concern regarding zoonotic transmission due to their high adaptability to new hosts [53]. In this study, we evaluated Oxford Nanopore MinION sequencing for SVA investigation, aiming to provide insights and tools for the investigation of emerging RNA viral diseases through sequencing and bioinformatics.
Oxford Nanopore provides two methodologies for RNA sequencing: traditional amplicon sequencing (PCR-cDNA sequencing, PCS), which has lower error rates and higher throughput, but involves reverse transcription and PCR amplification, which is time consuming and loses some RNA genome structure information through the process; and direct RNA sequencing (DRS), which is an innovative technique under development that can sequence RNA strands directly, thus eliminating the length limitations possibly coming from reverse transcription and allowing for the detection of nucleic acid base modifications. Both sequencing methods used in this study can be used to detect unknown RNA viral pathogens. However, a poly(A) tail, which is present in the SVA genome, is needed for adapter ligation. Thus, this approach lends itself readily to the sequencing of RNA viruses with a 3 poly(A) tail. Many important swine RNA viral pathogens have a 3 poly(A) tail, such as coronaviruses (porcine epidemic diarrhea virus), picornaviruses (FMDV, SVA) and arteriviruses (PRRSV). The sequencing of RNA pathogens which do not contain a poly(A) tail (such as rotaviruses) can be performed through the enzymatic addition of a 3 poly(A) tail and this step can be added to any sequencing reaction without interfering with the sequencing of samples already containing a poly(A) tail [54]. This study provided a thorough comparison between the PCS and DRS methods, which are summarized in Table 5, aiming to provide guidance on the selection of a sequencing method when in different clinical situations and for different purposes. We identified that PCS is more time consuming, but can generate a more accurate consensus, the advantage of which was especially obvious with higher viral copy number samples. Although DRS was observed to be less accurate, it was quicker to perform and just as sensitive and has unique and promising features such as the detection of nucleic acid modifications, as observed by other studies [55,56]. Despite their differences, both sequencing methods were able to accurately detect SVA at the strain level using raw reads and entry-level bioinformatics analysis (Table 5). Thus, a core sequencing laboratory with data analysis experts was not necessary for the detection of the strain of SVA present, suggesting it could be run on a farm or at least more quickly than other more analysis-intense sequencing methods.
The analytical sensitivity of a diagnostic method gives important information to help guide method selection based upon the situation. The evaluation of the analytical sensitivity of MinION sequencing requires the definition of either sequencing time or the minimum number of total reads. In this study, the idea of a same day report was desired, so a rapid turnaround time frame using only 6 h of sequencing was performed. The analytical sensitivity for both DRS and PCS was shown to be similar with an input of 5 × 10 4 viral copies or more (in 0.5 mL starting material) always generating SVA reads. DRS was slightly more sensitive at approximately 10 2 -10 3 viral copies (per 0.5 mL), while PCS needed approximately 10 3 −10 4 viral copies (per 0.5 mL) to detect SVA. Previously, it had been shown that the MinION sequencing of influenza virus had a detection limit of 10 2 -10 3 genome copies/mL for 48 h of sequencing, showing a similar sensitivity to our DRS experiments [57]. Similar to other studies, we observed a great inconsistency between runs within the same sequencing time frame, which could be caused by factors such as varying flow cell conditions and sample quality [58]. Using the number of sequencing reads as well as sequencing time (monitored in real time using the MinKNOW interface) can help minimize this sequencing run to run variation. In fact, we were able to determine that an input of more than 5 × 10 4 viral copies from a clinical sample and obtaining around 10 4 total sequence reads were needed to generate a minimum of one SVA read. If more SVA reads were desired for other purposes, such as whole genome generation, or if a lower amount of sample was used, then more total reads should be set as a target. From this study, an advantage of direct RNA sequencing over amplicon sequencing, such as PCR-cDNA, was that DRS showed a quantitative relationship between input viral titers and output sequencing reads. Similarly, a strong relationship between influenza viral titers and influenza sequencing reads using Oxford Nanopore direct RNA sequencing technology was observed by other research groups [57]. However, in a hepatitis B virus (HBV) study using Oxford Nanopore amplicon sequencing (which includes a PCR step similar to our PCS protocol), considerable variability in total yields and the proportion of mapped HBV reads between sequencing runs was observed concluding that it was not quantitative [59]. Amplicon sequencing, such as the PCR-cDNA protocol, includes more steps during library preparation, including the amplification and selection of PCR products which could possibly introduce bias, while the process of direct RNA library preparation is simple and straightforward without additional amplification steps.
While most sequencing is generally restricted to large laboratories, the portability of the MinION sequencer makes it suitable for diagnosis in the field. On-site diagnosis can greatly improve emerging infectious disease management, especially considering that emerging disease outbreaks can happen anywhere and are more likely to occur in developing countries or remote areas where there is a lack of veterinary infrastructure, expertise, and diagnostic capacities [4,60]. In fact, several field studies have been conducted to confirm such advantages of a portable sequencer including a Zika virus outbreak in Brazil, a 2015 Ebola outbreak, and a Dengue virus field investigation [28,61,62], concluding that the use of the MinION sequencer was advantageous for rapid in-field disease detection.
There are a few limitations to using these sequencing methods for emerging disease detection. First, our method of species and strain detection largely depends on the genome database, GenBank. While the examination of emerging viral diseases caused by known viruses expanding to new hosts or geographical regions will have viral genome information available in GenBank, previously unknown or newly discovered pathogens will not. However, the sequencing information for these unknown or newly discovered pathogens can be determined following MinION sequencing by carefully examining the unclassified sequencing reads. Second, the analytical sensitivity of MinION sequencing is lower than that of diagnostic PCR assays, but PCR assays are limited to the detection of known pathogens and during an outbreak, high levels of the pathogen should be present allowing for the ease of detection through MinION sequencing. In addition, sequencing, even at the current sensitivity of detection, in the case of new pathogens can be used to support PCR by providing strain information for more effective disease control and for epidemiologic studies to track infection. Third, while this study provided a benchmark and foundation for portable sequencer use in disease diagnostics, there is still more to do to achieve commercial diagnostics, such as the improvement of the accuracy, detection limit and consistency of flow cell performance.
This study evaluated the ability of MinION sequencing for use as a diagnostic tool for the detection of emerging viral diseases in swine by examining SVA infection as a model of an emerging disease. We demonstrated that the portability, easy-operation, low-maintenance MinION platform is an effective tool for the investigation of SVA. We provided a detailed pipeline of our analysis of raw reads to help investigators use this technology (https://github.com/ShaoyuanTan/svaproject). The methods established in this study provide a framework for prompt diagnostics of other emerging viral diseases. Infectious diseases will continue to emerge around the world and it is increasingly important to be prepared for the next outbreak [63].