Full Genomic Characterization of a Saffold Virus Isolated in Peru

While studying respiratory infections of unknown etiology we detected Saffold virus in an oropharyngeal swab collected from a two-year-old female suffering from diarrhea and respiratory illness. The full viral genome recovered by deep sequencing showed 98% identity to a previously described Saffold strain isolated in Japan. Phylogenetic analysis confirmed the Peruvian Saffold strain belongs to genotype 3 and is most closely related to strains that have circulated in Asia. This is the first documented case report of Saffold virus in Peru and the only complete genomic characterization of a Saffold-3 isolate from the Americas.


Introduction
Saffold virus (SAFV) was first described in 2007, after it was isolated from a stool sample taken in San Diego, in 1981, from an eight-month-old female with fever of unknown origin [1]. SAFV belongs to the Cardiovirus genus of the Picornaviridae family. The Cardiovirus genus is composed of two species, encephalomyocarditis virus, for which only one serotype has been reported, and Theilovirus, for which 14 distinct serotypes are known: SAFV 1-11, Theiler's murine encephalomyelitis virus, Thera virus, and Vilyuisk human encephalomyelitis virus [2]. SAFVs are ubiquitous in populations around the world and have been linked to respiratory and gastrointestinal infections early in life [3][4][5][6]. Despite their extensive distribution, only a handful of SAFV genomic sequences from the Americas have been described [7][8][9]. Furthermore, many of the sequences publicly available are not complete genomes, but rather very small fragments of partial VP1 sequences. Aside from its original 1981 isolation in the USA, SAFV-1 has only been reported in Bolivia [9]. Other serotypes, including SAFV-2, 4, and 9, have been reported in Bolivia, Brazil, Canada, and the USA [7][8][9]. For the remaining serotypes, including SAFV-3, 5, 6, 7, 8, 10, and 11, there are no known reports of isolates from the Americas. Here, we provide complete genomic characterization of a SAFV-3 isolate collected from a two-year-old female from the Amazonian area of Maynas, in Loreto, Peru.

Results and Discussion
The SAFV strain described here was isolated from an oropharyngeal swab collected from a two-year-old female who presented with diarrhea, heart murmur, and symptoms of respiratory illness, including headache, sore throat, cough, rhinorrea, and dyspnea. The patient was identified during routine respiratory surveillance efforts carried out by investigators from the U.S. Naval Medical Research Unit No.6 in Peru and neighboring countries, with institutional review board approval from all implicated partners (Protocols NMRCD.2010.0010 and NMRCD.2010.0008). The sample was initially cultured in LLC-MK2, MDCK, and Vero-E6 cells, but by day 18 cytopathic effect (CPE) had only been observed in LLC-MK2 cells and no pathogens had been identified in the original sample using traditional PCR or ELISA-based approaches. Additional analyses of the CPE-positive LLC-MK2 culture supernatant, using a highly multiplexed MassTag PCR approach that can detect over 20 respiratory pathogens simultaneously ( [10], also failed to identify any potential etiology. The sample then entered a pathogen discovery pipeline based on unbiased next-generation sequencing that eventually produced a match to a Japanese isolate of SAFV-3 (Accession #HQ902242.1) with 98% identity at the nt level. The consensus sequence generated ( Figure 1  To further characterize our strain, we conducted phylogenetic analyses of both the full genome (ORF only, 6888 nt, 2296 aa) and the complete viral protein 1 (VP1) (810 nt, 270 aa). This latter region is expected to be under positive selection in order to avoid recognition by the host's immune system, and thus could potentially return different phylogenetic results when compared to the whole genome. To maintain sampling diversity as large as possible, trees were constructed using publicly available reference sequences that represent the totality of the diversity of SAFV strains in terms of genotype, year of isolation, and geographical origin ( Table 1). The full genome tree includes 34 complete genome sequences covering serotypes 1-11, whereas the full VP1 tree includes the same 34 sequences used for the full genome tree plus 7 additional available sequences containing complete VP1 genes. Phylogenetic analyses of both full genome and complete VP1 sequences confirm that the Peruvian SAFV strain collected in 2012 belongs to genotype 3 and is most closely related to Asian strains ( Figure 2). We also constructed an additional VP1 tree, this time containing partial VP1 sequences available from several Bolivian isolates ( Figure S1). Although there are minor differences in the branching patterns of the two VP1 trees, the Peruvian strain remains closely associated with Asian strains within the SAFV-3 group despite the fact that a number of additional South American SAFV strains were included in the analysis. The fact that the Peruvian isolate is most closely related to SAFV strains that have circulated in Asia rather than in Europe, for example, should not be surprising given the available information. Specifically, there are no other reports of SAFV-3 from any country in the Americas, indicating that information from additional American strains will be needed to establish more robust phylogenetic relationships in support of theories of SAFV movements throughout the world. As it stands, the information presented here can only be used to support grouping of the Peruvian SAFV strain into group 3, with particular similarity to Asian strains. Nevertheless, it is worth noting that Peru has historically received a large number of immigrants from Asia, particularly from China and Japan. Along with further characterization of additional strains, this observation could be used to support a theory of introduction of SAFV into the Americas directly from Asia. Each strain is labeled using standard identifiers, including virus type, country of isolation (using ISO country codes), isolate name (if any), year of isolation, and GenBank accession number. Additionally, SAFV1-11 serotypes are color-coded for easy viewing. The Peruvian isolate described here is highlighted in bold and with an arrow. Full genome trees were constructed using a total of 36 publicly available complete genome sequences covering serotypes 1-11. Full VP1 trees were constructed using those same sequences plus 17 additional sequences containing complete VP1 genes. Scale bars represent the number of substitutions per site. penicillin-streptomycin (Gibco, 15140-122. Thermo Fisher Scientific: Waltham, MA, USA). Cultures were maintained at 37 °C and 5% CO2 until was observed.

MassTag PCR
CPE-positive culture supernatants were extracted using the QIAamp cador pathogen mini kit (QIAGEN, 54104. QIAGEN: Valencia, CA, USA) according to the manufacturer's instructions and nucleic acids were analyzed by MassTag PCR essentially as described [10,11]. Briefly, RNAs were reverse-transcribed using SuperScript II (Thermo Fisher, 18064-014) and random primers (Thermo Fisher, 48190-011). Reverse transcription products were amplified using a panel of primers labeled with photocleavable mass codes (Agilent, custom. Agilent Technologies: Santa Clara, CA, USA) targeting Influenza viruses A and B, respiratory syncytial viruses A and B, human parainfluenza viruses 1-4, human metapneumovirus; coronavirus OC43 and 229E, enterovirus, rhinovirus, and adenovirus. Upon removal of unincorporated primers, tags were released by UV irradiation and analyzed using a 6100 Series Single Quadrupole LC/MS System (Agilent Technologies).

Sequencing Library Preparation
CPE-positive culture supernatants were preserved in TRIzol LS (Thermo Fisher, 10296-028) and RNA was extracted using Direct-zol™ RNA MiniPrep kit (Zymo Research, R2050) according to the manufacturer's instructions. RNAs were converted to cDNA and amplified using sequence-independent single primer amplification as described [12] with the following modifications. To enhance coverage of the terminal ends, an oligo containing three rGTP at the 3′ end (GCCGGAGCTCTGCAGATATCGGCCATTATGGCCrGrGrG) and the FR40RV-T primer [12] were added during first-strand synthesis and the reverse transcriptase was changed to Maxima H Minus (Thermo Fisher, EP0751), which has terminal transferase activity that enables addition of the rGTP containing oligo to the 5′ end during cDNA synthesis. Amplicons were sheared to ~400 bp and used as starting material for TRUseq libraries (Illumina FC-121-4001. Illumina, Inc.: Hayward, CA, USA) prepared according to the manufacturer's instructions.

Sequencing and Bioinformatics
Sequencing was performed on a MiSeq using a 300 cycle kit (Illumina MS-102-2002). Cutadapt [13] and Prinseq-lite [14] were used to trim primers and remove poor quality reads, respectively. Reads were assembled into contigs using Ray Meta [15] and annotation was done by BLAST in combination with custom scripts.

Phylogenetics
A number of available SAFV sequences were downloaded from GenBank to serve as references (Table 1). To cover as many genotypes, years, and geographical regions as possible, we considered 34 complete genomes (at least 6888 nt covering all 2296 aa of representative SAFV1-11 isolates, including the Peruvian isolate), 17 complete VP1 sequences (810 nt covering all 270 aa), and seven additional partial VP1 sequences (at least 312 nt) specifically from the Americas. Theiler's murine encephalomyelitis virus and Theilers-like virus genomes were used as outgroups in the analysis. Alignments were constructed using Muscle and trees were generated using the Maximum Likelihood algorithm with 2000 bootstrap replicates in Mega 6.0 [16,17]. Genetic distances were calculated using a General Time Reversible Gamma Distributed model. Table 1. Sequences used for phylogenetic analysis. The Peruvian isolate is highlighted in gray. Sequences from isolates with complete genomes were used for both full genome and VP1 phylogenetic trees. Sequences from isolates with partial VP1 sequences were used for an additional VP1 tree that is almost identical to the complete VP1 tree ( Figure S1). Abbreviations: SAFV-Saffold virus; TMEV-Theiler's murine encephalomyelitis virus.

Conclusions
Our case report constitutes the first isolation of SAFV in Peru and the only complete genomic characterization of a SAFV-3 isolate from the Americas. Although the characteristics of the patient from whom the specimen was isolated agree with previous reports that the virus affects young children and that it is linked to both respiratory and gastrointestinal infections, little is known about the prevalence of SAFV infections in South American populations. At least two serological studies in Peru have shown the presence of neutralizing antibodies to closely related viruses of the Cardiovirusgenus [18,19]. One of these studies reported that 21% of the population of the Amazonian city of Iquitos had neutralizing antibodies to encephalomyocarditis virus [18]. Interestingly, the authors also reported elevated cross-reactivity rates (43.5%), suggesting that many sero-conversions could be due to the presence of closely related members of the Cardiovirus genus, including SAFV. Upcoming studies looking specifically at SAFVs should help elucidate both the prevalence and virulence of these pathogens, particularly in vulnerable populations. In turn, these may allow more robust characterizations of SAFV evolution in the Americas.

Disclaimers
The views expressed in this article are those of the authors and do not reflect the official policy or position of the Department of the Navy, Department of Defense, or the U.S. Government. We are military service members or employees of the U.S. Government. The work was prepared as part of our official duties. Title 17 U.S.C. §105 provides that "Copyright protection under this title is not available for any work of the United States Government." Title 17 U.S.C. §101 defines U.S. Government work as a work prepared by military service members or employees of the U.S. Government as part of that person's official duties. The work was supported by work unit number 847705.82000.25GB.B0016. The study protocol was approved by NAMRU-6's IRB in compliance with all applicable Federal regulations governing the protection of human subjects.