From Orphan Phage to a Proposed New Family–The Diversity of N4-Like Viruses

Escherichia phage N4 was isolated in 1966 in Italy and has remained a genomic orphan for a long time. It encodes an extremely large virion-associated RNA polymerase unique for bacterial viruses that became characteristic for this group. In recent years, due to new and relatively inexpensive sequencing techniques the number of publicly available phage genome sequences expanded rapidly. This revealed new members of the N4-like phage group, from 33 members in 2015 to 115 N4-like viruses in 2020. Using new technologies and methods for classification, the Bacterial and Archaeal Viruses Subcommittee of the International Committee on Taxonomy of Viruses (ICTV) has moved the classification and taxonomy of bacterial viruses from mere morphological approaches to genomic and proteomic methods. The analysis of 115 N4-like genomes resulted in a huge reassessment of this group and the proposal of a new family “Schitoviridae”, including eight subfamilies and numerous new genera.


Introduction
Escherichia phage N4 is a virulent phage that was originally isolated by Gian Carlo Schito from sewers in Genoa (Italy) in 1966 [1]. TEM analysis revealed a 70-nm-diameter capsid and a short tail, the characteristic features of a podovirus. Its genome consists of double-stranded DNA and has a length of 70,153 bp with about 400 bp direct repeats and short 3 -noncohesive extensions [2]. Analysis of the N4 genome and its replication revealed unique characteristics. Apart from a phenomenon called lysis-inhibition [3] that causes delayed lysis and a subsequently increased burst size upon infection, further analysis revealed a gene for a large virion-associated RNA polymerase (vRNAP) that became a characteristic for N4-like genomes and the use of in total three DNA-dependent RNA polymerases for transcription that were subject to different scientific questions and thus were intensively studied ( Figure 1) [4]. The vRNAP is injected into the host cell along with DNA [5]. N4 genome is transcribed in three different temporal stages [6]. From a taxonomic perspective, phage N4 has a long history; from the first proposal in 1987 to establish a species Escherichia phage N4 [8] and its subsequent renaming to Escherichia virus N4 in 2015, it has persisted as a genomic orphan. At the time of writing (Master Species List #35, ratified March 2020), its genus is now called Enquatrovirus and consists of only one species, representing four isolates.
Since 2015, following a comprehensive analysis of at that time 33 N4-like genomes [9], the number of publicly available N4-like phage genomes has nearly tripled [10]. The last report of the Bacterial and Archaeal Viruses Subcommittee [11] presented the new taxonomic classifications and reassessments that were achieved in 2018 and 2019 and listed a new order (Tubulavirales), ten new families, 22 new sub-families, 424 new genera and 964 new species, which still represent only a fraction of the genomes currently available. However, it has to be taken into account that ICTV does not classify viral strains or variants, i.e., those phage isolates with genomes that show ≥95% DNA sequence identity with an exemplar isolate of a species [12]. With regard to N4-like viruses, i.e., viruses encoding the vRNAP, only a rather small number of those have been officially classified by the ICTV so far. Currently, they are classified in 10 genera (Baltimorevirus, Enquatrovirus, Gamaleyavirus, Ithacavirus, Johnsonvirus, Jwalphavirus, Litunavirus, Luzseptimavirus, Mukerjeevirus and Shizishanvirus). This study provides further insight into the diversity and taxonomy of N4-like viruses using different approaches like genome-based phylogeny for deeper classification.

Proposal of a New Family
To analyze the similarities or relationship, respectively, between N4-like viruses and other podoviruses, we used ViPTree (https://www.genome.jp/viptree/; [66]) which is originally based on the Phage Proteomic Tree [67]. The results showed that the group of N4-like is clearly monophyletic and forms a distinct clade ( Figure 2). The distinct clustering of the newly proposed family was confirmed with a gene-sharing network analysis using vConTACT2 (Figure 3), where the N4-like viruses cluster clearly separates from all other dsDNA bacterial viruses. In fact, the deep branch lengths in the ViPTree and limited connectedness in the gene-sharing network show that there are no unifying genomic features among all members of the Podoviridae to justify the current membership.  Panproteome analysis revealed that seventeen N4-like proteins are conserved in this proposed family of phages: RNAP 1 (EPNV4_gp15), RNAP 2 (EPNV4_gp16), vRNAP (EPNV4_gp50), EPNV4_gp24, EPNV4_gp25, DNA polymerase (EPNV4_gp39), EPNV4_gp42, DNA primase (EPNV4_gp43), EPNV4_gp44, EPNV4_gp52, EPNV4_gp54, major capsid protein (EPNV4_gp56), tape measure protein (EPNV4_gp57), portal protein (EPNV4_gp59), EPNV4_gp67, large terminase subunit (EPNV4_gp68), and EPNV4_gp69 (Table 2).  Based on the different analyses, we propose a new family "Schitoviridae" in honor of Gian Carlo Schito who isolated Escherichia phage N4, the first isolated species of this group.

Proposal of New Subfamilies and Genera
Results of an all-by-all pairwise nucleotide identity analysis or intergenomic similarity analysis with VIRIDIC gave strong evidence for the proposal of eight new subfamilies and 30 genera which were confirmed by phylogenetic analysis of the terminase large subunit and vRNA polymerase genes, i.e., all proposed taxa are monophyletic in these marker gene trees (Supplementary Figures S1 and  S2). In line with previously established taxa, we used 95% and 70% nucleotide sequence identity over the length of the genome as species and genus demarcation criteria, respectively [11,12,68,69]. At the subfamily level, members of the same subfamily share at least 40% intergenomic distance as calculated with VIRIDIC, with members of different subfamilies sharing little to no nucleotide identity [70].
The proposed subfamily "Migulavirinae" consists of two previously ratified genera (eight species), Litunavirus and Luzseptimavirus, representing phages with Pseudomonas aeruginosa as their host. The subfamily "Enquatrovirinae" contains three genera (14 species), Gamaleyavirus, Enquatrovirus and the newly proposed genus "Kaypoctavirus" and includes phages infecting members of the Enterobacteriaceae like E. coli, Shigella boydii or Klebsiella pneumoniae. N4-like viruses infecting Achromobacter xylosoxidans were grouped into four genera (eight species) in the proposed subfamily "Rothmandenesvirinae" in honour of Lucia Rothman-Denes, who worked on N4 and its RNA polymerases. The subfamily "Erskinevirinae" was named after John M. Erskine who in the early 1970s was one of the first people to isolate phages against Erwinia. It consists of two genera, "Yonginvirus" and Johnsonvirus, with three species and represents most of the N4-like viruses against Erwinia. The relatively large subfamily "Rhodovirinae" consists of seven genera, "Aorunvirus", "Raunefjordvirus", "Aoquinvirus", "Pomeroyivirus", "Sanyabayvirus", "Plymouthvirus" and Baltimorevirus, and contains aquatic viruses infecting members of the Rhodobacteraceae. Two further proposed subfamilies (five genera), "Fuhrmanvirinae" (named after American oceanographer and marine biologist Jed Alan Fuhrman) and "Pontosvirinae", mainly consist of phages against marine Vibrio species. The "Humphriesvirinae" subfamily in honour of James C. Humphries , who was the first to isolate a Klebsiella phage, comprises five genera with viruses infecting different genera of the Enterobacteriaceae like Escherichia, Klebsiella or Salmonella.

Discussion
The constantly rising number of sequences provides the scientific community with valuable data to work with to answer various scientific questions. However, the taxonomic classification of phage genomes has not kept pace which has led to the presence of large numbers of unclassified genomes in the INSDC. While the ICTV makes a huge effort to manage this problem and improvements have been made on the genus and subfamily level (2019: 103 proposals, 2020: 188 proposals submitted [68]), it is clear that at the family level that concerted efforts, both by the ICTV and the wider community of phage biologists are required to address the issue of family-level classification. The creation of the family Herelleviridae from the subfamily Spounavirinae and related phages [69], provided the blueprint for the creation of new families of tailed phages, and the start to the dismantling of the morphology-based families Myoviridae, Siphoviridae and Podoviridae. Following from that example, we used some of the methods trialed and tested for the creation of a new family (Phage Proteomic Tree, vConTACT2) and the delineation of its internal structure (genome-distance comparisons, phylogenetic analysis of signature genes) to define the new family "Schitoviridae" of N4-like phages, to be removed from the family Podoviridae.
For panproteome construction with PIRATE the settings used were 30 and 35% identity threshold, cdhit lowest percentage id of 95 and e-value for BLAST hit filtering of 1E-5. For Proteinortho, the search options were adjusted so that the minimum percent identity and coverage of the best blast hits were 30% and 50%, respectively. All other parameters were left as default.
The CoreGenes5.0 webserver (https://coregenes.ngrok.io/) was used with the OrthoMCL option with E-value of 1e-5. CoreGenes5.0 uses the GET_HOMOLOGUES package to implement the ortholog clustering [76,77]. We considered signature genes to be gene products present in all members of the proposed family where there was consensus between two or more of the analyses.

VIRIDIC Analysis
The Bacterial and Archaeal Viruses Subcommittee uses nucleotide based sequence similarities as a crucial feature for taxonomic classification of viruses at the ranks of species and genus. We therefore employed the online tool VIRIDIC (Virus Intergenomic Distance Calculator, http://rhea.icbm.unioldenburg.de/VIRIDIC/) [70] for the calculation of pairwise intergenomic similarities amongst the phage genomes of this study. We have chosen 95% DNA sequence identity as the criterion for demarcation of species in genera. Each of the proposed species differs from the others with more than 5% at the DNA level. For the demarcation of genera and subfamilies, we have chosen 70% and 40% DNA sequence identity, respectively. Based on this analysis, new genera and subfamilies were identified (Supplementary Table S1).

Supplementary Materials:
The following are available online at http://www.mdpi.com/2079-6382/9/10/663/s1, Figure S1: Phylogenetic analysis using the terminase protein sequences of N4-like phages, respectively. The amino acid sequences were compared using MUSCLE with MEGA7 [78]. The tree was constructed using the maximum likelihood algorithm. The percentages of replicate trees were assessed with the bootstrap test (100).; Figure S2: Phylogenetic analysis using the vRNA polymerase protein sequences of N4-like phages, respectively. The amino acid sequences were compared using MUSCLE with MEGA7 [78]. The tree was constructed using the maximum likelihood algorithm. The percentages of replicate trees were assessed with the bootstrap test (100); Figure S3: ViPTree analysis of N4-like viruses with related podoviruses, Table S1: VIRIDIC analysis of N4-like phages.