How to Name and Classify Your Phage: An Informal Guide

With this informal guide, we try to assist both new and experienced phage researchers through two important stages that follow phage discovery; that is, naming and classification. Providing an appropriate name for a bacteriophage is not as trivial as it sounds, and the effects might be long-lasting in databases and in official taxon names. Phage classification is the responsibility of the Bacterial and Archaeal Viruses Subcommittee (BAVS) of the International Committee on the Taxonomy of Viruses (ICTV). While the BAVS aims at providing a holistic approach to phage taxonomy, for individual researchers who have isolated and sequenced a new phage, this can be a little overwhelming. We are now providing these researchers with an informal guide to phage naming and classification, taking a “bottom-up” approach from the phage isolate level.


Introduction
Virus taxonomy is currently the responsibility of the International Committee on the Taxonomy of Viruses (ICTV, [1]), which published its first report in 1971. The Bacterial and Archaeal Viruses Subcommittee (BAVS) within ICTV holds the responsibility of classifying new prokaryotic viruses. New proposals can be submitted year-round by the public. These taxonomy proposals (TaxoProps) are evaluated by relevant study groups (SGs) and the BAVS [2], and are then discussed and voted on by the executive committee (EC) during the yearly meeting. All ICTV-accepted proposals are finally ratified by the members of the IUMS (International Union of Microbiological Societies) Virology Division through an email vote.
Bacterial virus taxonomy has undergone a number of changes since the discovery of bacteriophages in the early 20th century. Electron microscopy-which led to the recognition of different phage morphologies-and nucleic acid content provided the basis for the first classification scheme [3,4]. Since then, genome composition and morphology have been the major criterion for classification at the family rank, with the current taxonomy comprising 22 families grouping bacterial or archaeal viruses.
For many years, the grouping of prokaryotic viruses in lower rank taxa such as genus and subfamily happened at a minimal pace. Taking the tailed phage families as an example, the 5th Report of ICTV (1991) described one genus in each of the families Myoviridae, Podoviridae, and Siphoviridae [5]. This increased to 16 genera spread over the three families by the 7th Report [6] and 18 genera by the 8th Report [7]. As nucleotide sequencing techniques improved the number of publically available 18 genera by the 8th Report [7]. As nucleotide sequencing techniques improved the number of publically available bacteriophage sequences, researchers started to question the large number of bacteriophage genomes that remained unclassified [8]. These concerns would prove prescient, and in the coming years next generation sequencing methods would spur an explosion in bacteriophage sequencing ( Figure 1). The availability of genome sequence data also gave rise to a range of potential classification or grouping schemes, such as the Phage Proteomic Tree [10], phage network clusters [11], kmer-based grouping [12], signature genes-based grouping [13], or whole genome nucleotide identity-based grouping [14], which were not always compatible with the rules laid out in the ICTV Code and/or the International Code of Virus Classification and Nomenclature (ICVCN). Since the 8th Report of ICTV, both genome and proteome-based methods have been used by the BAVS to classify phages into species, genera, and subfamilies, resulting in 14 subfamilies, 204 genera, and 873 species in the 2015 taxonomy release [15][16][17][18][19][20].
In this paper, we provide a naming and classification guide for researchers who have isolated and sequenced a novel bacteriophage isolate specifically; however, these guidelines can also be applied to archaeal viruses. The guide will follow a "bottom-up" approach (i.e., starting at the species level), rather than the "top-down" approach which was used in the past to assign isolates to a family based on morphology.

Congratulations, You Have Isolated a Bacteriophage!
You have just isolated and sequenced a bacteriophage, so what do you do next? Well, hopefully, you plan on publishing your finding and submitting the sequence to one of the public databases that are part of the International Nucleotide Sequence Database Consortium (INSDC), GenBank, the European Nucleotide Archive (ENA) or the DNA Data Bank of Japan (DDBJ) [21]. That is great, but The availability of genome sequence data also gave rise to a range of potential classification or grouping schemes, such as the Phage Proteomic Tree [10], phage network clusters [11], kmer-based grouping [12], signature genes-based grouping [13], or whole genome nucleotide identity-based grouping [14], which were not always compatible with the rules laid out in the ICTV Code and/or the International Code of Virus Classification and Nomenclature (ICVCN). Since the 8th Report of ICTV, both genome and proteome-based methods have been used by the BAVS to classify phages into species, genera, and subfamilies, resulting in 14 subfamilies, 204 genera, and 873 species in the 2015 taxonomy release [15][16][17][18][19][20].
In this paper, we provide a naming and classification guide for researchers who have isolated and sequenced a novel bacteriophage isolate specifically; however, these guidelines can also be applied to archaeal viruses. The guide will follow a "bottom-up" approach (i.e., starting at the species level), rather than the "top-down" approach which was used in the past to assign isolates to a family based on morphology.

Congratulations, You Have Isolated a Bacteriophage!
You have just isolated and sequenced a bacteriophage, so what do you do next? Well, hopefully, you plan on publishing your finding and submitting the sequence to one of the public databases that are part of the International Nucleotide Sequence Database Consortium (INSDC), GenBank, the European Nucleotide Archive (ENA) or the DNA Data Bank of Japan (DDBJ) [21]. That is great, but now the big question is: "What do I name my phage?" Didn't think about that one, did you? Well, it Viruses 2017, 9, 70 3 of 9 turns out the name that you give to your new virus isolate is pretty important, as it will be used in publications, mentioned in conversations among colleagues, and identify the sequence of your virus in public databases and other resources.
Perhaps the most important rule of bacteriophage naming is "do not use an existing name." There are already four dissimilar bacteriophages named N4, making it very difficult to distinguish between them. So before naming your bacteriophage-and definitely before publishing a report on it-please take the time to compare proposed names against those already used within the field. A good-if not dated-place to start is Bacteriophage Names 2000 [22] or its recent extension, the website Phage Name Check [23]. An up to date list of bacteriophage names can be found by searching the NCBI Nucleotide database [24] with the term "vhost bacteria[filter] AND ddbj_embl_genbank[filter]" [9]. This search will return all bacteriophage isolate names currently associated with sequences in INSDC databases-both those classified by ICTV, as well as those that have yet to undergo official classification.
The current approach to bacteriophage naming is a tripartite construct consisting of the bacterial host genus name, the word "phage", and a unique identifier, for example "Escherichia phage T4". Since the first two components of this naming construct are not unique, the third component is critical to the usability of the name. Leafing through a list of bacteriophage names, it is clear that there are a number of approaches to providing unique identifiers in names. For example, one approach to constructing unique identifiers includes information about phage morphology and host [25]. So the name Escherichia phage vB_EcoM-VR20 denotes a virus of Bacteria, infecting Escherichia coli, with myovirus morphology, named VR20. In this case, the latter part of the name (VR20) will be the phage's common name-you can refer to the phage as VR20 in your paper-and the BAVS will only use this part when naming a taxon (see Section 2.4). One limitation to this approach is that one needs to employ electron microscopy or computational methods to derive the correct morphotype. While there are few hard and fast rules for these terms, please be careful when choosing one, because it is likely to be used as shorthand in a variety of contexts for years to come.
Please use the following bacteriophage naming guidelines: • Always use the complete host genus name, followed by a space, followed by the word "phage", followed by a space, followed by a unique identifier (e.g., Escherichia phage T4).

•
Use only the isolation host genus in the name, rather than higher order taxa names-such as Enterobacteria, Pseudomonad, or the generic Bacteriophage-or lower order taxa names like Staphylococcus aureus DSM 1234. This approach to species naming was officially adopted by the BAVS in 2015 and should also be used in isolate names [ As a caveat to these guidelines, we would like to launch a plea for accuracy and precision in the use and referencing of existing phage names. For example, Salmonella phage Felix O1 has been cited as Felix-O1, FelixO1, Felix 01, and Felix-01. These inconsistencies cause confusion within the community and must be avoided.

What Is the Relationship between Bacteriophage Isolate Names and Taxa Names?
The rules for naming taxa are described in the International Code of Virus Classification and Nomenclature [26]. Typically, the name of a species is based on the name of the first sequenced isolate, which then becomes the type isolate. Current bacteriophage species names replace "phage" in the tripartite isolate name construct with "virus", so the isolate Escherichia phage T4 belongs to the species Escherichia virus T4. Higher order taxa names are derived from unique identifiers used in isolate names as in the genus T4virus. Sometimes these unique identifiers are too similar to existing taxon names, inappropriate, or do not otherwise conform to ICTV taxa naming standards, and a different taxon name must be chosen.

Now It Is Time to Publish Your Phage Sequence
Once you have isolated, sequenced, and named your new bacteriophage, it is time to start thinking about sharing your data with the world. Today, sharing your results is not simply about publishing in a peer-reviewed journal. While such descriptions are central to the scientific process, so, too, is the sequence of your new bacteriophage. Though often overlooked, submitting your new bacteriophage sequence to a public INSDC database such as GenBank is critical to making your sequence publically available for generations to come. Please keep in mind that in this age of bioinformatics and computational biology, it is likely that over time the sequence record for your new bacteriophage will be accessed exponentially more often than a traditional publication. In other words, do your best to provide detailed and accurate information about your phage when you submit the sequence to an INSDC database. This includes providing the most accurate classification data possible.
If known, lineage information should be included in INSDC sequence submissions using the "lineage" field or in a source note. For example, if you have sequenced a new phage that belongs to the species Escherichia virus T4, provide the name of your new virus, e.g., "Escherichia phage My_New_Virus," and the lineage "Viruses; dsDNA viruses, no RNA stage; Caudovirales; Myoviridae; Tevenvirinae; T4virus; Escherichia virus T4". In cases where your new phage cannot be placed in an existing species, provide a lineage that reflects classification. For example, if your new phage belongs to the genus T4virus, provide the lineage "Viruses; dsDNA viruses, no RNA stage; Caudovirales; Myoviridae; Tevenvirinae; T4virus".
Please use the following guidelines when submitting to public databases: • Do include lineage information for all submitted sequences. Even if your bacteriophage is novel and does not belong to a described species, provide the most accurate lineage information possible that places the sequence including genus and/or family using the criteria discussed in this manuscript.

•
Do include accurate genomic composition information when no other lineage information is available or can be inferred. In most cases, it should be possible to place a new isolate within the higher-order dsDNA, ssDNA, dsRNA, or ssRNA lineage groupings. • Do identify prophages using the "proviral" location descriptor.

Classifying Bacteriophage
So why does taxonomy even matter? Well, taxonomy offers a very useful way of aggregating genome sequence data around a collection of genetic and/or molecular attributes. In this way, rules describing taxa are effectively search terms that allow you to retrieve a set of sequences with similar characteristics. Taxonomy also provides context to sequences when searching for sequence similarities. Knowing that a newly sequenced virus is highly similar to a previously classified one immediately tells you something about the new virus-the expected gene content, host range, environmental niche, etc. Bacteriophage classification also supports the organization of genome sequence data within public databases. Each viral species is represented in by at least one "reference" genome in the NCBI Viral Refseq database. Other validated genomes belonging to the same species will be stored as so-called "genome neighbors" of the RefSeq genome [9,28]. This arrangement provides a compressed dataset where each species is represented by one (or more) representative sequences-typically from type isolates-as well as a method for retrieving a set of validated genomes from each viral species.

Does My Phage Represent a New Species?
The first question you need to answer is basic one: "Does my newly sequenced phage belong to an existing species?" The main species demarcation criterion for bacterial and archaeal viruses is currently set at a genome sequence identity of 95%, meaning that two viruses belonging to the same species differ from each other by less than 5% at the nucleotide level. This can be calculated by comparing your sequence to existing phage genomes. There are several tools to do this (e.g., BLASTN [29], PASC [30], Gegenees [31], or EMBOSS Stretcher), but each comparison needs to be checked for genomic synteny. While it is common for larger dsDNA phages to differ in their genome organization, isolates showing high levels of rearrangements can never belong to the same species.
If your phage belongs to an existing species, be sure to specify that taxonomic lineage when depositing the sequence into GenBank or other INSDC databases. If results suggest that your phage represents a novel species, congratulations! The appropriate Study Groups and BAVS will-with your help in providing data-create a new species based on your phage. To place this new species in a higher taxon, we will move up to the genus level in the next section.
We recommend that you alert the appropriate BAVS Study Group chair or the Subcommittee Chair [2], who will advise you on how to proceed. This will generally involve filling out an ICTV Taxonomy Proposal Submission Template (TaxoProp for short), which is available here from the ICTV website [32]. The ICTV website includes examples of completed TaxoProps.

Is My Phage a Member of an Existing Genus?
The BAVS currently describes a genus as a cohesive group of viruses sharing a high degree of nucleotide sequence similarity (>50%), which is distinct from viruses of other genera. For each genus, defining characteristics can be determined, such as average genome length, average number of coding sequences, percentage of shared coding sequences, average number of tRNAs, and the presence of certain signature genes in member viruses. The latter can in turn be used for phylogenetic analysis with other bacterial or archaeal viruses encoding this gene to find monophyletic groups as well as higher order relationships.
All the genera currently in the ICTV database have a taxonomy history (TaxoProp) accessible through the website, which can be used for researchers to assess the genus inclusion criteria. If your phage falls into an existing genus, the BAVS will define the new viral species within the existing genus. If the phage is sufficiently different from existing isolates, we can define a new genus, according to the characteristics described above. The minimum requirements for the creation of a new genus are the description of the average genome characteristics of its proposed members (size, GC content, tRNAs, coding sequences), a nucleotide identity comparison with visualization, a comparison of the predicted proteomes, and phylogenetic analysis of at least one conserved gene-all of these with the appropriate outliers to demonstrate cohesiveness of the new genus.
While you can propose a new genus and species based upon a unique virus, the BAVS generally recommends that genera be established when two or more related viruses have been deposited in one of the INSDC databases.

What about Subfamilies and Families?
In the current taxonomy releases, bacterial and archaeal viruses are classified at the family rank according to the morphology of their virions; for example, phages with short tails are placed in the family Podoviridae. This means that for proper classification, electron micrographs of the viral particles should be taken. Based on the morphology and the genomic information necessary for classification in species and genus, we can now determine whether your isolate falls in an existing subfamily of viruses. If your new phage, in its newly created genus, is genomically or proteomically similar to phages in an existing subfamily, the genus can be added to the subfamily. The criteria for inclusion can vary between subfamilies, and should be consulted from the TaxoProps describing the respective subfamily.
At this time, subfamilies are only created when they add necessary hierarchical information (ICVCN Rule 3.2). In practical terms, this mean that a new subfamily is created when two or more genera show an obvious relation which is not adequately described at the family level. For instance, in the family Podoviridae, the subfamily Autographivirinae groups all podoviruses that contain an RNA polymerase gene in their genome [16]. The requirements for the creation of a new subfamily are not easily defined, but should definitely include the description of at least two clearly related genera within a family, with evidence that the new subfamily is cohesive.
In a genome-based taxonomy, the tailed phage families Myoviridae, Siphoviridae, and Podoviridae have become an artificial "ceiling" prohibiting the accurate description and grouping of the genomic diversity present among their member phages. For example, T4-related phages, infecting a wide range of host bacteria from different phyla, are characterized by the presence of a set of 30 conserved (core) proteins [33], but also have more distant cousins in the Far-T4 group sharing only 10 core proteins [34]. These phages are currently all classified within the family Myoviridae, but have the genomic diversity to represent a new order. Another example involves the "lambdoid phages", comprising both siphoviruses (Escherichia phage Lambda) and podoviruses (Salmonella phage P22), which cannot be grouped together in the same family at this time. The BAVS is therefore working on a new system that would abolish these families in favor of genome/proteome-based family descriptions.

My Phage/Virus Does Not Fit in Anywhere, What Now?
In the very special circumstance that your new phage does not fit in with any known bacterial or archaeal virus, genomically or morphologically, it is the first representative of a new family. In this case, we strongly urge you to contact the BAVS Chair or the chair of an appropriate Study Group to work together to define the demarcation criteria for this new family.

Proposed Software to Use
This is a non-exhaustive list with suggested software to use. The BAVS as a subcommittee is not associated with the developers of the software described below.

Conclusions
Bacteriophage genomics, ecology, and evolution are quickly growing fields, and large numbers of new phages are being discovered, named, sequenced, and deposited into public databases. This poses semantic, logistical, and taxonomical challenges that we have tried to address in this informal guide. It is also important to understand that taxonomy is ever changing because of the unremitting flow of new information. The effort of classification is currently undertaken by a small group of dedicated scientists, but with input from the larger phage community-this means you, dear reader-we can increase the effort while keeping it manageable for each individual researcher.