RNA Phage Biology in a Metagenomic Era

The number of novel bacteriophage sequences has expanded significantly as a result of many metagenomic studies of phage populations in diverse environments. Most of these novel sequences bear little or no homology to existing databases (referred to as the “viral dark matter”). Also, these sequences are primarily derived from DNA-encoded bacteriophages (phages) with few RNA phages included. Despite the rapid advancements in high-throughput sequencing, few studies enrich for RNA viruses, i.e., target viral rather than cellular fraction and/or RNA rather than DNA via a reverse transcriptase step, in an attempt to capture the RNA viruses present in a microbial communities. It is timely to compile existing and relevant information about RNA phages to provide an insight into many of their important biological features, which should aid in sequence-based discovery and in their subsequent annotation. Without comprehensive studies, the biological significance of RNA phages has been largely ignored. Future bacteriophage studies should be adapted to ensure they are properly represented in phageomic studies.


Introduction
Bacteriophages, commonly known as "phages", are the most abundant biological entities on the planet, with approximately 10 31 in the biosphere [1]. Phages were independently identified in 1915 by Twort and in 1917 by d'Hérelle [2,3]. They are viruses which can alter microbial populations, with a major role in diversity patterns of microbial populations [4]. They were first recorded as antibacterial agents by d'Hérelle and quickly developed into clinical aids against bacterial infections, particularly across Eastern Europe [3,5,6]. The first known RNA phage, f2, which infects Escherichia coli, was described more than 40 years after the discovery of DNA phages [7]. Weissmann (1974), suggested that RNA phages offered a means to examine basic biological processes at an in-depth molecular level [8]. Since their identification, RNA phages have served as valuable models for understanding not just essential viral processes but also fundamental molecular mechanisms such as RNA genome replication, translational control, and gene regulation [9][10][11][12].
The RNA phage MS2, isolated by Alvin John Clark in 1961 and highly similar phage f2 [13], have become key models in molecular biology and genetics. The MS2 phage coat protein gene was the first gene to be completely sequenced in 1972 by Friers and his colleagues [14]. In addition, the genome of the MS2 phage was the first to be fully sequenced in 1976, also by Walter Friers and colleagues [15]. This preceded the sequencing of the first DNA based genome of phage phiX174 in 1977 [16]. RNA phages have also provided scientists with a model system for understanding the biology of many on RNA-dependent RNA-polymerase (RdRp) analysis. This was the first report of an RNA phage with a natural affinity for a Gram-positive host. In addition, a recent pre-print has described an RNA virus, a planarian-infecting Nidovirales, with a genome of 41.1 kb in length, significantly longer than the previous largest RNA virus genome of 30 kb [30]. These findings highlight that there are certainly many RNA viruses yet to be discovered and described, including RNA phages.
This review focuses on examining known RNA phages, both dsRNA and ssRNA, which target bacterial cells. Outlined are their mechanisms of adsorption through to the release of progeny. Future endeavors may use conserved features of RNA phages as genetic signatures to aid in prospective metagenomic exploration of RNA phages in the "dark matter" via sequence-based targeting.

Cystoviridae
Currently there are seven recognized species of Cystoviridae listed in the 2017 ICTV report. The type species of Cystovirus family is phi6, which for a long time was thought to be unique as a dsRNA phage. The Cystoviridae have a tri-segmented, linear dsRNA genome, with the concatenated genome varying size from 12.7 kb (phi2954) to 15.0 kb (phi8). Individual genome segments range in size from 2.9 kb to 6.4 kb ( Figure 1). The three genome segments, large (L), medium (M) and small (S) are transcribed into separate polycistronic mRNAs that are predicted to be translated by the host machinery into 12 proteins. A lipid membrane envelops a double-layered proteinaceous nucleocapsid (NC) [31]. Cystoviridae genes are ordered into functional units within the segments: L-segment contains genes for the virion core (P1, P2, P4, and P7), the M-segment encodes the complex essential for host recognition (P3 and P6) and the S-segment is responsible for the shell protein of the nucleocapsid (P8 (except in phi8), P9, P12, and P5) [34][35][36][37][38][39][40][41][42]. P5 and P11 are transcript variants of the same gene [43]. The noncoding regions that flank the coding sequences within the segments are required for efficient genome replication and packaging. The 5 untranslated region (UTR) of the plus strand region encodes a cis-acting RNA sequence known as the pac sequence [44]. The segment-specific pac sequence is composed of 200 nucleotides located within a number of stem-loop structures. The pac sequences act in unison with other fundamental structural elements to ensure the correct packaging of the genome when required.
The integral-membrane, fusogenic P6 protein is responsible for securing the receptor-binding protein of Cystoviridae, P3, to the viral envelope. It is this multimeric spike protein, P3, which enables the recognition of the host bacteria receptor pilin, the protein monomer making up bacterial pili, by the phi6 phage. The P3 protein of phages phi8, phi12, phi13, and phiYY have been suggested to be a single polypeptide or a multimer [45]. The P3 protein of phi6 adsorbs to host type IV pili of its target, Pseudomonas syringae, which then retracts to bring the phage into close proximity of the host membrane [46,47]. This form of attachment is also exploited by phiNN and phi2954 [37,40]. Other members of Cystoviridae, such as phi8, phi12, phi13, and phiYY, utilize their heteromeric P3 protein to attach to the lipopolysaccharide (LPS) on the cell surface [48]. The P3 protein of these species differs in its composition as it contains two or three different polypeptides (P3a, P3b and, in some cases, P3c). The P6 protein is activated following the removal of P3 and then mediates the fusion of the viral membrane with the host membrane to release the NC into the periplasmic space.
The loss of viral membrane around the NC enables the muralytic (peptidoglycan-degrading) enzyme P5, located on the NC surface, to degrade the peptidoglycan layer of the bacterial cell wall [49,50]. The permeabilization of the host plasma membrane facilitates the translocation of the NC across the cytoplasmic membrane of the host cell through an endocytosis-like process, driven by P8 [51,52]. Upon entry into the cytoplasm, the P8 shell of the NC dissociates to reveal the naked dodecahedral polymerase complex (PC). The release of P8 stimulates the PC, which is transcriptionally active. This is the characteristic mechanism dsRNA viruses exploit in order to replicate their genome-delivery of the nucleic acids in a specialized icosahedral capsule containing the necessary RNA metabolism enzymes such as mRNA synthesizing enzymes. This nano-compartment enables the dsRNA genome to remain "hidden" from any antiviral mechanisms of the host and avoids dsRNA induced host responses [53]. It also provides a safe environment for phage replication and translation. The dimeric P7 protein acts as an assembly and packaging cofactor by accelerating the rate of immature PC assembly through stabilization of the entire complex [54,55].
The core particle is composed of P1, P2, P4, and P7. These proteins are involved in the transcription of the phi6 genome. The monomeric RdRp of the P2 gene is activated by PC entry into the cytoplasm. This enzyme catalyzes the semi-conservative transcription of polycistronic mRNAs within the core particle [56]. Bacterial hosts lack the capability to synthesize complementary strands from the RNA template, so all characterized RNA viruses, including phages, encode their own enzymes. The RdRp attaches to the 3 end of the single-stranded mRNA transcripts and through primer-independent de novo initiation it efficiently replicates and transcribes the phage genome [57,58]. The suggested transcription mechanism involves the dsRNA genome unwinding as it is pulled through one channel of P2 and nucleotide triphosphates (NTPs), oligonucleotides, manganese (Mn 2+ ) and magnesium (Mg 2+ ) ions entering through another [59]. Initially the template strand overextends and it locks into a "specificity pocket" [59]. The strand then reverses, in the presence of two cognate NTPs, to form the functional initiation complex. The reaction is primed through the activity of one of the NTPs as it serves as the carboxyl-terminal domain of the protein. It has been suggested that Cystoviridae control transcription through an interchange of two independent mechanisms (a) plus-sense initiation sites are preferred by the polymerase and (b) initiation competent ssRNA templates have more available transcription initiation sites [60]. Initiation is the rate-limiting step of transcription, located at the 3 -terminal cytidine nucleotide of the −ve ssRNA template.
By directly releasing the mRNA transcripts into the cytoplasm, the dsRNA genome is never exposed to the host cytoplasm which helps the phage to avoid host defense mechanism activation. The mRNA transcripts are used as templates for translation of the necessary proteins. The early stage of infection is characterized by equal amounts of mRNA from L, M, and S segments [61,62]. However, only the L-segment transcripts are efficiently produced in this early stage, to give rise to an increased level of PC proteins and the formation of empty PCs.
The large free-strand +ve ssRNA is then translated to form P1, P2, P4, and P7, which are subsequently assembled to form empty PCs [43]. The hexameric nucleoside triphosphatase (NTPase) motor of P4 directs the bundling of the three genome segments in the form of +ve ssRNA into the empty PCs by recognition of 5 pac sequences [44,63]. This packaging is controlled through the expression of segment-specific binding sites on the PC. Binding sites specific for the S-segment are exposed initially to allow P4 to package the S-segment into the empty PC. A conformational change of the PC alters the binding site to become M-segment specific to package this segment of the genome into the viral progeny. Another change allows the packaging of the L-segment. Once the PC expands to a threshold size, these +ve ssRNA transcripts are then converted into dsRNA by a single round of negative strand synthesis of RdRp P2 [64]. Studies by Pirttimaa and colleagues (2002) found that of the 12 P4 hexamers, one is both functionally and structurally unique [65]. Although studies focused on the basic molecular mechanisms of phages have exploded in recent years, the exact transcriptional and translational processes of Cystoviridae are yet to be fully described in exact detail.
The size and organization of this PC is regulated through the activity of inner capsid protein P1 and P4 [55,66,67]. P1 is conserved throughout dsRNA viruses, although it appears to vary in multimeric status [55,68]. Transcription is initiated following effective replication of the dsRNA genome. As the infection progresses, the M and S segment mRNA predominate to produce the proteins essential to virion assembly. The naked PC is encapsulated in a newly synthesized NC shell. The membrane protein P9, along with morphogenic P12, have crucial roles in construction of a new phospholipid membrane around the NC particle from the host plasma membrane [69]. The spike protein complex of P3 and P6 is the last component attached to the surface, to ensure the progeny are capable of receptor recognition.
Cystoviridae are categorized as virulent phages as they induce lysis of their host bacterium at the end of the infection cycle in order to release viral progeny, through P5 and P10 activity [49,50]. However, recent findings have shown that phi6 is capable of forming a pseudolysogenic carrier state within its host [70]. Cystoviridae species phage phi6 targets the Gram-negative bacterium, Pseudomonas syringae, an important plant pathogen. This phage was first isolated in the 1970's in the USA from Pseudomonas-infected bean straw [71].
There have recently been six additional Cystoviridae isolated and characterized in the 2017 ICTV report with another five requiring further analysis. Their genetic and structural similarities with phage phi6 suggest that there will be an expansion of this phage taxonomic family with further classification required. Sampling of various legumes in the USA have resulted in the isolation of additional dsRNA phages but these have not been characterized beyond their sequences [48,72,73]. Assorted environmental sources in Europe and Asia have yielded more novel dsRNA phages: Pseudomonas phage phiNN was isolated from a freshwater sample in Finland, while Pseudomonas phage phiYY came from hospital sewage waste in China [37,42]. Phage isolate phiYY has been found to target P. aeruginosa strains, an opportunistic pathogen of immuno-compromised individuals. This suggests there may be potential to develop a phage therapy to combat Pseudomonas infections in these individuals.
It is clear from the recent isolations of Cystoviridae from multiple environments, with only a single member infecting a Gram-positive host, that there are many more RNA phages yet to be discovered. Recently, Alphonse and Ghose (2017) examined known Cystoviridae using their encoded RdRp [32]. While ssRNA phage genomes have high mutation rates [74], RdRp appears to be conserved amongst RNA phage genomes and thus might be a good candidate as a genetic signature to identify further RNA phage sequences. However, identification of Cystoviridae in metagenomic datasets using a marker such as the RdRp is complicated by the tri-segmented nature of the Cystoviridae genomes. Therefore, sequence based detection of all three genomic segments of Cystoviridae, particularly if they are divergent from sequences present in public repositories, will be challenging. Incorporation of genetic tags from each of the three segments will greatly enhance de novo efforts of finding Cystoviridae members.

Leviviridae
The Leviviridae family encompass phages with a positive-sense single stranded, monopartite RNA genome of 3.3-4.3 kb in length. The nonenveloped, somewhat spherical virion capsid is composed of 178 copies of the dimeric coat protein (CP) and a single copy of the maturation protein ( Figure 2). The 5 end of the genome carries a triphosphate cap. challenging. Incorporation of genetic tags from each of the three segments will greatly enhance de novo efforts of finding Cystoviridae members.

Leviviridae
The Leviviridae family encompass phages with a positive-sense single stranded, monopartite RNA genome of 3.3-4.3 kb in length. The nonenveloped, somewhat spherical virion capsid is composed of 178 copies of the dimeric coat protein (CP) and a single copy of the maturation protein ( Figure 2). The 5′ end of the genome carries a triphosphate cap.  [27,75]).
There are two genera of Leviviridae; the Levivirus and Allolevivirus. These genera were historically differentiated through serological cross-reactivity, sedimentation, molecular weight and density [76]. More recently, the number of known genes in their genomes have been used to distinguish between Levivirus and Allolevivirus members, with three and four, respectively ( Figure 2). These genera are subdivided into genogroups; Levivirus has MS2-like (genogroup I) and BZ13-like (genogroup II) and Allolevivirus has Qß-like (genogroup III) and F1-like (genogroup IV) [27].
Leviviridae phages that target E. coli, "coliphage", which are male-specific, adsorb along the fertility (F) pilus, coded by the F-plasmid of Escherichia coli, or the chromosomal marker Hfr, whereas in non-coliphage species alternative pili are exploited [77]. Alternatively, coliphages that can infect cells via the cell wall are classified as somatic [78]. The presence of enteroviruses in water from pollution is often detected through the identification of RNA coliphages as biomarkers [79]. Phages that utilize F-pili are classified as male-specific phages. The way in which the Leviviridae phages induce lysis of their host is a notable difference between the genera; Levivirus phages encode a There are two genera of Leviviridae; the Levivirus and Allolevivirus. These genera were historically differentiated through serological cross-reactivity, sedimentation, molecular weight and density [76]. More recently, the number of known genes in their genomes have been used to distinguish between Levivirus and Allolevivirus members, with three and four, respectively ( Figure 2). These genera are subdivided into genogroups; Levivirus has MS2-like (genogroup I) and BZ13-like (genogroup II) and Allolevivirus has Qß-like (genogroup III) and F1-like (genogroup IV) [27].
Leviviridae phages that target E. coli, "coliphage", which are male-specific, adsorb along the fertility (F) pilus, coded by the F-plasmid of Escherichia coli, or the chromosomal marker Hfr, whereas in non-coliphage species alternative pili are exploited [77]. Alternatively, coliphages that can infect cells via the cell wall are classified as somatic [78]. The presence of enteroviruses in water from pollution is often detected through the identification of RNA coliphages as biomarkers [79]. Phages that utilize F-pili are classified as male-specific phages. The way in which the Leviviridae phages induce lysis of their host is a notable difference between the genera; Levivirus phages encode a separate lysis polypeptide, whereas Allolevivirus phages utilise their maturation protein in lysis mediation [80,81]. These proteins are two canonical "single gene lysis" (SGL) systems that are utilised by small phages, the third is the E lysin from phage φX174, a ssDNA Microviridae representative [82]. The lysis mechanism, and specific protein where applicable, is fundamental to the lifecycle of the phage.

Levivirus
The type species of Levivirus is the Enterobacteria phage MS2, a member of the MS2-like phages (genogroup I). Phages of the Levivirus genus infect their host targets through the initial adsorption of the virion along the sides of pili using the "maturation A-protein" (Mat L ) as the receptor binding protein [83]. This results in the self-proteolytic cleavage of the A-protein into at least two fragments and a structural change of the F-pilus [84]. This induces the release of the phage RNA into the host bacterium. Studies have reported that the two largest polypeptide components are transferred into the host along with the genomic RNA [84]. The fragmented Mat L binds the RNA at two distinct regions: the Mat L coding region and the 3 -UTR [85]. It appears that Mat L -RNA complex may be injected into the cell as opposed to free RNA, suggesting that the Mat L protein may have a greater biological role than originally envisaged [84]. The exact mechanism of how the Mat L -RNA complex gains entry to the host remains undescribed, but could involve a type IV secretion system (T4SS) homolog [86]. It has been postulated that Mat L may also contribute to the replication process of the RNA genome.
As the nucleic acid is a single copy of +ve ssRNA, it functions both as the genome template and mRNA upon infection. Thus, there is constant competition between replication and translation processes as the ribosome and replicase run in opposite directions along the template strand [87]. The two events are independent of each other with the secondary structures of the +ve ssRNA strand and formation of a complementary negative strand of RNA maintaining this equilibrium. It has been noted that in the 3 -terminal sequence of the Levivirus genomes, there is a signature sequence of 5 -ACCACCCA-3 [88].
For effective genome replication, Leviviruses encode a copy of RdRp that codes for the catalytic ß-subunit of the replicase. This protein associates with three host proteins: ribosomal protein S1 [89] and the translational elongation factors EF-Tu and EF-Ts [90], to form a functional polymerase unit, the holoenzyme. The role of EF-Tu has been established as delivering an aminoacyl-tRNA to the ribosome when in its GTP-bound form [91]. This GTP is hydrolyzed to form GDP-bound EF-Tu following a codon anti-codon match within the ribosomal complex. This displaces the EF-Tu and EF-Ts binds to the GDP-bound EF-Tu and removes the GDP molecule. This allows the EF-Tu to be recycled for further elongation rounds [92,93]. Sequestration of these elongation factors inhibits initiation of translation. The S1 protein functions as a translational initiation factor. The sole purpose of this protein is to recognize the template plus strand, the core-complex of the three remaining proteins is sufficient to synthesize new +ve ssRNA strands [94].
Studies have shown that there are two internal sequences which are key to the recognition of the plus strand by the replicase, the S site and M site. [95]. The S site is described as being a uracil rich sequence of approximately 100 nucleotides, located just before the initiation codon of the coat protein. The secondary structure of the S site is poorly defined. The M site is of similar length, forms a branched stem-loop structure and resides within the replicase coding region [96]. These two sites are simultaneously bound by the S1 protein to allow for effective replication by the replicase through enhanced recognition of the template to the active site [97].
The RNA template is protected from cellular nuclease degradation through an unknown mechanism. There is an additional host factor required for successful translation that has been isolated but not genetically identified in the case of Levivirus species. This protein does not interact with the polymerase machinery but instead binds directly to the 3 terminal of the mRNA template [98]. The replicase will associate to the start site and initiate negative-strand synthesis by replicating through the genome. This strand is used to synthesize new +ve ssRNA genomes for the viral progeny.
As the infection cycle reaches the end-stage, the CP-dimers bind the replicase gene start site, located within a hairpin-structured operator, and act as translational repressors [99,100]. This results in a packaging signal that stimulates the assembly of functional viral progeny. At the same time, there is an increase in quantities of the lysis protein, with a single lysis protein required for each phage progeny. Since the lysis protein lyses the cell without affecting the integrity of the peptidoglycan network, and in the absence of muralytic enzyme activity, it is referred to as an "amurin" [101]. Research focused on this protein has revealed that it is primarily localized in Bayer's patches, the periplasmic zones of adhesion between the inner and outer membrane [102]. The exact mechanism by which this 75-amino acid lysin induces host lysis is not exactly known [103]. However, the current proposal is that the lysis protein forms lesions and hydrophobic pores in the inner membrane that dissipates the proton motive force (PMF) [104]. This alteration in PMF activates autolysis of the bacterial host through certain enzymes such as DD-endopeptidases and lytic transglycosylases. Supporting research has shown alteration in the average length in glycan strands and degree of cross-linkage, suggesting the activation of the aforementioned enzymes [105].
Nonetheless, the molecular information and functioning schema of such an autolytic pathway have yet to be identified [82]. Recent findings have indicated that the lysis of host cells by MS2 lysin is dependent on a range of host factors, including host chaperone DnaJ [106]. This post-translational regulator allows for another level of control of both quantity and activity of the lysis protein of MS2.
Translation of the phage genes requires the ribosome to associate with the RNA through a Shine-Dalgarno sequence, the start codon and the host ribosomal S1 protein. The S1 protein can bind the S site, as mentioned above. This creates a situation whereby the S1 protein of the ribosome and replicase are competing for the same RNA binding site.
There are a variety of systems that regulate protein synthesis, including: RNA secondary structure, ribosome access to the initiation codon, and folding kinetics [107][108][109]. The secondary structures of the +ve ssRNA are the predominant factors in determining different protein yields; e.g., the CP gene is free from any secondary structures as it is required in high copy numbers (178 per virion), whereas the replicase gene is trapped in tight secondary structures as only one copy per progeny is required [110]. The open reading frame (ORF) of the coat protein is readily available for the ribosomal translation. As the ribosome moves along the RNA transcript, it disrupts the secondary structure to allow hidden genes to be translated. Following CP gene translation, the initiation codon of the replicase gene becomes available, resulting in the synthesis of the replicase ß-catalytic subunit. The translation of the lysis and replicase gene is dependent on successful translation of the CP gene.
Newly synthesized viral particles require only one copy of the lysis protein and the Mat L protein [111][112][113]. The ORF of the lysis gene overlaps the replicase gene in a +1 frameshift, with the termination sequence of the lysis protein located in the coding region of the replicase gene [103,114]. Studies by van Duin and his colleagues (1990) on the translational control of the lysis protein provided key information as to the role of secondary structures in transcriptional regulation. Their work demonstrated that the formation of a stable hairpin in the RNA between the Shine-Dalgarno sequence and the start codon of the lysis gene, represses the expression of the lysis gene. Following successful transcription, during translation there is incomplete dissociation of the ribosome from the mRNA as it creates the CP protein [115,116]. The ribosome backtracks to reinitiate at the start codon for the lysis protein in approximately 5% of translational cycles [117,118]. The lysis protein is produced at low levels towards the ends of the infection cycle. This allows for gradual accumulation of the lysis protein to ensure that the viral progeny have sufficient time to mature.
The Mat L protein is only transcribed from newly synthesized genome templates [119]. The strong secondary structure formed by the Shine-Dalgarno sequence and the S1-binding sequence prevent translation of the 5 -end in normal mRNA structure, where the Mat L gene is positioned. In nascent RNA strands, there is an alternate, shorter hairpin structure created that enables translation of the maturation gene in the 5 terminal by allowing access and binding of the ribosome. This RNA-folding intermediate of newly-synthesized strands enable the ribosome access to the start codon of the A-protein.
A recently isolated RNA phage for Acinetobacter species, AP205, was found to have an unusual genome structure with the lysis gene located in the 5 terminal [120]. Although the genome of AP205 mirrors the typical Levivirus genome map, the secondary structure and 3 -UTR follows that of Allolevivirus. This phage has yet to be approved as a Levivirus. Potential Leviviruses of Pseudomonas, phages PPR7 and PRR1, have also been isolated and characterized, both exhibit particular hallmarks of Levivirus phages [121][122][123][124].

Allolevirus
The type species of Allolevivirus is Qß, the representative of the Qß-like phages (genogroup III). Species of Allolevivirus contain a longer version of the genome with an extension of the C-terminal of the CP gene [76]. The presence of this minor-CP A 1 (MCPA 1 ) protein, also known as the 'read-through protein', is a feature unique to Allolevivirus phages [125]. Both the MCPA 1 and the maturation A 2 (MA 2 ) proteins are essential for host attachment [76]. The majority of Allolevivirus members were found to encode an Arg-Gly-Asp (RGD) motif, essential for host cell recognition and attachment, within their MCPA 1 and/or MA 2 [88]. This motif is absent in the Levivirus phages. Similar to the signature 3 -terminal sequence of Levivirus, Allolevivirus species contain a 5 -TCCTCCCA-3 within the 3 -terminal of their genome [88].
The underlying translation and replication mechanisms are similar to Levivirus with minor variations. The host factor that associates with the functional replicase has been isolated, purified and genetically characterized for Qß as the protein encoded by the host factor of Qß (hfq) gene of E. coli [126,127]. This nonspecific ssRNA binding protein, Hfq, aids polymerase association to the 3 end of the +ve ssRNA template. The start of the 5 -terminal begins with a GG sequence. There is a nontranslated A residue attached to the extreme 3 terminus in a CCA sequence, following activity of the terminal nucleotidyl transferase (TNTase) domain of RdRp [128][129][130]. This does not serve as a template nucleotide, instead RNA synthesis begins at the penultimate C residue.
The transition between replication and translation is similar to the above mentioned Levivirus, with a slight change; the translation of the (MA 2 ) is controlled in a temporal manner as opposed to a structural intermediate. This is dictated by the length of time it takes for the polymerase to move from the start site of the maturation gene to the complement of the Shine-Dalgarno sequence [131,132]. Once it has been translated, these two sequences bind to form a strong secondary structure to prevent continuous translation of the same gene. The additional MCPA 1 protein is formed following ribosomal read-through of the leaky-stop codon (UGA) of the CP gene [118]. It is read as a tryptophan codon (UGG), which promotes gene expression of the MCPA 1 protein. The ribosome occasionally, in approximately 5% of cases, translates past this leaky termination sequence for an additional 600 nucleotides to form a C-terminal extension of the CP [133]. This protein is incorporated in low quantities into viral progeny and is essential for successful infection. Studies of the amino acid sequence and the three-dimensional structure of the MCPA 1 protein, have shown it to be unique to the small group of Allolevivirus phages [133]. The MA 2 and the MCPA 1 protein, whose exact role is unknown as of yet, are essential for successful infection of pili-positive hosts Another notable difference in the infection pattern of Allolevivirus is the absence of a lysis gene in the genome. Instead, the MA 2 protein has a secondary function to induce the lysis of the host cell for release of viral progeny [134]. The MA 2 protein is referred to as an amurin as it does not destroy the peptidoglycan layer directly through muralytic activity. It is also known as an "antibiotic protein" due to the similarity in function to antibacterial agents which target cell walls [135]. It has been reported that MA 2 induces host cell lysis by inhibiting the enzymatic activity of MurA, a UDP-N-acetylglucosamine-enolpyruvyl transferase. This is an essential enzyme in the production of peptidoglycan as it catalyses the first committed step, the biosynthesis of murein precursor [136].
At the next stage of cell division, the inhibition of cell wall biosynthesis leads to host lysis and release of the phage progeny.
A study by Friedman et al. (2009), noted that the sequences of both Levivirus and Allolevivirus genera had strong homogeny across position of ORF, length of proteins and the catalytic ß-domains of the RdRp [88]. The conservation of the YGDD motif of the replicase protein across all positive-sense ssRNA viruses was recorded throughout the Leviviridae.
Although both Levivirus and Allolevivirus phages target the pilus of their hosts as receptors to initiate, the fact there is no conserved infection mechanism suggests that there may be varying mechanisms for the RNA to enter the cell. Originally thought to only affect plasmid-encoded appendages, there have been Leviviridae specific for genome-encoded pili of Gram-negative bacteria, such as Pseudomonas phage PP7 and Acinetobacter phage AP205.

Discussion
Although there have only been a limited number of RNA phages identified to date, their "true" diversity and abundance in nature remains unknown. Current approaches used for the isolation, selection, and purification of viral particles, including precipitation by polyethylene glycol (PEG) and caesium-chloride (CsCl) gradient purification, are almost certainly biased against RNA phages [137]. The selection of DNA phages in these methods goes a long way to explaining why RNA phages are under-represented in genome databases.
The fragile nature of RNA and the widespread presence of RNases in human and animal derived samples also hinders studies involving RNA phages. The development of RNA phage-selective isolation protocols will also greatly enhance our endeavors. For example, separation of DNA and RNA fractions of samples and complete eradication of unwanted RNase is recommended. It should also be noted that the low abundance of RNA phages in databases will result in reduced "hits" for novel sequences. As research into the RNA section of the phage community is expanded, we can expect the databases to become more representative of the wider RNA phage community.
An interesting paper recently proposed that members of the Picobirnaviridae family may not be eukaryotic viruses as originally thought, but may in fact represent a novel family of RNA phages [138]. This research involved analysis of bacterial ribosome binding sites (RBS) upstream of the coding sequences in their bi-segmented, dsRNA genomes. It was noted that an RBS motif, thought to be unique to prokaryotic-infecting viruses, was enriched in the picobirnaviruses. This finding suggests that these dsRNA viruses could be classified as putative bacteriophages. Furthermore, an additional study has supported this hypothesis by proposing that picobirnaviruses are in fact a novel RNA phage family of high genomic diversity [139]. This type of analysis demonstrates the possibility that more members of RNA virus populations may in fact be mischaracterized. A more robust method for classification of RNA phages would help to resolve this issue.
Identifiable RNA phage-specific domains, such as the RdRp gene, capsid gene, maturation protein gene, or the NTPase gene, can serve as features which one could use to mine metagenomic databases for RNA phages. However, since the RdRp gene is conserved amongst RNA viruses, unique genetic elements of Leviviridae and Cystoviridae families should also be used in specific studies. Contigs with homologs to both the leviviral and cystoviral RdRp gene are potential RNA phages and should be subjected to further analysis. Based on the recent studies mentioned above, homologs to the RdRp gene of picobirnaviruses should also be included [138]. The study by Krishnamurthy and colleagues which identified 20 unique RNA phage phylotypes utilized nucleotide identity to the RdRp and the maturation gene to categorize these phages [29]. The specific 3 -terminal sequences of Levivirus and Allolevivirus members could be used to further classify these phages. Signature features of Cystoviridae members, such as the muralytic enzyme gene or the nucleocapsid shell protein gene, could also serve as genetic signatures when screening the databases for RNA phages [42].
A common theme of this review is the need for greater efforts to be directed towards the discovery of more RNA phages for all potential applications, such as tools for advancing molecular biology and as potential phage therapeutics. The rise in antimicrobial resistance across bacteria is not a novel problem but it is alarming. The host range of RNA phages could offer therapeutic potential against some of the World Health Organizations' (WHO) list of deadly pathogens, including some of the Gram-negative members of the ESKAPE pathogens, such as Klebsiella pneumoniae, Acinetobacter baumannii, and Pseudomonas aeruginosa. Clinical isolates of P. aeruginosa have been found to be resistant to most of the antibiotics normally used to treat this infection [140]. An unclassified Levivirus P. aeruginosa phage PP7 has been identified which targets this bacterium via a pilin-specific mechanism [141]. Further studies regarding the therapeutic parameters of RNA phages, such as PP7, should be done to examine their efficiency to control these pathogens and to explore their potential use as components of cocktails used in phage therapy.