3.2. Whole Genome and Proteome Comparisons of the Acinetobacter Phages
To visualize relationships between the sequenced
Acinetobacter phages, the phage genomes were analysed using whole genome dot plots, pairwise sequence identity and shared proteins. Dot plots and nucleotide sequence alignments after genome co-linearization indicate that while there is substantial diversity among the sequenced
Acinetobacter phages, there is also sufficient similarity to allow these phages to be placed in six discrete clusters designated A–G (
Figure 1 and
Figure S1). Since several phages exhibited a significant proportion of shared gene content without having substantial nucleotide identity, clusters were defined on the basis of at least 40% of shared gene content.
With the exception of clusters A and C, each cluster possesses a minimum of 40% ANI and greater than 50% conservation at the protein level (
Figure S2). This division is further supported by morphological similarity (
Figure 2) and by low standard deviation in genome size, G + C content and the number of genes encoded between members of each group (
Table 1). Three of the proposed clusters correspond to ICTV taxonomic assignments: Cluster A comprises phages related to members of the
Tevenvirinae subfamily while two clusters, B and E, correspond to the formally established genera
Fri1virus and
Ap22virus. Two phages, ME3 and Presley did not reveal a clear relationship to the other phages and are designated as genomic singletons. Several phages exhibited low ANI to the defined clusters. Specifically, phiAC-1 exhibits ANI ranging from 16.2% to 21.4% with phages of the
Ap22virus (Cluster B) while phages F1245/05, Acibel007 and Petty possess between 9.3% and 25.6% ANI with phages assigned to the
Fri1virus (Cluster D).
The comparison of gene content provides a secondary assessment of genomic diversity. During a manual pairwise comparison, it was observed that some candidate ORFs had not been identified and others had optimal ribosome binding sequences that differ from the published annotations. For these reasons, each phage was re-annotated prior to protein clustering (
File S1). Initially, proteins from the five clusters were assembled separately into groups using OrthoMCL. From these results, sets of proteins were identified for each cluster of phages that (i) were present in all genomes within that cluster, (ii) were encoded on two or more but not all, of the genomes in a cluster and (iii) proteins that were unique to a single phage genome (
Figure 3).
To estimate the gene content relationship between all 37 phages, OrthoMCL was used to cluster all 4065 proteins, which assembled into 737 groups of two or more proteins and 975 orphans (
File S2). This approach allowed the inter- and intra-cluster relationship in gene content amongst the
Acinetobacter phages to be represented as a network phylogeny, the results of which show agreement with the clusters designated from nucleotide sequence comparisons (
Figure 4). Highlighting their status as singletons, 260 and 80 of the proteins encoded by ME3 and Presley, respectively, were unique orphans. The relationship between the more distantly related phages is more apparent from the gene content analysis where Petty, F1245/05 and Acibel007 each encode 22 proteins (41.5% to 48.9% of ORFs) that represent core protein groups of the
Fri1virus. Similarly, 29 proteins (35.4% of ORFs) encoded by phiAC-1 fall within the core
Ap22virus protein groups. These more distant relationships are represented by common branches in the network phylogeny (
Figure 4)
.Functional inferences for the grouped and unique protein sequences were obtained using a combination of BLASTP, InterProScan and the HHsuite tools, HHblits and HHsearch. Of the 4067 proteins, a putative function based on bioinformatics analysis could be ascribed to 1762 (322 protein groups and 172 orphans) while 2305 (56.6%) were annotated as hypothetical proteins of unknown function. The majority of proteins formed mutually exclusive groups common to two or more members of a single cluster (
File S2).
A total of 37 core and three accessory protein groups were shared between two clusters of phages, predominantly between the myoviruses. Clusters C and D shared the greatest number of protein groups that include 17 virion structural and assembly proteins in addition to a predicted thioredoxin, endodeoxyribonuclease, replicative helicase and DNA polymerase. Each of the T4-like and Fri1-like phages encode a predicted ATP-dependent ATP ligase and deoxynucleotide monophosphate (dNMP) kinase that group together in the OrthoMCL analysis. Three core structural proteins are shared between the Acinetobacter siphoviruses; the predicted portal vertex protein, major capsid protein and a putative tail completion protein.
The three accessory protein groups shared between clusters correspond to a hypothetical protein (clusters A and E), a HNH family homing endonuclease (clusters B and E) and a glycoside family 24 endolysin (clusters A and B). The examination of pairwise comparison maps indicates that inclusion of accessory and unique orphan genes in some phages are not uniformly distributed across the genome. The genomic modules containing genes responsible for the assembly of virion structure, nucleic acid metabolism and genome replication tend to be highly conversed in both gene content and order. Despite the comparisons in gene content defining sharp boundaries between the clusters, there are examples of mosaicism both within and between clusters that are particularly apparent in the tailspike and endolysin genes.
One of the major determinants of host specificity, phage receptor binding proteins appear subject to relatively frequent recombination events [
88,
89]. Twenty-five phages representing Clusters B–D and the singleton ME3 were predicted to encode tailspikes adopting a parallel β-helix structure (
File S2). Tailspike proteins tend to exhibit a bimodular structure consisting of an N-terminal virion binding domain and a C-terminal receptor binding domain that often possesses enzymatic activity [
90]. While the N-terminal tailspike sequence is both highly conserved and specific to each cluster of
Acinetobacter phages there appears to be two clear examples of domain mosaicism where the C-terminal sequence is highly conserved between two members of different clusters. Specifically, the C-terminal sequences of the Cluster B myovirus IME-AB2 and Cluster D podovirus Abp1 exhibit 97% identities. The myoviruses WCHABP12 and AM24 also possess highly similar (94% identities) C-terminal sequences but divergent N-termini.
Acinetobacter phage endolysins represent a mixture of single and multiple domain endolysins where a lysozyme-like domain is accompanied by a peptidoglycan-binding domain. While the position of the lysis cassette is conserved within each cluster, the endolysins from the 37 phages form four protein groups that do not segregate according to cluster. ME3 appears to have the lysis functions encoded in two separate, adjacent genes with locus tags ME3_7 and ME3_8, encoding a predicted peptidoglycan-binding protein and lysozyme, respectively.
In the following sections, the common characteristics and distinguishing features for each cluster of phages are briefly discussed.
3.2.1. Cluster A: The T4-Like Acinetobacter Phages
The
Acinetobacter phages 133, Acj9, Acj61, Ac42 and ZZ1 all belong to the large and ubiquitous myovirus subfamily
Tevenvirinae. The T4-like morphology is characterised by an elongated icosahedral capsid and a contractile tail exhibiting transverse striations and a collar [
86]. The tail terminates in a baseplate that carries six long kinked tail fibers and six short tail spikes. In the quiescent state, the long tail fibers are held in a folded configuration by whisker fibers extending from the collar. With the exception of ZZ1, whose dimensions were obtained without magnification control, all the
Acinetobacter T4-like phages exhibit similar dimensions and consist of a moderately elongated head of 120 by 86 nm, a contractile tail identical in morphology to T4 of 111 by 16 nm terminating in a baseplate with long and short tail fibers [
86]. The five sequenced
Acinetobacter T4-like phages encapsulate a 159 to 169 kbp genome encoding between 241 and 257 ORFs (
Table 1) and exhibit low intra-cluster ANI of between 10 and 26 %.
Previous studies have demonstrated that the T4-like phages comprise a core genome which consists of genes involved in DNA replication, virion structural components and assembly chaperones [
62]. Each of the
Acinetobacter T4-like phages share 122 conserved core proteins and between 68 and 89 proteins that form accessory protein groups. Gene products unique to each phage occupy approximately 20% of the genome. Relative to other members of the
Tevenvirinae, CoreGenes analysis indicates that >40% of the gene products have a homolog encoded by T4, RB69, RB49, JS98, SP18, JD18 and CC31 (
File S3).
The presence of homologs to Alt, MotA, AsiA and gp55 in the Cluster A
Acinetobacter phages suggest the regulation of transcription follows a similar pattern to T4. Alt is an internal head protein injected with the T4 genome with mono-ADP-ribosyltranferase activity that modifies the bacterial RNA polymerase enhancing transcription of early T4 RNAs. The middle transcription activator MotA binds to the MotA box to activate transcription in the presence of AsiA-associated host RNA polymerase [
91]. Gp55 is a sigma70 family protein that binds to the host RNA polymerase and facilitates recognition of the late promoter sequence motif [
92].
The low level of sequence identity exhibited between the T-like
Acinetobacter phages confirms that each represents a distinct T4-like species. With the isolation of related phages, the T4-like
Acinetobacter phages may form further genera to the 11 already established within the
Tevenvirinae subfamily but for the time being represent unclassified species [
62,
93].
3.2.2. Cluster B: The AP22-Like Acinetobacter Myoviruses (Ap22virus)
Cluster B is comprised of nine
Myoviridae; AP22, AB1, IME-AB2, LZ35, YMC-13-01-C62, YMC11/12/R2315, YMC11/12/R1215, WCHABP1 and WCHABP12 that encapsulate genomes of between 43.2 and 46.4 kbp and encode between 82 and 89 ORFs (
Table 1,
Figure 5). Three of these phages have been examined by electron microscopy. Micrographs of IME-AB2 and AB1 show a single morphotype consisting of an isometric head and a contractile tail terminating in a baseplate with short tail fibers [
58,
68]. This grouping has been recognized independently and an ICTV proposal to create a new genus within the family
Myoviridae, the
Ap22virus was ratified in 2016 [
93]. These phages share at minimum 40% ANI and share between 61.5 and 100% proteins. We note that YMC-13-01-C62, YMC-11/12/R2315 and YMC11/12/R1215 have average nucleotide sequence identities of 99%, which is above the current ICTV species demarcation criteria of 95% indicating that they should be considered as isolates from the same phage species [
94]. The AP22-like genomes exhibit a modular and syntenic organization with the majority of genes encoded on the forward strand. In each of these phages, the putative head morphogenesis protein is separated from a predicted prohead protease by between seven and 13 ORFs of unknown function. The endolysin and holin are encoded at the end of the structural and assembly gene module, followed by two modules encoding genes involved in nucleotide metabolism, recombination and superinfection immunity (
Figure 5). The
Ap22virus core genome consists of 47 gene products that include virion structural and assembly proteins, a superinfection immunity protein, SaV-like domain protein, primase/helicase and transcriptional regulatory proteins.
While members of the
Ap22virus show little nucleotide similarity to other phages in the extant sequence database, a small number of myoviruses infecting the genera
Aeromonas (51 and vB_AsaM-56),
Burkholderia (Bcep1, Bcep43 and Bcep781),
Edwardsiella (GF-2) and
Xanthomonas (OP2) were identified as encoding protein homologs by tBLASTx. CoreGenes comparisons show these phages possess between 19 and 27 homologs (
File S2) to the virion structural and assembly genes of the
Ap22virus.
Phage phiAC-1 is clearly more distantly related to members of the
Ap22virus, exhibiting greatest nucleotide sequence identity with AP22 at 21.4%. This feature is reflected by the OrthoMCL analysis where 34 (41.5%) phiAC-1 proteins are designated as unique orphans and these differences are apparent in the visual comparative analysis (
Figure 5). In contrast to the other phages comprising cluster B, all of which infect strains of
A. baumannii, phiAC-1 is propagated upon
Acinetobacter soli strain KZ-1 and is reported to show a narrow host range [
73]. Functions could be predicted for only seven of the 34 orphan phiAC-1 proteins, which include a tailspike protein, single-stranded DNA-binding protein, YqaJ/RecB-like exonuclease, phosphoadenylyl sulphate reductase, a putative helicase loader and two proteins with a predicted EF-hand domain and a domain of unknown function (DUF1376), respectively.
3.2.3. Cluster C: Myoviruses Acibel004 and PhiAbaA1
vB_AbaM_Acibel004 has a 99.7 kbp genome encoding 156 ORFs and 22 tRNAs with genes organised into several functional modules that include a lysis cassette, DNA packaging, virion structure and assembly, nucleic acid metabolism, genome replication and tRNA modules. The virion structure consists of an isometric capsid 70 nm in diameter and a 105 nm long contractile tail [
59], which in the quiescent state appears to show a triangular cluster of tail fibers at the tail terminus (
Figure 2). The structure appears to form a hexagonal pyramid with the fibers meeting at an apex. This structure is also evident in micrographs of phiAB11 and Abp53 and appears to be lost upon contraction of the tail. Twenty-six proteins were identified from analysis of Acibel004 virions by electrospray ionisation mass spectrometry and a number show similarity to the putative structural proteins of
Pseudomonas phages PAK-P1, PAK-P3 and KPP10 [
59], a relationship that is confirmed by CoreGenes analysis (
File S3). Acibel004 possesses 20.5% ANI and 56.4% of shared proteins with phiAbaA1. A further tentative relationship can be found for the 10,290 bp partial sequence available for Abp53 [JF317274.1], a phage of similar morphology to Acibel004 [
95] and which is annotated to include several virion structural genes. The partial Abp53 sequence shows 71% identity across 4% of the Acibel004 genome by dc-megablast. Lastly, we note that Lin et al. [
87] have described an additional
Acinetobacter phage of similar morphology, phiAB11.
3.2.4. Cluster D: Myoviruses AM24 and YMC13/03/R2096
The grouping of phages AM24 and YMC13/03/R2096 into Cluster D is supported by ANI of 75.4% and a total of 134 proteins shared across the two genomes. The AM24 and R2096 genomes encode 18 and 17 tRNAs in addition to 168 and 170 ORFs, respectively (
Table 1). Both phages exhibit an almost identical organization consisting of four modules. No putative small terminase subunit could be identified within the DNA packaging and virion structural and assembly gene module. An additional tail fiber and SleB-like cell wall hydrolase are encoded upstream of the large terminase subunit, separated by a cluster of tRNAs. Both phages are replete with ORFs encoding proteins of unknown function, accounting for approximately 70% of all genes.
No related phages within the wider sequenced population were identified by searches conducted with dc-megablast. However, CoreGenes comparisons of phages identified by TBLASTX demonstrate that these two phages share a maximum of 20% of proteins with members of the
Pakpunavirus and the
Pseudomonas syringae phages KIL1, KIL2, KIL3, KIL4 and KIL5 (
File S3). The shared gene products are predominantly involved in virion structure and assembly but also include those involved in nucleotide replication and metabolism. Based on the nucleotide identity and shared gene content of these two phages, the creation of a new genus within the Family
Myoviridae is proposed, named the “R2096virus” after the first isolated member.
3.2.5. Cluster E: Acinetobacter Phages of the Subfamily Autographivirinae; Genus Fri1virus
The Cluster E
Acinetobacter phages represents the recently established genus
Fri1virus within the subfamily
Autographivirinae [
93] and is comprised of nine podoviruses; AB3, Abp1, IME-200, phiAB1, phiAB6, PD-AB9, PD-6A3, WCHABP5 and the type species Fri1 (
Table 1). Like other phages of the
Autographivirinae, the Fri1-like viruses all encode their own single subunit RNA polymerase (RNAP) and share a common overall genomic organization with genes encoded solely on the forward strand [
84]. Alongside the
Fri1virus six other genera, the
Phikmvvirus,
Sp6virus,
Kp32virus,
Kp34virus,
Pradovirus and
T7virus have been defined within the
Autographivirinae.
The Fri1-like phages possess an average genome size and G+C content of 41.7 kbp and 40.15%, respectively. We note that the AB3 genome is missing approximately 10 kbp of sequence compared to its close relatives, corresponding to the left end of the genome [
96]. Two truncated ORFs that are not annotated in the associated GenBank entry, present at the right and left ends of the sequence, are predicted to encode a putative DNA maturase B and DNA helicase, respectively [
96]. Despite possessing between 40.8% and 34.7% shared proteins relative to members of the genera
Phikmvvirus and Kp34virus and exhibiting a similar genome organization, there are sufficient differences to separate these phages from previously established genera within the
Autographivirinae. These phages encode a single subunit RNAP situated adjacent to the structural genes and a class I holin with three predicted transmembrane domains and an endolysin with a glycoside hydrolase family 19 domain (InterPro: IPR000726) situated between the tail fiber and the small terminase subunit. Each phage is predicted to encode two rho-independent transcriptional terminators, located downstream of the RNAP and major capsid proteins. In addition, the recognition and specificity loops of the RNA polymerase are conserved within these viruses and differ substantially from those reported for the
Phikmvvirus and
Kp34virus [
17]. Phylogenetic analysis of the RNA polymerase demonstrates that these phages fall into a single monophyletic clade (
Figure 6).
3.2.6. Cluster F: The 531-Like Acinetobacter Siphoviruses
Cluster F is comprised of two phages, Bphi-B1251 and YMC11/11/R3177 (R3177) that share 61% ANI and 67% protein homologs. Bphi-B1251 encapsulates a 45.4 kbp 39.05% G+C genome encoding 66 ORFs while R3177 encapsulates a larger genome of 47.6 kbp encoding 80 ORFs (
Table 1). Bphi-B1251 is described as a podovirus, both in the GenBank accession record and in the associated genome announcement [
84]. However, homologues of the structural genes of these phages suggests that both phages are in fact siphoviruses, indicated by the presence of a tape-measure gene of 4.9 and 4.3 kbp in Bphi-B1251 and R3177, respectively. Supporting this assignment, R3177 has recently been confirmed as a temperate member of the
Siphoviridae and exhibits a 531-like morphology [
83]. The 531-like phages exhibit a slightly elongated head of 73 × 59 nm and a 252 nm long tail that is characterised by the presence of multiple transverse disks that gives the tail a segmented appearance [
29]. Analysis of R3177 and Bphi-B1251 ORFs using HHsuite allowed for the prediction of additional structural, replication and maintenance proteins. Examining dot plot alignment of Bphi-B1215 and R3177 shows that a number of genomic modules exhibit localised differences (
Figure 7). In the virion structural and morphogenesis gene module these non-homologous regions correspond to different head-tail joining proteins and a HicAB-like type II toxin-antitoxin cassette in R3177. Additionally, R3177 encodes an integrase, excisionase and transcriptional regulatory proteins that are absent in Bphi-B1251 indicating that, despite significant nucleotide and proteomic similarity, these two related phages undertake different lifestyles. While few protein homologs in other phages were identified by BLASTp, R3177 exhibits strong sequence similarity (>70% coverage, >95% identity BLASTn) to putative prophage regions in the sequenced genomes of
A. baumannii strains SSA12, KAB04, KAB02, YU-R612 and NCGM237. It would therefore be of interest to determine whether Bphi-B1251 and R3177 are related to the unsequenced phages 531 and B
9PP, two prophages of similar morphology induced from
Acinetobacter isolates HER1032 and HER1096, respectively [
29]. The clear similarities between these two phages and lack of relatives among the sequenced phages are sufficient to propose the establishment of a new genus, which we tentatively name the “B1251virus.”
3.2.7. Cluster G: The Loki-Like Acinetobacter Siphoviruses
The two Cluster G phages, vB_AbaS_Loki and IME_AB3 are closely related. Morphologically, Loki is a B1 siphovirus that resembles
Burkholderia phage vB_BceS_KL1 [
99] with an isometric capsid measuring 57 nm across opposite apices and a non-contractile tail that measures 176 nm in length, 10 nm in diameter and exhibits transverse striations [
85]. Loki and IME_AB3 encapsulate a genome of 41 and 43 kbp with% mol G+C contents of 44.4 and 45.6%, respectively (
Table 1). ANI between the two phages is 58.9% and the 109 encoded ORFs form 45 protein groups and 19 orphans. The genome is organised into three modules containing genes encoding the virion structure and morphogenesis, DNA replication and metabolism and a final module encoding the holin and endolysin that is replete with genes of unknown function (
Figure 8). Loki and IME_AB3 are distinguished both by the constituent genes present in this last module and by the structure of the endolysin. The Loki endolysin is modular comprising a lysozyme-like domain (InterPro: IPR023346) and peptidoglycan-binding domains (InterPro: IPR018537), whereas the IME-AB3 endolysin is globular and contains a single lysozyme domain (InterPro: IPR000726). Loki and IME-AB3 each share between 28 and 31 (49.1% to 58.8%) protein homologs with the
Pseudomonas and
Burkholderia phages of the
Septima3virus. These shared proteins represent the majority of genes encoded in the structural and replication gene modules and these phages differ predominantly in the module encoding the endolysin and holin. Due to the similarities in nucleotide sequence, gene content and genome organisation we propose the establishment of a new genus, the “Lokivirus” named after the first fully characterised member.