In Silico Prophage Analysis of Halobacterium salinarum ATCC 33170

.


Introduction
Halobacterium salinarum is an aerobic archaeon found in hypersaline environments, such as salted fish, salt flats, and salterns.These archaea play a crucial role in the ecology and biogeochemical cycles of hypersaline environments and, thus, have garnered significant attention due to the unique adaptations they have to thrive in high-salt conditions.These adaptations include the production of compatible solutes to maintain osmotic balance and modified membrane proteins [1].Furthermore, H. salinarum can perform phototrophy using bacteriorhodopsin, a reddish light-driven proton pump.Due to this archaeon's extreme conditions, it possesses robust DNA repair mechanisms to counteract the damaging effects of high salt and UV radiation, ensuring genome stability in harsh environments [2].We are also particularly interested in H. salinarum as a source of archaeol, one of the main components of our sulfated lactosyl archaeol (SLA) archaeosome adjuvant.Indeed, SLA archaeosomes are liposomal vesicles composed of a sulfated disaccharide group covalently linked to the free sn-1 hydroxyl backbone of an archaeal core lipid derived from H. salinarum [3].This adjuvant has shown great promise in pre-clinical studies, and a better understanding of potential prophage elements within H. salinarum is necessary to support the continued progression of SLA archaeosomes toward clinical applications [4][5][6].The prophage status of this strain is important, as culturing this organism in a good manufacturing practice (GMP) facility requires the absence of active prophages to eliminate phage contamination risks to the facility and resulting products.
Extremophiles such as H. salinarum thrive in harsh environments and maintain intricate interactions with viruses that target archaea.Haloarchaeophages are viruses that specifically infect halophilic archaea [7].These viruses exhibit a wide range of morphologies, including tailed and non-tailed phages, with double-stranded DNA genomes being the most common [7].They possess specialized mechanisms to attach to and inject their genetic material into host cells, leading to the replication and assembly of new virus particles.Haloarchaeophages have demonstrated high host specificity, selectively infecting particular species or strains of halophilic archaea.
The study of haloarchaeophages has provided valuable insights into the biology of halophiles and their viral interactions [7,8].These viruses influence the population dynamics and diversity of halophiles, serving as agents of selection and contributing to the evolution of their hosts.Moreover, the infection and lysis of halophilic archaea by haloarchaeophages release organic matter and nutrients back into the environment, influencing the biogeochemical cycles of hypersaline habitats.
Prophages are viral genomes integrated into the host genome, remaining dormant until triggered to enter the lytic cycle.Studies have identified prophages in halophilic archaea, such as Haloferax volcanii, Haloquadratum walsbyi, and Halobacterium halobium, among others [9][10][11].Prophage elements in halophile genomes indicate that viruses can integrate their genetic material into the host genome and coexist with their hosts for extended periods.Under certain conditions, such as stress or changes in the host cell environment, the prophage may be induced to enter the lytic cycle, producing new virus particles and lysis of the host cell.The study of prophages in halophiles provides insights into the genetic diversity and evolution of halophilic archaea.It also contributes to understanding viralhost interactions in extreme environments and the mechanisms underlying viral latency and reactivation.
Haloarchaeophages also hold promise in various biotechnological applications.Their stability in high salt concentrations and extreme conditions makes them valuable tools for genetic engineering, DNA delivery systems, and bioremediation.The unique properties of haloarchaeophages offer opportunities to develop innovative molecular biology and biotechnology approaches.
This study aims to perform a comprehensive genetic analysis of Halobacterium salinarum strain ATCC 33170 to identify any prophage sequences encoded within its genome.Advanced bioinformatic analyses have been employed to identify putative prophage sequences within the ATCC 33170 genome.Functional annotation and prediction of prophageassociated genes provide insights into potential roles in host-virus interactions, such as lysogeny, host manipulation, or environmental adaptation.The findings of this study contribute to our understanding of the genetic landscape of Halobacterium salinarum strains and their viral interactions.Unraveling the prophage sequences and their potential functions will provide valuable insights into the coevolutionary dynamics between halophilic archaea and viruses in highly saline environments and provide data to support the further development of components, such as archaeol, derived from halophiles.then analyzed again without database restrictions to identify regions with high viral hits spanning more than 5 kb.
The contig file stemming from the Flye assembly was also submitted to PHASTER (www.phaster.ca,accessed on 13 January 2022) with the checkbox for submitting a file consisting of multiple separate contigs selected [13].The results were downloaded and indicated the presence of an incomplete prophage region on contig 14.An in-depth analysis of this region was conducted in Geneious Prime.The study revealed that the incomplete prophage region contains six genes translated to obtain protein sequences for further analysis with BLASTP, PHYRE2, and InterProScan [14][15][16].
The six proteins were submitted to BLASTP against the non-redundant protein database and a restricted viral database, and the top hit was documented.The proteins were also submitted to PHYRE2 to identify the closest protein templates based on the predicted structure [14].Results were downloaded and analyzed for protein functionality and origin.

Scaffold Construction Flye Contig Assembly
Flye assembled the collective 65 ATCC and NCBI contigs into 21 contigs with a length sum of 2.4 Mb (Table 1).The longest contig was 1.03 Mb, and the assembly had an N50 length of 718,706 base pairs (bp).PHASTER analysis identified a single incomplete prophage region, 7 kb in length from bases 3957-11,025 on contig 14, encoding six proteins (Table 2).The prophage completeness score given was 40, with the scoring scale described as follows: incomplete prophage cut-off < 70, questionable prophage cut-off 70-90, and intact prophage cut-off > 90.The resulting PHASTER BLAST hits of this region are listed in Table 2 and include two hypothetical proteins of unknown function (gene product (gp)) 1 and gp3, a putative pore-forming tail tip protein (gp2), two putative transposases (gp4 and gp5), and insertion sequence element Dka2 orfB (gp6).Examination of the PHASTER hits for taxonomic association reveals that each protein belongs to a different phage family (Table 3).The taxonomic hits are Myoviridae (2), Ackermannviridae, Siphoviridae, Bicaudaviridae, and an unclassified archaeal virus.The phage genome lengths for these phages ranged from 17,666 bp to 219,372 bp, significantly larger than the 7 kb region identified as an incomplete prophage.Further investigation into gp2, which was initially identified as matching the putative pore-forming tail tip protein of Vibrio phage vB_VchM_Kuja (Kuja), was completed using BLASTP against virally-restricted and un-restricted databases.The resulting protein hits are a UDP-glucose 6-dehydrogenase (Tables 4 and 5).As such, it appears that the Kuja protein is incorrectly labeled in the PHASTER database, as it is labeled as UDP-glucose 6-dehydrogenase in NCBI and encoded in the metabolism region of the genome.

BLAST
Analysis of all contigs with BLASTX and discontiguous megablast on an unrestricted database produced hits exclusive to halophiles without phage or prophage hits present.With the database restricted to viruses, hits are randomly spaced throughout the genome without long stretches (>4 kb) of sequential phage hits.Focus on the incomplete prophage region of contig 14 using BLASTP with the unrestricted database shows 100% identity and 94-100% coverage against Halobacterium spp.(Supplementary Table S1).When the database is restricted to viruses, the gp hits within the incomplete prophage region are myoviral and siphoviral phages (Table 4).Two notable hits are for gp4 and gp5, with 100% coverage and identity to the Halobacterium phage phiH.These two phiH proteins are predicted to encode a transposase.

PHYRE2
The six protein sequences from the predicted incomplete prophage region of contig 14 were submitted to PHYRE2 to model the proteins and compare them to known structural templates.The top Protein Data Bank (PDB) hits for gp1 through gp3 are for bacterial and archaeon proteins with 100% confidence (Table 5).The top PDB hit for gp1 is a UTPglucose-1-phosphate uridylyltransferase from the bacterium Corynebacteria glutamicum with 95% coverage.The enzyme UDP-glucose 6-dehydrogenase from the hyperthermophilic archaebacteria Pyrobaculum islandicum is the top PDB hit for gp2 with 97% coverage.Gp3 has 89% coverage to the cell division control protein 6 homolog 1 (Cdc6/Orc1) from the thermophilic archaeon Sulfolobus solfataricus.
The top PDB hits for gp4 and gp6 were to an uncultured archaeon Cas14a.1 protein with 82 and 85% coverage, respectively, and 100% confidence.These gene products also had a single hit to bacteriophage T7 DNA primase/helicase.The gp4 T7 hit ranked 93 of 120, with 47.9% confidence and 16% coverage.The gp6 T7 hit ranked 80 of 120, with 82% confidence and 20% coverage.Finally, the gp5 top hit was a transposase from the IS200-like superfamily of insertion sequences with 100% confidence and 95% coverage.Hit 30 of 69 in the gp5 results was to a phage replication organizer domain, with 8.9% confidence and 25% coverage.

InterProScan
Significant database hits were obtained with InterProScan against the six proteins of interest (Table 6).The database hits confirm the results obtained with PHYRE2 and BLASTP.

PADLOC
The contigs were analyzed with PADLOC to identify any putative antiviral defense systems encoded.There were six hits across five contigs (Table 7).Contig 1 was found to encode PDC-S70, which is a putative Phage Defense Candidate with an unknown function [26].Contig 8 encodes the HEC-05 system, a helicase, methylase, ATPase (Hma)embedded candidate [26].There are two SoFic proteins encoded on contigs 16 and 17, which ligates AMP onto target proteins [27,28].Finally, contig 19 encodes two putative DNA-modification system proteins (DMS_other) labeled Specificity_I and REase_I (Table 7).The PADLOC database warns DMS_other hits, comprised of various DNA-modification system proteins, such as restriction-modification, Bacteriophage Exclusion (BREX) [29], and Defense Island System Associated with Restriction-Modification (DISARM) [30], is generated with a permissive model that may not return genuine phage defense systems and should therefore be treated with caution [31].

Discussion
Halophilic euryarchaeotes, a group of extremophilic archaea adapted to high-salt environments, have attracted considerable scientific interest due to their ecological significance and unique adaptations.They play a crucial role in the biogeochemical cycling of hypersaline environments and offer valuable insights into the limits of life on Earth [32][33][34].Hypersaline waters and salt crystals contain high numbers of haloarchaeal cells and their viruses, representing a worldwide distributed reservoir of orphan genes and possibly novel virion morphotypes [7,8].The study of haloarchaeal-associated viruses, known as halophages, provides a deeper understanding of viral-host interactions and unveils potential biotechnological applications due to their unique features [8].
One seemingly pervasive type of pleomorphic halophage is the Haloarcula His2, which shares protein similarity with several prophages in the halophiles [40][41][42].For example, Haloquadratum walsbyi was found to encode an incomplete prophage related to His2 and two pleomorphic haloviruses, HRPV-1 and HHPV1, which also have a similar block of homologs related to His2 [10,35,42].Furthermore, the Halorubrum pleomorphic virus-1 (HRPV-1), isolated from a solar saltern, was found to encode three structural proteins, VP3, VP4, and VP8 [36,43].The HRPV-1 encoded proteins show significant similarity to the proteins of the minimal replicon of plasmid pHK2 of Haloferax sp. and the His2 phage [36,42,44].As several halophage sequences are available from a broad array of halophiles, analysis of the H. salinarum genome for the presence of prophage sequences is possible.
PHASTER analysis of the 21 Flye contigs identified a 7 kb portion of contig 14 encoding putative prophage elements.The assigned prophage completeness score was low (i.e., 40), suggesting an incomplete prophage region.Further analysis of all contigs using BLASTX and discontiguous megablast did not reveal other areas encoding potential prophage elements in the genome.While some of the genes appear to be viral in origin, BLASTP analysis with a restricted database shows mixed viral-family hits, which is highly unusual for an intact phage.
PHYRE2 analysis on the six proteins of interest revealed high homology to various proteins from bacteria, archaea, and a transposon (IS200) with 100% confidence, as discussed in the Results section.The gp1 protein was modeled to a glucose-1-phosphate uridylyltransferase, or UGPase, an enzyme that catalyzes UDP-glucose production from glucose-1-phosphate and UTP [45].This enzyme is widespread due to its role in glycogen synthesis and forming glycolipids, glycoproteins, and proteoglycans [46][47][48].Although glycoproteins are featured in some halophage capsids, this enzyme was not modeled to any phage proteins [43].Gp2 PHYRE2 results are to UDP-glucose dehydrogenase, which catalyzes a two-step NAD-dependent oxidation of UDP-glucose (UDP-Glc) to produce UDP-glucuronic acid (UDP-GlcA) [49].Studies of this enzyme have demonstrated its importance in polysaccharide biosynthesis and detoxification [49].This enzyme was also not modeled to a protein with PHYRE2.
Modeling of the gp3 protein hit the crystal structure of a heterodimer of Cdc6/Orc1 initiators bound to the origin DNA from Sulfolobus solfataricus [50].Cellular initiators form higher-order assemblies on replication origins, using ATP to remodel duplex DNA and facilitate the loading of replisome components [51].This protein appears to be a core component of the basal initiation machinery used to recognize the origin of replication in S. solfataricus, suggesting it is not a phage-derived protein.
Further investigation of gp4 and gp6 showed that their structures are similar to the cryo-EM structure of the Cas12f1-sgRNA-target DNA complex [52].The type V-F Cas12f proteins are compact and associate with a guide RNA to cleave single-and doublestranded DNA targets [53].A cryo-electron microscopy structure revealed that two Cas12f1 molecules assemble with the single guide RNA to recognize the double-stranded DNA target [52,53].Each Cas12f1 protomer plays distinct roles in nucleic acid recognition and DNA cleavage, explaining how the miniature enzyme achieves RNA-guided DNA cleavage [54,55].There is a single hit to the T7 bacteriophage primase-2 helicase, though the modeled region is tiny in comparison and is ranked 93 and 80 for gp4 and gp6, respectively.The gp5 protein is modeled to the crystal structure of IS200 transposase of Sulfolobus solfataricus [56].IS200 transposases, present in many bacteria and Archaea, are distinct from other groups of transposases.Two monomers form a tight dimer, forming the catalytic site at the interface between the two monomers [56].A phage hit corresponding to a DNA replication organizer membrane protein of phage Phi29 was identified [57].The hit was ranked 30, having 9% confidence and 25% identity over 12 amino acids; thus, it is unlikely that gp5 is a phage-derived protein.
Examination of the contigs for phage-defense systems with PADLOC revealed six putative genes on five contigs.To defend against viruses, Archaea are thought to primarily use the Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated genes (CRISPR-Cas) system [58,59], Toxin-Antitoxin (TA) systems, Restriction Modification (RM) systems, and alteration of cell surface proteins [60,61].No CRISPR-Cas or TA systems were identified in the analysis, only RM, AMP-ligating, and unidentified systems.Overall, the phage-defense system carriage rate of H. salinarum is low compared to bacteria, which can carry upwards of 15 defense systems [62].
As this strain will be used to produce adjuvants, the mutational behavior of nearby strains should be assessed.In the investigation of the evolutionary dynamics of H. salinarum, a mutation-accumulation experiment was conducted, and the genome was sequenced, comparing it to the moderate halophilic archaeon Haloferax volcanii [63].The mutation accumulation in H. salinarum over 1250 generations revealed a base-substitution rate of 3.99 × 10 −10 per site per generation, comparable to that of H. volcanii [63].However, dissimilarities in genome-wide insertion-deletion rates and mutation spectra suggest unique evolutionary pathways.Notably, H. salinarum is characterized by a high rate of spontaneous mutations attributed to mobile genetic elements (MGEs), including ISH elements and transposons [64,65].Studied since the 1980s, these elements are associated with the insertional inactivation of genes, as well as genome inversions and rearrangements.For example, differences between laboratory strains NRC-1 and R1 primarily stem from the dynamic mobilome, exemplifying the impact of MGEs on the evolutionary landscape of H. salinarum [65].

Conclusions
In conclusion, no genes encoding major phage structural proteins, such as capsid or coat proteins, were identified in the genome of H. salinarum ATCC 33170.An incomplete prophage region was identified using PHASTER, which was further investigated using BlastP, InterProScan, and PHYRE2.This suggests a lack of capacity to produce infectious phage particles in this archaeal strain.Additionally, proteins from the incomplete prophage region did not hit any viral proteins with significant confidence or coverage.The whole genome analysis of Halobacterium salinarum ATCC 33170 with various bioinformatics software programs suggests no intact prophage regions capable of producing functional virions are encoded.These findings support the safe production of H. salinarum ATCC 33170 in a GMP facility due to its low risk of active prophages, facilitating its use for commercial/medical applications.Further, studying halophiles and their interactions with haloarchaeophages provides a fascinating insight into the adaptations and dynamics of life in highly saline environments.Further research in this field will enhance our understanding of these unique organisms and their viruses while uncovering their potential applications in diverse areas of science and technology.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/applmicrobiol4020042/s1,Table S1: BLASTP results from an unrestricted non-redundant database search against putative incomplete prophage region-encoding proteins of contig 14.

Table 2 .
PHASTER results from incomplete prophage region of contig 14.

Table 3 .
Taxonomic information on PHASTER hits for each protein.

Table 4 .
BLASTP results from a virus-restricted, non-redundant database search.

Table 5 .
Top PHYRE2 hit per gene product from the incomplete prophage region of contig 14.

Table 6 .
InterProScan hit results per gene product from the incomplete prophage region of contig 14.

Table 7 .
Putative antiviral defense systems encoded by H. salinarum identified using PADLOC.
* Strand refers to the open reading frame being encoded on the forward strand (+) or reverse strand (-).