Phage Genome Diversity in a Biogas-Producing Microbiome Analyzed by Illumina and Nanopore GridION Sequencing

The microbial biogas network is complex and intertwined, and therefore relatively stable in its overall functionality. However, if key functional groups of microorganisms are affected by biotic or abiotic factors, the entire efficacy may be impaired. Bacteriophages are hypothesized to alter the steering process of the microbial network. In this study, an enriched fraction of virus-like particles was extracted from a mesophilic biogas reactor and sequenced on the Illumina MiSeq and Nanopore GridION sequencing platforms. Metagenome data analysis resulted in identifying 375 metagenome-assembled viral genomes (MAVGs). Two-thirds of the classified sequences were only assigned to the superkingdom Viruses and the remaining third to the family Siphoviridae, followed by Myoviridae, Podoviridae, Tectiviridae, and Inoviridae. The metavirome showed a close relationship to the phage genomes that infect members of the classes Clostridia and Bacilli. Using publicly available biogas metagenomic data, a fragment recruitment approach showed the widespread distribution of the MAVGs studied in other biogas microbiomes. In particular, phage sequences from mesophilic microbiomes were highly similar to the phage sequences of this study. Accordingly, the virus particle enrichment approach and metavirome sequencing provided additional genome sequence information for novel virome members, thus expanding the current knowledge of viral genetic diversity in biogas reactors.


Introduction
Bacteriophages or phages, viruses that infect Bacteria and Archaea, are increasingly being recognized as the most abundant biological entities on earth, with an estimated amount of 10 31 particles on our planet [1,2]. Moreover, they play a critical role in shaping microbial diversity and composition. For instance, it is estimated that approximately 20% of them included steps for the concentration of viral particles before DNA extraction and shotgun sequencing to generate a virus-specific (virome) dataset [29,35].
Calusinska and colleagues presented a detailed characterization of dsDNA-and RNAviruses obtained from biogas plants and observed their huge genetic diversity in nine different fermentation samples [29]. The authors primarily identified head-tailed phages of the order Caudovirales, constituting 77% of all virus sequences identified and analyzed in their study. The most abundant phages in the studied sample were taxonomically assigned to the families Myoviridae, Siphoviridae, and Podoviridae, accounting for 13.2%, 51.5%, and 6.9% of all classified dsDNA viruses, respectively. However, thousands of viral sequences were still left that did not share a detectable homology with reference phage genomes deposited in nucleotide sequence databases. Consequently, limited data on biogas viromes impede the understanding of the importance of phages on biogas microbiomes and the entire biogas process. Applying a microarray-based approach, Zhang and colleagues found Enterobacteria phages (family Myoviridae) that were most abundant in Chinese fullscale wastewater treatment plants [36]. This observation corresponds to previous findings described by Calusinska et al. [29]. Analyses of metaproteome data originating from eleven biogas reactors revealed that about 1.6% of all identified mass spectra referred to predicted phage proteins [17]. These results clearly showed that phages are not only an integral part of biogas microbiomes, but that they are also very active.
The overall aim of this study was to extend the reference repository with new viral genome information in the context of AD microbiomes. Furthermore, the present work also provides an adapted protocol for the enrichment of bacteriophages from biogas microbial communities as preparation for phage-metavirome studies. Based on virome sequence data, phages were taxonomically classified and functionally analyzed in the context of the AD process. Moreover, the genetic relatedness of phage genomes originating from a biogas microbiome was compared with metagenome sequence information originating from different biogas plants deposited in the NCBI database to deduce their genetic repertoire, specifically emphasizing functions of importance regarding the biogas reactor environment.

Reactor Set-Up and Sampling
A continuous stirred tank reactor (CSTR) with a working volume between 16 and 17 L (24 L total volume) was investigated during this study. The biogas reactor was operated under mesophilic conditions, with the temperature maintained at 38 • C by a heated water jacket. Seed sludge obtained from a local biogas plant (Kaim Agrar-Energie, Nauen, Germany) was initially used for the inoculation of the laboratory-scale reactor and was described in a previous study on the semi-continuous anaerobic digestion of whole-crop cereal silage (wheat: rye [1:1], DAH Energie, Oberkrämer, Germany) at an organic loading rate of 3.5 g organic dry mass L −1 d −1 over a period of 12 months (unpublished data). After a subsequent rest period of about 6 months including stirring under mesophilic conditions, the whole reactor content was removed from the CSTR and particles >5 mm were separated by a food mill. The CSTR was then re-inoculated with this resulting slurry and supplemented with 24 g of microcrystalline cellulose (MCC, Roth, Karlsruhe, Germany) serving as a standardized substrate. After 7.25 d of operation, additional supplementation with 82.5 g sodium carboxymethyl cellulose (CMC, Roth) was carried out. Biogas production was continuously monitored using a volumetric gas counter (promethano GVSM-Siphon Type II, BlueMethano, Berlin, Germany) (Supplementary Material Figure S1) and resulted, after substrate additions, in biogas yields of 735 mL g ODM −1 (MCC) and 92 mL g ODM −1 (CMC). The methane concentration was measured in the gas sample collected in gas-tight bags (Tesseraux, Bürstadt, Germany) using a gas analyzer (Multitec 540, Sewerin, Gütersloh, Germany). Additional monitoring comprised continuous measurement of temperature and pH (Pt1000I-S & EGA161 L/PG-S, Meinsberg, Waldheim, Germany), impeller torque (RZR 2102 control, Heidolph, Schwabach, Germany), and the weight of the reactor content (PW12C3, HBM, Darmstadt, Germany). Sampling for phage enrichment was conducted on day 16.9 after MCC addition.

Electron Microscopy
To investigate the morphology of the viral particles, transmission electron microscopy (TEM) was performed. The samples were treated according to the protocol of Ackemann with small modifications [37]. Therefore, 2 mL of the enriched (by centrifugation steps referring to protocol 2.4 and filtered with 0.8 µm CLearLine ® polypropylene filters) biogas reactor sample mixed with 2 mL of SMG Buffer (50 mM Tris-HCl p.H 7.5, 100 mM NaCl, 8.1 mM MgSO4, and 0.01% (w/v) gelatin) was centrifuged at 18,000× g for 60 min. The pellet was washed twice and resuspended with 1.5 mL of 0.1 M ammonium acetate [37]. The samples were stored at 4 • C until processing. For staining, 50 µL of the solution was pipetted on a 100-mesh copper grid coated with 0.4% (w/v) formvar in dichloroethane for absorption for 1 min, and negatively stained with one drop of 1% (w/v) potassium phosphotungstate. The incubation lasted for 10-20 s. The copper grids were analyzed with two microscopes, a Philips BioTwin CM120 (F.E.I. Company, Hillsboro, OR, USA) at approximately 60 kV, and images were taken using the DITABIS Imaging Plate system and the FEI Tecnai G 2 20 S-TWIN (F.E.I. Company, Hillsboro, OR, USA) at 200 kV. The TEM pictures were manually improved by using Adobe Illustrator for contrast increasing and sharpening.

Enrichment and Purification of Phage Particles from a Biogas Microbiome
A sample of the biogas reactor content was mixed with a 1:1 volume of SMG Buffer and 2 g of glass beads of the size Φ5 mm. According to the protocol of Calusinska et al., 2018, the sample was then shaken for 30 min at 250 rpm to detach the phage-like particles from the plant fibers [27]. Subsequently, the sample was centrifuged for 1 h at 48,000× g (Beckman L-70 Ultracentrifuge) to sediment large-sized material. The supernatant was sequentially filtered with 6 µm, 0.8 µm, 0.45 µm (CLearLine ® Polypropylene filters), and 0.22 µm pore size sterile filters (ROTILABO ® CME (mixed cellulose esters)). The flow-through was centrifuged at 43,000× g and 4 • C for 4 h (Beckman L-70 Ultracentrifuge, Krefeld, Germany). The pellet was resuspended with 1.5 mL of the BE buffer of the Macherey Nagel DNA Extraction kit NucleoSpin Microbial DNA Mini Kit Machery Nagel, Düren, Germany) to generate a pellet with a high number of phages.

Phage DNA Extraction
The DNA extraction was performed by applying the NucleoSpin Microbial DNA Mini Kit for DNA (Machery Nagel, Düren, Germany) with small modifications. The input of the sample was increased to 1.5 mL. Therefore, the first three steps of the protocol (prepare and lyse the sample and adjust the DNA binding conditions) were repeated five times to increase the efficiency. The DNA concentration was determined by fluorometric quantification using a QubitTM (Invitrogen, Life Technologies, Darmstadt, Germany).

MinION Library Preparation and Sequencing
The extracted DNA was subjected to an additional RNase treatment and subsequent clean up using a Genomic DNA Clean & Concentrator (gDCC) kit (Zymo Research Europe GmbH, Freiburg, Germany) according to the manufacturer's instructions. In total, 5 ng of phage DNA was required to prepare the Oxford Nanopore Technologies (ONT) rapid PCR genomic library using the Rapid PCR Barcoding Kit (SQK-RPB004) (Oxford Nanopore Technologies GmbH, Oxford Science Park, UK). Sequencing was performed on an Oxford Nanopore Technologies GridION Mk1 sequencer at the CeBiTec (Center for Biotechnology), Bielefeld University (Bielefeld, Germany), using an R9.4. flow cell running for 48 h.

Illumina Library Preparation and Sequencing
For short-read sequencing purposes, 100 ng of the total phage DNA was used. The Illumina sequencing library was constructed using the Nextera DNA Flex Library Prep/Illumina DNA Prep Kit (Illumina Inc., Eindhoven, The Netherlands) according to the manufacturer's instructions. Subsequently, sequencing of the library was performed on the Illumina MiSeq system applying the Illumina MiSeq Reagent Kit v3 following the 2 × 300 nt indexed high output run protocol.

Base Calling, Read Processing, and Assembly
The base calling and read processing of the nanopore data, as well as the virome assembly, were performed as recently described with small modifications [38]. In brief, MinKNOW (v4.2.10) was used to control the sequencing run protocol followed by base calling with bonito (v0.3.0). Virome assembly was performed using canu v2.1.1 [39]. The resulting contigs were then polished with the Illumina data using Pilon [40], which was run for ten iterative cycles. Bwa-MEM 0.7.12 [41] was used for Illumina read mapping for the first five iterations, and bowtie2 v2.3.2 [42] was used for the second set of five iterations. In the next step, the resulting contigs were filtered by different methods and manual curation of the functional annotation. The final metagenome-assembled viral genome (MAVG) data set only contained contigs with genes typical for phages.

Virome Filtering, Annotation, and Genome Analysis
For the initial analysis, the "What the Phage" (WtP) pipeline, an easy-to-use and parallelized multitool approach for phage identification combined with annotation and classification applying the program HostPhinder [43], was used [44]. Based on the results obtained from this pipeline, the data were manually inspected and filtered for phage contigs. The final virome was annotated by means of prokka v1.13.7 with the virus reference dataset [45]. For further manual annotation and genome analysis, phage genome sequences were imported into GenDB 2.0 [46]. The ORFs of these phage genomes of interest were individually blasted again with the BLASTp of the NCBI database. The visualization of phage genomes was performed using the Snapgene Viewer software, and the visualization of graphs was performed applying the R package ggplot2 [47]. Network analysis was performed applying the vConTact 2.0 network by using ProkaryoticViralRefSeq 94 [48,49]. The network visualization was generated using Cytoscape 3.8.2 [50]. The lifestyle of the phage community was predicted by applying the Phage Classification Tool Set (PHACTS), which is using a local database and a so-called Gini coefficient, that measures how important a protein is towards classifying a phage's lifestyle [51].

Fragment Recruitment
To determine the distribution and abundance of the phage genomes in different biogas communities, fragment recruitments were performed. To this end, the tool SparkHit [52] was used as described previously [53] and genome sequences of each MAVG were employed as a template. Corresponding computations were scaled-up and parallelized by using the de.NBI Cloud environment (https://www.denbi.de/cloud, accessed on 12 September 2021). SparkHit was applied on metagenome FASTQ files from 66 downloaded biogas metagenome datasets, clearly referenced as a biogas plant at the ENA (www.ebi.ac.uk/, accessed on 8 September 2021). Randomly chosen two million reads of each FASTQ file were compared with each single MAVG genome. The alignment identity threshold was set to >50% similarity. The result of the fragment recruitment was visualized using R.

Occurrence of Virus-Like Particles in the Studied Biogas Reactor by Means of Transmission Electron Microscopy
To analyze the presence of virus-like particles being part of the biogas-producing microbial community, a fermentation sample was directly taken from the mesophilic (38 • C) laboratory-scale biogas reactor. The sampled biogas reactor operated under wet fermentation conditions, characterized by high liquid and relatively low total solid percentages (approx. 5% dry solids). The activity of the microbiome was ensured by biogas production from cellulose derivatives prior to sampling for phage enrichment. The digestion of the commonly used reference substrate microcrystalline cellulose (MCC) resulted in biogas production of 735 mL g −1 organic dry mass observed at 7.25 days. This is consistent with the range of biogas formation proposed by the VDI guideline for MCC (Supplementary Material Figure S1A,B) [54]. Subsequent supplementation of the cellulose derivative carboxymethyl cellulose (CMC) yielded only 92 mL biogas g −1 organic dry mass after 9.65 days (Supplementary Material Figure S1A,B). In contrast with MCC, the synthetic compound CMC has been described to show virtually any activity in anaerobic digestion [55].
A digestate sample was taken and examined for the occurrence of phage-particles by means of transmission electron microscopy (TEM). The TEM analysis revealed phages with the morphology of Myoviridae (phages with long contractile tails) ( Figure 1A,E), icosahedral phage-like particles ( Figure 1B), Podoviridae (short non-contractile tails) ( Figure 1C), and Rudiviridae (rigid rod-like particles) ( Figure 1D). Unfortunately, the TEM analysis of phagelike particles from biogas digestate is often a challenging task because of sample impurities (e.g., undigested plant fibers or corn residues). Therefore, phenotypic phage particle assignment is often only possible to a limited extent. This is in line with the previous study characterizing viral morphologies in AD reactors by means of TEM including wastewater treatment plants, farm digesters, and biogas units treating a mixture of municipal solid waste and agro-food residues [29].

Occurrence of Virus-Like Particles in the Studied Biogas Reactor by Means of Transmission Electron Microscopy
To analyze the presence of virus-like particles being part of the biogas-producing microbial community, a fermentation sample was directly taken from the mesophilic (38 °C) laboratory-scale biogas reactor. The sampled biogas reactor operated under wet fermentation conditions, characterized by high liquid and relatively low total solid percentages (approx. 5% dry solids). The activity of the microbiome was ensured by biogas production from cellulose derivatives prior to sampling for phage enrichment. The digestion of the commonly used reference substrate microcrystalline cellulose (MCC) resulted in biogas production of 735 mL g −1 organic dry mass observed at 7.25 days. This is consistent with the range of biogas formation proposed by the VDI guideline for MCC (Supplementary Material Figure S1A, B) [54]. Subsequent supplementation of the cellulose derivative carboxymethyl cellulose (CMC) yielded only 92 mL biogas g −1 organic dry mass after 9.65 days (Supplementary Material Figure S1A,B). In contrast with MCC, the synthetic compound CMC has been described to show virtually any activity in anaerobic digestion [55].
A digestate sample was taken and examined for the occurrence of phage-particles by means of transmission electron microscopy (TEM). The TEM analysis revealed phages with the morphology of Myoviridae (phages with long contractile tails) ( Figure 1A,E), icosahedral phage-like particles ( Figure 1B), Podoviridae (short non-contractile tails) ( Figure  1C), and Rudiviridae (rigid rod-like particles) ( Figure 1D). Unfortunately, the TEM analysis of phage-like particles from biogas digestate is often a challenging task because of sample impurities (e.g., undigested plant fibers or corn residues). Therefore, phenotypic phage particle assignment is often only possible to a limited extent. This is in line with the previous study characterizing viral morphologies in AD reactors by means of TEM including wastewater treatment plants, farm digesters, and biogas units treating a mixture of municipal solid waste and agro-food residues [29].

Virome Separation, Enrichment, Sequencing, and Bioinformatic Analysis
To gain deep insights into the taxonomic composition of the phage community obtained from the studied biogas reactor, an extracted fraction enriched for virus-like particles was sequenced on the Illumina MiSeq and Nanopore GridION sequencing platforms. The Illumina data were used to improve the base accuracy, and thus significantly reduce the error rates in the obtained sequences. The Nanopore sequencing run generated 4.04 million reads, yielding a total of 15.47 Gb. The mean read length averaged 4140 bp, with a maximum read length of 8723 bp. The Illumina sequencing run (2 × 300 bp) resulted in 32.3 million reads, yielding 9.7 Gb. After the initial tests, the diversity of the dataset was lower than expected. To avoid oversampling problems in the assembly, the Nanopore dataset was shrunk to 56,000 reads, with 208 Mb. The following canu assembly with the shrunk data set generated 629 contigs with a total size of 5.3 Mb. These contigs were polished by the Illumina data and pilon, followed by annotation using the WtP pipeline and PROKKA. In total, 375 contigs representing the metagenome-assembled viral genomes (MAVGs) could be identified and selected for further analyses, whereas the remaining contigs were filtered out. The final virome had a size of 4.61 Mb and included 7117 predicted genes and 7 tRNAs. The annotation was manually inspected and refined in GenDB 2.0 [46]. The read coverage per contig showed a large difference between 1-fold and 9083-fold, representing the abundance range of viral genomes within the biogas virome. Details of the MAVGs genomes are listed in the supplementary information (Supplementary Material Table S1).

Phage Diversity in the Studied Biogas Plant as Analyzed by Means of Viral Metagenome Data
In addition to the TEM analysis described earlier in this Section 3.1., the sequencing approach and bioinformatics analysis revealed 375 MAVGs representing individual phage genomes, with the 30 most abundant and two largest MAVGs listed in Table 1. In total, 104 of the 375 MAVGs could only be classified to the superkingdom Viruses, indicating that one third of the viral sequences originating from the investigated biogas reactor were so far unknown and represented putative novel viral genetic diversity (Supplementary Material  Table S1). However, 269 of the 375 MAVGs were assigned to the order Caudovirales, with the most abundant families within this order being Siphoviridae (133 MAVGs), Myoviridae (101 MAVGs), and Podoviridae (28 MAVGs), followed by the remaining seven MAGs of the order Caudovirales, which were not classifiable to any other known family ( Figure 2). In addition, MAVGs 157 and 515 were assigned to the families Inoviridae (class Faserviricetes) and Tectiviridae (class Tectiliviricetes), respectively. At the genomic level, Caudovirales are an highly diverse group featuring mosaic genomes composed of conserved and variable regions [30]. The predominance of this order in biogas-producing plants was reported previously [36,56,57] and is now confirmed by this study.  Furthermore, among others, one Tectivirus (MAVG 515) was also identified within the metavirome. Tectiviruses are supposed to be one of the ancestors of the Polintons (large eukaryotic DNA transposons) that are suggested to be the first group of eukaryotic dsDNA viruses which have evolved from archaeal and bacterial ancestors [61,62].
Apart from the order Caudovirales, characterized by dsDNA genomes, the presence of viruses with ssDNA genomes in biogas-producing plants has not yet been demonstrated and has, therefore, been questioned. However, in this study, MAVG 157 of the family Inoviridae (order Tubulavirales), previously described as a virion with a circular, positive sense ssDNA genome [63], could be identified ( Table 1). Members of this family have a unique morphology, visible as flexible filaments or rigid rods, caused by the helical symmetry of the capsid [64]. These phages infect Gram-positive, Gram-negative, or cell wall-less bacteria. A characteristic feature of this family is that its members neither enter Within the family Siphoviridae, MAVG 80 was particularly noticeable since it dominated the phage community. Its genome showed a 9083-fold coverage by metagenomic reads, indicating that MAVG 80 is one of the most abundant phage species within the biogas virome analyzed. Members of the family mentioned above represent phages with dsDNA packaged into a capsid connected to a tail [33]. Siphoviridae-like head-tailed viruses have already been reported for methanogenic digesters previously [29,33]. Moreover, a recent study on a Siphoviridae phage, designated Blf4 and isolated from a commercial biogas plant in Germany, also described that this phage infected methanogenic Archaea of the species Methanoculleus bourgensis [58]. Considering the relatively high abundance of Methanoculleus members in AD [19,59,60] and its metabolic relevance, Blf4 and Siphoviridae-like phages infecting members of the genus Methanoculleus genus might profoundly affect their host's abundance and, thus, the efficiency of the entire biogas process.
Furthermore, among others, one Tectivirus (MAVG 515) was also identified within the metavirome. Tectiviruses are supposed to be one of the ancestors of the Polintons (large eukaryotic DNA transposons) that are suggested to be the first group of eukaryotic dsDNA viruses which have evolved from archaeal and bacterial ancestors [61,62].
Apart from the order Caudovirales, characterized by dsDNA genomes, the presence of viruses with ssDNA genomes in biogas-producing plants has not yet been demonstrated and has, therefore, been questioned. However, in this study, MAVG 157 of the family Inoviridae (order Tubulavirales), previously described as a virion with a circular, positive sense ssDNA genome [63], could be identified ( Table 1). Members of this family have a unique morphology, visible as flexible filaments or rigid rods, caused by the helical symmetry of the capsid [64]. These phages infect Gram-positive, Gram-negative, or cell wall-less bacteria. A characteristic feature of this family is that its members neither enter typical lytic nor lysogenic cycles. Instead, virions are released from cells by extrusion, causing chronic infection without killing the host [64,65]. They mobilize DNA within the microbiome, and thus play a role in the evolution of microorganisms. MAVG 157 of the family Inoviridae is 8.471 bp long and showed threefold genome coverage, based on the sequencing data obtained. This is the first study so far indicating the occurrence of ssDNA phages in a biogas-producing microbiome.

Protein-Based Phage Similarity Network Showed Relationships with Phages Infecting Members of the Classes Bacilli and Clostridia
To predict the taxonomic relatedness of biogas plant phages with phages deposited in the ProkaryoticViralRefSeq v94 database and to indicate a narrow host range of virion studies, a genome-and network-based clustering classification method applying the program vConTACT v2.0 [49] was used ( Figure 3). The resulting whole-genome clustering network was composed of 2991 phages in total. MAVGs were grouped based on their gene content profiles into viral clusters (VCs) related to candidate viral genera as assigned by the International Committee on the Taxonomy of Viruses [65]. The network revealed that biogas plant phages were spread across 65 VCs, with 75 additional phage genomes that were considered to be outliers, 134 singletons, and 13 MAVGs showing overlaps with other VCs. The latter are those whose genomes showed no common genomic features among all known viruses to justify the membership, and, therefore, are presumably specific to the biogas reactor environment. Moreover, these results suggest that the 357 MAVGs most likely represent novel viruses due to the low proportion of hits to reference phage genomes, and since they form VCs distinct from reference phages.
In total, nine MAVGs were affiliated with phages that infect bacterial host cells of the phyla Firmicutes, Actinobacteria, and Proteobacteria. Among the cluster related to Proteobacteria, only MAVG 101 (family Podoviridae) was related to phages that attack members of this phylum. A further three biogas plant phages were clustered with viruses infecting members of the phylum Actinobacteria. They originated from the vConTACT v2.0 database and belonged to the families Podoviridae and Siphoviridae.  [49]. The vConTact 2.0 network generated by using the ProkaryoticViralRefSeq v94 virus database in conjunction with the phage dataset originating from a laboratory-scale biogas reactor (375 sequences) analyzed in this study using the perfused forced directed layout. Related bacteriophages are represented as triangles in pink with the edges between them representing shared protein clusters. Node colors were used to indicate phage hosts, whose taxonomic affiliations were positioned right next to the respective cluster. Ellipses around the groups indicate the biogas phages of this study. Network visualization was generated using Cytoscape 3.8.2.

Insights into the Biogas Plant Phages Life Cycle by Comprehensive Genome Analysis
To predict the lifestyle (temperate or virulent life cycle) of the phage community obtained from the mesophilic laboratory-scale biogas reactor, the 375 MAVGs of this study were analyzed using the Phage Classification Tool Set (PHACTS) [51]. Out of the 375 phages, PHACTS was able to confidently determine the lifestyle of 40 MAVGs confidently ( Figure 4C). Unreliable results were obtained for the other 335 MAVGs; sometimes, these MAVGs were classified as virulent or as temperate at the same time. Most likely, the low proportion of the confidently predicted results was due to the novelty of biogas phage  [49]. The vConTact 2.0 network generated by using the ProkaryoticViralRefSeq v94 virus database in conjunction with the phage dataset originating from a laboratory-scale biogas reactor (375 sequences) analyzed in this study using the perfused forced directed layout. Related bacteriophages are represented as triangles in pink with the edges between them representing shared protein clusters. Node colors were used to indicate phage hosts, whose taxonomic affiliations were positioned right next to the respective cluster. Ellipses around the groups indicate the biogas phages of this study. Network visualization was generated using Cytoscape 3.8.2.
Among the phylum Firmicutes, five biogas plant phages created viral clusters with other phages, infecting members of the classes Bacilli and Clostridia. These phages belonged to the family Siphoviridae or represented virions with unknown taxonomy. These results suggest that members of the classes Bacilli and Clostridia are most likely affected when phages initiate the lysis of host microorganisms, and thus might cause significant process disturbances due to the decimation of essential microbial groups within the AD.

Insights into the Biogas Plant Phages Life Cycle by Comprehensive Genome Analysis
To predict the lifestyle (temperate or virulent life cycle) of the phage community obtained from the mesophilic laboratory-scale biogas reactor, the 375 MAVGs of this study were analyzed using the Phage Classification Tool Set (PHACTS) [51]. Out of the 375 phages, PHACTS was able to confidently determine the lifestyle of 40 MAVGs confidently ( Figure 4C). Unreliable results were obtained for the other 335 MAVGs; sometimes, these MAVGs were classified as virulent or as temperate at the same time. Most likely, the low proportion of the confidently predicted results was due to the novelty of biogas phage genome sequences in the database consulted for comparison. Furthermore, some of the obtained sequences were too short for meaningful comparative analysis, such as the protein-based phage similarity network analysis, as they might not play an important role for the phage lifecycle. In total, 14 out of 40 MAVGs were classified as virulent, whereas 26 MAVGs were predicted to follow a temperate life cycle ( Figure 4C). Unfortunately, none of the confidently predicted phages represented the most abundant MAVGs of this study. Therefore, the ten most abundant and the two longest phages identified in this study were analyzed in detail to deduce their potential role within the studied biogas microbial community. Predicted open reading frames (ORFs) were initially classified according to nine different functional groups: DNA replication (e.g., DNA polymerase and exonuclease), host modification, putative functions, DNA modification (e.g., transposase), structural proteins (e.g., capsid proteins and tail proteins), host lysis, lysis-related proteins, auxiliary metabolic genes (AMGs), and packaging proteins ( Figure 4A, Supplementary Material Table S2 and Figure S2). A substantial fraction of the genes in MAVGs encodes hypothetical proteins (78.2%) with unknown functions, indicating the need for further experimental work addressing the functional characterization of phage-related genes. Genes encoding proteins involved in mandatory functions were also not further considered. On the other hand, genes involved in host modifications, host lysis, and lysogeny-related proteins were of particular interest and, therefore, were examined more closely. A gene involved in host modification was only found in MAVG 6, the largest phage genome in this study ( Figure 4B). Its genome harbored a gene encoding an ADP-ribosyltransferase (ADPRT), which acts on the α subunit of the host RNA polymerase [66]. The activity of ADPRT was described to facilitate gaining control over the infected host cell primarily through shifting gene expression toward the phage genes at different stages of growth [66]. Brown and colleagues also assumed that many ADPRT-encoding genes are accessory genes that are horizontally transferred to confer a fitness advantage to the host [67,68]. Shmakov et al. also predicted that ADPRT is usually encoded by CRISPR-Cas systems loci, which might help to stabilize the prophages and plasmids in the host bacteria [69].
Furthermore, the phosphoadenosine phosphosulfate reductase (PAPS reductase) gene representing an AMG, was identified in 13 MAVGs studied (80, 112, 137, 189, 588, 62, 106,  149, 162, 303, 630, 631, and 655), four of them being among the ten most-abundant virions (MAVG 80, 112, 137, and 189), and MAVG 588 being the second-longest phage genome. AMGs are known to modulate the fitness of the phage by providing functions to the microbial organisms to improve the energy metabolism for phage reproduction [6,7]. AMGs featuring functional relevance in carbon metabolism, assembly of iron-sulfur clusters, nitrification, methane oxidation, and other metabolic processes are believed to give an advantage by improving utilization of nutrients within their host [8,9]. PAPS reductase has been described to be involved in assimilatory pathways of sulfate reduction, suggesting that phages can influence the metabolism and cycling of this essential element [70,71] and thus have an impact on the biogas microbial community (Supplementary Material  Table S3). Furthermore, it was hypothesized that PAPS reductase is an accessory protein belonging to the clade of CysH family enzymes associated with type-IV CRISPR-Cas systems [69], representing the bacterial and archaeal adaptive immunity systems against virus particles [72]. Each of the PAPS reductases encoding MAVGs was identified as viral contigs, indicating that the identified PAPS gene is embedded in a phage region.

Occurrence of Phage Relatives in Biogas-Producing Microbial Communities as Deduced from Publicly Available Metagenome Data
To evaluate the prevalence or, rather, abundance of phages within the microbial communities of different biogas plants, 66 publicly available biogas metagenome datasets obtained from the NCBI database were mapped on the viral genome sequences using the

Occurrence of Phage Relatives in Biogas-Producing Microbial Communities as Deduced from Publicly Available Metagenome Data
To evaluate the prevalence or, rather, abundance of phages within the microbial communities of different biogas plants, 66 publicly available biogas metagenome datasets obtained from the NCBI database were mapped on the viral genome sequences using the program SParkHit. Fragment recruitments were performed by mapping a maximum of ten million sequences from the corresponding metagenome representing the biogas microbial community. The mapping approach was aimed at the reconstruction of phage genomes (represented by the MAVGs) from the previously published metagenome data. The metagenome fragment mapping results were distinguished into the following groups: (I) the occurrence of all 357 phage sequences in the metagenome and (II) the abundance of a single MAVG in the metagenome. In total, 27 out of the 66 metagenomes analyzed (≥0.1% of all metagenome sequences) were assigned to group (I) with different abundance values. Particularly high abundance values ranging between 0.5% and 1.8% of all metagenome sequences were detected for 7 out of the 66 metagenome data sets (accession numbers NCBI: SRR3184553 with (0.5%), ERR843252 (0.6%), ERR843254 with (0.7%), ERR843253 (0.8%), SRR2917898 (1.1%), SRR2917897 (1.3%), and SRR2917896 (1.8%)) ( Figure 5). The abundance values for the longest phage genome in the studied metavirome, MAVG 6, placed this virion on the fifth place in the ranking (0.040%, accession number SRR2917897). MAVG 6 was also found to be prevalent in the microbiome described by Güllert and colleagues [73], but appeared to be slightly abundant also in other mesophilic biogas-producing microbial communities (Supplementary Material Table S4). MAVG 6 was classified as belonging to the family Siphoviridae; however, the results obtained from the protein-based phage similarity network analysis indicated only a distant relationship with previously known phages. Together with 19 other MAVGs, MAVG 6 formed an independent cluster in the plot, which indicated the novelty of this virion (Figure 3).
To determine whether the biogas virome members were present in other habitats, metagenome fragment mappings were also carried out using publicly available metagenome data originating from cow dung, chicken gut, fish gut, pig gut, and soil environments. The phage abundance values achieved for these data were negligible, indicating a specific niche for the phages analyzed in this study, which is clear for the biogas plant environment.
Information obtained from the fragment recruitment approach showed that the MAVGs analyzed within this study can also be found in different biogas plants and most The highest number of mapped sequences within group (I) was found in the metagenome originating from a production-scale biogas plant located near Cologne, Germany (between 1.1% and 1.8%; accession numbers SRR2917898 and SRR2917896, respectively). This biogas plant was operated under mesophilic (40 • C) conditions and fed with a mixture of maize silage (69%), cow manure (19%), and chicken manure (12%) [73]. Güllert and colleagues reported that a substantial number of cellulosome-producing bacteria of the phylum Firmicutes (57%), followed by Bacteroidetes (11%) and Actinobacteria (7%), were detected in the studied reactor [73]. Interestingly, the abundance values for phage sequences in this metagenome were unusually high (1.8%), because, on average, between 0.1% and 0.5% of viral sequences were previously detected in AD microbiomes [17]. Unfortunately, the analysis of the viral community was not an objective of the study by Güllert et al. [73]. However, our results indicated significant similarity between the phages presented in Güllert's data and the metavirome of this study.
The biogas plant phage members were also detected with abundances between 0.5% and 0.8% in the microbiome originating from a mesophilic industrial biogas plant operating under dry fermentation conditions and fed with maize silage (63%), green rye (35%), and chicken manure (2%) (ERR843252, ERR843253, and ERR843254) [59]. As expected, Clostridia and Bacilli were also the most abundant classes of Firmicutes in the studied microbiome, with 14.3% and 1.4% of the total number of analyzed reads, respectively. As proposed in the previous chapter, phages from this study are most probably specific for Firmicutes members residing in mesophilic communities in particular.
Detailed analysis of fragment recruitment results also revealed that several of the analyzed MAVGs appeared to be highly abundant in mesophilic microbial communities, and were hardly found in microbiomes originating from thermophilic biogas reactor systems (group II). This trend, represented by the color change in Figure 5 (from blue to orange), indicates that the phages obtained in this study are most likely specific for mesophilic microorganisms residing in the biogas microbiome. MAVG 28 falls into this category and was found to be the most abundant virion among all MAVGs in this study. Its highest amount (0.1%) within the publicly available biogas metagenome datasets was detected in the mesophilic biogas plant studied by Stolze and colleagues [59]. Compared to the otherwise detected small proportion of viral sequences in biogas-producing communities being, on average, ≥1%, the abundance value achieved for MAVG 28 was surprisingly high. MAVG 28 was taxonomically classified as belonging to the family Myoviridae. Its genome is 44.297 bp long and was proposed to undergo the lytic cycle (non-confidently) of replication, as deduced from its genetic repertoire. MAVG 558 seemed to be less prominent in most of the analyzed microbiomes. However, this virion was the second most-abundant genome in the dataset described by Güllert and colleagues (between 0.044% and 0.074%) [73] and represented an unclassified phage with a genome length of 52,688 bp. Similar to MAVG 28, the MAVG 558 was proposed to encode the temperate cycle of replication (non-confidently).
The abundance values for the longest phage genome in the studied metavirome, MAVG 6, placed this virion on the fifth place in the ranking (0.040%, accession number SRR2917897). MAVG 6 was also found to be prevalent in the microbiome described by Güllert and colleagues [73], but appeared to be slightly abundant also in other mesophilic biogas-producing microbial communities (Supplementary Material Table S4). MAVG 6 was classified as belonging to the family Siphoviridae; however, the results obtained from the protein-based phage similarity network analysis indicated only a distant relationship with previously known phages. Together with 19 other MAVGs, MAVG 6 formed an independent cluster in the plot, which indicated the novelty of this virion (Figure 3).
To determine whether the biogas virome members were present in other habitats, metagenome fragment mappings were also carried out using publicly available metagenome data originating from cow dung, chicken gut, fish gut, pig gut, and soil environments. The phage abundance values achieved for these data were negligible, indicating a specific niche for the phages analyzed in this study, which is clear for the biogas plant environment.
Information obtained from the fragment recruitment approach showed that the MAVGs analyzed within this study can also be found in different biogas plants and most probably affect mesophilic microbial community members. Therefore, viruses may shape AD microbiomes.

Conclusions
Many of the environments in which phages interact with their bacterial hosts are anaerobic. Some of these scenarios are of importance regarding biotechnological applications, such as energy recovery by the anaerobic digestion of biomass. In the present study, high-throughput metavirome sequencing of samples from a laboratory-scale biogas reactor was performed. The aim was to provide an adapted protocol for the enrichment of bacteriophages from biogas digestates as preparation for phage-metavirome studies, and, in addition, to gain a better understanding of the phage's community composition and its influence on the biogas microbiome. The protocol for the enrichment of virus-like particles from a biogas digestate was successfully established and serves as an orientation aid for other studies focusing on the isolation of bacteriophages from AD fermentation samples. Although this study does not fully reflect the complexity of the biogas-producing microbiomes, the obtained results provide some insights into the ecological role of phages within the analyzed biogas microbial community. Unfortunately, the interpretation of the generated data was limited due to the restricted availability of appropriate reference genome sequences in public databases, which prevented their complete characterization when relying on sequence homology-based identification. Accordingly, the enrichment and purification of phage particles applied in this study and subsequent sequencing led to the odentification of further key phage species, thus providing genome sequence information for novel phage members of the biogas-producing community.
The occurrence of phages in the biogas microbiome certainly has a significant impact on the composition of the biogas community and, therefore, the dynamics of biomass conversion. The obtained results indicated that Firmicutes followed by Actinobacteria, in particular, may severely be affected by phage infections. As bacteria from the phylum Firmicutes are predominantly involved in the hydrolysis of the fed substrate, it can be assumed that phages might strongly affect this step of the biomethanation process. Furthermore, several phages from this study are most probably specific for Firmicutes and Actinobacteria members residing in mesophilic communities in particular. Thus, further genetic analysis of the gene contents of the identified phage contigs could shed light on phage lifestyles in the biogas microbiome and expand the knowledge on the interactions between phages and their hosts under anaerobic conditions. Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/microorganisms10020368/s1, Figure S1: (A) Biogas formation VN (L) and pH (moving average, window size = 15) during the anaerobic digestion of carboxymethyl cellulose (CMC) in the investigated continuous stirred tank reactor (CSTR) prior to sampling at t = 9.7 d. CMC was added to a final concentration of 0.5% at t = 0 (YB, CMC = 92 mL g ODM ORFs that were found in two or more genomes are colored with the same color (e.g., PAPS reductase in bright red). Hypothetical proteins are depicted in grey and are not labeled. The ORFs identified are indicated by arrows in the direction in which they are translated. Genomes were generated with Snapgene; Table S1: The number of enriched and sequenced phages originating from the biogas reactor; Table S2: Functional groups of genetic determinants found in the most abundant phages in the metavirome: DNA modification, DNA replication, structural proteins, temperate phage proteins, putative proteins, host modification, packaging, host lysis, hypothetical proteins, and AMG; Table S3: Occurrence of the auxiliary metabolic genes (AMGs), namely phosphoadenosine phosphosulfate reductase (PAPS reductase), detected in MAVGs originating from the biogas reactor sample; Table S4: The occurrence of 357 MAVGs in different biogas-producing microbial communities as deduced from the fragment recruitment approach.