Metagenomic Analysis of Virioplankton from the Pelagic Zone of Lake Baikal

This study describes two viral communities from the world’s oldest lake, Lake Baikal. For the analysis, we chose under-ice and late spring periods of the year as the most productive for Lake Baikal. These periods show the maximum seasonal biomass of phytoplankton and bacterioplankton, which are targets for viruses, including bacteriophages. At that time, the main group of viruses were tailed bacteriophages of the order Caudovirales that belong to the families Myoviridae, Siphoviridae and Podoviridae. Annotation of functional genes revealed that during the under-ice period, the “Phages, Prophages, Transposable Elements and Plasmids” (27.4%) category represented the bulk of the virome. In the late spring period, it comprised 9.6% of the virome. We assembled contigs by two methods: Separately assembled in each virome or cross-assembled. A comparative analysis of the Baikal viromes with other aquatic environments indicated a distribution pattern by soil, marine and freshwater groups. Viromes of lakes Baikal, Michigan, Erie and Ontario form the joint World’s Largest Lakes clade.


Introduction
Viral communities that inhabit the water column (virioplankton) represent the range of all viruses, including those that affect eukaryotes, as well as archaeal viruses and bacteriophages. Viruses can contain both RNA and DNA, and their numbers are comparable in aquatic environments [1]. Bacteriophages constitute a large part of viral communities [2] and are active participants in the microbial loop. They affect genetic diversity and control the number of heterotrophic bacteria and cyanobacteria, and this factor largely determines the content and structure of plankton [3].
The study of viral communities is difficult for several reasons. The primary reason is because only a small part of their hosts and viruses themselves can be cultivated. Moreover, because no one gene is common for all viral genomes, viral communities cannot be studied in the same way as bacteria and eukaryotes based on the analysis of 16S rRNA and 18S rRNA genes, respectively. Therefore, individual groups of viruses are studied by signature genes. For example, genes for capsid proteins, such as g20 and g23 [4][5][6], serve as targets for identification of T4-like myoviruses, the polA DNA polymerase gene [7] for podoviruses, the polB gene [8] for members of the family Phycodnaviridae and the RdRp gene [9,10] for RNA viruses.
Until recently, the focus of aquatic virologists was on the viromes of marine environments, whereas there were fewer studies of freshwater viromes. The first study of freshwater bodies investigated the viral communities in fish ponds [11]. Subsequently, studies characterised RNA viromes from Lake Needwood (Maryland, USA) [12] and the viral diversity from Lake Limnopolar (Antarctica) [13]. There were also metagenomic studies of viruses from two sites of an aquaculture facility [14], two lakes in France, Bourget and Pavin [15], four freshwater bodies located in the Sahara desert [16] and the Feitsui Reservoir in North Taiwan [17]. In 2013, metagenomic analysis of viral communities from East Lake (China) revealed high genetic diversity of viruses [18]. Recently, six arctic freshwater bodies in Spitsbergen (Svalbard, Norway) were studied and compared with an Antarctic lake. The authors revealed a taxonomic similarity in the investigated water bodies. However, the viromes differed at the fine-grain genetic level. These data indicate differences among dominant species of viruses. Single-stranded DNA (ssDNA) viruses predominated in Arctic viromes; most viruses remained unidentified [19]. The viromes of the Great Lakes, including Ontario and Erie, exhibited the dominance of bacteriophage sequences as well as a high content of plant and animal viruses; a comparative analysis indicated a similar composition of viruses in both lakes [20]. In the eutrophic Lake Matoaka (USA), sequences that belonged to tailed bacteriophages prevailed along with Podoviridae family members [21]. In the virome from Lake Michigan, like other similar viromes, most of the generated open reading frames (ORFs) were assigned to hypothetical proteins [22]. In Lake Lough Neagh (Northern Ireland), 85% of the virome did not have homologues in the extant sequence databases [23]. These data were consistent with most previous metagenomic analyses. Subsequent analysis of the annual dynamics in the content of the viral community indicated that 20% of the viruses were not detected during specific periods of the year [24].
To date, there have been no studies on sequencing viromes from the pelagic zone of Lake Baikal. Recent studies focused on the identification and genetic diversity of viruses of the Tevenvirinae subfamily (order Caudovirales, family Myoviridae) via the signature genes g20 [25] and g23 [26,27]. Lake Baikal is oligotrophic; it is the deepest lake in the world (1642 m) and has a large freshwater supply (2.36 × 10 4 km 3 ). Geological, geographical and hydrological characteristics of the lake reflect its uniqueness and high endemism of aquatic organisms [28]. Although some studies characterised viruses of Lake Baikal [25,26,29,30], their genetic diversity remains insufficiently investigated.
Spring is an important period in Lake Baikal. According to long-term data, this period corresponds to the highest algal biomass and rates of primary production [31][32][33]. The studied periods are regarded as early spring (under-ice period: February, March and April) and late biological spring (May and June) according to Kozhov [34]. The aim of this study was to determine the taxonomic composition of viral communities in the pelagic zone of Lake Baikal during the under-ice and late spring (open-water) periods, perform functional annotation of genes and comparative analysis with viromes previously obtained from other lakes and assemble contigs and compare them to the RefSeq and GenBank databases.

Sample Collection and Sequencing
For the concentration of the viral fraction and DNA extraction,~25 L samples were taken on 22 March and 8 June 2018, 7 km and 3 km, respectively, from the Listvyanka settlement (southern basin of Lake Baikal). From each layer (0, 5, 10, 15, 20, 25 and 50 m), 3.5 L were sampled and mixed to obtain an integrated sample of the 0-50 m layer. Subsequently, the sample was filtered through polycarbonate filters with a pore size of 0.4 µm (Millipore, Burlington, MA, USA) to remove phyto-and zooplankton. The filtrate was concentrated using a VivaFlow 200 tangential flow ultrafiltration system (Sartorius, Göttingen, Germany) to the final volume of~20 mL. It was passed through a nozzle with a pore size of 0.2 µm (Sartorius, Göttingen, Germany) to remove bacteria and concentrated using a Vivaspin Turbo 15 (50 kDa; Sartorius, Göttingen, Germany) to a volume of~100 µL. The samples were processed for several hours; during filtration and concentration, the samples were refrigerated at 4 • C.
The samples were processed with sodium dodecyl sulphate (SDS) and proteinase K. DNA was extracted using the standard phenol-chloroform method. The DNA concentration was measured using a Qubit 2.0 Fluorimeter (Invitrogen, Carlsbad, CA, USA), according to the manufacturer's instructions (~50 ng of DNA was obtained). The extracted DNA was stored at −80 • C.
DNA was fragmented with a Covaris S2 (USA), and libraries were prepared using the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA). Whole genome amplification technology was not used. The obtained DNA libraries were sequenced on a MiSeq device (Illumina, San Diego, CA, USA) using the Kit v3 2 × 300 reagents (Illumina, San Diego, CA, USA) in the "Genomics Core Facility" (ICBFM SB RAS, Novosibirsk, Russia).
The samples were labelled as follows: 7 km from the Listvyanka settlement (March)-BVP1, and 3 km from the Listvyanka settlement (June)-BVP2; BVP1_2 indicated the cross-assembled sample.

Water Chemistry Analyses
The concentrations of total phosphorus, total nitrogen, total organic carbon, nitrate, nitrite and chlorophyll a were determined as previously described [33]. The concentration of oxygen was measured using the SBE 25 Sealogger CTD (Sea-Bird Electronics, Bellevue, Washington DC, USA).
Samples (1 L volume) for qualitative analysis of phytoplankton were fixed with Lugol's solution and then concentrated by sedimentation. Algae were counted twice in a 0.1 mL Nageotte chamber under a "Peraval" light microscope with 720× and 1200× magnification. Thin cells of Synedra acus subsp. radians (Kütz.) Skabitchevsky formed by the sexual process did not have silica valves, and thus 50 mL samples were filtered onto 1 µm pore-size "Millipore" polycarbonate filters and stained by DAPI for visualisation of nuclei and chloroplasts. Subsequently, the cells were counted directly on the filter using an Axio Imager M1 microscope with 200× magnification.

Bioinformatics Analysis
The virome was analysed with the online pipeline Meta Genome Rapid Annotation using Subsystem (MG-RAST); raw data were uploaded [38].
The assembly and analysis of contigs included the following stages. Quality control was performed using the Fast QC program (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Then, the obtained data were processed with Trimmomatic v. 0.36 [39], using the parameter SLIDINGWINDOW:4:20. Sequences shorter than 50 nucleotides were excluded from the analysis; adapters were removed. The SPAdes 3.13.0 metagenomics assembler, metaSPAdes, with default parameters was used for the de novo assembly. All calculations were performed on the HPC-cluster "Akademik V.M. Matrosov" ("Irkutsk Supercomputer Centre of SB RAS, http://hpc.icc.ru\T1\ textquotedblright).
Contigs were assembled with SPAdes using the VirSorter (v. 1.0.3) tool. The contigs that belonged to viruses were sorted out (Virome database). Contigs that were shorter than 5 kilobase pairs (Kbp) were removed before processing. The length of contigs was chosen as the most optimal ratio of recall value to precision value. This value, according to the simulation, is also the minimum necessary for the correct identification of viral sequences [40]. Then, the closest homologues were determined using blastn (DB RefSeq 2019 and GenBank 2019), with an e-value parameter of 10 −3 . To predict genes, the MetaGeneMark tool [41] was used; subsequently, a search in the NR NCBI (2019) database was performed manually with blastp. The contig and genome architectures were drawn using EasyFig [42].
The original unprocessed reads were uploaded to the MG-RAST server (BVP1 ID mgm4814173.3, BVP2 ID mgm4816981.3) and NCBI (SRA project PRJNA547700). Table 1 shows temperature, Secchi disk transparency, concentrations of total phosphorus, organic carbonate and nitrogen and the content of oxygen, nutrients and chlorophyll a in the pelagic zone of Lake Baikal. Based on these data, the trophic state of Lake Baikal was assigned as oligotrophic with mesotrophic characteristics, according to the Vollenweider & Kerekes classification [50]. The diatom Fragilaria radians (Kützing) D.M. Williams and Round, also known as S. acus subsp. radians (Kützing) Skabitchevsky (Algae Base taxonomy [51]), dominated phytoplankton in March and June. The number of virus particles under the ice was slightly higher than during the open-water period. On the contrary, the total number of bacteria decreased in June compared to March (Table 1).

Taxonomic Composition
The bulk of sequences (84.1% for BVP1 and 57.3% for BVP2) did not show any similarity with sequences from the databases ( Figure 1). This fact has been stated in all previously obtained viromes. We assigned 8.8% (BVP1) and 4.5% (BVP2) of all annotated sequences to the Viruses domain.  The bulk of sequences (84.1% for BVP1 and 57.3% for BVP2) did not show any similarity with sequences from the databases (Figure 1). This fact has been stated in all previously obtained viromes. We assigned 8.8% (BVP1) and 4.5% (BVP2) of all annotated sequences to the Viruses domain.
The share of single-stranded viruses was 0.02% for BVP1 and 0.003% for BVP2.

Analysis of Sequences at the Genus Level
At the genus level, most annotated sequences belonged to T4-like viruses of the Myoviridae family.  In the BVP1 sample, we identified 27.41% (19,044) of the classified reads as a part of the functional category "Phages, Prophages, Transposable elements, Plasmids" (Figure 2). These genes are associated with phage replication and packaging of virus particles (e.g., terminase, integrase, helicase and primase). "Phages, prophages" were the largest part of this group (97% of all classified reads in this group), whereas 1.3% of reads belonged to Gene Transfer Agents (GTA). In the functional category "Phages, Prophages, Transposable elements, Plasmids", we assigned a small number of reads to the functional categories "Pathogenicity islands" (1.57%) and "Transposable elements and integrons" (0.16%). Notably, the subgroup with the most reads in the "Phages, prophages" category was "r1t-like streptococcal phages" (54.1%).
In the BVP2 sample, MG-RAST annotated 28 functional categories; each was subdivided into distinct subsystems. Some reads (9.6%, 34871) belonged to the functional category "Phages, Prophages, Transposable elements, Plasmids". The largest part, 13.2% (47855), was the clustering-based subsystems category (e.g., biosynthesis of galactoglycans and related lipopolysaccharides; catabolism of an unclassified compound, etc. and other clusters identified as unclassified). The NULL subcategory included 29,684 sequences with the prevalence of Ribonucleotide reduction (3428) and Phosphate metabolism (3211) at level 3. The subgroup with the most reads in the "Phages, Prophages" category was "r1t-like streptococcal phages" (58.6%), similar to BVP1; phage protein and phage terminase annotations were most commonly identified.

Contig Analysis
The total number of obtained contigs was 255,462 for BVP1, 388,735 for BVP2 -388,735 and 544,501 for BVP1_2 (cross-assembled). Further analysis included only contigs with a length of 5 Kb or more (Table 4). After VirSorter processing, the number of contigs was 376 in BVP1, 776 in BVP2 and 1136 in cross-assembled BVP1_2. We annotated these contigs with blastn according to the Refseq 2019 and GenBank 2019 databases; the e-value was 10 −3 (Table 5). At the first step, contigs were annotated using the Refseq database. Contigs that were not assigned to the closest relative at the first step were then annotated using the GenBank database. The remaining contigs were not assigned to the closest relative because they had very low similarity.

Contig Analysis
The total number of obtained contigs was 255,462 for BVP1, 388,735 for BVP2 -388,735 and 544,501 for BVP1_2 (cross-assembled). Further analysis included only contigs with a length of 5 Kb or more (Table 4). After VirSorter processing, the number of contigs was 376 in BVP1, 776 in BVP2 and 1136 in cross-assembled BVP1_2. We annotated these contigs with blastn according to the Refseq 2019 and GenBank 2019 databases; the e-value was 10 −3 (Table 5). At the first step, contigs were annotated using the Refseq database. Contigs that were not assigned to the closest relative at the first step were then annotated using the GenBank database. The remaining contigs were not assigned to the closest relative because they had very low similarity.
The number of annotated contigs with a length greater than 25 Kbp was 16 in BVP1, 31 in BVP2 and 54 in cross-assembled BVP1_2.
The BWA program was used to map back to the raw data. For BVP1, 14% of reads were mapped to VirSorter contigs, and 12.8% of reads for BVP2. Table 5 shows the first 10 contigs for each assembly with a minimum e-value. Other annotated contigs are shown in Supplementary Table S2.  Figure 3 shows the genetic map of the BVP1_2_NODE_212 contig. Based on the analysis of the terminase large subunit, we classified this phage as closely related to the cultured marine Synechococcus phage Bellamy (MF351863) and uncultured Mediterranean phage (KT997817). According to the results of the MetaGeneMark analysis, the number of predicted ORFs was 32. We visualised the contig coverage with reads in the IGV genome browser (Supplementary Information Figure S1). Visualisation indicated no gaps and uniform coverage.

Comparative Analysis of Viromes
A dendrogram constructed with the agglomerative hierarchical clustering demonstrated a clear separation of viromes into groups: Soil, marine and freshwater ( Figure 4). Viromes from Lake Baikal (BVP1 and BVP2) form a cluster with viromes of the world's largest lakes (Michigan, Ontario and Erie). BVP1 has a separate branch in this cluster. BVP2 is located closer to viromes from Lakes Ontario and Erie. Viromes from Lakes Matoaka and Lough Neagh, as well as fish ponds, form a eutrophic cluster. Table 6 shows the main characteristics of the lakes.

Comparative Analysis of Viromes
A dendrogram constructed with the agglomerative hierarchical clustering demonstrated a clear separation of viromes into groups: Soil, marine and freshwater ( Figure 4). Viromes from Lake Baikal (BVP1 and BVP2) form a cluster with viromes of the world's largest lakes (Michigan, Ontario and Erie). BVP1 has a separate branch in this cluster. BVP2 is located closer to viromes from Lakes Ontario and Erie. Viromes from Lakes Matoaka and Lough Neagh, as well as fish ponds, form a eutrophic cluster. Table 6 shows the main characteristics of the lakes.

Discussion
Using high-throughput sequencing, we obtained the first data on the content and structure of viral communities from the pelagic zone of Lake Baikal, the world's oldest and largest lake in terms of area and water volume.
The Lake Baikal pelagic zone is an oligotrophic water body, as stated previously [52] and confirmed by our data on the concentration of total phosphorus, nitrogen, chlorophyll a content and water transparency. In May and June, there is a steady stratification of the water column that corresponded to the spring period (inverse thermal stratification). At Lake Baikal, early June refers to the period of biological spring. Water chemistry characteristics in March and June 2018 were similar in the interannual aspect compared to the previous years of observations; no changes were found. Notably, under the ice, the development level of the diatom F. radians, which dominated the plankton of the lake, was slightly higher compared to June, as was the concentration of chlorophyll a. In general, the concentration of chlorophyll a corresponded to the characteristics we previously observed in March and June 2016 at the same station [33]. The total number of virus particles and bacteria was similar to the same periods of other years [33,53].
Although the viral sequences in the two viromes showed significant and distinct diversity, they were similar in taxonomic composition; bacteriophages of the order Caudovirales (89.3%) dominated. In other freshwater bodies, viruses from the order Caudovirales also predominated among the virus fraction. Lake Limnopolar, lakes of the Svalbard archipelago and Lakes Bourget and Pavin are the exception. Bacteriophages can be easily isolated from natural samples and sequenced, and thus their sequences predominate in databases. This fact explains their prevalence in viromes [22], and our data support this statement.
In virioplankton of Lake Baikal, the bulk of the order Caudovirales comprised phages that belong to the Myoviridae family. The members of this family were also the most numerous in Lakes Erie and Ontario [20], East Lake (Aug 2009, Dec 2009, and June 2010) [18], Lake Michigan [22], in tropical water bodies of Singapore [54] and freshwater bodies of the Sahara desert [16].
Sequences of the Baikal viromes annotated from databases were mostly bacterial (85.2% for BVP1 and 92.9% for BVP2). The bacterial part is probably overvalued due to the erroneous classification of metavirome sequences of prophages because, according to databases, they are of bacterial origin [55].
Of special interest is a small number of viruses with ssDNA in Lake Baikal; their content was 0.02% (BVP1) and 0.003% (BVP2). The same low content was noted in Lake Lough Neagh (0.5%). On the contrary, the share of ssDNA viruses in Lakes Pavin and Bourget was 80% and 85%, respectively [15]. In the spring sample, ssDNA viruses dominated viromes of Arctic lakes (74%); in the summer sample, they were 9% [19]. This distinction is due to the difference in the preparation of metagenomic libraries for sequencing [56,57]. In most studies, sampling for virome sequencing was performed from the surface water layers (0-5 m). In some cases, the shallowness of the water body limited the sampling depth.
Notably, sequences of the viral fraction were mostly dissimilar to the sequences from databases [13,20]. This so-called "viral dark matter" [58] represents content and structure that remain unknown in all studied samples [18,20,21,23].
After assembling the reads, there were no whole genome sequences that exhibited significant similarity to the reference sequences. Long contigs (over 25 Kbp) represented 5% (BVP1), 5% (BVP2) and 5.6% (BVP1_2) of the sequences that belonged to viruses after VirSorter processing. The identified pool of the Baikal contigs probably represents endemic Baikal viruses, or it is associated with a small number of reference genomes from freshwater bodies in the database. In general, the cross-assembled method yields longer contigs.
The BVP1_2_NODE_212 contig was the most similar to the reference sequences and had a great number of known genes. According to the RefSeq database, Prochlorococcus phage P-TIM68 was the closest relative of this contig. Although Prochlorococcus species are marine cyanobacteria, previous studies determined that Prochlorococcus phages may contain several genes that are similar to other T4-type viruses. They are common for all cyanophages, and in our case the same picture is likely [59]. For example, in Lake Michigan many annotated viral proteins belong to Prochlorococcus phage P-SSM2 [22].
Comparative analysis of the Baikal viromes in functional categories demonstrated that the "Phages, Prophages, Transposable elements, Plasmids" category prevailed during the under-ice period. The predominance of this category likely indicates the active replication (reproduction) of viruses under the ice. There are more bacteria during this period. In the late spring, the clustering-based subsystems category contained the most sequences, together with the NULL subcategory, which represented the bulk of sequences from this category. The NULL subcategory may contain some wrong assignments, and this possibility may indicate either the uniqueness or lack of sequences with known functions in the SEED database.
Agglomerative hierarchical clustering of viromes clearly separated viromes into freshwater, soil and marine groups. Studies by other authors confirm such clustering [15,21].
On the dendrogram, the Baikal viromes are most closely located near viromes from the world's other largest lakes in terms of area, such as Ontario, Erie and Michigan, and form the joint World's Largest Lakes (WLL) clade. Lakes Michigan, Erie and Ontario are a part of the Great Lakes system in North America; they are located in temperate climate zone, like Lake Baikal. Lake Michigan was previously mesotrophic; currently, its trophic state is defined as oligotrophic [60]. Lake Erie is mesotrophic, and Lake Ontario is oligo-mesotrophic. Obviously, combining these viromes into the WLL cluster reflects several characteristics, including morphometry, geographical location and trophic state, all of which determine the content and structure of viral communities. Centric diatoms, primarily, members of the genera Aulacoseira and Stephanodiscus, are the drivers of the spring planktonic communities in the Great Lakes. In Lake Baikal, the complex of endemic diatoms, mainly of Aulacoseira baicalensis and Stephanodiscus meyeri, dominated under the ice. However, since 2007 Baikal phytoplankton have shown a change in the diatom communities: A smaller and weakly silicified diatom F. radians (same as S. acus) has dominated the spring plankton of the lake [33].
Similar to Lake Baikal, most viruses in the viromes of Lake Michigan belong to the order Caudovirales, inside which the families Myoviridae, Siphoviridae and Podoviridae dominate (27%, 23% and 9%, respectively, of the total number of known virus sequences) [22]. In viromes of Lakes Erie and Ontario, the double-stranded DNA (dsDNA) phages of the order Caudovirales were also the most numerous; the Myoviridae family predominated (79.7%). Podoviridae (7.9%) and Siphoviridae (4.5%) were the second and third predominant families [20]. Concomitantly, viruses that affect algae (the Phycodnaviridae family; 4.3%) and insects (and animals; the Iridoviridae family; 2.6%) were the most representative. Notably, viromes studied in Lakes Ontario, Erie and Michigan were sampled from the surface in the coastal area. In our case, we obtained viromes from the pelagic layer of 0-50 m.
Nevertheless, they formed a joint cluster. As mentioned above, a separate branch represents BVP1 (under-ice period) in this cluster. The seasonal taxonomic shift can explain the observed nature of the separation of the Baikal viromes and their position on the dendrogram because viromes from Lakes Michigan, Ontario, Erie, and BVP2 were sampled in the summer, namely June and July.
Two clusters comprised the neighbouring eutrophic clade: Viromes from fish ponds and viromes from eutrophic shallow lakes Lough Neagh (Ireland) and Matoaka (USA).
River viromes, which group with viromes from Lake Limnopolar, represent a separate branch of the freshwater clade. This clustering is likely due to the similar content of the dominant virus families. In Lake Limnopolar, ssDNA viruses of the families Circoviridae, Nanoviridae and Microviridae prevail in the spring period, whereas in the summer period, Phycodnaviridae and bacteriophages of the order Caudovirales prevail. The families Microviridae and Myoviridae dominate in samples from the Amazon River. Viruses from the families Circoviridae, Podoviridae, Phycodnaviridae and Siphoviridae were also numerous.
After a comparative analysis of the results, we concluded that methodological specifics of sample preparation for high-throughput sequencing, read depth and bioinformatics data processing are crucial. Standardisation of virome processing requires platforms with an installed pipeline, such as MG-RAST, to avoid problems with different settings of processing programmes, sequence annotation and evaluation of alpha and beta diversity. The presence of bacterial DNA, as well as ultramicrobacteria (less than 2 µm in size) that can pass through a 0.2 µm filter, significantly complicates the analysis of viromes. In this study, we used prefiltration through filters with a pore size of 0.2 µm. The separation of virus particles in a gradient of caesium chloride [35] or chemical flocculation [61] may be possible alternatives. Notably, there is a lack of data on the whole genomes of viruses. The isolation and cultivation of virus strains followed by the characterisation of their genomes are an important component for future studies.

Conclusions
For the first time, we characterised the content of viruses in the pelagic zone of Lake Baikal, the world's oldest and largest lake. The data showed the presence of a significant number of bacteriophage taxa compared to eukaryotic and archaeal viruses. Members of the order Caudovirales dominated the bacteriophages, with the prevalence of the Myoviridae family; Siphoviridae and Podoviridae were the second and third dominant families, respectively. We assembled the contig with a length of 29,000 bp, which belongs to cyanophage. A comparative analysis of viromes indicated their separation into marine, freshwater and soil clades and allowed us to assign the WLL cluster, which includes viral communities of Lake Baikal and the Great Lakes of North America.