Ecology of Subglacial Lake Vostok (Antarctica), Based on Metagenomic/Metatranscriptomic Analyses of Accretion Ice

Lake Vostok is the largest of the nearly 400 subglacial Antarctic lakes and has been continuously buried by glacial ice for 15 million years. Extreme cold, heat (from possible hydrothermal activity), pressure (from the overriding glacier) and dissolved oxygen (delivered by melting meteoric ice), in addition to limited nutrients and complete darkness, combine to produce one of the most extreme environments on Earth. Metagenomic/metatranscriptomic analyses of ice that accreted over a shallow embayment and over the southern main lake basin indicate the presence of thousands of species of organisms (94% Bacteria, 6% Eukarya, and two Archaea). The predominant bacterial sequences were closest to those from species of Firmicutes, Proteobacteria and Actinobacteria, while the predominant eukaryotic sequences were most similar to those from species of ascomycetous and basidiomycetous Fungi. Based on the sequence data, the lake appears to contain a mixture of autotrophs and heterotrophs capable of performing nitrogen fixation, nitrogen cycling, carbon fixation and nutrient recycling. Sequences closest to those of psychrophiles and thermophiles indicate a cold lake with possible hydrothermal activity. Sequences most similar to those from marine and aquatic species suggest the presence of marine and freshwater regions.


Introduction
Nearly 400 subglacial lakes have been discovered in Antarctica, the largest of which is Lake Vostok [1 5]. While Lake Vostok covers an area (15,690 km 2 ) that is about 80% of the size of the Laurentian Great Lake Ontario, it holds a larger volume of water (5,400 km 3 ), due to its depth (maximum depth = 510 m). It lies beneath 3,700 to 4,200 m of ice, and has been continuously ice-covered for the past 15 million years, with the only known influx of water originating from melting of the overriding glacier. Lake Vostok consists of a northern and the southern basin ( Figure 1). Little is known about the northern basin, but more is known about the southern basin because of studies based on an ice core that was drilled over the southeastern corner of the lake. While glacial ice melts over portions of the lake, water from the lake freezes (i.e., accretes) to the bottom of the glacier in other parts of the lake creating an accretion ice layer that is over 200 m thick in some locales [2,6 8]. Because the glacier moves across the lake at a rate of approximately 3 m per year, the accretion ice holds a temporal record that spans approximately 5,000 to 20,000 years [7], as well as a spatial record of the surface waters of the lake. The accretion ice from the core represents several parts of the southern portion of the lake, including a region near a shallow embayment on the southwestern corner of the lake and the southern portions of the southern main basin.
Accretion ice from the ice core has been analyzed to determine the concentrations of specific ions (e.g., Na + , K + , Ca 2+ , Cl , SO 4 2 ) [9 12] and biomass [13 19] originating from several locations in the lake. The ice that formed in the vicinity of the shallow embayment (3,538 3,608 m, termed type 1 accretion ice) contains fine particulate matter [7], as well as relatively high concentrations of ions [9], organisms and nucleic acids [9,16,17]. Conversely, accretion ice that formed over the southern basin (3,609 3,769 m, termed type 2 ice) contains low concentrations of particulates, ions, organisms and nucleic acids. Dozens of bacterial cells, fungal cells and sequences have been reported from several accretion ice core sections [12 19] with the highest numbers concentrated in the core sections that represent regions of the lake near the shallow embayment. They include many types of organisms that are common to other aquatic environments, as well as many that remain unidentified. Sequences from thermophilic bacteria have been reported, indicative of possible hydrothermal activity in the lake [20 22]. Autotrophic and heterotrophic species have also been reported from the accretion ice [16,17]. These reports indicate that Lake Vostok might be more biologically complex than previously concluded. Another feature of Lake Vostok is that it lies entirely below current mean sea level. Its surface is more than 200 m below sea level, and the deepest point is almost 800 m below sea level. A recent study [23] concluded that Lake Vostok lies within a graben (similar to those in the Great Rift Valley in Africa) that formed more than 60 million years ago. A second study, based on radar data [24], reported that 35 million years ago when Antarctica was free of ice, the Southern Ocean was in the immediate vicinity of Lake Vostok. However, by 34 million years ago, ice had covered the lake and lowered sea level, which might have isolated it from a direct connection to the ocean. The fact that parts of Lake Vostok contain moderate levels of salt, and that sequences from marine organisms have been detected in the accretion ice indicate that this lake might have a complex history. In this study, we utilize metagenomic/metatranscriptomic sequence data ( [22]; Supplementary Tables S1 S10) to reconstruct the possible ecology of Lake Vostok. Figure 1. Source of ice core sections used in this study. (a) Location of Lake Vostok (small rectangle) in Antarctica; (b) Detail of the outline of Lake Vostok, as indicated by radar [1,8,23,24]; (c) Detail of the southern end of Lake Vostok, showing the locations of the shallow embayment, ridge, southern basin, track of glacier to the drill site (dashed line), and approximate locations where the accretion ice samples (V5 and V6) were formed.

Summary of Results
Two samples, each consisting of meltwater from two accretion ice core sections, were analyzed for meltwater from two ice core sections (3,563 m and 3,585 m; Figure 1) that accreted in the vicinity of the shallow embayment on the southwestern corner of Lake meltwater from two ice core sections (3,606 m and 3,621 m) that accreted on the western side of the southern basin. Sequences have been deposited in the NCBI (National Center for Biotechnology) GenBank database (accession numbers are provided in the Experimental Section; BLAST results are presented in Supplementary Tables S1 S10 [22].). For the V5 sample, 36,754,464 bp of sequence data was obtained that included 94,728 high quality 454 sequence reads, with mean lengths of 388 bp. For the V6 sample, 1,170,900 bp of sequence data was obtained that included 5,204 high quality reads, with mean lengths of 225 bp. The lower quantity of sequence data for V6 is consistent with our previous results from the same core sections indicating much lower concentrations of cells and viable cells in the V6 ice core sections compared to the V5 ice core sections [16,17]. However, the lower number of sequences and the shorter average read lengths also might indicate that the nucleic acids in this sample were degraded to a greater extent than those in the V5 sample. Overall, approximately 15% of the sequences were unique, while the remaining 85% were additional copies from the unique set of sequences. A total of 3,718 unique sequences were retrieved from V5 (3,146 were rRNA gene sequences), of which 1,863 could be classified to species (Table 1; Supplementary Tables S1 S6 [22]), and 184 unique gene sequences were retrieved from V6 (111 were rRNA gene sequences), of which 133 could be classified to species (Table 2; Supplementary  Tables S7 S10). Approximately 94% of the unique sequences in V5 and 85% in V6 were from Bacteria. Sequences closest to those from species of autotrophs and heterotrophs were present. Only two unique Archaea sequences were found (both in V5), and they were most similar to species of methanotrophs from deep-ocean sediments. The remaining sequences were closest to those from species of Eukarya (6% in V5 and 15% in V6), including more than 200 unique sequences from multicellular organisms, most of which were Fungi (primarily ascomycetes and basidiomycetes). In general, the species indicated by the sequence comparisons were organisms specific to lakes, brackish water, oceans/seas, soil, lake sediments, deep-sea sediments, deep-sea thermal vents, animals and plants.

Extremophiles
A large number of the sequences were most similar to those from psychrophilic and psychrotolerant species (Tables 1 and 2; Supplementary Tables S1 S10). Within the Gammaproteobacteria, there were 33 unique sequences closest to various Psychrobacter species (most with rRNA SSU gene identities of greater than 97%), all described as psychrophiles. Also present were sequences closest to psychrophilic or psychrotolerant species of Actinobacteria, Alphaproteobacteria, Archaea, Archaeplastida, Bacteroidetes, Betaproteobacteria, Firmicutes, Chromalveolata, and Opisthokonta (both Animalia and Fungi). Conversely, there were many sequences that were closest to those from several thermophilic species (many with rRNA SSU gene identities greater than 97%). While most were found in V5 ice (46 with rRNA sequence identities greater than 97% to known taxa), a few were found in V6 ice (2 with rRNA sequence identities greater than 97% to known taxa). A number of sequences most similar to those of sulfur oxidizing bacteria were found in V5 (Tables 1 and 2). Previously, several gene sequences from a thermophilic bacterium, Hydrogenophilus thermoluteolus were reported from the Lake Vostok accretion ice [20,21].
Several sequences from marine species, as well as from halophilic and halotolerant species, were present in the metagenome/metatranscriptome data set (Tables 1 and 2). Sequences closest to Jeotgalicoccus halotolerans, Nesterenkonia halotolerans, and other halophilic bacteria were found in V5 accretion ice, several of which have rRNA SSU gene identities greater than 98%. Some of these species were alkalaitolerant. A number of sequences in V5 and V6 were most similar to sequences from marine species, including marine bacteria (many with rRNA identities above 97%), a sea squirt-associated bacterium (99% rRNA SSU gene identity to Pseudomonas xanthomarina), an oyster pathogen (100% identity to a hypothetical protein from Perkinseus maratimus), a sea anemone (78% identity to a hypothetical protein from Nematostella vectensis, a small sea anemone, related to Hydra spp.) and a marine mollusk (100% rRNA SSU identity to Nutricola tantilla). The presence of marine, halophilic and halotolerant species is suggestive of marine layers, or other regions with high ion concentrations, within the lake or lake sediments. Saltwater layers and submarine brine lakes have been reported at the bottom of the Mediterranean Sea and Gulf of Mexico [25,26]. The number of V5 sample sequences with 49, respectively).The combination of possible halophiles, psychrophiles and thermophiles (at the 97% identity levels; Tables 1 and 2; Figure 2), in addition to higher concentrations of ions and particulate matter in the V5 sample, all are suggestive of a diversity of conditions in the southwestern region of the lake. In the V6 samples, there were more sequences closest to psychrophiles (7) than thermophiles (2). This is consistent with the presence of hydrothermal activity in the vicinity of the shallow embayment, and colder conditions in the main basin.

Metabolic Classification
We considered only sequences whose species and functional characteristics had been clearly identified. Genes for glycolysis, the TCA (tricarboxylic acid) cycle and genes for the synthesis of most amino acids were present (Supplementary Figures S1 S5). Incomplete pathways were found for the synthesis of arginine, histidine and proline. Alternate enzymes might be used for these processes, as determination of some sequences could not be made unambiguously. Some of the genes for each of these pathways were present, and therefore it is assumed that some of the unidentified sequences may encode for the missing enzymes in the pathways.
Sequences closest to those from bacteria capable of nitrogen fixation, nitrosification, nitrification, nitrate reduction, denitrification, anammox, assimilation and decomposition were present (Figure 3). Several types of nitrogen fixing bacteria were indicated by the sequences, including species of Azospirillum, Azotobacter, Bacillus, Burkholderia, Cyanobacteria, Frankia, Klebsiella, Rhizobium, Rhodobacter, Rhodopseudomonas and Sinorhizobium. Sequences of nitrifying bacteria included species of Methylococcus, Nitrobacter, Nitrococcus, Notrosococcus and Nitrosomonas. Species important in other parts of the nitrogen cycle were within the genera Alkaligenes, Bacillus, Clostridium, Micrococcus, Paracoccus, Proteus, Pseudomonas, Streptomyces and Thiobacter. Several sequences from planctomycetes were closest to sequences from species capable of anammox metabolic processes in marine environments. Ecological characterization of organisms based on BLAST search results from the V5 and V6 metagenomic/metatranscriptomic sequences. The taxonomic classifications listed are based on the highest identities to sequences from species within the taxa that have been documented to have the functions specified in the boxes. These have been grouped primarily by Phylum for simplicity. Greater than 95% of the taxa are either primary producers (autotrophs), bacteria involved in the nitrogen cycle, or decomposers, including bacteria and fungi, and a few chromalveolates (shaded green box). Primary and secondary consumers comprise less than 5% of the species richness (i.e., number of unique sequences). Organisms listed in black font were determined by sequences that exhibited y with sequences in GenBank (NCBI). Organisms listed in red font exhibited <97% sequence identity or were suggested because the sequences were closest to symbionts, parasites or pathogens of those organisms .
Sequences of genes and organisms involved in many phases of carbon fixation cycles and pathways were found (Figure 3). Three forms of carbon fixation were indicated. The sequences indicated that most of the organisms utilize either the reductive TCA (rTCA) cycle ( -, -and -proteobacteria, and Chlorobi) (Tables 1 and 2) or the reductive pentose phosphate (rPP; Calvin-Benson) cycle (Archaeplastida, Chromalveolates, Cyanobacteria, and -, -, and -Proteobacteria) [27]. Based on the frequency of gene sequences, the most common mode of CO 2 -fixation was the rTCA cycle (Tables 1  and 2), while the rPP cycle was the second most common. The two Archaea in V5 may fix carbon via the reductive acetyl-CoA (rACA) pathway [27]. However, mRNA gene sequences for enzymes in this pathway were not found in searches of the metagenome/metatranscriptome data set.
A large number and diversity of sequences from phototrophs were present in the accretion ice, including 228 cyanobacterial, 11 algal, 12 chromalveolate, and other unique sequences. Sequences for many of the genes involved in the light reactions of photosynthesis in cyanobacteria were found in the accretion ice. These included light-independent photochlorophyllide reductase and oxidase, phycocyanobilin oxidoreductase, a phycoerythrin subunit and several genes involved in carotenoid biosynthesis. However, gene function was not measured in this study, and it is possible that some of the gene sequences were from pseudogenes, that they are inactive genes, or they originated from organisms once entrapped in the meteoric ice.

Eukaryotes
While only about 6% of the unique sequences were closest to eukaryotes (221 from V5, 87 of which have sequence identities of and 29 from V6, 24 of which have identiti ), diverse taxonomic groups were represented. Most of the sequences were most similar to those from Fungi (91 sequences in V5 and 22 in V6), including one rRNA SSU sequence that was 99% similar to a marine fungus sequence that previously had been recovered from a deep-sea thermal vent [28]. Several sequences from species of Animalia were found, including 21 sequences closest to those from arthropods (16 in V5 and 5 in V6), many of which are predatory or parasitic, including sequences closest to species of Daphnia (planctonic crustaceans; 98% identity), Ellipura ( springtails, some of which are aquatic or marine), Branchiopoda (fairy shrimp, primarily freshwater; 93% identity) and Entomobryidae (slender spingtails, some of which are aquatic; 89 98% identity). Additionally, V5 contained sequences closest to an unidentified bilaterian, a rotifer (closest to Adineta sp., which is a hardy, cosmopolitan, freshwater species; 98% identity), a tardigrade (closest to Milnesium sp., which is a hardy, predatory, cosmopolitan, freshwater species; 93% identity), a mollusk (Nutricola tantilla, a small [maximum diameter of 9 mm] marine bivalve that lives in sediments to about 120 m water depth; 100% identity) and a cniderian (related to Nematostella sp., a small sea anemone; 78% identity). Several Archaeplastida sequences were found in V5 and V6 . Sequences closest to an uncultured lobster gut bacterium (98% identity), Verminephorbacter sp. (an annelid nephridia symbiont; 92% identity), Renibacter salmonarium (a salmonid fish pathogen; 98% identity), a rainbow trout intestinal bacterium T1 (93% identity) and Photorhabdus asymbiotica (a nematode symbiont; 98% identity) all were found in the V6 accretion ice sample. Additionally, sequences closest to Carnobacterium mobile (associated with fish; 100% identity), Macrococcus sp. (a bivalve-associated bacterium; 97% identity), Pseudomonas xanthomarina (associated with sea squirts; 99% identity) and Mycobacterium marinum (associated with fish; 99% identity) were found in the V5 accretion ice sample. All of the species are dependent on intimate associations (symbiotic or parasitic) with their eukaryotic hosts, which are crustaceans, annelids, tunicates, nematodes or fish. Species of all of these animal groups have been found in the vicinity of deep-sea thermal vents and elsewhere in marine and aquatic environments [29 37]. Additional indications of animals in the lake came from sequences of several species in the Enterobacteriaceae, which were present in both the V5 and V6 samples. These included sequences closest to several strains/species of E. coli, Erwinia, Klebsiella, Salmonella, and Shigella (most with sequence 97%), all of which are found in the digestive systems of fish and other aquatic and marine animals. In addition, sequences closest to those from species of Fusobacteria (some with that are parasitic on animals, Alphaproteobacteria 97% identities) that are animal symbionts, and Tenericutes that are arthropod symbionts and pathogens (up to 91% sequence identity) were found in the V5 sample.

Possible Marine Environment in Lake Vostok
Organisms in Lake Vostok have had millions of years to adapt and evolve. The lake has been continuously ice-covered for approximately 15 million years [1,4,10,23,24,38]. During parts of the Miocene (15 to 25 million years ago, mya), it was intermittently ice-free, and previous to that, a cooling period from 25 34 mya led to the lake to be ice-covered much of the time [38]. However, prior to 35 mya (during the Eocene), most of Antarctica (including the Lake Vostok region) was free of ice, sea levels were higher and extensive complex ecosystems were present on the continent, complete with lakes and streams, as well as diverse sets of microbes, plants (including extensive forests), fungi and animals. During these times, Lake Vostok probably contained a species-rich biota. Currently, Lake Vostok is below sea level. Currently, it is separated from regions that once were occupied by ocean water by low ridges that are approximately 20 50 m above current mean sea level [23,24,38]. However, 35 million years ago, sea levels were between 50 100 m higher. Therefore, this low area may have been a straight that connected Lake Vostok with the ocean, thus making it a large marine bay (Figure 4). Our metagenomic/metatranscriptomic data set included sequences that are most similar to sequences from marine organisms, including many Bacteria, halophiles, a marine mollusk, a sea anemone, a marine thermal vent fungus and two Archaea that previously were described from deep-ocean sediments. As temperatures in Antarctica began to cool and sea levels dropped about 34 million years ago, Lake Vostok become ice covered and isolated from the ocean [24]. Aerobic organisms that survive in cold and partially shaded aquatic environments could have survived in the epilimnion. Anaerobes would have been limited to the benthic and sediment regions. Hydrothermal regions may have been present much of the time, because rifting of the region began more than 60 million years ago [23]. As freshwater from the melting glacier flowed into the lake, ion gradients were likely to develop. These gradients are common in other ice-covered Antarctic lakes [39 42], and the differences in ion concentrations in the embayment compared to the main basin are consistent with this hypothesis. Hydrothermal activity in the vicinity of the embayment might cause mixing of any stratified layers in the lake in that region, leading to the higher ion and mineral inclusion concentrations in the accretion ice from that region. Once current continuous ice-cover began approximately 15 million years ago, additional selection would have occurred as light intensities decreased, and oxygen concentrations and water pressures increased. Although extinctions would probably have been extensive, it is unlikely that all life would disappear from the lake and its sediments. Based on the number of unique species matches in BLAST searches of our accretion ice metagenomic/metatranscriptomic data set, we estimate the total number of species to be at least 1,996 in Lake Vostok, of which over 90% (approximately 1,800) are Bacteria. This converts to approximately 4 species mL 1 , assuming that the concentration of nucleic acids in the accretion ice is representative of the concentration of nucleic acids and organisms in the lake water, and that the samples are representative of the lake in general. However, as ice forms, it can force out many components, including some molecules, minerals and cells. Therefore, the concentrations of organisms and nucleic acids are probably higher in the lake water than in the accretion ice. The calculated concentration is below the number of species found in ocean water, sediments and most freshwater lakes [43 46]. Because the accretion ice represents primarily water at the surface of the lake, species density and richness in the lake could be much higher in some locales. In any case, a relatively high number of sequences matching sequences from a diverse set of species were found in Lake Vostok accretion ice. The 15 35 million years of coverage by ice appears to have been ample time for many organisms to adapt to the extreme conditions that developed in the lake, which has likely led to numerous speciation events.

Acquisition and Processing of Ice Core Sections
All ice core sections were selected from the USGS NICL (United States Geological Survey, National Ice Core Laboratory, Denver, CO, USA). They were selected based on desired depths and for the absence of cracks (to avoid possible external contamination). They were shipped frozen to our laboratory. Sections were surface sterilized according to a tested method that assures destruction and removal of all surface contaminating cells and nucleic acids, while preserving cells and nucleic acids frozen in the ice [16,17,47,48]. Briefly, quartered ice core sections, 6 16 cm in length (total volume approximately 125 mL), were warmed at 4 °C for 30 min (to avoid thermal shock and cracking) before surface decontamination. The work surfaces in a room (under positive pressure) separate from the main laboratory were treated with 0.5% sodium hypochlorite, 70% ethanol and UV irradiation for one hour prior to surface sterilizing and melting the ice core sections. Inside a sterile laminar flow hood, the ice core sections were surface decontaminated by total immersion in a 5.25% sodium hypochlorite solution (pre-chilled to 4 °C for at least 2 h) for 10 s followed by three rinses with 800 mL of sterile water (4 [parts per billion] total organic carbon, TOC, and autoclaved). Then, the core section was transferred into a sterile funnel and melted at room temperature by collection of 25 50 mL aliquots. This protocol significantly reduces the risk of contamination of the ice core meltwater samples [47,48]. The meltwater was then frozen at 20 °C. Sample V5 included meltwater from Vostok 5G core sections at depths of 3,563 and 3,585 m, corresponding to type 1 ice that accreted in the vicinity of the embayment. Sample V6 included core sections 3,606 and 3,621 m, corresponding to type 2 ice that accreted over a portion of the southern main basin of Lake Vostok (Figure 1). A total of 250 mL of meltwater was used for each sample (approximately 125 mL from each ice core section). The meltwater samples were filtered sequentially through 1.2, 0.45 and 0.22 m Durapore filters (Millipore, Billerica, MA, USA). The filters were stored at 80 °C for future reference. Then, the filtered meltwater was subjected to ultracentrifugation at 100,000 xg for 16 h to pellet cells and nucleic acids. Two control samples (pu ppb TOC; and the same water, autoclaved and subjected to concentration by ultracentrifugation) also were processed using the same protocols. The V5, V6, and control samples were ultracentrifuged on different days to lessen potential cross-contamination. Pellets were rehydrated in 50 µL 0.1× TE (1 mM Tris [pH 7.5], 0.1 mM EDTA).

DNA and RNA Extraction
Nucleic acid extraction was performed using MinElute Virus Spin Kits (QIAGEN, Valencia, CA) and eluted in 150 µL AVE buffer (water with 0.04% sodium azide). This kit isolates both RNA and DNA. The eluted nucleic acids were further concentrated by precipitating overnight at 20 °C with 0.5 M NaCl in 80% ethanol. They were then pelleted by centrifugation at 16,000 × g for 15 min, washed with cold 80% ethanol and centrifuged at 16,000 × g for 5 min. They were dried under vacuum, and then they were resuspended in 15 µL 0.1× TE.

cDNA Synthesis and Amplification of cDNA and DNA
Complementary DNAs (cDNAs) were synthesized from the extracted RNAs. The procedure was performed using a SuperScript Choice cDNA kit (Invitrogen, Grand Island, NY, USA), according to instructions, using 10 µL of the extracted RNA and 80 pmol of random hexamer primers. The cDNA was then mixed with 10 µL of extracted DNA (less than 1 ng/µL) from the same meltwater sample, and EcoRI (Not I) adapters (AATTCGCGGCCGCGTCGAC, dsDNA) were added using T4 DNA ligase. The final concentration of components in each reaction for addition of EcoRI adapters was: 66 mM Tris-HCl (pH 7.6), 10 mM MgCl 2 , 1 mM ATP, 14 mM DTT, 100 pmols EcoRI (Not I) adapters and 0.5 units of T4 DNA ligase, in 50 µL total volume. The reaction was incubated at 15 °C for 20 h. Then, the reaction was heated to 70 °C for 10 min to inactive the ligase. [Note: The cDNAs and DNAs were mixed in order to maximize the biomass of nucleic acids, necessary for successful pyrosequencing. Thus, the cDNA comprised the metatranscriptomic fraction (of which most was from rRNA) and the DNA comprised the metagenomic fraction of each sample].
The products were size fractionated by column chromatography. Each 2 mL column contained 1 mL of Sephacryl ® S-500 HR resin. TEN buffer (10 mM Tris-HCl [pH 7.5], 0.1 mM EDTA, 25 mM NaCl; autoclaved) was utilized to wash the columns and elute the samples through the columns. ). They were assembled using MIRA 3.0.5 (Whole Genome Shotgun and EST Sequence Assembler [49]), using the following command line: job=denovo,genome,accurate,454. From the assembly, the average lengths for V5 and V6 were 539 and 318, respectively. The average quality scores were 39.7 and 40.2, respectively; maximum coverages were 3.3 and 6.2, respectively; and average coverages were 2.2 and 3.5, respectively. Initial taxonomic analyses were performed on MG-RAST [50] and Galaxy [51], Batch Mega-BLAST searches were performed to determine taxonomic and gene identities. The BLAST execution file was set up to retrieve the top 10 similar sequences, with e-value cutoffs of 10 10 . They were subjected to batch BLASTN similarity searches on the OSC, and then sorted according to gene, taxon, and similarity e-values using FileMaker (FileMaker, Inc., Santa Clara, CA, USA). The top BLASTN hit was used to determine taxonomic classification (when genus and species names were provided), also considering the lengths and percent similarities of the matches. The sequences were divided into four category files: V5 rRNA genes, V6 rRNA genes, V5 mRNA genes and V6 mRNA genes (Full list of sequences and descriptions in reference [22], and presented in Supplementary Tables S1 S10. The rRNA gene results were used primarily to determine taxonomic classifications. Each species was then categorized according to temperature requirements, growth requirements, metabolic functions and ecological niche, based on NCBI descriptions, publications cited in the NCBI descriptions, and internet sources. Some mRNA sequences were used to determine or confirm species identifications, where possible.

Metabolic Analysis
Sequences were uploaded onto the KAAS-KEGG (KEGG Automatic Annotation Server; KEGG Kyoto Encyclopedia of Genes and Genomes, Kyoto, Japan) pathway website and blasted against the default set of bacterial and eukaryotic genes [52]. The sequences were compared to known sequences from 40 taxa (23 provided on the KAAS-KEGG site, and 17 additional taxa added manually) The additional taxa (with abbreviations used for searches) were: Cryptococcus neoformans JEC21; Thalassiosira pseudonana; Dictyostelium discoideum; Burkholderia mallei ATCC 23344; Campylobacter jejuni NCTC11168; Desulfovibrio vulgaris DP4; Caulobacter crescentus CB15; Micrococcus luteus; Acidobacterium capsulatum; Flavobacterium johnsoniae; Fibrobacter succinogenes; Fusobacterium nucleatum; Opitutus terrae; Gemmatimonas aurantiaca; Rhodopirellula baltica; Chlorobium limicola; Chloroflexus aurantiacus. Based on the results from the KAAS-KEGG analysis, metabolic pathways that were present in our dataset were identified. Tables of enzymes that matched our sequences from each pathway were retrieved. Subsets of these are presented in Supplementary Figures S1 S5.

Conclusions
Lake Vostok accretion ice contains nucleic acids from a diversity of species. While many were closest to those from psychrophilic species, a large number of sequences were closest to those from thermophilic species. Sequences from both anaerobes and aerobes were represented, as well as halophiles, aquatic and marine species. The list of taxa included approximately 94% Bacteria and 6% Eukarya, including over 100 species of multicellular Eukarya. While most were fungi (primarily ascomycetes and basidiomycetes), a number of animals also were indicated, including a rotifer, tardigrade, nematode, bivalves, sea anemone, crustaceans, and possibly fish (as suggested by the presence of sequences that were most similar to those from bacterial symbionts and pathogens of fish species). The species indicated by the sequences include those that participate in many parts of the nitrogen cycle, as well as those that fix, utilize and recycle carbon. Because of the higher concentrations of nucleic acids and viable organisms in accretion ice compared to the overriding meteoric ice [16,17], it is likely that the organisms and nucleic acids in the accretion ice originated in the lake water. The indications of large numbers of marine, halophilic and halotolerant organisms suggest that marine layers, or other saline regions, might exist in the lake. They may have originated millions of years ago at a time when the lake might have been physically connected with the surrounding ocean. Therefore, Lake Vostok might contain a complex interdependent set of organisms, zones and habitats that have developed over the tens of millions of years of its existence.