Genomic Insights of Dyadobacter tibetensis Y620-1 Isolated from Ice Core Reveal Genomic Features for Succession in Glacier Environment

Glaciers have been recognized as biomes, dominated by microbial life. Many novel species have been isolated from glacier ecosystems, and their physiological features are well characterized. However, genomic features of bacteria isolated from the deep ice core are poorly understood. In this study, we performed a comparative genomic analysis to uncover the genomic features of strain Dyadobacter tibetensis Y620-1 isolated from a 59 m depth of the ice core drilled from a Tibetan Plateau glacier. Strain D. tibetensis Y620-1 had the smallest genome among the 12 cultured Dyadobacter strains, relatively low GC content, and was placed at the root position of the phylogenomic tree. The gene family based on a nonmetric multidimensional scaling (NMDS) plot revealed a clear separation of strain D. tibetensis Y620-1 from the reference strains. The genome of the deep ice core isolated strain contained the highest percentage of new genes. The definitive difference is that all genes required for the serine-glyoxylate cycle in one-carbon metabolism were only found in strain D. tibetensis Y620-1, but not in any of the reference strains. The placement of strain D. tibetensis Y620-1 in the root of the phylogenomic tree suggests that these new genes and functions are of ancient origin. All of these genomic features may contribute to the survival of D. tibetensis Y620-1 in the glacier.


Introduction
Glaciers and ice sheets comprise approximately 70% of the total freshwater on Earth [1]. Although they are the largest freshwater reservoirs on Earth, only recently have those systems been recognized as biomes dominated by microorganisms [1][2][3]. Microbe-mediated biogeochemical cycles on glaciers have both local and global impacts [2,4]. Thus, it is important to understand the physiology and genomic features of these microorganisms.
In spite of the fact that the glacial environment is too hostile for the proliferation and survival of advanced organisms, the snow and ice ecosystems are not so extreme for microorganisms [5,6], and viable microorganisms have been found in ice cores drilled from polar and Tibetan Plateau glaciers [7,8]. Interconnected liquid veins along three-grain boundaries in ice were proposed as a habitat in which psychrophilic bacteria can move and obtain energy and carbon from the solution in the liquid veins [8]. Recently, many novel species have been described from glaciers in the Alps [9][10][11], Tibetan Plateau [12][13][14][15][16], Antarctic [17,18], and Arctic [19][20][21], suggesting that the cold origin of endemic species [22]. To survive in cold environments, psychrophilic bacteria possess special adaptation strategies in terms of both physiology and molecular basis [23][24][25]. The physiological features (e.g., growth temperature, salinity, pH; composition of fatty acids, menaquinone; enzyme activities and assimilation of general carbon sources) have been well described. However, the genomic features of these glacier isolated type strains are poorly characterized.
In the present study, the genome of a type strain Dyadobacter tibetensis Y620-1 isolated from a 59 m depth of the ice core drilled from Yuzhufeng Glacier on the Tibetan Plateau was compared to the genomes of 12 Dyadobacter cultured isolates, and one metagenome assembled genome. The genus Dyadobacter was first proposed by Chelius and Triplett [26], within the phylum Bacteroidetes, class Sphingobacteria. Bacterial members of this genus are gram-negative rods that have been isolated from diverse habitats, i.e., plant, soil, freshwater, seawater, glacier, subterranean sediment, and desert sand [27]. Our aim was to investigate the genomic features of the deep ice core isolated strain D. tibetensis Y620-1 and identify the potential strain specific metabolism pathways that facilitate its survival in the glacial environments.

Materials and Methods
Ice core samples were drilled from the Yuzhufeng Glacier on the Tibetan Plateau of China (94 • 14.77 E, 35 • 39.64 N) in 2009. The type strain Y620-1 was isolated from a 59 m depth of the ice and has been proposed as a novel species named as Dyadobacter tibetensis Y620-1 [28].
The genome of strain D. tibetensis Y620-1 was sequenced in 2012 and described by Liu et al. (2014). The reference genomes were downloaded from the NCBI database in March 2018 ( Table 1). The completeness of genomes was estimated using CheckM [29], genomes with a completeness of less than 95% and contamination over 5% were removed. AAI and ANI (Average nucleotide and amino acid identity) were calculated using compareM: https://github.com/dparks1134/CompareM [30] and ANI calculator http://enve-omics.ce.gatech.edu/ani/ [31], respectively. To remove potential differences introduced through different annotation methods, all the genomes analyzed were annotated simultaneously in the present study with RAST (Rapid Annotation using Subsystem Technology) [32] and PROKKA [33].
For phylogenomic clustering, Runella limosa DSM 17973 and Rudanella lutea DSM 19387 were chosen as the out-group. The two strains are close relatives to the Dyadobacter genus [34] and are placed right at the lineage outside Dyadobacter. In general, out-groups that are closely related to the in-group species are better suited for phylogeny reconstruction than the distantly related ones [35]. The Maximum Likelihood phylogenomic tree was constructed using PhyloPhlAn2 [36]. Neighbor-Joining and Bayesian trees were constructed using MEGA 5.05 and Mrbayes 3.2, respectively, with the concatenated protein sequences produced by PhyloPhlAn2 [37][38][39].

General Features of the Dyadobacter Genomes
Dyadobacter strains with high quality non-redundant genomes were isolated from a wide range of habitats, e.g., soil, desert sand, fresh water, plant, and bioreactor ( Table 1). The size of the Dyadobacter genomes ranged from 5.18 to 8.74 Mbp. Out of the 13 Dyadobacter genomes, 12 were cultured with completeness >99.69 %, and one (Dyadobacter sp. UBA7685) was assembled from the metagenome of a water sample with completeness of 97.02%. The genome size of the strain D. tibetensis Y620-1 (5.31 Mb) was the smallest among the 12 cultured Dyadobacter strains. The genomic GC content (guanine-cytosine content) of the 13 Dyadobacter genomes ranged from 41.26% to 52.08%. Most of the strains that were able to grow at ≤5°C have considerably lower GC contents (≤47.00 %) than those with a minimum growth temperature ≥10°C (GC content ≥50.23 %) [16,26,[42][43][44][45][46][47][48]. The genomic GC content of strain D. tibetensis Y620-1 was 43.45%, which was lower than all the Dyadobacter genomes, except for the strain D. koreensis DSM 19938 (41.26 %). The CRISPRs (Clustered regularly interspaced short palindromic repeats) were only identified from strain D. tibetensis Y620-1 and Dyadobacter sp. 50-39 with 6 and 5 copies, respectively. Seven strains were predicted to contain a full rRNA operon. The copy number of 16S rRNA varied widely (from 1 to 4 copies) in the genomes of Dyadobacter. For example, the genome of strain D. tibetensis Y620-1 contained one 16S rRNA gene, while strain D. fermentans DSM 18053 had four copies of 16S rRNA genes. Harboring a lower copy number of rRNA operon suggested strain D. tibetensis Y620-1 being having an oligotrophic lifestyle [49]. The copy number of tRNA ranged from 30 to 43 in the Dyadobacter genomes. The 13 genomes contained 3 to 7 copies of cold shock genes. Strain D. tibetensis Y620-1 contained the largest number of cold shock genes among the 13 genomes with 5 CspA and 2 CspG genes been identified. Other components of csp family (CspB, CspC, CspD, CspE, CspF and CspI) were not contained by any of the 13 genomes.

Distribution of Dyadobacter Strains in Their Phylogenomic Tree
To infer the ancestral state, a robust phylogenomic tree is needed to describe the evolutionary relationship of the taxa. We obtained a robust evolutionary position of the 13 Dyadobacter strains using three different phylogenomic approaches (ML (Maximum Likelihood) and NJ (Neighbor Joining), and Bayesian, Figure 1). Strains isolated from different environments were mixed in the phylogenomic tree. Strain D. koreensis DSM 19938 and Dyadobacter sp. UBA7685 isolated from fresh water were located in the deep lineage with strains isolated from the soil and bioreactor. The plant associated strain Dyadobacter sp. Leaf189 was placed in the middle lineage with strains isolated from the soil and desert sand. Strain D. tibetensis Y620-1 was isolated from the 59 m depth of an ice core, with the smallest genome placed in the basal position of the phylogenomic tree.

Average Nucleotide and Amino Acid Identity
We calculated the pairwise AAI and ANI of the Dyadobacter with the two out-group strains. The inter-genus AAI and ANI were not higher than 69.70% and 60.91%, respectively. The intra-genus ANI and AAI ranged from 70.48% to 99.33% and 67.92% to 99.29%, respectively ( Figure 2). The highest pairwise AAI values observed was 99.33% between strain Dyadobacter sp. UBA7685 assembled from metagenome and Dyadobacter sp. 50-39 isolated from a bioreactor, suggesting that all investigated genomes represented non-redundant genomes based on the proposed threshold of 99.5% AAI suggested by Parks et al. (2017). The rest of the pairwise ANI were all lower than 95%, suggesting that these are different species [50]. Thus, the 13 Dyadobacter genomes could represent 12 distinct species (Dyadobacter sp. UBA7685 and Dyadobacter sp. 50-39 could be the same species). Genome clustering based on AAI and ANI matrix was consistent with their phylogenomic positions, for example, Dyadobacter sp. UBA7685 and Dyadobacter sp. 50-39 with the highest ANI were placed together (Figures 1 and 2).

Distribution Pattern of Function Genes and Gene Families
We annotated the Dyadobacter genomes on the RAST server. The functional genes were classified into four hierarchy levels: category, subcategory, subsystem and role. There were 26 categories, 99 subcategories, 372 subsystems and 1498 roles identified, and no substantial differences were observed among the 13 genomes (Table S1). The distribution of genes in the 26 categories did not differ significantly between strain D. tibetensis Y620-1 and the reference strains (chi-square test, P > 0.05). At the subcategory level, nine genes related to inorganic sulfur assimilation were specific to strain D. koreensis DSM 19938 and were not identified in other genomes. There were 16 strain specific subsystems in the 13 Dyadobacter genomes. Most interestingly, twenty-five genes related to serine-glyoxylate cycle were specific to strain D. tibetensis Y620-1, and seventeen genes related to L-fucose utilization were specific to strain Dyadobacter sp. SG02. There were 134 specific roles distributed in 12 Dyadobacter genomes except Dyadobacter sp. UBA7685. Strain D. koreensis DSM 19938 and D. tibetensis Y620-1 were very divergent with 48 and 33 strain specific roles, and the rest had no more than 20.
We constructed a gene family matrix of the 13 Dyadobacter genomes. Genes in these genomes were clustered into 10,898 families, alternatively pan genomes. The Core genome of the 13 Dyadobacter genomes comprised 1382 gene families (Table S2). The number of gene families (10,898) was much higher than the function type of the genes (1498 types, defined by RAST), suggesting a high sequence diversity of genes with the same function in the Dyadobacter. Ordination of functional genes using two-dimensional non-metric multidimensional scaling (NMDS) revealed a clear separation of strain D. tibetensis Y620-1 and D. psychrophilus DSM 22270 (Figure 3). Strain D. psychrophilus DSM 22270 was isolated from hydrocarbon contaminated soil, and it is a psychrophilic bacterium [44]. The genus Dyadobacter showed a conserved range of the coding density around 1.2 genes per 1 kb sequences (adjusted to 0.12 genes per 100 bp sequence, Figure 4), slightly higher than the average coding density of prokaryotic species (one gene per 1 kb of sequence) [51]. Protein coding sequences (CDS) that cannot be assigned to any known function or gene family may represent new genes [52]. We analyzed the density of new genes (genes of function unknown) in the Dyadobacter genomes. The results showed that the density of new genes vary greatly, ranging from 17% to 34% in Dyadobacter genomes (22% in average) (Figure 4). Strain D. tibetensis Y620-1 has the highest density of new genes of 34%, more than ten percent higher than that of the other isolates (the metagenome assembled genome was not included for its relative low completeness), and was twice that of strain D. soli DSM 25329 isolated from farm soil near Daejeon, South Korea [43]. It is worth noting that 771 genes with a known function present in the genome of other Dyadobacter species are missing in D. tibetensis. These genes are most related to Cofactors/Vitamins/Prosthetic Groups/Pigments (110 genes, 14%), Amino Acids and Derivatives (95 genes, 12%), Carbohydrates (95 genes, 12%), Protein Metabolism (67 genes, 9%) and RNA Metabolism (55 genes, 7%) ( Figure 5).

Specific Functions in One-Carbon Metabolism of D. tibetensis Y620-1
We analyzed the strain specific function of D. tibetensis Y620-1 and 30 genes assigned in one-carbon metabolism were detected. This was substantially higher than the other strains, which typically only contained 5-7. These genes were further divided into two subsystems: one-carbon metabolism by tetrahydropterines and serine-glyoxylate cycle ( Figure 6). In the 12 reference strains, all the one-carbon metabolism related genes belonged to the subsystem tetrahydropterines. Genes related to tetrahydropterines were also present in strain D. tibetensis Y620-1. However, 25 genes in the subsystem serine-glyoxylate cycle that were presented in strain D. tibetensis Y620-1 were absent from the 12 reference strains ( Figure 6).

Discussion
The 13 Dyadobacter genomes showed high genetic diversity in genome size, GC content, rRNA operon copy number and the number of cold shock protein genes. These features may enable them to colonize in diverse habitats such as plants, soils, freshwater, seawater, subterranean sediment sample and desert sand [27]. However, the strain D. tibetensis Y620-1 isolated from a deep ice core of Yuzhufeng glacier is located in the basal position of the Dyadobacter phylogenomic tree and separated from other freshwater isolated strains and the psychrophilic strain D. psychrophilus DSM 22270. Thus, strain D. tibetensis Y620-1 may represent a highly glacier-adapted species. NMDS analysis of gene families reveals a clear separation of D. tibetensis Y620-1 from the mesophilic strains, suggesting glacial environment adaptation. The well-characterized psychrophilic strain D. psychrophilus DSM 22270 is also clearly separated from the mesophilic strains. However, the strain D. tibetensis Y620-1 and D. psychrophilus DSM 22270 are located far away from each other (the strain D. tibetensis Y620-1 in the bottom-left of the plot while the strain D. psychrophilus DSM 22270 in the upper-right of the plot). The separation of the two cold adapted strains in the NMDS plot may reveal different functions of the two strains. Strain Y620-1 was isolated from a glacier ice core, where the primary productivity is much lower than that of soil. Thus, the ability in carbon metabolism may differ between these two psychrophiles.
A limited difference was detected at the category and subcategory levels. However, at the subsystem level, presence and absence of genes related to one-carbon metabolism could clearly differentiate D. tibetensis Y620-1 from the other 12 reference strains. Genes related to serine-glyoxylate cycle present in D. tibetensis Y620-1 are not identified from any of the 12 reference strains. One-carbon compounds can be generated from various renewable sources, such as digestion of organic matter [53]. The serine-glyoxylate cycle is unique since it is the only naturally evolved oxygen-insensitive pathway that can synthesize acetyl-CoA (the two-carbon building block) from multiple groups of one-carbon compounds without carbon loss [54]. In the oligotrophic glacial environment, one of the survival challenges is to obtain metabolic substrates [55]. One-carbon compounds may support microbial communities in the cold and oligotrophic environment [56]. The presence of genes relate to serine-glyoxylate cycle may enable the strain D. tibetensis Y620-1 to utilize simply formed and newly produced carbon sources, e.g., decomposed microbial residues entrapped in the glacier and labile organic carbon freshly derived from photosynthetic bacteria [57][58][59][60]. The carbon and energy sources in the veins of the ice core were estimated to be able to maintain the bacterial population for thousands of years [8]. Oligotrophic lifestyle could also be revealed by the lower copy number of rRNA operon in D. tibetensis Y620-1 [49]. All genes required for serine-glyoxylate cycle [54] are found and are specific to the glacier isolated strain D. tibetensis Y620-1, suggesting the utilization of one-carbon may be one of the strategies for adaptation to the oligotrophic condition in the glacier environments.
Low-temperature habitats are hot spots of microbial diversity and evolution. These environments may harbor microorganisms that process novel metabolic functions [61]. Our results showed that D. tibetensis Y620-1 had the highest density of novel genes compared with other genomes. The basal placement of D. tibetensis Y620-1 in the phylogenomic tree suggests that these new genes and functions could be ancient origin. This is supported by the view that distribution of bacteria may not result from widespread contemporary dispersal but is an ancient evolutionary legacy, as revealed by the evolutional analysis of cold desert cyanobacteria and thermal traits of Streptomyces sister-taxa [62,63].