Genomics of Methylotrophy in Gram-Positive Methylamine-Utilizing Bacteria

Gram-positive methylotrophic bacteria have been known for a long period of time, some serving as model organisms for characterizing the specific details of methylotrophy pathways/enzymes within this group. However, genome-based knowledge of methylotrophy within this group has been so far limited to a single species, Bacillus methanolicus (Firmicutes). The paucity of whole-genome data for Gram-positive methylotrophs limits our global understanding of methylotrophy within this group, including their roles in specific biogeochemical cycles, as well as their biotechnological potential. Here, we describe the isolation of seven novel strains of Gram-positive methylotrophs that include two strains of Bacillus and five representatives of Actinobacteria classified within two genera, Arthrobacter and Mycobacterium. We report whole-genome sequences for these isolates and present comparative analysis of the methylotrophy functional modules within these genomes. The genomic sequences of these seven novel organisms, all capable of growth on methylated amines, present an important reference dataset for understanding the genomic basis of methylotrophy in Gram-positive methylotrophic bacteria. This study is a major contribution to the field of methylotrophy, aimed at closing the gap in the genomic knowledge of methylotrophy within this diverse group of bacteria.


Introduction
Methylotrophy, the ability to utilize substrates containing no carbon-carbon bonds (C1 substrates), is widespread in the domain of Bacteria, with representatives found in Proteobacteria, Firmicutes, Actinobacteria, Verrucomicrobia and the NC10 candidate phylum [1,2]. At the genomic level, methylotrophy has been characterized in detail in Proteobacteria, with the first genome published in 2004 [3] and the count of currently available genomes nearing one hundred [2]. While methylotroph representatives within Verrucomicrobia and the NC10 candidate phylum have only been characterized recently, representative genome sequences are already available [4][5][6].
The goal of this study was to expand the whole-genome knowledge of methylotrophy in Gram-positive bacteria through comparative genomics of divergent taxa within this group. Here, we describe the isolation of seven novel strains of Gram-positive methylotrophs, including two strains of Bacillus (Bacillaceae) and five representatives of Actinobacteria, belonging to the genera of Arthrobacter (Micrococcaceae) and Mycobacterium (Microbacteriaceae). We report whole-genome sequences for these isolates and present a comparative analysis of methylotrophy functional modules within these genomes.

Permissions
No specific permissions were required for the sampling location (47° 38.075′ N, 122° 15.993′ W) or sampling activities. The field studies did not involve endangered or protected species.

Strain Isolation and Cultivation
Arthrobacter sp. MA-N2 was isolated from a 2004 Lake Washington sample [19], and the rest of the strains were isolated from a 2011 Lake Washington sample ( [20]; Table 1). All strains were isolated from methylamine enrichments, which were set up as previously described [20,21]. Axenic cultures were routinely maintained on solid media supplemented by methylamine. For long-term storage, culture stocks were frozen at −80° with 10% dimethyl sulfoxide, as a cryoprotective agent.

DNA Isolation, Whole Genome Sequencing, Assembly and Genome Annotation
Biomass for genomic DNA isolation was collected from plates. DNA was isolated as previously described [22]. The draft genomes were generated at the Department of Energy Joint genome Institute (JGI) using Illumina, Pacific Biosciences (PacBio) or a combination of the two technologies (see Table 1). All general aspects of library construction and sequencing performed at the JGI can be found at http://www.jgi.doe.gov. The raw reads were assembled using HGAP (version: 2.0.0) [23] for PacBio datasets, Allpaths, Version r41554, for hybrid Illumina/PacBio datasets, and a combination of Allpaths, Version r41554 [24], and Velvet, Version 1.1.05 [25], for the Illumina datasets. Genes were identified using Prodigal [26], followed by a round of manual curation using GenePRIMP [27] for the Draft genomes in fewer than 10 scaffolds. The predicted coding sequences were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant database, Universal Protein Resource (UniProt), TIGRFam, Pfam, Kyoto Encyclopedia of Genes and Genomes (KEGG), COG, and InterPro databases. The tRNAScanSE tool [28] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [29]. Other non-coding RNAs, such as the RNA components of the protein secretion complex and the RNase P, were identified by searching the genome for the corresponding Rfam profiles using INFERence of RNA Alignment (INFERNAL) [30]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG) platform developed by the Joint Genome Institute, Walnut Creek, CA, USA [31].

Phylogenetic Analysis
Average amino acid identity (AAI) values were computed via reciprocal BLAST best hits between pairs of proteomes in accordance with previously described methods [32], except that the predicted protein products were used directly as the subject for alignments rather than translated genomic sequences [20,21].

Reconstruction of Methylotrophy Pathways
Automated gene annotations created using the IMG pipeline were curated manually for genes involved in key metabolic pathways. Reconstruction of methylotrophy pathways was modeled after prior analysis of the genomes of B. methanolicus [13][14][15]. Proteins with experimentally proven functions in methylated amine metabolism were also employed [10,18]. In the case of multiple functional counterparts, these were categorized into "types" based on reciprocal BLAST comparisons. In the cases of very divergent counterparts, some of the "type" categories are very tentative, as low homology may result either from a long evolutionary history or from a lateral transfer from a distant source.

Gram-Positive Methylotrophs Isolated from Lake Washington Are All Facultative Methylated Amine Utilizers
As part of a large-scale methylotroph isolation project from Lake Washington sediment samples, a number of Gram-positive strains were isolated. Three of the isolates described here were identified as Arthrobacter, two as Bacillus and two as Mycobacterium species (Table 1). These were selected for genomic sequencing out of multiple isolates, and each isolate in the same phylogenetic affiliation originated from a different enrichment microcosm. The Arthrobacter and the Bacillus strains were only isolated from the enrichments set up at 10 °C or at room temperature, while the Mycobacterium strains were only isolated from enrichments set up at 30 °C. The Arthrobacter species were represented by two distinct phylotypes, forming either bright yellow (strain 31Y) or whitish colonies (strains 35W and MA-N2). The two Bacillus isolates and the two Mycobacterium isolates, respectively, were similar phenotypically. All isolates were facultative methylotrophs, able to grow on rich media, such as nutrient broth agar (Difco), and on multiple multicarbon substrates, such as succinate or glucose, as previously reported for these types [1]. However, none was able to grow on methanol.

The Newly-Sequenced Genomes Represent a Diversity of Gram-Positive Methylotrophs
Based on the 16S rRNA gene sequence, the closest named relative of all three Arthrobacter isolates was Arthrobacter aurescens (97.4% to 99.8% 16S rRNA gene sequence identity). However, they were only distantly related to each other (96.7% to 97.5% 16S rRNA gene sequence identity). Both white isolates were somewhat more closely related to the yellow isolate than to each other.
The two Mycobacterium isolates were closely related to each other (99.4%) and closely related to the differently named species of Mycobacterium (M. rhodesiae, M. gilvum, M. smegmatis; 97.6% to 97.8%).
The two Bacillus isolates were closely related to each other (100% 16S rRNA gene sequence identity), but not to the described Bacillus species, including the well-studied B. methanolicus species strains, MGA3 and PB1, with which they only showed 93.0% and 93.1% 16S rRNA gene identity, respectively.
We further assessed the genomic diversity of the isolates in terms of genome-genome similarity, via calculating average amino acid identity indices (AAI) [16,17]. As expected, the two Mycobacterium genomes and the two Bacillus genomes were found to be very similar in terms of protein-protein identity (95% and 98% AAI, respectively; Figure 1). However, the proteins translated from the Bacillus genomes shared only 61.2% AAI with the proteome of B. methanolicus. The Bacillus (low GC Gram-positive organisms) proteomes shared only 39% AAI with the Arthrobacter or the Mycobacterium (high GC Gram-positive organisms) proteomes. Within the high GC Gram-positive group, significant divergence was uncovered, with the three Arthrobacter species only sharing 68% to 78% AAI and the Mycobacterium genomes sharing only 50% AAI with the Arthrobacter genomes ( Figure 1).

Figure 1.
Phylogenetic relationships among the strains described. Each dot represents a comparison between two genomes and shows their 16S rRNA gene identity (y-axis) plotted against the average amino acid identity (AAI) of the genes shared between the two genomes (x-axis).

Methylotrophy Pathways Deduced from the Novel Genomes Agree with Prior Knowledge, but Suggest Novel Primary Oxidation Modules
Methylotrophy pathways so far have been only thoroughly analyzed at the systems level in one representative of Gram-positive methylotrophs, B. methanolicus. The methylotrophic metabolic scheme in this bacterium [13][14][15], as well as the individual methylotrophy genes previously identified via functional studies in a variety of Gram-positive methylotrophs [9,10,12,18] served as guides in reconstructing methylotrophy metabolic modules in the new genomes.

Primary C1 Oxidation Modules
None of the organisms included in this study were able to grow on methanol. Thus, not surprisingly, no genes for methanol oxidation were identified in any of the genomes, by using protein queries for the NAD-dependent methanol dehydrogenase from B. methanolicus [13,15] or the N,N-dimethyl-4-nitrosoamiline oxidoreductase from Mycobacterium sp. [12].
Queries for methylated amine oxidation modules produced the following results. Genes for trimethylamine monooxygenase (Tmo) [18] were identifiable in the genomes of Arthrobacter (two copies in strain MA-N2) and Mycobacterium strains, but not in the Bacillus genomes ( Table 2). The Arthrobacter counterparts shared approximately 90% amino acid identities (AI), while Arthrobacter/Mycobacterium AI values were approximately 87%, which is significantly above the AAI values ( Figure 1). The identities with proteobacterial counterparts, including the characterized proteins from alphaproteobacteria [18], were approximately 60%. Likely, the same proteins are responsible for the dimethylamine monooxygenase activity, as no additional candidates for this activity were identifiable.
Remarkably, only small fragments of the mao gene are identifiable in the genomes of the Mycobacterium strains, as parts of conserved genome islands containing other methylotrophy genes ( Figure 2). As the sizes of the truncated genes are very similar (129 and 123 nucleotides, respectively), it seems likely that these genes are remnants of a deletion event. The scenario in which two independent assemblies resulted in these truncated genes is less likely. If no functional Mao is encoded in the Mycobacterium genomes, how then is methylamine oxidized by these species? We hypothesize that it is oxidized via the N-methylglutamate pathway, which is widespread among diverse microbial taxa [34][35][36]. In the vicinity of the truncated mao genes, in each genome, we identified a cluster of genes similar to the genes encoding gamma-glutamylmethylamide synthetase (Gma) and the three subunits of N-methylglutamate synthase, well-characterized in Proteobacteria (MgsA-C) [34,35]. No such gene clusters are present in other genomes included in this study. Gene clusters were also identified in the Mycobacterium genomes potentially responsible for the N-methylglutamate dehydrogenase function (MgdA-D; Table 2), also essential for the N-methylglutamate pathway. mgdA-D gene homologs at approximately 50% AI were identified in Arthrobacter strains 31Y and MA-N2, but not in the remaining genomes ( Table 2).
The product of trimethylamine oxidation by the monooxygenase is trimethylamine N-oxide, which is cleaved into dimethylamine and formaldehyde by trimethylamine N-oxide demethylase (Tmd; [1,18]. While this enzyme has been partially purified from a number of sources, including methylotrophic bacteria [19,37], the sequence of the protein responsible for this activity remained unknown. Recently, it has been proposed that an aminomethyltransferase carries out this function in Alphaproteobacteria, in whose genomes the respective gene is co-located with the gene for Tmo [38]. In the genomes characterized here, we were unable to identify such a gene in the vicinity of the Tmo genes. Instead, we observed the presence of a conspicuous gene encoding a protein predicted to be a ferredoxin and a flavodoxin, possessing a NAD(P)-binding motif. These predicted properties match the properties of the characterized Tmd enzymes [1,19,37], suggesting this gene as a candidate for encoding the Tmd. Clearly, this proposed function requires experimental verification. Other conspicuous genes were present in the vicinity of putative methylated amine oxidation genes, annotated as ethanolamine utilization genes EutQ, whose function so far remains unknown. These encode small proteins of the cupin family. In each case, almost identical copies of EutQ are found near Tmo and Mao genes (Figure 2), and these are highly related (>80% AI) among the Arthrobacter and the Mycobacterium species. EutQ-like genes are also found in the C1 gene cluster in Bacillus, but they are not related to the genes in Arthrobacter or Mycobacterium. We hypothesize that the eutQ genes encode proteins important for methylated amine oxidation, thus we included these proteins in our gene inventory analysis (Table 2, Figure 2).

Potential for Oxidation of Formaldehyde to CO2 via Linear Pathways
Gram-positive methylotrophs have been suggested to utilize the cyclic pathway for formaldehyde oxidation to CO2, based on high activities of the respective enzymes [1,18]. However, experimental evidence was obtained for B. methanolicus that a linear pathway is also active [15]. We were able to identify genes for the relevant H4F-linked C1 transfers in all of the genomes. With the exception of strain 31Y, a single FolD (bifunctional methylene-H4F dehydrogenase/methenyl-H4F cyclohydrolase) was encoded, and their reciprocal AI fit well within the AAI ranges ( Figure 1). The genome of strain 31Y encoded a second FolD, with only 32% AI with its homolog. The latter gene was part of a unique gene cluster encoding C1 transfer reactions, including an additional, non-homologous methenyl-H4F cyclohydrolase (Mch), PurU (formyl-H4F deformylase), a formaldehyde dehydrogenase (NAD-independent), an aminomethyltransferase and a molybdenum oxidoreductase of unknown function (genes 2855-2863). Only one other genome encoded a recognizable formaldehyde dehydrogenase (Fadh), the genome of Mycobacterium sp. 141, with 78% AI to its counterpart in Arthrobacter sp. 31Y. Only the genomes of Arthrobacter strains 31Y and MA-N2 encoded formyl-H4F ligases similar to the one in the genome of B. methanolicus (Fhs) [13,15]. However, all genomes encoded PurU enzymes: a single copy in the Bacillus genomes, two copies (similar, but not identical) in the Mycobacterium genomes and one to three relatively divergent copies in the Arthrobacter genomes. The prevalence of PurU (an irreversible enzyme) over Fhs (FtfL, a reversible enzyme) [39] indeed supports a proposal for the dissimilatory function of the H4F-linked pathway [15].
Formate dehydrogenases (Fdh) are encoded by all genomes, and they represent a variety of phylogenetically distant types. The genomes of the Mycobacterium species encode a three-subunit type ( Table 2). Only one Arthrobacter strain, MA-N2, encodes a homolog of this enzyme (AI in the range of AAI). The genomes of the Bacillus species encode a single-subunit enzyme (Fdh2) highly similar to the one annotated in B. methanolicus [13,15], but without homologs in either Arthrobacter or Mycobacterium. The Arthrobacter species encode three other types unrelated to the first two, as follows. Strains 31Y and MA-N2 encode Fdh3 (AI 90%); strains 31Y and 35W encode an Fdh4 (AI 73%); and MA-N2 encodes Fdh5 that is equally distantly related to Fdh3 and Fdh4 (AI 52%-58%) and unrelated to Fdh1 and Fdh2.

Oxidation of Formaldehyde to CO2 by the Cyclic Pathway
Gram-positive methylotrophs are known to use the dissimilatory RuMP cycle for formaldehyde oxidation to CO2 [1,37]. This cycle shares the early reactions, catalyzed by hexulose phosphate synthase (Hps) and hexulose phosphate isomerase (Hpi), that produce fructose 6-phosphate, the genes for which will be described below. Genes for glucose phosphate isomerase (Pgi) were identified in all genomes, with two nearly-identical copies present in the Mycobacterium strains. Interestingly, the proteins encoded by the Arthrobacter strains were unrelated to the mycobacterial counterparts, with AI (<30%) being significantly below AAI. The Bacillus enzymes were unrelated to the counterparts from both Mycobacterium and Arthrobacter (AI < 29%).
Glucose 6-phosphate dehydrogenases (Zwf) are also represented by a variety of phylogenetic types, with multiple copies present in each genome ( Table 1). The Actinobacteria share one type, with AI 70% between Arthrobacter and Mycobacterium species (above AAI). The remaining types are specific to individual groups with low AI among the groups (<40%). The genomes encoded one to two 6-phosphogluconolactonases. The Arthrobacter counterparts were only distantly related to the mycobacterial counterparts (AI 46%-47%), and none shared any identity with the Bacillus counterparts.
The phylogenetic landscape of the enzyme concluding the oxidative cycle, 6-phosphogluconate dehydrogenase (Gnd), was also complex, with three recognizable types being encoded. All three groups appeared to share one type, with AI values slightly above AAI. The second type was only shared among the Actinobacteria, again with AI being above AAI. The third type was exclusive to Bacillus.

The Assimilatory RuMP Cycle
Multiple copies of the first enzyme of the RuMP cycle, Hps, were identified in all genomes, in each case identical or nearly identical to each other within each genome. The actinobacterial genomes contained two copies each, and the counterparts from Mycobacterium and Arthrobacter revealed AI (87% to 91%) that was significantly higher than AAI, suggesting that these enzymes were likely shared via lateral transfers among Actinobacteria. The Bacillus genomes each contained five copies of Hps genes, and these revealed low AI (41%-43%) with the actinobacterial counterparts, in agreement with AAI. One to three genes were identified in each genome encoding Hpi, and the trend for relative evolutionary distances was repeated: the actinobacterial counterparts were more related to each other than expected from AAI (74%-77%), while the Bacillus counterparts were related to the actinobacterial counterparts at the AAI value. Multiple phosphofructokinases are also encoded. While in the Arthrobacter species, most copies are highly related (the exception is gene 4492 in Arthrobacter 31Y), the Mycobacterium species encode two dissimilar enzymes. One shares unusually high AI (90%) with Arthrobacter, suggesting a lateral transfer event, while the second is shared among the species at AI equaling AAI, suggesting vertical inheritance. Two distinct types are present in Bacillus (60% AI). Multiple copies of fructose bisphosphate aldolases (Fba) were present, of which one was shared by Arthrobacter and Mycobacterium species at AI > 80%, the other being more divergent between these species. The Bacillus counterparts were unrelated to the actinobacterial counterparts. Similar trends were observed for many of the enzymes that participate in regeneration of the acceptor molecule as part of the RuMP cycle [1]. Of the multiple copies of transketolase (Tkt) transaldolase (Tal), ribulose phosphate epimerase (Rpe) and ribulose phosphate isomerase (Rpi), fructose 1,6-sedoheptulose 1,7-bisphosphatase (GlpX), triosephosphate isomerase (Tpi), glyceraldehyde phosphate dehydrogenase (Gap) and phosphoglycerate kinase (Pgk), some copies revealed higher than expected AI between the Arthrobacter and Mycobacterium counterparts, while others were not phylogenetically related, suggesting complex histories.
To test for the potential for the alternative variant of the RuMP cycle, we queried the genomes for the presence of genes potentially encoding phosphogluconate dehydratase and 2-keto-3-deoxy-6-phosphogluconate aldolase (Eda) [1]. Gene homologs for the latter were only detected in Arthrobacter strains 31Y and MA-N2, but gene homologs for the former were not detectable in any of the genomes.

Analysis of Distinct C1 Gene Clusters Suggests a Means for the Evolution of Methylotrophy in Gram-Positive Methylotrophs
In all of the genomes analyzed here, many of the genes implicated in methylotrophy were found as parts of genomic C1 metabolism islands (Figure 2; note that as these are draft genomes, the islands as shown may be incomplete, and some of the singleton genes may actually belong to islands). These islands in most cases contain genes for both primary C1 oxidation and the downstream functions that include oxidation and assimilation of formaldehyde. Clustering of C1 genes is also typical of other methylotrophs, including Proteobacteria, Verrucomicrobia and NC10 phylum species [39]. In cases where the regulation of gene or protein expression was studied, coordinated regulation was noted for genes/proteins involved in specific branches of C1 metabolic pathways [15,40], and co-transcription is a common mechanism [41]. Thus, one strategy for evolving methylotrophy seems to be clustering of relevant genes on the chromosomes. The other strategy must be multiplication of genes for key and likely rate-limiting functions, such as primary C1 oxidation and the early steps of C1 assimilation. Multiple and nearly identical copies were identified for these functions in most of the genomes analyzed here. Lateral transfers of gene clusters among diverse Gram-positive methylotrophs present another prominent mechanism. The proteins encoded by the specific C1 gene clusters in the Arthrobacter and the Mycobacterium species (Figure 2) display significantly higher relatedness than predicted from whole genome comparisons (Figure 1), strongly suggesting that these species either exchanged these clusters or acquired them from the same donor. However, an argument for the vertical evolution of methylotrophy in Gram-positive bacteria is also valid, as proteins in C1 pathways are only distantly related between the Bacillaceae and Actinobacteria, while the biochemical schemes are essentially identical.
How many of the Arthrobacter, Bacillus and Mycobacterium species are methylotrophic? Can we predict this capability from looking at the constantly growing genomic databases? Indeed, BLAST analyses identify homologs for most of the genes encoding functions necessary for methylotrophy (Table 2) in multiple sequenced genomes. However, no predictions could be made based on single genes. For example, even the "signature" functions, such as Hps and Hpi, are not necessarily indicative of methylotrophy, as they are also present in the genomes of non-methylotrophs [38,42]. The situation is further complicated by the fact that most of the reactions involved in assimilation or dissimilation of formaldehyde, including C1 transfer reactions or sugar phosphate inter-conversions, are common to all life. Thus, complete sets need to be identified in the genomes, with special attention to gene clustering. The genomes whose sequencing we are reporting here present some of the first blueprints for such analyses, in addition to the genomes of B. methanolicus [13][14][15].

Conclusions
The genomic sequences of seven novel strains representing the genera of Arthrobacter, Bacillus and Mycobacterium, all methylated amine utilizers, present an important reference dataset for understanding the genomic basis of methylotrophy in Gram-positive methylotrophic bacteria. This study is a major contribution to the field of methylotrophy, aimed at closing the gap in the genomic knowledge of C1 metabolism within this diverse group of bacteria.