Reconstructing Genomes of Carbon Monoxide Oxidisers in Volcanic Deposits Including Members of the Class Ktedonobacteria

Microorganisms can potentially colonise volcanic rocks using the chemical energy in reduced gases such as methane, hydrogen (H2) and carbon monoxide (CO). In this study, we analysed soil metagenomes from Chilean volcanic soils, representing three different successional stages with ages of 380, 269 and 63 years, respectively. A total of 19 metagenome-assembled genomes (MAGs) were retrieved from all stages with a higher number observed in the youngest soil (1640: 2 MAGs, 1751: 1 MAG, 1957: 16 MAGs). Genomic similarity indices showed that several MAGs had amino-acid identity (AAI) values >50% to the phyla Actinobacteria, Acidobacteria, Gemmatimonadetes, Proteobacteria and Chloroflexi. Three MAGs from the youngest site (1957) belonged to the class Ktedonobacteria (Chloroflexi). Complete cellular functions of all the MAGs were characterised, including carbon fixation, terpenoid backbone biosynthesis, formate oxidation and CO oxidation. All 19 environmental genomes contained at least one gene encoding a putative carbon monoxide dehydrogenase (CODH). Three MAGs had form I coxL operon (encoding the large subunit CO-dehydrogenase). One of these MAGs (MAG-1957-2.1, Ktedonobacterales) was highly abundant in the youngest soil. MAG-1957-2.1 also contained genes encoding a [NiFe]-hydrogenase and hyp genes encoding accessory enzymes and proteins. Little is known about the Ktedonobacterales through cultivated isolates, but some species can utilise H2 and CO for growth. Our results strongly suggest that the remote volcanic sites in Chile represent a natural habitat for Ktedonobacteria and they may use reduced gases for growth.


Introduction
Volcanic eruptions provide a model for understanding soil-forming processes and the roles of pioneer bacteria during early biotic colonisation. Recently, it has been demonstrated that the structure of microbial communities can play a key role in the direction of plant community succession pathways [1]. This is due in part to bacterial contributions to weathering of volcanic rocks, which releases nutrients, resulting in some of the most fertile soils in the world.
(Chloroflexi). Three Ktedonobacteria MAGs were obtained and all contained genes encoding CO and H 2 oxidation. Additional MAGs from other phyla were also found to contain these genes. Our study advances the understanding of the ecology of Ktedonobacteria and their potential to act as early colonisers in volcanic soils.

Sequencing
The DNA from the volcanic soils used in our study had been previously extracted [20]. The soil physico-chemical characteristics have been published [20] showing a pH of 5.6 in both the medium and oldest soil and 4.7 in the youngest soil, and nitrogen (mg/kg) of 25 (1640), 26 (1751) and 36 (1957). Briefly, the soil samples originated from three different sites of different ages according to the latest lava eruption (1640,1751,1957, map in Reference [20]). A total of nine samples (triplicate per site) were sequenced on an Illumina MiSeq at the Max-Planck-Genome Centre, Köln, Germany. The metagenome was analysed on a high-performance computer using 650 GB RAM and 64 cores at the Thünen Institute of Biodiversity, Braunschweig, Germany.

Quality Control
The sequence reads were checked using FastQC version 0.11.8 [21]. Low-quality reads were discarded using BBDuk version 38.68, quality-trimming to Q15 using the Phred algorithm [22]. A schematic overview of the steps and programs used are shown in Figure 1.
period, the soil formation evolved as indicated by their different levels of soil organic matter ranging from 65.33% in the most recent soil (1957) to 9.33% in both medium (1751) and oldest soils (1640) [20].
Especially, the most recent soil (youngest soil) was suspected to reveal microbial adaptations to the challenging environmental conditions and thus to unveil the metabolic processes which initiate microbial colonisations. Therefore, functional metabolic modules annotated in the environmental genomes were analysed, with a main focus on the poorly characterised class of the Ktedonobacteria (Chloroflexi). Three Ktedonobacteria MAGs were obtained and all contained genes encoding CO and H2 oxidation. Additional MAGs from other phyla were also found to contain these genes. Our study advances the understanding of the ecology of Ktedonobacteria and their potential to act as early colonisers in volcanic soils.

Sequencing
The DNA from the volcanic soils used in our study had been previously extracted [20]. The soil physico-chemical characteristics have been published [20] showing a pH of 5.6 in both the medium and oldest soil and 4.7 in the youngest soil, and nitrogen (mg/kg) of 25 (1640), 26 (1751) and 36 (1957). Briefly, the soil samples originated from three different sites of different ages according to the latest lava eruption (1640,1751,1957, map in Reference [20]). A total of nine samples (triplicate per site) were sequenced on an Illumina MiSeq at the Max-Planck-Genome Centre, Köln, Germany. The metagenome was analysed on a high-performance computer using 650 GB RAM and 64 cores at the Thünen Institute of Biodiversity, Braunschweig, Germany.

Quality Control
The sequence reads were checked using FastQC version 0.11.8 [21]. Low-quality reads were discarded using BBDuk version 38.68, quality-trimming to Q15 using the Phred algorithm [22]. A schematic overview of the steps and programs used are shown in Figure 1.

Metagenome Assembly and Binning
All trimmed Illumina reads were merged into longer contiguous sequences (scaffolds) using de novo assemblers Megahit version 1.2.8 [23] with k-mers 21, 29, 39, 59, 79, 99, 119 and 141, and MetaSPAdes (SPAdes for co-assembly) version 3.13.1 [24,25] with k-mers 21, 31, 41, 51, 61, 71 and 81. Triplicate samples were co-assembled in order to improve the assembly of low-abundance organisms. Assembly quality was checked with MetaQuast version 5.0.2 [26], showing that the best quality was obtained with SPAdes for our samples (data not shown). Downstream analysis was carried out using the scaffolds retrieved from SPAdes. Krona charts [27] were recovered from MetaQuast runs to identify taxonomic profiles. Downstream binning analysis was performed with two sets of scaffolds: full-size scaffolds and scaffolds larger than 1000 bp.
Metagenomic binning of the assembled scaffolds was carried out with the metaWRAP version 1.2.1 pipeline [28], in which binning module employs three binning software programs: MaxBin2 [29], metaBAT2 [30], and CONCOCT [31]. Completion and contamination metrics of the extracted bins were estimated using CheckM [32]. The resulting bins were collectively processed to produce consolidated metagenome-assembled genomes (MAGs) using the bin_refinement module (criterion: completeness > 70%; contamination < 5%). Both sets of MAGs (18 from scaffolds larger than 1000 bp and 17 from full-size scaffolds) were aggregated, visualised with VizBin [33] and then dereplicated using dRep [34]. Only the highest scoring MAG from each secondary cluster was retained in the dereplicated set. The abundance of each MAG in the different sites was calculated using BLASTN version 2.5.0+ [35], keeping only hits with >95% identity and e-value 1e-5 for the analysis [36]. A final heatmap was constructed using the function heatmap.2 from the gplots package version 3.0.4 [37] in R version 4.0.2 (https://www.r-project.org).

Functional Annotation
The open reading frames (ORFs) in all scaffolds of each MAG were predicted using Prodigal (v2.6.3) [38]. Functions were annotated using Cognizer [39] and the Kyoto Encyclopedia of Genes and Genomes (KEGG) annotation framework [40]. The annotations of the predicted proteins from KEGG were used to confirm protein functional assignment and identify pathways. Complete pathways were identified using KEGG BRITE pathway mapping [40]. Aerobic carbon-monoxide dehydrogenases and hydrogen dehydrogenase were also identified using KEGG ortholog annotations. CODH was further distinguished as form I and form II (putative CODH) based on active site motifs present in coxL genes (e.g., Reference [41]).

Phylogenomic Analysis
Taxonomic classification of MAGs was performed using the classify_bins module from metaWRAP which relies on the NCBI_nt database. MAGs were also screened using the RAST Server (Rapid Annotations using Subsystems Technology [42,43]), which also allowed to retrieve information regarding close relative genomes in order to construct the phylogenetic tree.
To estimate intergenomic similarity, amino-acid comparisons between MAGs and their closest relative genomes present in the databases were calculated based on reciprocal best hits (two-way AAI) using the enveomics collection (http://enve-omics.gatech.edu/ [44]).
The phylogenetic affiliation of MAGs was determined by constructing a genomic tree using FastTree version 2.1.11 [45]. Reference genomes were manually downloaded from the National Center for Biotechnology Information (NCBI) Refseq database (Supplementary Table S1). Conserved genes from the extracted bins and the reference genomes were concatenated using Phylosift version 1.0.1 [46].
Phylogenetic analysis of the large sub-unit CO dehydrogenase gene (coxL) using the Maximum Likelihood method with a JTT matrix-based model [47] was performed. Bootstrap values (100 replicates) are shown where support ≥ 70 percent. The scale bar indicates substitutions per site. All gapped positions were deleted, resulting in 420 positions in the final dataset. Evolutionary analyses were conducted in MEGA X [48,49].

Accession Number
Raw metagenomic data and environmental genomes derived from binning processes were deposited in the Sequence Read Archive (SRA) under the bioproject accession number PRJNA602600 for raw data and PRJNA602601 for metagenome-assembled genomes.

MAGs Recovery
A total of~3-4 million scaffolds were recovered from the soil metagenomes in each site. Even though all the sites underwent similar sequencing efforts (between 2.3 GB in 1640 and 1957 to 2.4 GB in 1751), the youngest soil had the largest number of scaffolds with 499 sequences > 50 kb (N50 length of 700) compared to the oldest soil with only 12 scaffolds with a size > 50 kb (N50 length of 914) ( Table 1). A total of 19 MAGs with a completeness of >70% and a contamination < 5% (2 from 1640, 1 from 1751 and 16 from 1957) were retrieved and characterised. N50-length such that scaffolds of this length or longer include half the bases of the assembly; L50-number of scaffolds that are longer than, or equal to, the N50 length and therefore include half the bases of the assembly (https://www.ncbi.nlm.nih.gov/assembly/help/#globalstats).

MAG Identification
MAGs were affiliated to the phyla Actinobacteria, Proteobacteria, Acidobacteria, Gemmatimonadetes, Chloroflexi, Firmicutes and Verrucomicrobia ( Figure 2). In the oldest soil, two environmental genomes were retrieved related to Actinomycetales (Actinobacteria) and Rhodospirillales (Proteobacteria). The only MAG retrieved from the middle soil was related to Acidobacteria. MAGs binned from the youngest soil included six assigned to Acidobacteria, one to Proteobacteria, one to Firmicutes, three to Actinobacteria, one to the phylum Gemmatimonadetes, one to Verrucomicrobia and three to the phylum Chloroflexi ( Figure 2).

Metabolic Characterisation of MAGs
Genes encoding enzymes involved in carbohydrate and energy metabolism, such as carbon fixation, sulphur metabolism, ATP synthesis and nitrogen metabolism, as well as terpenoid backbone biosynthesis, were found in all the MAGs (Figure 4). Other functions, including xenobiotic biodegradation, fatty acid metabolism, nucleotide metabolism and vitamin metabolism, among others, were also found (Supplementary Table S2).   Figure 5A). In addition to these three form-I coxL-encoding MAGs, 15 other scaffolds from MAGs containing form II coxL-like genes were recovered (data not shown), but the function of form II CoxL is not yet known. The arrangement of genes encoding form I CODH in each of the MAGs is shown in Figure 5B. It should be noted that all of these three MAGs show the canonical arrangement for the three structural genes of CODH, that is

Metabolic Characterisation of MAGs
Genes encoding enzymes involved in carbohydrate and energy metabolism, such as carbon fixation, sulphur metabolism, ATP synthesis and nitrogen metabolism, as well as terpenoid backbone biosynthesis, were found in all the MAGs (Figure 4). Other functions, including xenobiotic biodegradation, fatty acid metabolism, nucleotide metabolism and vitamin metabolism, among others, were also found (Supplementary Table S2).

Characterisation of CODH and Hydrogenase Genes in MAGs
Three MAGs (MAG-1640-1.1, MAG-1751-1.1 and MAG-1957-2.1) encoded form I of the CO-dehydrogenase large subunit (coxL). These MAGs were each associated with a particular soil, with low abundance in the metagenomes of the other sites ( Figure 5A). In addition to these three form-I coxL-encoding MAGs, 15 other scaffolds from MAGs containing form II coxL-like genes were recovered (data not shown), but the function of form II CoxL is not yet known. The arrangement of genes encoding form I CODH in each of the MAGs is shown in Figure 5B. It should be noted that all of these three MAGs show the canonical arrangement for the three structural genes of CODH, that is the MSL (coxM-coxS-coxL) genes. The genes encoding the [NiFe]-hydrogenase and its accessory proteins were only identified in MAG-1957-2.1, and instead, only some of the accessory hyp genes were found in the other two MAGs ( Figure 5B). A phylogenetic analysis of the form I coxL genes was performed, showing they are affiliated with Actinobacteria (MAG-1640-1.1), Nitrospirae Candidatus Manganitrophus noduliformans (MAG-1751-1.1) and Chloroflexi (MAG-1957-2.1) (Figure 6). This grouping is consistent with the results of PhyloSift (Figure 2), except for MAG-1751-1.1, where it was loosely associated with Acidobacteria (although with an amino-acid identity of only 40%) rather than Nitrospirae.

Complete Metabolic Characterisation of Ktedonobacterales MAGs
Here, we focused on the Ktedonobacterales MAGs because of their apparent importance in early soil formation. Three Ktedonobacterales MAGs were identified in the 1957 soil metagenomes but were not found in the older soils (Figures 2 and 3). Two of the MAGs (MAGs 1957-2.1 and 1957-3.1), affiliated to the class Ktedonobacteria (phylum Chloroflexi), contained genes for the complete electron transport chain, citric acid metabolism, nitrogen metabolism, sulphur metabolism, several transporters, the complete gene set for carbon monoxide oxidation (CO dehydrogenase), herbicide degradation and degradation aromatics, as well as the major subunit of the formate dehydrogenase, and also a hydrogenase. MAG 1957-6.1 (Ktedonobacteria) had very similar pathways as the other Chloroflexi MAGs, except a step for CO-oxidation and the electron transport chain were absent (Figure 4, Supplementary Table S2, Figure 7). loosely associated with Acidobacteria (although with an amino-acid identity of only 40%) rather than Nitrospirae.  soil formation. Three Ktedonobacterales MAGs were identified in the 1957 soil metagenomes but were not found in the older soils (Figures 2 and 3). Two of the MAGs (MAGs 1957-2.1 and 1957-3.1), affiliated to the class Ktedonobacteria (phylum Chloroflexi), contained genes for the complete electron transport chain, citric acid metabolism, nitrogen metabolism, sulphur metabolism, several transporters, the complete gene set for carbon monoxide oxidation (CO dehydrogenase), herbicide degradation and degradation aromatics, as well as the major subunit of the formate dehydrogenase, and also a hydrogenase. MAG 1957-6.1 (Ktedonobacteria) had very similar pathways as the other Chloroflexi MAGs, except a step for CO-oxidation and the electron transport chain were absent (Figure 4, Supplementary Table S2, Figure 7).

Characterisation of MAGs
In this study, a characterisation of metagenome-assembled genomes retrieved from Llaima volcano was performed. This study builds from a previous study [19] in which 16S rRNA gene amplicon-based sequences from those soils were analysed. The main objective of this study was to characterise genomes from those sites and to analyse the functions of the abundant but the poorly characterised Ktedonobacteria (phylum Chloroflexi) present at Llaima volcano. The relative abundance of the main phyla based on classification of scaffolds larger than 500 bp showed that microbial communities change as the soils age (Supplementary Figure S1). This corroborates findings from a previous study [19]. For example, the relative abundance of Chloroflexi is higher in the

Characterisation of MAGs
In this study, a characterisation of metagenome-assembled genomes retrieved from Llaima volcano was performed. This study builds from a previous study [19] in which 16S rRNA gene amplicon-based sequences from those soils were analysed. The main objective of this study was to characterise genomes from those sites and to analyse the functions of the abundant but the poorly characterised Ktedonobacteria (phylum Chloroflexi) present at Llaima volcano. The relative abundance of the main phyla based on classification of scaffolds larger than 500 bp showed that microbial communities change as the soils age (Supplementary Figure S1). This corroborates findings from a previous study [19]. For example, the relative abundance of Chloroflexi is higher in the younger soils (28% in the youngest soil to 7% in the oldest soil) and the opposite trend is observed for members of the phylum Proteobacteria, as their abundance increases as the soil ages (from 42% in the youngest soil to 59% in the oldest soil) (Supplementary Figure S1). Except for those related to Firmicutes and Verrucomicrobia, and to a lesser extent Acidobacteria and Proteobacteria, the extracted environmental genomes had an amino acid identity >50% with their closest reference genome (Supplementary Table S3), which suggest that they belonged to those genera [53].
A total of 16 MAGs were recovered from the youngest soil (1957) (Figure 3). This soil is only partially vegetated (about 5%) by mosses and lichens. The microbial community in this area likely harbours populations able to grow as facultative chemolithoautotrophs or mixotrophs on carbon monoxide, hydrogen or methane. This high relative abundance of MAGs with genes for CO and hydrogen utilisation in the youngest soils is consistent with reports by King and colleagues for Hawaiian and Japanese volcanic deposits (21-to 800-year-old sites). For some of those sites, microbial community structure changed as the soil matured, with members of the phylum Proteobacteria dominating vegetated sites while younger sites were enriched with Ktedonobacteria within the Chloroflexi and characterised by relatively high rates of atmospheric CO uptake [7,14,54].
MAGs were most abundant in the soil site from where they were retrieved (Figure 3). Relatively few MAGs were retrieved from the two older soils, which can be explained by the higher diversity in these soils and the decreased likelihood of recovering MAGs from groups such as Actinobacteria, Acidobacteria and Chloroflexi that were less common in them. In fact, several of the MAGs retrieved had a low relative abundance within the soils (Figure 3), which is consistent with their relative abundance of 16S rRNA genes in these soils [19]. Binning at the strain level remains a technical challenge [55], with the chances of retrieving MAGs at a given sequencing effort being reduced with increasing microdiversity (intra-population genetic diversity) and overall community diversity [56]. We previously reported that as the soil recovered and vegetation established, the microbial population appeared to enlarge and become more diverse [19], which explains the lower number of MAGs retrieved from more mature soil (1640 sample), compared to the younger sites (1957).

Metabolic Characterisation of MAGs
The three MAGs containing form I coxL genes were found in an operon structure ( Figure 5B) typical of known CO oxidisers [41]. Form I coxL has been definitively associated with CO oxidation at high concentrations and also at sub-atmospheric levels [41]. Thus, even at low abundance, the presence of these cox-containing MAGs strongly suggests a capacity for atmospheric CO uptake at all the sites.
Most of the complete functions found from the Ktedonobacteria MAGs were also found in three reference genomes: Ktedonobacter racemifer DSM 44963 [51], Thermogemmatispora carboxidivorans PM5, isolated from a geothermal biofilm on Kilauea Volcano, Hawaii (USA) [50], and Dictyobacter volcani W12 [52]. According to our genomic analyses, all of these reference strains possess formate-, H 2 -, and CO-dehydrogenases, as do the MAGs recovered in the present study. Burkholderia strains (phylum Proteobacteria) [57], members of the phylum Chloroflexi [14] and other members of the phyla Proteobacteria and Actinobacteria [58] have also been reported as CO-oxidisers in Hawaiian volcanic deposits. coxL genes encoding the large subunit of the CO dehydrogenase have been found in Proteobacteria species from Kilauea and Miyake-jima volcanoes [10,14,54].
The taxonomies of MAGs 1640-1.1 and 1957-2.1 were consistent for coxL ( Figure 6) and phylogenomic analyses ( Figure 2). In contrast, MAG-1751-1.1 clustered weakly with Acidobacteria based on genomic analysis (40% amino acid identity with a reference genome, see Supplementary Table S3) but did not cluster with Candidatus Manganitrophus noduliformans as did the coxL sequence from this MAG.
The large subunit of the NAD-reducing hydrogenase was also found in several MAGs (Figure 4). Hydrogen metabolism has been shown to provide an additional energy source for some microorganisms and has been observed in bacteria and archaea [67]. Hydrogen dehydrogenases have also been found in members of the genus Cupriavidus (phylum Proteobacteria) from volcanic mudflow deposits in the Philippines, suggesting their potential contribution to hydrogen uptake [68].

Conclusions
This study is further evidence that poorly characterised groups, such as Ktedonobacteria, establish in remote volcanic sites and may use reduced gases for growth. Further studies are needed to demonstrate the activity of these pathways and their significance in volcanic deposits.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2076-2607/8/12/1880/s1, Figure S1: Relative abundance of main phyla based on classification of scaffolds >500 bp. Other phyla include members of Planctomycetes, Gemmatimonadetes, Cyanobacteria, Firmicutes, Armatimonadetes and Patescibacteria, Table S1: List of strains used for the tree construction, Table S2: Summary table of other complete cellular functions  in the MAGs isolates form sites 1640, 1751 and 1957 retrieved from KEGG analysis. Table S3: Summary of the metagenome-assembled genomes (MAGs) isolated in the present study. Average amino-acid identity (AAI) was calculated by comparing the MAGs with their closest reference genomes identified based on the RAST results. The RefSeq accession number of the reference genomes are provided in parentheses, Table S4: Summary of enzymatic functions for CO−, H 2 −, and formate-oxidation in the Ktedonobacteria reference genomes. Carbon monoxide oxidation: K03518 carbon monoxide dehydrogenase small subunit coxS, K03519 carbon monoxide dehydrogenase medium subunit coxM, K03520 carbon monoxide dehydrogenase large subunit coxL. Formate oxidation: K00122 formate dehydrogenase, K00123 formate dehydrogenase major subunit. Hydrogen oxidation: H 2 dehydrogenase: K00436: NAD-reducing hydrogenase large subunit.