Characterization of Martelella soudanensis sp. nov., Isolated from a Mine Sediment

Gram-stain-negative, strictly aerobic, non-spore-forming, non-motile, and rod-shaped bacterial strains, designated NC18T and NC20, were isolated from the sediment near-vertical borehole effluent originating 714 m below the subsurface located in the Soudan Iron Mine in Minnesota, USA. The 16S rRNA gene sequence showed that strains NC18T and NC20 grouped with members of the genus Martelella, including M. mediterranea DSM 17316T and M. limonii YC7034T. The genome sizes and G + C content of both NC18T and NC20 were 6.1 Mb and 61.8 mol%, respectively. Average nucleotide identity (ANI), the average amino acid identity (AAI), and digital DNA–DNA hybridization (dDDH) values were below the species delineation threshold. Pan-genomic analysis showed that NC18T, NC20, M. mediterranea DSM 17316T, M. endophytica YC6887T, and M. lutilitoris GH2-6T had 8470 pan-genome orthologous groups (POGs) in total. Five Martelella strains shared 2258 POG core, which were mainly associated with amino acid transport and metabolism, general function prediction only, carbohydrate transport and metabolism, translation, ribosomal structure and biogenesis, and transcription. The two novel strains had major fatty acids (>5%) including summed feature 8 (C18:1 ω7c and/or C18:1 ω6c), C19:0 cyclo ω8c, C16:0, C18:1 ω7c 11-methyl, C18:0, and summed feature 2 (C12:0 aldehyde and/or iso-C16:1 I and/or C14:0 3-OH). The sole respiratory quinone was uniquinone-10 (Q-10). On the basis of polyphasic taxonomic analyses, strains NC18T and NC20 represent novel species of the genus Martelella, for which the name Martelella soudanensis sp. nov. is proposed. The type strain is NC18T (=KTCT 82174T = NBRC 114661T).


Introduction
Although deep subsurface environments are characterized by relatively low available organic carbon and elevated pressure, they are the largest habitats for prokaryotes, estimated to contain from 12% to 20% of the total biomass of microorganisms on Earth [1,2]. The deep biosphere contains a variety of functionally active microbial communities, but microbes face challenges such as limited available electron donors or acceptors, pore space, and fracture networks, and competition with other microorganisms for growth [3]. The Soudan Iron Mine is located in northern Minnesota, USA, along the southern edge of the Canadian Shield Transects and the Archaean Animikie ocean basin. The mine reaches a depth of 714 m below the surface, providing access to deep subsurface brines entrained in a massive hematite formation [3,4]. Previous cultivation-based and metagenomics studies reported the presence of diverse microbial communities, including the iron-oxidizing Marinobacter and iron-reducing Desulfuromonas in a calcium-and sodium-rich brine that reaches salinities as high as 4.2% (w/v) from these boreholes [3,5,6].
Halophiles and halotolerant bacteria are able to survive even in highly saline environments that are unfavorable for the existence of most life forms [7]. In recent decades, halophilic and halotolerant bacteria have been studied in biotechnological applications [8].

Enrichment Culture and Isolation
Strains NC18 T and NC20 were isolated from sediment near a descending exploratory borehole (47 • 49.2 N, 92 • 14.5 W), Soudan Mine Diamond Drill Hole 942, located at the bottom (level 27) of the Soudan Underground Mine State Park in Soudan, Minnesota, USA. The sediment sample was serially diluted with 0.85% NaCl, and suspensions were plated on R2A agar (BD, Franklin Lakes, NJ, USA) supplemented with 2% NaCl (w/v) and incubated at room temperature for 7 days. Circular, smooth, and white-to-cream-colored colonies of designated strains NC18 T and NC20 were isolated and subsequently purified three times. Cultures were maintained on R2A plates supplemented with 2% NaCl at 30 • C, and stocks were preserved in R2A broth with glycerol (20%, v/v) at −80 • C.

Phenotypic Analysis
Cell morphology and flagellation of strains NC18 T and NC20 were observed by transmission electron microscope (CM20, Philips, Amsterdam, The Netherlands) operated at 80 kV with cells grown in a marine agar (MA; BD, Franklin Lakes, NJ, USA) plate for 2 days at 30 • C. Cells were negatively stained using 2% (w/v) uranyl acetate, air-dried, and had their grids examined. The presence of spores was analyzed by phase-contrast microscopy (ECLIPSE 80i, Nikon, Tokyo, Japan) at a magnification of 1500×, using cells that had been grown for 1 week at 30 • C on MA. Gram staining was determined by using the bioMérieux Gram-staining kit according to the manufacturer's instructions. The colony color and morphology of strains NC18 T and NC20 were investigated on MA plate incubated for 2 days at 30 • C.
Growth at different temperature (4,10,12,15,20,25,30,35, and 40 • C) and pH (pH 4.0-10.0 at intervals of 1.0 pH units) levels were tested on marine broth 2216 (MB; BD, Franklin Lakes, NJ, USA). pH was adjusted with 0.1 N NaOH or 0.1 N HCl solutions and checked after autoclaving. Salt tolerance was investigated in marine broth supplemented with 0-15% (w/v, at 1% intervals) NaCl. Growth of strains NC18 T and NC20 at different temperatures, pH, and NaCl concentrations were determined by OD 600 using a spectrophotometric method (Optizen POP, Mechasus, Daejeon, Korea). Growth under anaerobic conditions was determined on MA plates at 30 • C for 10 days in an anaerobic Gaspak jar (OXOID) with Anaero-PACK (Mitsubishi Gas Chemical Co., Tokyo, Japan).
Catalase activity was determined with 3% (v/v) hydrogen peroxide. Casein hydrolysis was examined on MA supplemented with 1% (w/v) skim milk. Enzyme activities, acid production from different carbohydrates, and the assimilation of various substrates were determined using API ZYM, API 20E, and API 20NE, respectively, according to the manufacturer's instructions (bioMérieux, Marcy l'Etoile, France).

Chemotaxonomic Analysis
For the analysis of cellular fatty acid and respiratory quinone, cells of NC18 T and NC20 were prepared from cells grown on MA for 2 days at 30 • C. Fatty acid methyl esters were prepared and analyzed according to the standard MIDI protocol (Sherlock Microbial Identification System, version 6.2) and identified by the RTSBA 6.0 database of the Microbial Identification System [26].
Respiratory quinone was extracted by the chloroform-methanol extraction method and analyzed using high-performance liquid chromatography (HPLC) as previously described [27].

Phylogenetic Analysis
Genomic DNA was prepared using an AccuPrep Genomic DNA Extraction Kit (Bioneer, Daejeon, Korea) according to the manufacturer's instructions. DNA was precipitated using 1 volume of chilled isopropanol and 0.1 volume of 3 M sodium acetate, followed by overnight incubation at −20 • C. DNA pellet was collected by centrifugation at 13,800× g for 30 min at 4 • C. The DNA pellet was washed with 70% ethanol, air-dried, and resuspended in nuclease-free water (Qiagen, Germantown, MD, USA).
The 16S rRNA gene was amplified using universal primers 27F and 1492R [28]. The 16S rRNA gene sequences (1482 nt) of strains NC18 T and NC20 were identical with the corresponding region of its genomic sequence and were compared with the related sequences from the EzBioCloud server (Available online: www.ezbiocloud.net (accessed on 16 July 2021)) [29]. The 16S rRNA gene sequences were aligned with those of closely related species using the CLUSTAL X software program [30]. Gaps were edited in the BioEdit program [31] using the neighbor-joining, maximum-parsimony, and maximum-likelihood algorithms in MEGA 6.0 software [32]. Bootstrap analysis was performed to determine confidence values of individual branches in the phylogenetic tree with 1000 replications. The 16s rRNA gene sequences of strains NC18 T and NC20 were deposited in GenBank/EMBL/DDBJ under accession numbers MT367774 and MT367775, respectively.

Comparative Genomic Analysis
DNA G + C contents of strains NC18 T and NC20 were calculated from the wholegenome sequence. Overall genome relatedness index (OGRI) of strains NC18 T , NC20, and reference strains with available genomic sequences, M. mediterranea DSM 17316T (GenBank assembly accession GCA_002043005.1), M. endophytica YC6887T (GCA_000960975.1), and M. lutilitoris GH2-6T (GCA_005924265.1) were estimated on the basis of average nucleotide identity (ANI) using the ANI calculator employing the OrthoANIu algorithm [48] and digital DNA-DNA hybridizations (dDDH) values using the genome-to-genome distancecalculation (GGDC) method [49]. An online calculator GGDC2.1; available online: http: //ggdc.dsmz.de/ggdc.php# (accessed on20 May 2021) was used for calculating the dDDH value with recommended Formula 2 [50]. Identified CDSs were classified into groups on the basis of their roles according to the reference to orthologous groups (EggNOG available online: http://eggnogdb.embl.de (accessed on 20 May 2021) [43]. To calculate the similarity at the orthologous protein level between genomes, given that average amino acid sequences change more slowly than nucleotide sequences do, two-way average amino acid identity (AAI), which is more sensitive over greater evolutionary distances, based on reciprocal best hits, was calculated using the AAI calculator available online: http://enve-omics.ce.gatech.edu/aai (accessed on 20 May 2021) [51]. The Bacterial Pan-Genome Analysis Tool (BPGA) pipeline [52] was used to define core (shared with all five strains), accessory (shared with more than two but not all strains), and unique (strainspecific) pan-genome orthologous groups (POGs) of the five Martelella strains. POG clustering was carried out using the USEARCH algorithm with an identity value of 0.5.
Phenotypic examination revealed several common traits between the novel strains and closely related type strains. However, strains NC18 T and NC20 could be clearly differentiated from type strains by their ability to maximally grow in a higher NaCl concentration (13%); the ability for L-rhamnose fermentation-oxidation; the presence of urease; the absence of aesculin hydrolysis and acetoin production; and the inability for esterase lipase activity. Strain NC18 T could also be differentiated from strain NC20 by its ability to optimally grow at pH 7; ability to maximally grow at a temperature of 40 • C and the presence of β-glucosidase; and N-acetyl-β-glucosaminidase activity. The detailed morphological, physiological, and biochemical characteristics of strains NC18 T and NC20 are given in Table 1; the species description is in Section 4.1. Thus, the distinguished phenotypic properties suggest that strains NC18 T and NC20 are separated from other species of the genus Martelella. Table 1. Differential characteristics of strains NC18 T and NC20 with related taxa within the genus Martelella. Cell size (µm) 5-9 (8) 5-8 (7) 5-9 (6-8.5) 5-9 (7-8.5) 5-9.5 5-10 (6-8)  [6], and strain 10 from Lee [5]. +, positive; -, negative; w, weak; NR, not reported.

Phylogenetic Characterization
The 16S rRNA gene sequences of strains NC18 T and NC20 were 1482 bp, showing 100% similarity to each other. NC18 T and NC20 showed the highest sequence sim  The MLSA tree also agreed with the taxonomic positions of the two strains as shown by the phylogenetic tree based on the 16S rRNA gene (Figure 2, Supplementary Figures  S4 and S5).  The MLSA tree also agreed with the taxonomic positions of the two strains as shown by the phylogenetic tree based on the 16S rRNA gene (Figure 2, Supplementary Figures S4 and S5).

General Genomic Features
The genomic features of strains NC18 T and NC20 are shown in Table 3. The genome sizes of NC18 T and NC20 were 6,109,459 and 6,109,677 bp, respectively. The NC18 T genome was predicted to have 5849 genes, 5502 protein-encoding genes, 6 rRNAs, and 48 tRNAs. The NC20 genome was predicted to have 5830 genes, 5585 protein-encoding

General Genomic Features
The genomic features of strains NC18 T and NC20 are shown in Table 3. The genome sizes of NC18 T and NC20 were 6,109,459 and 6,109,677 bp, respectively. The NC18 T genome was predicted to have 5849 genes, 5502 protein-encoding genes, 6 rRNAs, and 48 tRNAs. The NC20 genome was predicted to have 5830 genes, 5585 protein-encoding genes, 6 rRNAs, and 48 tRNAs. The DNA G + C contents of both NC18 T and NC20 were 61.8 mol% (Table 1), which are within the known DNA G + C contents of genus Martelella (52.8-62.6 mol%). The predicted functional genes based on the COG database of both strains mainly belong to amino acid transport and metabolism (E; 537 and 526 orthologs for NC18 T and NC20, respectively), carbohydrate transport and metabolism (G; 534 and 524 orthologs), transcription (K; 500 and 496 orthologs), and inorganic ion transport and metabolism (P; 458 and 451 orthologs), except only a general function prediction (R) (Supplementary Table S2).

Comparative Genomic Characterization
ANI, AAI, and dDDH values between NC18 T and NC20 were 99.9%, 100%, and 100%, respectively (Tables 4 and 5). These indicate that the two isolates belonged to a single species. In contrast, ANI values between NC18 T and the reference strains of M. mediterranea DSM 17316 T , M. endophytica YC6887 T , and M. lutilitoris GH2-6 T were 88.1%, 80.2%, and 80.4%, respectively (Table 4). AAI values between NC18 T and the reference strains of M. mediterranea DSM 17316 T , M. endophytica YC6887 T , and M. lutilitoris GH2-6 T were 87.7%, 76.3%, and 77.8%, respectively (Table 5). These ANI and AAI values were lower than the 95% threshold used to identify isolates as belonging to the same bacterial species [53,54]. The dDDH values of strain NC18 T and the reference strains of M. mediterranea DSM 17316 T , M. endophytica YC6887 T , and M. lutilitoris GH2-6 T were 34.9%, 23.9%, and 23.7%, respectively (Table 4), which were below the threshold of 70% for species delineation [53]. Altogether, these results indicate that strain NC18 T represents a novel species of the genus Martelella.
To gain an in-depth understanding of the intra-species genomic diversity of the Martelella species, pan-genome analysis was performed. The pan-genome curve showed that the size of the pan-genome increased with the addition of new genomes (Supplementary Figure S6). The core genome slowly decreased as the genomes were added one by one. Pan-genomic analysis shows that the two strains and three related species had 8470 POGs: 2258 POG core, 3617 POG accessory, and 2595 POG unique (Figure 3). The five strains contained a certain number of strain-specific genes (POG unique), and the number varied considerably (9-971) depending on each strain, showing that the changing genetic flow led to the generation of strain specificity (Supplementary Table S3). Most of the POG core was classified into basic functions: amino acid transport and metabolism (E), general function prediction only (R), carbohydrate transport and metabolism (G), translation, ribosomal structure and biogenesis (J), and transcription (K) (Supplementary Figure S7), which were related to necessary nutrients obtaining from various environments and maintaining a lifestyle. Function unknown (S) occupied a large proportion, which showed the current lacing in understanding Martelella genomes. Most of the POG accessory was also classified as a similar pattern to that of the POG core. led to the generation of strain specificity (Supplementary Table S3). Most of the POG core was classified into basic functions: amino acid transport and metabolism (E), general function prediction only (R), carbohydrate transport and metabolism (G), translation, ribosomal structure and biogenesis (J), and transcription (K) (Supplementary Figure S7), which were related to necessary nutrients obtaining from various environments and maintaining a lifestyle. Function unknown (S) occupied a large proportion, which showed the current lacing in understanding Martelella genomes. Most of the POG accessory was also classified as a similar pattern to that of the POG core. In addition, Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis indicated that the POG unique were mainly involved in carbohydrate metabolism, membrane transport, amino acid metabolism, and lipid metabolism, especially carbohydrate metabolism (Supplementary Figure S8), which corresponded with the variable ability of carbon source use tested by API (Table 1). These genomic characteristics indicated the diversity of metabolic pathways in different Martelella strains. In addition, the concatenated POG core-based phylogenetic tree (Supplementary Figure S9) showed that the two novel strains were distinct from the three other Martelella strains, as clustering with M. mediterranea DSM 17316 T .
This result was consistent with the phylogenetic tree based on 16S rRNA gene sequences of strains NC18 T and NC20 with other related taxa. Thus, the distinguished ge-T Figure 3. Comparison of POG distribution of strains NC18 T and NC20 with related taxa within the genus Martelella.
In addition, Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis indicated that the POG unique were mainly involved in carbohydrate metabolism, membrane transport, amino acid metabolism, and lipid metabolism, especially carbohydrate metabolism (Supplementary Figure S8), which corresponded with the variable ability of carbon source use tested by API (Table 1). These genomic characteristics indicated the diversity of metabolic pathways in different Martelella strains. In addition, the concatenated POG core-based phylogenetic tree (Supplementary Figure S9) showed that the two novel strains were distinct from the three other Martelella strains, as clustering with M. mediterranea DSM 17316 T .
This result was consistent with the phylogenetic tree based on 16S rRNA gene sequences of strains NC18 T and NC20 with other related taxa. Thus, the distinguished genetic distinctiveness by ANI, AAI, and dDDH values suggest that strain NC18 T is separated from other recognized species of the genus Martelella. Pan-genomic analysis also showed the distinct patterns of gene content and metabolism pathway within this new Martelella species.

Conclusions
We isolated two Martelella strains and performed phenotypic, chemotaxonomic, phylogenetic, and genomic analyses to identify these strains as novel species. Phylogenetic analysis showed that strains NC18 T and NC20 grouped with members of the genus Martelella, including M. mediterranea DSM 17316 T and M. limonii YC7034 T . Genetic distinctiveness by ANI, AAI, and dDDH values suggested that strains NC18 T and NC20 are separated from other recognized species of the genus Martelella. Pan-genomic analysis showed that most POG cores of NC18 T and NC20 were relevant to amino acid transport and metabolism in the COG category and carbohydrate metabolism in the KEGG pathway. On the basis of polyphasic taxonomic data, strain NC18 T represents the type strain of a novel species of the genus Martelella, for which the name Martelella soudanensis sp. nov. is proposed. ary distances calculated using the Jukes-Cantor model. Evolutionary history inferred using the maximum-likelihood method, Figure S3: Phylogenetic tree based on 16S rRNA gene sequences of strains NC18 T and NC20 with other related taxa using 1393 bp sequence. Evolutionary history inferred using the maximum-parsimony method, Figure S4: Multilocus sequence analysis (MLSA) tree based on universally conserved protein sequences of strains NC18 T and NC20 with other related taxa. Evolutionary distances computed using JTT matrix-based method. Evolutionary history inferred using the maximum-likelihood method, Figure S5: Multilocus sequence analysis (MLSA) tree based on universally conserved protein sequences of strains NC18 T and NC20 with other related taxa. Evolutionary history inferred using the maximum-parsimony method, Figure S6: Pan-genome curve of five Martelella strains. Analysis performed using the Bacterial Pan-Genome Analysis Tool (BPGA) pipeline with default parameters, Figure S7: Functional POGs annotation of five Martelella strains using COG database, Figure S8: Functional POGs annotation of five Martelella strains using KEGG database, Figure S9: Phylogenetic tree using concatenated POG core based on pan-matrix of five Martelella strains, Table S1: Accession numbers of 31 universally conserved gene sequences used in multilocus sequence analysis (MLSA) tree, Table S2: COG functional classification of the genome belonging to strains NC18 T and NC20, Table S3: Pan-genomes of five Martelella strains.