Metagenomics of Bacterial Diversity in Villa Luz Caves with Sulfur Water Springs

New biotechnology applications require in-depth preliminary studies of biodiversity. The methods of massive sequencing using metagenomics and bioinformatics tools offer us sufficient and reliable knowledge to understand environmental diversity, to know new microorganisms, and to take advantage of their functional genes. Villa Luz caves, in the southern Mexican state of Tabasco, are fed by at least 26 groundwater inlets, containing 300–500 mg L−1 H2S and <0.1 mg L−1 O2. We extracted environmental DNA for metagenomic analysis of collected samples in five selected Villa Luz caves sites, with pH values from 2.5 to 7. Foreign organisms found in this underground ecosystem can oxidize H2S to H2SO4. These include: biovermiculites, a bacterial association that can grow on the rock walls; snottites, that are whitish, viscous biofilms hanging from the rock walls, and sacks or bags of phlegm, which live within the aquatic environment of the springs. Through the emergency food assistance program (TEFAP) pyrosequencing, a total of 20,901 readings of amplification products from hypervariable regions V1 and V3 of 16S rRNA bacterial gene in whole and pure metagenomic DNA samples were generated. Seven bacterial phyla were identified. As a result, Proteobacteria was more frequent than Acidobacteria. Finally, acidophilic Proteobacteria was detected in UJAT5 sample.


Introduction
The tropical rainforest is one of the world's main biomes, biologically possessing the greatest wealth of the tropics [1][2][3], but also the most endangered [4]. The Mexican humid tropics (MHT) or warm-humid tropics is in the southeast region of Mexico. It occupies just 11% of the country's landscape [5], but it has the greatest biological diversity [6]. One of the characteristic epicontinental ecosystems in the MHT are underground aquatic biomes, which are interconnected with springs and the Chichonal volcano by an aquifer network that spans hundreds of kilometers in a region lying North of the Sierra de Chiapas in southeastern Mexico [7]. Thus, Villa Luz (VL) caves are characterized by the presence of elemental sulfur embedded in the walls with 300-500 mg L −1 H 2 S and <0.1 mg L −1 O 2 in the air, as well as karst springs and colloidal sulfur [8][9][10][11]. H 2 S concentration in the atmosphere varies and is often quite toxic. Besides the fascinating hydrology and atmosphere, VL caves have a diverse biological community that appears to be largely dependent on the mineral-rich waters [8].
(Sulphur cave), that is fed by about 26 sulfidic springs, and Cueva Luna Azufre (Sulfur Moon cave), with non-sulfidic springs [8,12]. Azufre (Sulphur cave), that is fed by about 26 sulfidic springs, and Cueva Luna Azufre (Sulfur Moon cave), with non-sulfidic springs [8,12]. These caves are divided into 13 different chambers (I-XIII) [10,29], see Figure 2. They were formed by the folding of a micritic block of limestone in the Cretaceous period and are limited to the South by a normal fault, which probably controls the location of its main entrance. Front chambers receive a certain amount of light through the roof skylights, while the more inner chambers are completely dark. Several anastomosing streams that flow through the cave are fed by springs that emerge from the limestone floor, most of which contain H2S and possibly gas bubbles with CO2. Based on the chemistry and physical nature, the springs have been classified into two groups. Group A members are characterized by containing between 300 and 500 mg L −1 H2S, and less than 0.1 mg L −1 O2. This water is slightly oversaturated with calcite and oversaturated with gypsum and dolomite. It is recognizable in the caves by the elemental sulfur contained in the walls above the high-water mark, the white bacterial filaments on the wet rock surfaces, and the pyrite deposits in the water-covered sediments or rocks. Group B springs have <0.1 mg L −1 H2S and ≤4.3 mg L −1 O2. They are characterized by travertine and iron oxides (red-yellow color) precipitation, calcite and dolomite oversaturation, and gypsum sub-saturation. Both type (AB-type) springs have elements of the first two spring groups and their composition is the most abundant in these caves. Based on total dissolved solids and chemistry in general, it has been proposed that the A and B groups have a similar origin and composition, suggesting that H2S oxidation takes place first in group B. The causes or controls on water oxidation are still unknown [7][8][9][10]. These caves are divided into 13 different chambers (I-XIII) [10,29], see Figure 2. They were formed by the folding of a micritic block of limestone in the Cretaceous period and are limited to the South by a normal fault, which probably controls the location of its main entrance. Front chambers receive a certain amount of light through the roof skylights, while the more inner chambers are completely dark. Several anastomosing streams that flow through the cave are fed by springs that emerge from the limestone floor, most of which contain H 2 S and possibly gas bubbles with CO 2 . Based on the chemistry and physical nature, the springs have been classified into two groups. Group A members are characterized by containing between 300 and 500 mg L −1 H 2 S, and less than 0.1 mg L −1 O 2 . This water is slightly oversaturated with calcite and oversaturated with gypsum and dolomite. It is recognizable in the caves by the elemental sulfur contained in the walls above the high-water mark, the white bacterial filaments on the wet rock surfaces, and the pyrite deposits in the water-covered sediments or rocks. Group B springs have <0.1 mg L −1 H 2 S and ≤4.3 mg L −1 O 2 . They are characterized by travertine and iron oxides (red-yellow color) precipitation, calcite and dolomite oversaturation, and gypsum sub-saturation. Both type (AB-type) springs have elements of the first two spring groups and their composition is the most abundant in these caves. Based on total dissolved solids and chemistry in general, it has been proposed that the A and B groups have a similar origin and composition, suggesting that H 2 S oxidation takes place first in group B. The causes or controls on water oxidation are still unknown [7][8][9][10].

Environmental Sampling and DNA Extraction for Metagenomic Analysis
Snottite, biovermiculite, and sediment samples were collected from 10 pre-selected sites ( Figure  3), considering their physicochemical characteristics (temperature, pH, and H2S smell). These samples were transported to laboratory in a thermal cooler, in which they were pre-frozen with liquid nitrogen and stored at −40 °C until subsequent DNA extraction. Metagenomic DNA from 0.5g of environmental samples was extracted using the FastDNA ® SPIN Kit for Soil (MP Biomedicals, Santa Ana, CA, USA), following the manufacturer's recommendations. Snottite samples were frozen with liquid nitrogen and then were ground to achieve better disintegration of the embedded cells within the mucus-like layers and to facilitate mechanical and chemical lysis. Sediment samples were centrifuged at 10,000× g for 20 min with a centrifuge 5810R (Eppendorf, Hamburg, Germany) to completely extract excess water and precipitate bacterial cells. As for the biovermiculites samples, the DNA was directly extracted. The DNA obtained was used to amplify 1400 base pairs (bp) of bacterial 16S ribosomal DNA (rDNA) gene using the oligonucleotides: NVZF 5´-GCG GAT CCG CGG CCG CTG CAG AGT TTG ATC CTG GCT CAG-3´ and NVZR 5´-GGC TCG AGC GGC CGC CCG GGT TAC CTT GTT ACG ACT T-3´ [30]. The PCR reaction mix consisted of 50 ng DNA template, 1x PCR buffer (Qiagen, Hilden, Germany), 0.025M MgCl2 (Merck, Darmstadt, Germany), 200 μM each deoxynucleotide (dNTP) (Promega, Madison, Wisconsin, USA), 0.6 μM each oligonucleotide, and 0.5U HotStarTaq DNA Polymerase (Promega). PCR reaction conditions were: an initial denaturation step of 94 °C for 10 min followed by 30 cycles at 94 °C for 1 min, and 58 °C for 1 min, 72 °C for 1 min, and a final extension at 72 °C for 10 min, with a T-gradient Thermo cycler, the CFX96 Touch TM Real-Time PCR (Bio-Rad, Singapore, Singapore). The PCR products were evaluated by submerged electrophoresis in 1.2% agarose gel stained with ethidium bromide and quantified using a NanoDrop 2000 Spectrophotometer (Thermo Scientific, Waltham, MA, USA). Finally, they were diluted to 100 ngμL −1 by bTEFAP analysis, for which the Research and Testing Laboratory services (Lubbock, TX, USA), were required.

Environmental Sampling and DNA Extraction for Metagenomic Analysis
Snottite, biovermiculite, and sediment samples were collected from 10 pre-selected sites (Figure 3), considering their physicochemical characteristics (temperature, pH, and H 2 S smell). These samples were transported to laboratory in a thermal cooler, in which they were pre-frozen with liquid nitrogen and stored at −40 • C until subsequent DNA extraction. Metagenomic DNA from 0.5g of environmental samples was extracted using the FastDNA ® SPIN Kit for Soil (MP Biomedicals, Santa Ana, CA, USA), following the manufacturer's recommendations. Snottite samples were frozen with liquid nitrogen and then were ground to achieve better disintegration of the embedded cells within the mucus-like layers and to facilitate mechanical and chemical lysis. Sediment samples were centrifuged at 10,000× g for 20 min with a centrifuge 5810R (Eppendorf, Hamburg, Germany) to completely extract excess water and precipitate bacterial cells. As for the biovermiculites samples, the DNA was directly extracted. The DNA obtained was used to amplify 1400 base pairs (bp) of bacterial 16S ribosomal DNA (rDNA) gene using the oligonucleotides: NVZF 5 -GCG GAT CCG CGG CCG CTG CAG AGT TTG ATC CTG GCT CAG-3 and NVZR 5 -GGC TCG AGC GGC CGC CCG GGT TAC CTT GTT ACG ACT T-3 [30]. The PCR reaction mix consisted of 50 ng DNA template, 1x PCR buffer (Qiagen, Hilden, Germany), 0.025M MgCl 2 (Merck, Darmstadt, Germany), 200 µM each deoxynucleotide (dNTP) (Promega, Madison, Wisconsin, USA), 0.6 µM each oligonucleotide, and 0.5U HotStarTaq DNA Polymerase (Promega). PCR reaction conditions were: an initial denaturation step of 94 • C for 10 min followed by 30 cycles at 94 • C for 1 min, and 58 • C for 1 min, 72 • C for 1 min, and a final extension at 72 • C for 10 min, with a T-gradient Thermo cycler, the CFX96 Touch TM Real-Time PCR (Bio-Rad, Singapore, Singapore). The PCR products were evaluated by submerged electrophoresis in 1.2% agarose gel stained with ethidium bromide and quantified using a NanoDrop 2000 Spectrophotometer (Thermo Scientific, Waltham, MA, USA). Finally, they were diluted to 100 ng µL −1 by bTEFAP analysis, for which the Research and Testing Laboratory services (Lubbock, TX, USA), were required.

Pyrosequencing and Sequence Analysis
Clone library and pyrosequencing preparation services were requested from the Research and Testing Laboratory. To perform PCR we used specific universal primers 28F (5'-GAGTTTGATCNTGGCTCAG-3') and 519R (5'-GTNTTACNGCGGCKGCTG-3') of the 16S rRNA gene V1 and V3 variable regions. A systematic check was performed to remove low-quality reads in accordance with Brown et al.'s (2009) strategies [31]. These involve eliminating: (i) Sequences that do not perfectly match the 3bp key code and primer sequence at the start of read, (ii) Sequences that do not perfectly match at least the first 10bp of the distal primer, (iii) Sequence reads that contain any undetermined nucleotide (N), and (iv) Sequence reads <50 bp after removing both primers [32]. The data was obtained using an ad hoc channel written in Perl. All statistics were obtained with RStatistics software, making use of several open-source libraries such as GData [33] and Vegan [34]. The group sequences were calculated to have 0.97% of similarity and 80% of overlap by using the Cluster Database at High Identity with Tolerance (CD-HIT) software [35]. Taxonomic affiliations were assigned using the Ribosomal Database Project (RDP) classifier [36] and all data were tabulated. Readings with RDP score value <0.8 were assigned below the taxonomic rank/range and left in the last rank as unidentified.

Samples and Chemical Properties
Samples from five different sites in the VL caves were selected (UJAT1, UJAT2, UJAT3, UJAT4, and UJAT5). As extreme values, we measured from 27 °C to 30 °C, while the pH was found to be from 2.5 to 7. As seen in Table 1, UJAT1 and UJAT2 samples contained high chemical concentrations, UJAT4 and UJAT5 samples contained intermediate chemical concentrations, and the UJAT3 sample was obtained from a microenvironment with low chemical concentrations, mainly CO2, CH4, and H2S. The UJAT5 sample represents an acidophilus microniche (pH 2.5).

Pyrosequencing and Sequence Analysis
Clone library and pyrosequencing preparation services were requested from the Research and Testing Laboratory.
To perform PCR we used specific universal primers 28F (5 -GAGTTTGATCNTGGCTCAG-3 ) and 519R (5 -GTNTTACNGCGGCKGCTG-3 ) of the 16S rRNA gene V1 and V3 variable regions. A systematic check was performed to remove low-quality reads in accordance with Brown et al.'s (2009) strategies [31]. These involve eliminating: (i) Sequences that do not perfectly match the 3bp key code and primer sequence at the start of read, (ii) Sequences that do not perfectly match at least the first 10bp of the distal primer, (iii) Sequence reads that contain any undetermined nucleotide (N), and (iv) Sequence reads <50 bp after removing both primers [32]. The data was obtained using an ad hoc channel written in Perl. All statistics were obtained with RStatistics software, making use of several open-source libraries such as GData [33] and Vegan [34]. The group sequences were calculated to have 0.97% of similarity and 80% of overlap by using the Cluster Database at High Identity with Tolerance (CD-HIT) software [35]. Taxonomic affiliations were assigned using the Ribosomal Database Project (RDP) classifier [36] and all data were tabulated. Readings with RDP score value <0.8 were assigned below the taxonomic rank/range and left in the last rank as unidentified.

Samples and Chemical Properties
Samples from five different sites in the VL caves were selected (UJAT1, UJAT2, UJAT3, UJAT4, and UJAT5). As extreme values, we measured from 27 • C to 30 • C, while the pH was found to be from 2.5 to 7. As seen in Table 1, UJAT1 and UJAT2 samples contained high chemical concentrations, UJAT4 and UJAT5 samples contained intermediate chemical concentrations, and the UJAT3 sample was obtained from a microenvironment with low chemical concentrations, mainly CO 2 , CH 4 , and H 2 S. The UJAT5 sample represents an acidophilus microniche (pH 2.5).

Bacterial Diversity Distribution
The pyrosequencing studies with V1-V3 hypervariable regions of bacterial 16S rRNA gene by PCR amplified from five selected sites provided 20,901 readings with an average size of 434.2 bp (standard deviation (SD) average: 55.3). Using the Chao 1 estimator [33,34,38], the taxonomic analysis of sequences revealed the presence of 27 and 81 families in the UJAT2 and UJAT1b samples, respectively. According to the Shannon Index, the diversity had all values >1, with a maximum of 3.02 for UJAT1b sample. UJAT3 sample reported index of 0.73. See Table 2. These high Shannon index values indicate a diversity-balanced distribution. Only in the UJAT3 sample we found a diversity decrease, corresponding to increased dominance of Enterococcaceae and Anaerolineaceae members (Figure 4). Samples are grouped by sampling location, as displayed in Figure 4, which shows that the UJAT1a and UJAT1b samples closely resemble each other, together with the UJAT2 sample. Then, the UJAT4 and UJAT5 samples are of another cluster separate from the UJAT3 sample and of the first mentioned samples. Rarefaction curves obtained at family level, indicate that all samples except UJAT4, where the estimated Chao1 indicates the presence of up to 81.1 families (standard error (SE): 6.13), tend to reach a plateau, which indicates that the sequencing depth was sufficient to carry out a thorough description of each sample ( Figure 5).

Discussion
The number of metagenomic studies has increased in recent years [39]. Metagenomics has been used to evaluate and exploit biodiversity in many habitats, including extremophiles environments [40][41][42][43][44][45]. In this study, we determined the prokaryotic diversity of sulphydric hot springs in the VL caves of Tacotalpa, Tabasco, Mexico, with severe limitations or total absence of light. We found different bacterial communities that were dominated by Proteobacteria, Firmicutes, Chloroflexi, Chlorobi, Bacteroidetes, and Actinobacteria. We also found the phylum Acidobacteria, although with very little dominance. A dominance of Proteobacteria was observed in this study and is in accordance with other cave studies [46,47]. This suggests that the presence of this community is a consequence of the increase in organic matter entering this cave [47]. Although the interaction of these bacteria might develop metabolic capability against possible contamination by infiltration of human or animal organic matter [48], the bat guano could be the main source of organic matter responsible for making Proteobacteria the dominant phylum in this cave. Dominance and pH found in the UJAT5 sample microbiota (Figure 4; Table 1) suggest that this might correspond to acidophilic Proteobacteria, which in the case of iron oxidizers has been the focus of a large amount of research due to its significance in environmental biotechnology [49].
Gram-positive Firmicutes, Bacteroidetes, and Chloroflexi bacterial phyla are described as follows: Firmicutes are present in all aquatic environments, Bacteroidetes are green but not sulfurous, and Chloroflexi have low abundance in oligotrophic waters [50] . Wemheuer et al. (2013) were focused on the evaluation and exploitation of the prokaryotic diversity in two microbial communities obtained from different hot springs in Kamchatka; using the metagenomic approach, they found that the most abundant groups in the samples belonged to Proteobacteria, Thermotogae, and Thaumarchaeota, but they did not find Acidobacteria. This phylum is widely distributed and is abundant in soils, it is not restricted to acidic environments and is made up of oligotrophic organisms negatively correlated with soil organic matter [51]; however, their ecological and metabolic functions are not accurately known, because we do not have pure cultures neither do we have complete genomes sequences [30,[52][53][54][55]. Acidobacteria phylum is identified by a diverse collection of 16S rRNA gene sequences (>1500 in the Data Base Project ribosome [24] obtained from different environments, including soils and sediments [56,57], soil crusts of sand dunes [58], sewage [59,60], sewage distribution systems [61], mire or quagmire [62], acid mine drainage [63], intertidal hot springs [37], submarine hydrothermal vents shallow [64], surfaces Paleolithic rock paintings and catacombs [65][66][67][68], and interactions of species of this phylum with plants [43]. In situ hybridization with specific probes for Acidobacteria has also confirmed the presence of this phylum in many environments and revealed multiple cellular morphotypes, including cocci, short rods, and thin filaments [69]. Numerous 16S rRNA gene sequences from this phylum have also been identified in different active and ancient cave systems worldwide [70]. However, our knowledge of acidobacterial diversity is still rather incomplete [71], and even more considering only from caves [70].
Finally, our study revealed that all the bacteria identified herein are characteristic of caves and, in the Acidobacteria, Proteobacteria, and Actinobacteria cases, they are predominant bacterial communities on volcanic terrain [72], as is the case with this cave, which is very close to the Chichonal volcano. Currently, it is becoming widely understood that speleogenesis is induced by sulfuric acid, where sulfuric acid causes the dissolution of limestone and results in the precipitation of gypsum, a fact that has been implicated in the formation of numerous caves [8,9,12]. However, there are very few studies on the role of bacteria in the speleogenesis induced by sulfuric acid and its metabolism in this environment type.

Conclusions
To sum up, 20,901 reads of bacterial 16S rRNA gene sequences spanning V1-V3 hypervariable regions corresponded to seven phyla: Proteobacteria, Firmicutes, Chloroflexi, Chlorobi, Bacteroidetes, Actinobacteria and Acidobacteria. Proteobacteria phylum dominance could be due to the increased presence of organic matter, not only of bat guano but also that caused by man and animals, directly or through infiltrations. For the UJAT5 sample, we generated 6691 reads, which, due to the physicochemical characteristics (Table 1) and relative frequency obtained (Figure 4), could confirm the presence of acidophilic Proteobacteria in VL caves. All the bacterial communities identified are characteristic of caves, while Acidobacteria, Proteobacteria, and Actinobacteria are typical of volcanic surface terrain.