The Santorini Volcanic Complex as a Valuable Source of Enzymes for Bioenergy

: Marine microbial communities are an untapped reservoir of genetic and metabolic diversity and a valuable source for the discovery of new natural products of biotechnological interest. The newly discovered hydrothermal vent ﬁeld of Santorini volcanic complex located in the Aegean Sea is gaining increasing interest for potential biotechnological exploitation. The conditions in these environments, i.e., high temperatures, low pH values and high concentration of heavy metals, of-ten resemble harsh industrial settings. Thus, these environments may serve as pools of enzymes of enhanced catalytic properties that may provide beneﬁts to biotechnology. Here, we screened 11 metagenomic libraries previously constructed from microbial mat samples covering the seaﬂoor and the polymetallic chimneys of Kolumbo volcano as well as mat samples from Santorini caldera, to mine, in silico, genes associated with bioenergy applications. We particularly focused on genes encoding biomass hydrolysis enzymes such as cellulases, hemicellulases and lignin-degrading enzymes. A total of 10,417 genes were found for three speciﬁc groups of enzymes—i.e., the endoglucanases, the three different beta-glucosidases BGL, bglX and bglB, and the alpha-galactosidases melA, and rafA. Overall, we concluded that the Santorini–Kolumbo volcanic ecosystems constitute a signiﬁcant resource of novel genes with potential applications in bioenergy that deserve further investigation.


Introduction
Among all different extreme habitats that can be found on Earth, marine hydrothermal vents (HVs) represent the very promising targets for the discovery of some of the Earth's oddest and toughest microbes. These microbes have an enormous potential in the so called "blue biotechnology"-i.e., the area of biotechnology which exploits the diversity of marine organisms [1,2]. Hydrothermal systems associated with volcanic activity can support diverse ecosystems providing an oasis for microorganisms with many diverse abilities, yet they are largely unexplored. The geochemical conditions allowing life in such environments could only be met by nonphotosynthetic, lithotrophic microbial communities, having evolved a broad range of new metabolic capabilities [3,4]. However, despite the vital role of microorganisms in HV ecosystems, and their exploitation potential in biotechnology, key questions about their ecology, diversity and metabolic capabilities remain understudied simply because many HV environments remain largely unexplored [5]. Due to the challenges in accessing samples, they are among the least understood ecosystems on Earth [6].
The Hellenic Volcanic Arc (HVA) located in the Aegean Sea contains geologically unique and poorly studied hydrothermal vents. HVA was developed as a response to the northward subduction of the African plate beneath the active margin of the European plate [7,8]. In this arc, volcanism and hydrothermal activity occur through thinned continental crust [9]. HVA consists of five islands-Methana, Milos, Santorini, Nisiros and Kos [10]. In these areas, the vents release substantial amounts of CO 2 , H 2 , and H 2 S into the water column, creating an environment where cheolithoautotrohism plays a key role [10,11]. The Santorini volcano is world famous because of its recent major explosive eruption that happen~3600 years ago. This volcanic field extends for 20 km and consists of approximately 20 submarine cones. The largest cone is the submarine Kolumbo volcano which is situated 505 m below the sea level. The diameter of this volcano is 3 km and its crater is 1500 m wide. The northernmost part of the crater is covered by a large number of polymetallic chimneys and vents which release fluids of high (up to 220 • C) and low temperatures (up to 70 • C) [12]. The exterior of these chimneys and the seabed of the crater are covered with microbial mats of white/grey and reddish/orange colours. The volcanic discharge of CO 2 results in local pH reduction [13,14]. As a probable consequence of the elevated metal concentrations, high temperatures and low pH values, there is an absence of macrofaunal species which are typically found in HV sites. Indeed, the only thriving biological entities are of bacterial and archaeal origins [9].
Metagenomics, which is the study of the genetic content of all microbial organisms in an environmental sample, has the potential to impact a broad range of biotechnologies. The newly discovered HV field of Kolumbo volcano of the Hellenic Volcanic Arc (HVA), is gaining increasing interest for potential biotechnological exploitation; however, only recently, the first biogeochemical/biotechnological [9,15,16] and metagenomics investigation of its seafloor and water column [5,17,18] has been undertaken. These investigations, that were held in microbial mat samples covering the seafloor of the Santorini-Kolumbo volcanic system and the surfaces of Kolumbo chimneys, revealed that both Kolumbo crater and Santorini caldera harbor highly complex prokaryotic communities composed mainly of chemolithoautotrophs [5,15]. However, our most striking finding at the seafloor of this volcanic complex until now is the enormous number of genes and pathways linked to key metabolic processes with great exploitation potential in different biotechnological fields, including bioremediation and biomedicine [5]. Indeed, in our libraries we identified genes and group of organisms linked to anaerobic degradation of aromatic hydrocarbons, as well as CRISPR elements [5]. According to Barrangou and Horvath [19], these elements are antiviral defense mechanisms with great potential in biomedicine applications.
In this study, we investigated genes of interest in the bioenergy sector in order to evaluate the exploitation potential of the Kolumbo volcano in the emerging and challenging field of renewable energy. Current society and the current economy are largely dependent on petroleum as a fuel source; however, petroleum feedstocks are becoming scarcer and more expensive. Thus, there is a growing demand to develop strategies for fuel production from alternative carbon sources such as plant biomass instead of petroleum [20]. Lignocellulose represents an abundant and sustainable raw material for biofuel production accounting for approximately 70% of the carbon fixed by land plant photosynthesis [21,22]. Conversion of lignocellulosic biomass to biofuels has been an attractive alternative to petroleum usage; however, its conversion to biofuels is still complex and expensive. Thus, the development of a strategy capable of finding more effective enzymes with improved catalytic properties in a cost-efficient manner, e.g., enzymes of enhanced thermal and mechanical stability and better pH response, may revolutionize the biotechnology and chemical industries [23].
We explored metagenomic data from 11 samples of microbial mats covering the seafloor and the polymetallic chimneys of Kolumbo volcano, as well as mat samples from Santorini caldera, to mine genes of bioenergy importance. We particularly focused on genes encoding proteins that can effectively hydrolyze biomass such as cellulose, hemicellulose and lignin.

Source of the Investigated Metagenomic Data
A total of 11 metagenomic libraries previously constructed by IMBBC-HCMR and the Joint Genome Institute, Department of Energy, USA (JGI, DoE) were investigated [5,17,18].
The metagenomes have been constructed from samples collected within the framework of oceanographic sampling expeditions of the Hellenic Centre for Marine Research and the Rhode Island University, USA (Figures 1 and 2) and they are all available in the IMG/M data management system of JGI, DOE, USA. These included:   (2) The sample RED2 which was retrieved from the bottom pool fluid of the Kalisti Limnes located at the slopes of Santorini caldera. This fluid was a mixture of hydrothermal fluid with seawater, plus reddish Fe-mat and Fe-rich suspended particulate material. The suspended particulate material which was further used for metagenomic analysis was collected by pumping the sample fluid through a GF/F filter (RED2: 36.4542  Samples RED1 and WH1 were collected during the sampling expedition EN419 that took place in Santorini volcanic complex in May-June 2006 with the R/V Endeavor of Rhode Island University, in a joint campaign with the Hellenic Centre for Marine Research. High-quality DNA was prepared at IMBBC-HCMR and library construction and bioinformatic analysis were performed at JGI, DoE, USA. During this expedition, the bottom water temperature within the CO 2 -poor Santorini caldera was close to 22 • C, whereas the temperatures at the seafloor of the CO 2 -rich Kolumbo volcano varied from 60 to 70 • C [5]. Samples RED2, RED3 of Santorini caldera, the white mat sample WH2, the three layers of sulfide chimney CH1-V16c, the sulfide chimney CH2 and the water samples WAT1 and WAT2 were collected within the framework of the sampling campaign 2Biotech of the EU-FP7 SeaBioTech project that took place in September 2013 with the R/V Aegeao. Samples were collected with the Remotely Operated Vehicle (ROV) Max Rover of the HCMR. High-quality DNA extraction, library construction, sequencing and bioinformatic analysis were carried out by IMBBC-HCMR and JGI-DoE. During the Seabiotech expedition, the temperatures that were recorded using the remote operated vehicle-mounted CTD (SBE 39; Seabird Scientific) were as follows: at the slopes of Santorini caldera, the pool fluid temperature of sample RED2 was 29 • C, whereas the ambient seawater above the sample RED3 was 16.5 • C; in Kolumbo crater, the temperature of the ambient seawater above the seabed varied from 15.8 to 16.5 • C (samples WAT1 and WAT2), the temperature in the microbial mats covering the seabed varied from 60 to 70 • C (sample WH2) and the temperatures of the ambient water surrounding the sulfide chimneys V16 (CH1-V16c) and V59 (CH2) were 15.9 and 16.2 • C, respectively [15,17,18].

List of Genes
We have focused on the biomass hydrolysis process mediated by enzymes with interest in the bioenergy sector. Plant biomass consists of three major biopolymers-i.e., cellulose, hemicellulose and lignin. Cellulose is a highly recalcitrant substrate and it is protected by hemicellulose and lignin. It is a homopolymer of beta-1,4-linked glucose units. It is a major component of primary (up to 15-30%) and secondary cell walls (up to 40%) [22,24]. The second most abundant biomass component is hemicellulose, a group of short chain, branched, substituted polymer of sugars [22,25]. The sugar monomers are galactose, mannose, xylose, rhamnose, and arabinose [22,26]. Lignin is a class of hydrocarbon polymers consisting of aliphatic and aromatic structures. It is an amorphous polymer that contains phenyl propane units.
In many processes, cellulases must withstand the harsh conditions of the industrial bioconversion process, such as high temperatures of greater than 50 • C, and even low or high pHs [23,27]. Microorganisms can produce a series of enzymes enabling the degradation of all these biopolymers ( Table 1). The enzymes capable of degrading cellulose, named cellulases, contain three major categories of enzymes-namely, endoglucanases (EGs), exoglucanases/cellobiohydrolases (CBH) and the beta-glucosidases (BGL) [22,28]. The enzymes degrading hemicellulose consist of xylanase, which is the major enzyme, xyloglucanase, mannans endo 1,4-beta-manosidase and beta-1,3-and beta-1,4-glucanase/licheninase. The major group of xylanase enzymes consists of endoxylanases, exoxylanases or glycosyl hydrolases (GHs), glucuronidases, acetylxylan esterases, ferulic acid esterases and alphagalactosidase. The lignin-degrading enzymes consist of oxidoreductases (i.e., laccases), and heme peroxidases such as lignin (LiP) and manganese peroxidases (MnP) ( Table 1) [22]. By using their Enzyme Commission (EC) numbers, we searched for the presence of genes encoding these enzymes in the metagenomic libraries (Table 1). We used the KEGG Ortholog database of the 11 metagenomes which is available in the IMG/M data management system of JGI, DoE, Berkeley, CA, USA.

Phylogenetic Tree of Endoglucanases, Beta-Glucosidases and Alpha-Galactosidases
A total of 222 endoglucanases, 226 beta-glucosidases bglX and 216 alpha-galactosidases, melA and rafA, from the metagenome WH2 were selected for the construction of the phylogenetic trees. Putative amino acid sequences of these enzymes were extracted from the IMG/M database. A series of reference enzymes from the public database of Genbank-i.e., 14 endoglucanases, 17 beta-glucosidases bglX and 14 alpha-galactosidases melA and rafA from different microbial strains and metagenomes from hydrothermal vent sites were obtained and used for the tree constructions. Signal peptides were then identified and removed by using the software SignalIP (http://www.cbs.dtu.dk/services/SignalP/; accessed on 24 February 2021) [30]. Aligned and trimmed sequences were used to construct the phylogeny trees using both RAxML-NG (https://github.com/amkozlov/raxml-ng; accessed on 24 February 2021) [31] and IQ-TREE (http://www.iqtree.org/; accessed on 24 February 2021) [32]. The phylogenetic trees were bult with RAxML-NG using the model DEN and with IQ-TREE using the models LG+R5 for endoglucanases and WAG+R4 for alpha-galactosidases and beta-glucosidases. In the case of RAxML-NG, 1000 bootstrapped replicates were required; however, the algorithm converged after 900 bootstrap trees. Trees were visualized and obtained by using iTOL (https://itol.embl.de/upload.cgi; accessed on 24 February 2021) [33]. Both programs produced similar results and only the trees obtained by RAxML-NG are presented.

Results and Discussion
Through the enormous amount of metagenomic data being produced from a wide range of different environments, we can develop a better understanding of how to effectively harness various renewable energy sources [34]. Here, we explored the genetic content of 11 large metagenomic libraries previously developed from the extreme environments of the Santorini-Kolumbo volcanic complex in the context of bioenergy. We were particularly interested in genes involved in lignocellulosic biomass conversion to biofuels. Different groups of gene-encoded enzymes were mined and the results of gene counts are presented in Table 1.
One of the biggest challenges in biofuel research is the discovery of new cellulases of improved performance in order to reduce the amount and the cost of their production. The investigated environment of the Santorini volcanic complex benefits from this point of view because of its harsh conditions. A high number of genes encoding cellulases and hemicellulases were encountered in most of the metagenomes of the present study, whereas lignin-degrading enzymes were almost absent. Interestingly, high gene counts were found for three specific groups of enzymes-i.e., the endoglucanases (EG: 3384 genes), the three different beta-glucosidases (BGL: 450 genes, bglX: 3009 genes, bglB: 1078 genes) and the alpha-galactosidases (melA: 1083, gala, rafA: 1413). Endoglucanases (EGs) are very important components in biomass conversion to biofuels since they are considered as the initiators of cellulose hydrolysis. They are involved in the conversion of plant components into sugar. These enzymes act by cleaving internal beta-glycosidic bonds in the cellulose chain. Then, the ends of cellulose chain become accessible to cellobiohydrolase that produce cellobiose. Beta-glucosidase further cleaves cellobiose into glucose molecules [35]. Endoglucanases have an EC number of 3.2.1.4 and consist of one the most important groups of cellulases which is widely distributed in archaeal, bacterial, fungal and other eukaryotic species [36]. For cellulases and endoglucanases, a lot of researches have been carried out on fungus Trichoderma reesei [37] as well as on the Gram-positive, endospore forming and cellulotyic bacterium Clostridium that can produce a variety of chemicals including n-butanol [38]. However, the expression of endoglucanases from their host microbial communities adds to the final cost of the biofuel product. Therefore, there is a growing interest in exploiting the potential of thermostable bioprocessing enzymes. In the present study, we were able to mine more than 3000 gene-encoding endoglucanases. Interestingly, these genes were characterized by high diversity. Figure 3a presents the phylogenetic analysis of 222 partial endoglucanase amino acid metagenome sequences that were mined from the high-temperature WH2 sample. This analysis revealed three distinct clusters of diverse set of endoglucanase enzymes which are related to members of bacterial (green color; Figure 3a) and archaeal communities (red color; Figure 3a), as well as to community samples (blue color; Figure 3a).
Although several studies have been carried out to mine novel endoglucanases from microbial strains [39,40] or metagenomic samples [41,42], few investigations exist from hightemperature environments such as hydrothermal vents. Indeed, despite the high demand for new extremozymes, the number of known thermoactive cellulases such as endoglucanases is low and mostly originates from archaeal communities [43,44]. In the present study, we presented a large set of diverse enzymes closely related to enzymes from both thermophilic bacterial and archaeal communities, providing promising candidates for high-temperature biotechnological processes.   Figure 3. Phylogenetic maximum-likelihood tree based on partial sequences of (a) endoglucanase, (b) beta-glucosidase and (c) alpha-galactosidase amino acid sequences of the WH2 metagenome. Reference sequences from various microbial species and other metagenomes that were retrieved from Genbank are also included. The branches denoting sequences derived from GenBank were colored on the taxonomy domain they are coming from-i.e., green for bacteria, red for archaea and blue for community samples. Scale bar represents 1 amino acid substitution per 100. Dots on the branches represent bootstrap support values. The larger the dots, the higher the bootstrap values (close to 100%).
In addition to endoglucanases, beta-glucosidase genes with an EC number of 3.2.1.21 were also very abundant in almost all metagenomic libraries. The most abundant betaglucosidase bglX, also known as periplasmic beta-glucosidase, can perform hydrolysis of terminal, nonreducing beta-D-glycosyl residues with release of beta-D-glucose. Betaglucosidases are of great importance in industrial processes and they are mainly used for the enzymatic saccharification of cellulosic matrixes [45]. These enzymes are ubiquitous, occurring in organisms representing all domains of life from bacteria to highly evolved mammals. The Thermotoga species is an important source of hyperthermophilic glycosidases [46], whereas different Aspergillus species have also been studied as a source of glucosidases that can improve the efficiency of the saccharification step [47]. The phylogenetic analysis of beta-glucosidase amino acid sequences revealed the presence of diverse enzymes. A large number of different clusters were produced containing reference enzymes from both thermophilic bacterial and archaeal communities, as well as from metagenomic libraries from high-temperature environments (Figure 3b). These enzymes were most closely related to beta-glucosidases mined from metagenomes of the hydrothermal vent sediments in the hydrocarbon-rich Guayamas Basin [48,49] and to beta-glucosidase of the thermophilic Thermotoga sp. [48,49].
High gene counts were also recorded for alpha-galactosidases, melA and rafA, with an EC number of 3.2.1.22 (Table 1). These enzymes catalyze the hydrolysis of alpha-(1,6) bonds of galactose residues in galacto-oligosaccharides. A number of industrial applications of alpha-galactosidases are known, mainly in the food and pharmaceutical industry, but also in bioenergy sector [50]. In the present study, alpha-galactosidases phylogeny also showed the presence of a large number of clusters. These enzyme clusters are closely related to diverse enzymes including those of the thermophilic Geobacillus sp. 12AMOR1 which have been isolated from an Arctic deep-sea hydrothermal vent site [51] and of the hyperthermophilic archaeon Thermococcus barophilus, which has been isolated from a deepsea hydrothermal vent of the Mid-Atlantic Ridge [52]. Interestingly, many clusters were not closely related to any known alpha-galactosidases, implying the novelty characteristics of these enzymes.
Advances in genetics and biotechnology are introducing a new view of converting complex biomass into value-added products such as biofuels. In this context, emphasis has been placed upon the genetic conversion of biomass into fuels. Indeed, the production of cost-effective cellulases and hemicellulases for biomass hydrolysis is the major challenge in successful development of biofuels. In these categories, the enzymes that have gained more importance in recent years include the endoglucanases, beta-glucosidases and alphagalactosidases due to their high thermostability [36] and their high biomass hydrolysis efficiency [53][54][55].
Techno-economic studies have indicated that a cost-efficient biomass to ethanol conversion strategy requires a series of steps in biorefinery operation to be comprehensively addressed, including the use of more stable enzymes [22]. According to Gusakov et al. [56] the production cost of an enzyme is connected to the productivity of the enzyme-producing strains while the hydrolytic efficiency of the enzymes depends on both the properties of the individual enzymes and the synergies between them [22]. In order to improve the efficiency of these enzymes, several strategies have been implemented to obtain biocatalysts with improved performances. Although these strategies ranged from rational and computational designs to de novo enzyme designs, the highly demanding industrial conditions, in terms of extreme pH values and temperature, still limits the expression or/and activity of enzymes. Thus, the finding of more suitable blueprints to improve the catalytic properties of enzymes in a cost-efficient way can revolutionize the field of bioenergy [23,57].
The presence of a large number of enzymes from a high-temperature submarine hydrothermal vent site holds great promise for the identification of novel enzymes of high thermostability and improved catalytic properties. Further research on these data will provide valuable information to aid the discovery of endoglucanases, beta-glucosidases and alpha-galactosidases with potential in biofuel production.

Conclusions
Enzymes from extreme environments such as submarine volcanoes and hydrothermal vents, which are characterized by elevated temperatures and low pH conditions, are expected to display enhanced thermal or mechanical stability and better pH responses. Here, we explored the metagenomes of the extreme environments of the Santorini-Kolumbo volcanic complex. We mined a large number or gene-encoded enzymes involved in lignocellulosic biomass conversion to biofuels. Since this pool of enzymes originates from environments where temperature and pH resemble those of industrial settings, we anticipate that they can offer a cost-efficient solution in bioenergy production.