Metagenomic Approaches as a Tool to Unravel Promising Biocatalysts from Natural Resources: Soil and Water

: Natural resources are considered a promising source of microorganisms responsible for producing biocatalysts with great relevance in several industrial areas. However, a signiﬁcant fraction of the environmental microorganisms remains unknown or unexploited due to the limitations associated with their cultivation in the laboratory through classical techniques. Metagenomics has emerged as an innovative and strategic approach to explore these unculturable microorganisms through the analysis of DNA extracted from environmental samples. In this review, a detailed discussion is presented on the application of metagenomics to unravel the biotechnological potential of natural resources for the discovery of promising biocatalysts. An extensive bibliographic survey was carried out between 2010 and 2021, covering diverse metagenomic studies using soil and/or water samples from different types and locations. The review comprises, for the ﬁrst time, an overview of the worldwide metagenomic studies performed in soil and water and provides a complete and global vision of the enzyme diversity associated with each speciﬁc environment. of the different soil and water sources used in metagenomic studies conducted to identify promising biocatalysts.


Introduction
The natural resources available on Earth have played an important role in the history of human civilization since they have provided the necessary materials and energy for the preservation and proliferation of life. Besides the basic sustenance, proper use of these resources can also contribute to the improvement of our comfort, protection and well-being [1]. Over the centuries, we have witnessed the continuous exploitation of some finite natural resources, which resulted in the degradation of important ecosystems, subsequently creating severe environmental, economic and technological issues [2,3]. This overconsumption of materials and energy will eventually lead to inevitable resource depletion. Consequently, there is a need to moderate the demand for finite natural resources and simultaneously search for efficient and sustainable ways to extract and convert energy/materials from renewable resources [4]. In this context, biocatalysts can play a crucial role. Due to their high specificity and selectivity, the biocatalysts can generally ensure the effective conversion of substrates, minimizing the formation of undesirable side-products and reducing the energetic costs associated with the process. Hence, the use of suitable and robust biocatalysts can greatly contribute to the implementation of greener and sustainable bioprocesses that efficiently compete with the classical chemical routes. Currently, the use of biocatalysts for the valorization of alternative non-finite resources, under the concept of bioeconomy and the EU Green Deal, has gained increased attention.
Unexplored or slightly explored environments, such as soil and water, are interesting sources of novel and promising biocatalysts. Despite the clear differences at the physicochemical level, soil and water are both regarded as natural bio-reservoirs with great microbial diversity. For this reason, these environments have been the focus of several microbial studies in the last few decades. Microorganisms are considered important suppliers of various bioproducts with applications in several industrial areas, such as enzymes. In fact, in the last decade, we have seen a significant increase in the demand for enzymes [5], which is easily explained by their great biotechnological potential. However, the presence of a significant number of unculturable microorganisms both in the soil and water can limit or make unfeasible some microbial studies to find novel biocatalysts. In this context, metagenomics can play a crucial role.
Metagenomics has emerged as a culture-independent technique that allows exploring the genetic material of whole microbial communities present in a given environment [6]. This technique has been successfully used to identify novel enzymes with promising catalytic activities and some of them have been patented and already translated to the market [7]. Two different metagenomic approaches have been described, namely, sequencebased or function-based metagenomics. In both cases, an initial step of DNA extraction from an environmental sample is needed [8]. The sequence-based studies allow the identification of candidate genes, while the function-based screenings include the detection and isolation of clones from metagenomic libraries with a positive response to the desired phenotype [9]. The construction of a metagenomic library requires the selection of the most suitable expression vector, in which the environmental DNA fragment will be inserted. In addition to other aspects, such as the quality and size of the environmental DNA, this selection depends on the purpose of the functional screening. Plasmids can be used when DNA fragments are small (≤15 kb insert size) and contain only individual genes. On the other hand, some expression vectors, such as fosmids and cosmids (<40 kb insert size), or bacterial artificial chromosomes (BACs) and yeast artificial chromosomes (YACs) (>40 kb insert size), allow the recovery of large biosynthetic gene clusters that encode the production of one or more specialized metabolites. Besides the expression vector, expression systems should be selected in such a way that gene expression and target gene detection are maximized. Escherichia coli has been widely used due to the extensive genetic knowledge on this microorganism that makes it suitable for effective and profitable cloning and protein expression [10,11]. However, the demand for robust biocatalysts requires functional screenings at temperatures other than the growth temperatures of mesophilic hosts. Therefore, other expression hosts, such as Thermus thermophilus, have already been shown to be good candidates as expression hosts in the functional metagenome analysis [12].
This review discusses the most significant works reported in the last decade about metagenomic studies performed in soil and water environments for the discovery of novel and interesting enzymes. The bibliographic survey was carried out in the international database Scopus between 2010 and 2021 using several search criteria as illustrated in Figure 1. Briefly, in the first criteria the keywords used were: "metagenomic" and "enzyme" and "soil" or "water". After that, for each natural resource, the search was refined according to the type and origin of the samples. Finally, the different groups of enzymes were considered in the search.  Based on the results obtained in the bibliographic survey, the studies were divided into three categories according to the type of sample (e.g., raw resources, human manipulated resources and unspecified resources) as illustrated in Figure 2. Within the category of raw resources, differences were established according to the temperature of each specific environment. In the case of human manipulated resources, three conditions were considered for soil, namely, samples from polluted environments, composting processes and agricultural lands and grassland. For the water samples resulting from human manipulation two conditions were defined, namely, contaminated groundwater/freshwater and coastal water. In the category of unspecific resources, both for soil and water, all studies that did not present enough information to classify them according to the categories mentioned above were included. Overall, for each study, the environmental conditions (temperature and pH), the type of metagenomic approach and its main characteristics, as well as the catalytic activity, was revised. Furthermore, the number of citations found in the international database Scopus was also presented as an indicator of the impact of the study. All this information was summarized in the tables presented in each specific subsection.

Soil
Soil is more than just land; it represents one of the most important natural resources available on Earth. It is considered the basis of agriculture, an important water filter, a natural reserve of carbon and water and also the regular habitat for several living organisms. Soils originate from rocks through complex chemical, mechanical and biological processes which result in the formation of small particles and grains [13]. In terms of composition, soils are generally constituted by solid, liquid and gas fractions in a 2:1:1 ratio (volume basis). The solid fraction is mainly composed of inorganic materials but also contains a small portion of organic matter (around 1-5%) [14]. The liquid phase is generally an aqueous solution of electrolytes (e.g., micronutrients and macronutrients). On the other hand, the main gases present in soil are water vapor, CO2, O2 and N2 [13]. Depending on its origin and composition, different types of soil can be defined. Microbial diversity and functionality are strongly dependent on the specific characteristics of each type of soil. The classification established for the different soil samples considered in this review is presented in Figure 3a. A total of 173 metagenomic studies were performed, considering soil samples aiming to explore promising enzymes.

Soil
Soil is more than just land; it represents one of the most important natural resources available on Earth. It is considered the basis of agriculture, an important water filter, a natural reserve of carbon and water and also the regular habitat for several living organisms. Soils originate from rocks through complex chemical, mechanical and biological processes which result in the formation of small particles and grains [13]. In terms of composition, soils are generally constituted by solid, liquid and gas fractions in a 2:1:1 ratio (volume basis). The solid fraction is mainly composed of inorganic materials but also contains a small portion of organic matter (around 1-5%) [14]. The liquid phase is generally an aqueous solution of electrolytes (e.g., micronutrients and macronutrients). On the other hand, the main gases present in soil are water vapor, CO 2 , O 2 and N 2 [13]. Depending on its origin and composition, different types of soil can be defined. Microbial diversity and functionality are strongly dependent on the specific characteristics of each type of soil. The classification established for the different soil samples considered in this review is presented in Figure 3a. A total of 173 metagenomic studies were performed, considering soil samples aiming to explore promising enzymes.

Unspecified Resources
The designation "unspecified" includes studies from soil resources that do not fall into either the raw resources section or the section of resources subject to human manipulation, since neither the temperature of the sample collected, the sampling site nor the existence of any human activity which may alter the natural properties of the sample is mentioned.
A large part of the unspecified resources are forests that are made up of microorganisms responsible for mediating biogeochemical cycles in terrestrial ecosystems. Additionally, forest soils have a high microbial diversity due to the great accumulation of organic matter [15]. Several metagenome studies have been developed, notably in soil samples from the Amazon rainforest [152], mangrove forests [153][154][155][156][157], peat-swamp forests [158,159], beech forests [160] and chestnut groves [161], to assess microbial diversity, as well as the metabolic capacities of these communities to decompose natural biomass. A metagenome from the Eucalyptus sp. forest in Brazil revealed that two of the most abundant phyla are Actinobacteria and Firmicutes, commonly associated with forest soils [15,162].
This category also includes samples from soils with high moisture content, such as alluvial soils from Eulsukdo Island (South Korea) [142,163], soils collected from lakes [64,145] and even soils with a significant salinity content [145,164], mountain soils such as an acid peatland site in Germany [159] and arid areas, namely, soils of the Cerrado region of Brazil [165,166] that present a high clay content, low pH and high iron levels. All these environments have interesting characteristics that give them unique ecosystems with high enzymatic potential. Table 3 comprises all the data considered in the category of unspecified resources 3. Water Water is essential in our life. Although frequently perceived as just an ordinary substance, it plays a vital role on the Earth since life as we know it could not exist without (b) Assignment of the soil type to each collected sample from different environments (forest and their associated soils, wet mud/coastal soils, arid/semiarid soils, glacial soils, mountain soils, agricultural soils, compost, contaminated soils, industrial sludge and aquatic sediments).

Raw Resources
Depending on the geographical location, the soil may present itself in various forms at the Earth's surface as the result of environmental conditions, organic matter content and type of vegetation [15]. An abiotic factor with high importance in any ecosystem is temperature since it can directly or indirectly affect microbial activity and, consequently, the composition of microbial communities. Each microbial species has a different temperature tolerance range and is capable of producing distinct types of biocatalysts at different production rates. Still, certain microorganisms can thrive and function well metabolically in adverse conditions, notably at extreme temperatures [16,17].
Therefore, the sampling locations in which the soil was directly collected and evaluated "as found in the nature" (raw resources) were classified according to the recorded temperature: low temperature (location temperature below 20 • C); moderate temperature (location temperature between 20 and 45 • C); and high temperature (location temperature above 45 • C) [17].
As shown in Figure 3a, a greater number of studies were accomplished in lowtemperature environments (27 studies), followed by high-temperature environments (26 studies) and, finally, moderate temperature environments (5 studies). The unique characteristics of extreme environments, namely, extreme temperatures, make them more interesting places to find promising and robust enzymes. This fact is probably the main reason for the reduced number of studies accomplished in moderate temperature environments. Furthermore, there is an increasing need to find highly stable biocatalysts with efficient activities capable of acting in various industrial processes that make use of severe conditions, similar to what occurs in hot and cold environments [18]. Table 1 compiles all the data considered in the category of raw resources.

Low-Temperature Environments
Cold soils, in addition to being exposed to very low temperatures, are also subject to other harsh conditions, such as freeze-thaw cycles, UV radiation and restricted availability of water and nutrients. Indeed, arid/semiarid regions, aquatic environments, polar regions, mountainous and forest areas are examples of cold habitats that have been the focus of metagenomic studies. Although they usually have low biodiversity, the microorganisms that inhabit them have acquired survival machinery and, for this reason, these types of habitats constitute stimulating reservoirs of biotechnological molecules [74].
Deep-sea sediments, mainly sands and clays from several aquatic environments, are the type of samples that most contribute to the exploration of enzymes in these cold habitats. The South China Sea is a marginal sea and one of the most studied environments since it constitutes an important reservoir of sediments rich in organic materials [75]. Other regions of the Pacific Ocean, such as Suruga Bay in Japan [31], as well as the depths of the Atlantic Ocean [51] and the Arctic Ocean, notably in the Barents Sea area [48], have also been studied to search not only for lipolytic enzymes but also enzymes belonging to the glycosyl hydrolases class. Additionally, the submarine tufa columns of the Ikka Fjord in Greenland [21] and the karstic lake in Spain (Lake Arreo) [40], which, in addition to being permanently cold, are alkaline, and valuable sources of microorganisms adapted to these environments. In addition to aquatic sediments, other types of samples with metagenomic exploitation potential come from arid/semiarid and mountain soils, for example in the Ladakh region [70] and the Apharwat mountain [41,44], respectively, both in the northwestern Himalayas that have distinct geo-climatic characteristics like extremely cold and dry weather, high altitude and glacial and permafrost soils. Relevant examples of this are the Karuola glacier in Tibetan Plateau [45] and the Kolyma Lowland permafrost in north-eastern Siberia [47], which are extremely hostile environments and inhabited by unique microbial communities.

Moderate Temperature Environments
There are few studies in which sampling sources are identified as moderate temperature environments. Still, certain environments of this nature have other characteristics that also make them interesting for metagenomic purposes. An example is the Caatinga biome of João Câmara (Brazil) which presents sandy loam soil and constitutes an ecosystem of high biological relevance due to the features of the area, such as the semiarid climate, the high exposure to UV radiation and the long periods of drought [71,76]. Another example is the solar saltern of Goa, which differs by its high salinity and represents an important source for metagenomic studies given the difficulty in cultivating halophilic microorganisms through conventional techniques [52].

High-Temperature Environments
High-temperature environments have also proven to be important sources of very useful thermostable enzymes with applications in various industrial fields, such as food and chemical synthesis industries. In addition to geothermally heated environments, such as hot springs and hydrothermal vents, arid/semiarid regions and environments subject to natural composting processes are often good targets for the application of metagenomic tools [77]. Compost samples from Expo Park in Japan, produced from leaves and branches, are a good example of natural composting since they have been studied over a few years [24,42,59,69]. Once thermophilic composting reaches high temperatures, there is a greater predominance of microorganisms capable of degrading complex molecules, with this type of environment being a potential source of lignocellulose-degrading enzymes and, for this reason, an interesting subject of study [78]. Several arid/semiarid regions have been explored given the typical characteristics of these environments, including deserts [66] and also other sitesmore specifically, the Turpan Basin, which represents China's hottest place and has proven to be a valuable source of different types of highly thermostable enzymes [57]. Hot springs and hydrothermal vents from different portions of the planet, e.g., Caldeirão hot spring in Portugal [28], Solnechny hot spring in Russia [46] and Solfatara-Pisciarelli hydrothermal pool in Italy [34] have also contributed to finding robust enzymes through the construction of metagenomic libraries using DNA extracted from wet mud and/or sediments collected from these places. For all other metagenomic studies, the expression host used was E. coli, except for the metagenomic library constructed from sediments of a hot spring in the Azores, Portugal, which used the T. thermophilus as the host. Using this thermophilic host, Leis and co-workers intended to increase the probability of detecting genes derived from extreme environments that would encode for new thermostable biocatalysts and allow the screening of phenotypes that are not observable in E. coli [43].
In some studies performed from these raw resources, an additional enrichment step was performed to provide favourable growth conditions for certain microorganisms of interest, often present in small abundance, to the detriment of others [79]. These enrichments were implemented by introducing specific substrates, such as cellulose, xylan, chitin, starch and glucose [31,33,38], and even olive oil [47], that stimulate specific microbial activities. On the other hand, culture enrichment also occurred by controlling environmental conditions, in particular the temperature, which is generally in agreement with the temperature of the sampling locations [26].
Over the past decade, in addition to function-based metagenomic screenings, sequencebased metagenomic screenings have also been performed. Sequential metagenomics showed that the phyla that predominate high-temperature environments are Crenarchaeota, Thaumarchaeota, Acidobacteria and Proteobacteria capable of mineral-based metabolism and generally associated with soil, found more specifically in sediments from hot springs and hydrothermal vents [21,31,34,36,43].

Human Manipulated Resources
Soil, in addition to being "discovered" as it is exposed in nature, without any kind of alteration, can be studied in a variety of scenarios, including contaminated/polluted, agricultural and controlled composting as a consequence of human manipulations. According to Figure 3a, 31 metagenomic studies were performed in polluted soil samples, 27 studies were conducted in agricultural fields and grasslands and 21 studies were performed on samples provided by composting facilities. Table 2 comprises all the data considered in the category of human manipulated resources.

Polluted Environments
The intensification of industrialization, urbanization and mining have negatively affected the soil as a natural source. It has been observed that soil has been contaminated by different factors, namely, industrial sewages, solid wastes and urban activities. Some organic and inorganic pollutants have been responsible for soil contamination, such as heavy metals, alkaline or acidic constituents, toxins, oil contaminants and others [146].
It was found that the following categories of polluted samples have been used as the object of metagenomic studies: soils contaminated by oil and its constituents (such as polycyclic aromatic hydrocarbons (PAHs)), fertilizers and other alkaline pollutants, industrial sludges and sediments. Oil production sites [80], soils where oil spills or runoffs have occurred [50,101,111,139], soils near industrial areas [50,87,117,140] and soils treated with fertilizers [136] were particularly analyzed. Since pollutants are rich in toxic compounds, they affect the activity and diversity of microbial communities present in these adulterated soils. Therefore, metagenomic studies have been developed in this type of compromised environment to unravel the gene clusters that encode enzymes involved in the biodegradation of the various pollutants already mentioned. Only the microorganisms that transport machinery capable of resisting and degrading these types of recalcitrant compounds can survive in such environments. The toxic compounds can even act as substrates and enrich some specific microorganisms [147].
The acid mine drainage in Carnoulès (France) is considered an interesting reservoir of enzymes capable of degrading polymers and pollutants simultaneously producing antimicrobial agents, since this polyextreme environment, in addition to being highly acidic, presents high concentrations of heavy metals, such as iron and arsenic, as a consequence of mining [100]. Another example is the saline-alkali soil of Lop Nur (China) which is characterized by extreme aridity and is a location that suffers from severe human manipulation. Since it serves as a basis for monitoring and verification of nuclear tests [148], microorganisms present in this soil are certainly subject to a high degree of stress and, for this reason, it may present an interesting microbial diversity and functionality [143]. Other areas exposed to other components such as fats [99,129] or chitin [97,98] have also been the focus of metagenomic studies as they are potential sources of new genes encoding groups of specific enzymes (lipolytic and chitinolytic enzymes, respectively). Activated sludge from different municipal [50,88,135,145] or industrial effluents, such as pesticide [138], swine [113] and paper and pulp [50] industries, are also a rich source of microorganisms producing enzymes capable of degrading protein, lipids and other pollutants. Of the thirty-one studies, two of them performed the analysis of the 16S rRNA gene libraries constructed, one of activated sludge from a swine wastewater treatment facility and the other one from soil contaminated and enriched with chitin. In both studies, it was found that the most predominant phylum was Proteobacteria [98,113]. Nevertheless, different samples can unravel other dominant ones, since the composition of the sources (activated sludge from industrial or municipal wastewater treatment plants or treated soils) and the type of treatment accomplished may influence the bacterial diversity.

Agricultural Lands and Grassland
Land use and management have a great influence on the functioning of the soil ecosystem. Microbial diversity and functionality are sensitive to land use considering the important role of soil microorganisms in soil formation processes and nutrient cycling [149].
Hence, several metagenomic studies were implemented in fields designed for agriculture and/or grasslands, many of them subject to the ploughing and cultivation of different crops. Rhizosphere soils of, for example, red pepper plants and strawberry plants represent a complex but interesting ecosystem due to the symbiosis and parasitism interactions that happen between plants and microorganisms in these soil regions [92,115,142]. Cotton [132], wheat [116,141], sugarcane [83,90,103], corn [81,144], straw [86] and paddy [94,142] fields are examples of agricultural environments from which samples were collected, in particular from topsoil, to be analysed through metagenomic approaches. The selection of topsoil samples and not samples at higher depths is due to the presence of a higher soil microbial biomass on the surface since there is larger evidence of litter composition and root turnover rates in this type of land [149]. Another important fact is that decomposition of a variety of lignocellulosic residues occurs in these environments making them attractive to isolate lignocellulose-degrading enzymes relevant to several industrial applications.
Three large-scale research landscapes [133] in Germany (Hainich-Dün, Schorfheide-Chorin and Schwäbische Alb) are defined as exploratory environments. They present different geological and climatic conditions and are characterized by the different intensities of use and management of agricultural fields and grasslands. Therefore, they certainly have great microbial diversity. In these environments, 37 novel lipolytic enzymes, the vast majority belonging to the hormone-sensitive lipase family, were reported ( Table 2). These exploratory environments, together with the Oak Park research facility in Ireland [128], are valuable sources of promising biocatalysts, notably lipases/esterases, as their land is essentially fertilized with compost and/or manure and is subject to crop rotation. The crop rotation system benefits certain chemical and physical properties of the soil, which is very important and favourable for soil microorganisms [150].

Industrial Composting
Among the different manipulated sources, composting is considered one of the most important bio-reactions for renewable bioenergy on the planet due to the huge variety of microorganisms capable of degrading lignocellulosic biomass. Composting is a sustainable and efficient microbiological process in which the stabilization of the organic matter occurs due to the passage through a thermophilic phase promoted by the proliferation of thermophilic microorganisms [78,151].
The great contribution of composting to the circular economy has led to an increase in the number of composting facilities that are responsible for the production of compost, rich in humic substances, with high agronomic value in organic fertilization of agricultural soils. Different parameters are controlled and adjusted throughout the industrial composting process, such as temperature, pH, humidity, nature of organic materials, particle size and C/N ratio [151].
For the metagenomic studies in which composting samples were used, the sample collection essentially occurred during the thermophilic phase of composting that reaches high temperatures (above 45 • C) due to microbial metabolic activity. Certain studies refer to microbial diversity and confirm the prevalence of thermophilic microorganisms, namely, Actinobacteria, Bacteroidetes, Firmicutes and Proteobacteria, producing enzymes able to degrade the complex molecules that compose the lignocellulosic biomass [43,82,108,118]. Additionally, some studies also report enrichment with lignocellulosic substrates, namely, switchgrass, steam-exploded spruce, cutter chips, Whatman filter paper, pre-treated Miscanthus giganteus and wheat straw. These substrates are firstly incubated with the composting samples at a constant temperature and pH-defined according to the phase at which they were collected to increase the abundance of thermophilic microorganisms and develop a suitable consortium able to degrade lignocellulosic biomass. Over the years, several lignocellulose degrading enzymes have been reported through the analysis of industrial composting metagenomes [82,93,104,108,109].

Unspecified Resources
The designation "unspecified" includes studies from soil resources that do not fall into either the raw resources section or the section of resources subject to human manipulation, since neither the temperature of the sample collected, the sampling site nor the existence of any human activity which may alter the natural properties of the sample is mentioned.
A large part of the unspecified resources are forests that are made up of microorganisms responsible for mediating biogeochemical cycles in terrestrial ecosystems. Additionally, forest soils have a high microbial diversity due to the great accumulation of organic matter [15]. Several metagenome studies have been developed, notably in soil samples from the Amazon rainforest [152], mangrove forests [153][154][155][156][157], peat-swamp forests [158,159], beech forests [160] and chestnut groves [161], to assess microbial diversity, as well as the metabolic capacities of these communities to decompose natural biomass. A metagenome from the Eucalyptus sp. forest in Brazil revealed that two of the most abundant phyla are Actinobacteria and Firmicutes, commonly associated with forest soils [15,162].
This category also includes samples from soils with high moisture content, such as alluvial soils from Eulsukdo Island (South Korea) [142,163], soils collected from lakes [64,145] and even soils with a significant salinity content [145,164], mountain soils such as an acid peatland site in Germany [159] and arid areas, namely, soils of the Cerrado region of Brazil [165,166] that present a high clay content, low pH and high iron levels. All these environments have interesting characteristics that give them unique ecosystems with high enzymatic potential. Table 3 comprises all the data considered in the category of unspecified resources.

Water
Water is essential in our life. Although frequently perceived as just an ordinary substance, it plays a vital role on the Earth since life as we know it could not exist without water. As a natural resource, water is fully distributed over the planet. The oceans contain the greatest fraction of water (96.5%), while continental water is composed of freshwater (2.5%) and saline groundwater (1%) [182]. Water is present in its three physical forms (liquid, solid and vapour) depending on the climate conditions. Aqueous environments (saline and freshwater) are considered important sources of microorganisms capable of inhabiting and resisting in different physical forms, such as icebergs (important spots for marine life), seas and oceans, thermal springs, glaciers, lakes and ponds. The microbial diversity and functionality found in water are strongly dependent on the environmental conditions (e.g., salinity, temperature, physical form, nutrients, pH or depth). It is known that a large part of the aquatic microbial resources remains unexplored mainly due to access limitations. Nevertheless, the use of metagenomic tools has significantly aided the discovery of novel biocatalysts from aquatic metagenomes [183]. In this review, the water samples used in the metagenomic studies were divided according to their origin ( Figure 4a) and main characteristics (Figure 4b). In the last decade, a total of 26 metagenomic studies with water samples to find promising enzymes were reported (Figure 4a).
Catalysts 2022, 12, x FOR PEER REVIEW 6 of 10 water. As a natural resource, water is fully distributed over the planet. The oceans contain the greatest fraction of water (96.5%), while continental water is composed of freshwater (2.5%) and saline groundwater (1%) [182]. Water is present in its three physical forms (liquid, solid and vapour) depending on the climate conditions. Aqueous environments (saline and freshwater) are considered important sources of microorganisms capable of inhabiting and resisting in different physical forms, such as icebergs (important spots for marine life), seas and oceans, thermal springs, glaciers, lakes and ponds. The microbial diversity and functionality found in water are strongly dependent on the environmental conditions (e.g., salinity, temperature, physical form, nutrients, pH or depth). It is known that a large part of the aquatic microbial resources remains unexplored mainly due to access limitations. Nevertheless, the use of metagenomic tools has significantly aided the discovery of novel biocatalysts from aquatic metagenomes [183]. In this review, the water samples used in the metagenomic studies were divided according to their origin ( Figure  4a) and main characteristics (Figure 4b). In the last decade, a total of 26 metagenomic studies with water samples to find promising enzymes were reported ( Figure 4a). As shown in Figure 4b, in the last decade the most studied aqueous samples were hyper/saline water (46.2%), followed by contaminated water (30.8%) and groundwater/freshwater (15.4%) and, finally, brackish water (7.7%). The first two types of water have been the most explored probably due to their extreme characteristics.  As shown in Figure 4b, in the last decade the most studied aqueous samples were hyper/saline water (46.2%), followed by contaminated water (30.8%) and groundwater/freshwater (15.4%) and, finally, brackish water (7.7%). The first two types of water have been the most explored probably due to their extreme characteristics.

Raw Resources
As mentioned before, the microbial communities of natural water resources are strongly associated with environmental factors, one of which is temperature [184]. Indeed, in addition to the most common aquatic environments on the planet, such as seawater or lakes, extreme temperature environments also allow exploring a diversity of enzymes capable of catalysing reactions at reasonable or extreme conditions [185]. Therefore, raw water resources were classified in the same way and according to the same temperature ranges as soil resources: low temperature (location temperature below 20 • C); moderate temperature (location temperature between 20 and 45 • C); and high temperature (location temperature above 45 • C) [17].

Low-Temperature Environments
When thinking about low-temperature aquatic environments, thoughts are immediately associated with marine environments and/or high depth environments that are intrinsically related to high pressures. The decrease in temperature and the increase in depth cause a decreased diffusion of nutrients and energy and decreased abundance of prokaryotic cells, respectively. These extreme conditions require that microorganisms found in these zones have their adapted metabolism, for example, low concentrations of nutrients, which make them fascinating targets for the bioprospection of novel microbial capabilities and, accordingly, promising enzymes [200,201].
Low-temperature water samples were collected essentially in different geographical locations in the Atlantic Ocean. Surface seawater was collected at opposite ocean regions, particularly from an ecosystem observation site in New Jersey, USA [195], and the brackish Baltic Sea in Poland [187]. The latter recorded the lowest water temperature (0.8 • C), probably due to the fact that the Baltic sea was formed as a consequence of the last glaciation that occurred 10,000-15,000 years ago and, therefore, has undergone remarkable changes in its physicochemical characteristics [202]. In these different geographical locations, different groups of enzymes were reported, namely, a ribulose 1,5-bisphosphate carboxylase/oxygenase and a cytosolic β-glucosidase with a wide range of catalytic activities. Additionally, the chemocline of the Urania basin in the Mediterranean Sea was the subject of a metagenomic study to find interesting carboxylesterases, since it is a deep-sea anoxic hypersaline basin [50]. The extreme factors that characterize the Urania basin (hypersalinity, low temperature and anoxia) together with the typical features of chemoclines make this habitat accommodate a highly diverse microbial community with pronounced microbial activities, such as CO 2 fixation and exoenzyme activities [203].

Moderate Temperature Environments
In moderate temperature environments, samples of different types of water are included: groundwater/freshwater, hyper/saline water and brackish water.
Hyper/saline water samples from the surface of the South China Sea were the ones that most contributed to the metagenomic studies in this category. The seasonal average water temperature that falls into the range defined for this group (20-45 • C) and the unique environmental properties of the South China Sea potentially contribute to the diversity, novelty and uniqueness of genes encoding for valuable enzymes. Effectively, different groups of enzymes have already been explored in the South China Sea, including β-glucosidases, laccases and esterases [75,186,190,192].
As the lakes of the Amazon region remain unexplored, the freshwater metagenome of Lake Poraquê was functionally analysed. Being the largest hydrographic basin on Earth, the great genetic and metabolic diversity of microorganisms present in this important region may result in the discovery of new enzymes of biotechnological interest, such as enzymes involved in the degradation of plant cells walls [188].
The brackish samples of the Caspian Sea were also accessed since this environment presents a salinity and ionic concentration very similar to the human serum. In this way, there is a high probability that the secretory enzymes (more specifically, L-asparaginases) found in the Caspian Sea microbiome exhibit greater stability in the physiologic conditions of the human serum which can render them an interesting therapeutic applicability [199].

High-Temperature Environments
High-temperature habitats (>45 • C) are inhabited by heat-resistant microorganisms and some of these environments also combine other extreme conditions, for example, alkalinity, acidity, salinity, pressure and heavy metals [17]. Samples have been studied mainly from hyper/saline water environments and groundwater/freshwater environments.
Among the different aquatic environments, the hypersaline anoxic deep-sea basins in the Red Sea, namely, the Atlantis II deep brine pool, have received increased attention. These are characterized by a high temperature, extreme salinity, acidic pH, extremely low levels of light and oxygen and high concentrations of heavy metals. In this way, this extreme environment is expected to be an attractive location for the search for biocatalysts that can function under harsh conditions, not just those that characterize the Red Sea. An esterase capable of acting in the presence of heavy metals; a mercuric reductase extremely relevant in the detoxification system for mercuric/organomercurial species; a nitrilase useful in bioremediation processes, fine chemicals and pharmaceutics; a 3 -aminoglycoside phosphotransferase and a beta-lactamase with potential application as thermophilic selection markers; and a thioredoxin reductase important in the maintenance of the redox balance and counteracting oxidative stress inside cells are some examples of biocatalysts found in this extreme environment [189,193,194,197,198].
Another source commonly known for its high temperatures are the hot springs. A metagenomic library constructed from a groundwater sample from the Lobios hot spring in Spain was evaluated, by sequence-based and functional metagenomics approaches, given its high temperatures and alkaline pH values. Moreover, the microbial biodiversity and metagenomic potential of this source have not been sufficiently explored. This study reported a novel esterase belonging to family VIII and showed that the dominant prokaryotic phyla in this location, as in other hot springs on the planet, were Deinococcus-Thermus, Proteobacteria, Firmicutes, Acidobacteria, Aquificae and Chloroflexi. Additionally, the dominant archaeal phylum was Thaumarchaeota [191].

Human Manipulated Resources
Anthropogenic activities interfere negatively in many ways with the natural water cycle. Several water bodies, such as oceans, rivers and groundwater have been contaminated not only by natural events but particularly due to human interventions [204]. According to Figure 4a, 9 metagenomic studies were performed in contaminated water samples. This section was divided into groundwater/freshwater (4 studies) and coastal water (5 studies). Table 5 compiles all the data considered in the category of human manipulated resources.  [207] Oil industry products contaminated groudwater (Czech Republic) 6.9 8.5 Sequence-based --Haloalkane dehalogenases 3 [208] * Obtained from the international database Scopus (21 March 2022).

Groundwater/Freshwater
The main anthropogenic sources of water contamination are refineries, mines, factories and wastewater treatment plants, among others [204].
Over the past 10 years, different metagenomic studies have been executed in contaminated groundwater and freshwater sources for the acquisition of novel enzymes. Some examples of contaminated sources are the formation of water in a coalbed in Jharia coalfield (India) which is defined as an extreme environment [206], Eryuan Niujie hot spring in Yunnan (China), which has a high content of fats due to the wastes resulting from the livestock slaughter that occurs in the vicinity of the hot spring [207], and groundwater from an area in the Czech Republic that has been incessantly contaminated with various products from an oil industry for over 50 to 70 years [208].
The activities associated with each of the sites employ a certain continuous selective pressure on the microorganisms that live in these environments to develop enzymes capable of acting in the production of biodiesel, degradation of organophosphorus compounds and halogenated pollutants, respectively [206][207][208].

Coastal Water
The category of coastal waters is essentially composed of water samples from oilcontaminated harbours due to the numerous ships circulating in the waters of these areas each year and also due to the unintentional spills of hydrocarbons that may occur during the loading and unloading of petroleum-derived substances. The main pollutant responsible for the contamination of these sites is oil and the studies have mainly been focused on the Mediterranean Sea in Italy and the Barents Sea in Russia, allowing the finding of robust carboxylesterases [50].
The importance of the metagenomic analysis of water samples of these types of sources is justified by the abundance of microbial species capable of degrading hydrocarbons which can be potentially applied in the bioremediation of ecosystems. To this end, additional enrichments with crude oil or specific hydrocarbons (e.g., pyrene, naphthalene and phenanthrene) are carried out to mimic the place from which they are isolated.

Unspecified Resource
The tropical underground water of the Yucatán Aquifer in Mexico was the only resource considered unspecified. Nevertheless, it is a very interesting resource of freshwater since it consists of very permeable and porous limestone that allows the infiltration of water into the deepest layers of the soil. Additionally, the Yucatán Aquifer presents cracks or interconnected spaces that allow water from distant zones and sources to move freely, carrying a collection of microorganisms from different places. Thus, although it should be a natural selection process that favours some microorganisms by eliminating others, it may also present a high microbial diversity from diverse origins [209].
In this way, this aquifer can represent a potential and interesting source for the acquisition of a catalogue of enzymes suitable for the degradation of natural polymers, including proteins.

Soil versus Water
As previously demonstrated, soil and water have both been reported as promising sources of biocatalysts. Nevertheless, the number of metagenomic studies focused on soil have been considerably superior, as illustrated in Figure 5. Between 2010 and 2021, the period considered in this review, the soil was always a preferred source to search for new biocatalysts, representing more than 69% of the studies each year. Additionally, it is also important to highlight that the number of metagenomic studies for the identification of enzymes has been decreasing for both natural resources. In fact, since 2017, a significant reduction in the number of studies was observed but in 2021 this number went up again ( Figure 5), probably as a consequence of the pandemic situation which, in general, resulted in a sharp increase in publication volume [210]. In the last years, we have witnessed a change in the focus of metagenomic studies from functionality to taxonomy and gene annotation. The development of more accurate software allows access to the complex data generated by the sequence-based metagenomics approach. Consequently, it is possible to explore the biosynthetic gene cluster diversity and to understand the significant role of the microorganisms in the global biogeochemical cycles. The global distribution of the several metagenomic studies performed with soil and water samples is illustrated in Figure 6a. This figure shows that soil and/or water samples from five continents (America, Europe, Asia, Africa and Antarctica) and four oceans (Atlantic, Pacific, Arctic and Indic) were already explored through metagenomic studies aiming at the discovery of new enzymes. Regarding the soil samples, Asia was the continent with the higher number of studies (~52%), followed by Europe (~24%) and America (~13%) (Figure 6b). The samples were mostly collected from human manipulated environments, which can be justified by the higher demographic density and industrial activity attributed to these continents. The number of studies performed with soil samples from Africa or Antarctica was very low (<2%). When considering the studies performed in marine sediments, a higher percentage was found for the Pacific Ocean (~5%) ( Figure  6b).
On the other hand, the marine water samples studied by metagenomic approaches were mostly collected in the Atlantic Ocean (~27%), Indic Ocean (~19%) and Pacific Ocean (~15%) (Figure 6c). In terms of continental water, samples were obtained from Asia (~15%), Europe (~8%) and America (~8%). The global distribution of the several metagenomic studies performed with soil and water samples is illustrated in Figure 6a. This figure shows that soil and/or water samples from five continents (America, Europe, Asia, Africa and Antarctica) and four oceans (Atlantic, Pacific, Arctic and Indic) were already explored through metagenomic studies aiming at the discovery of new enzymes. Regarding the soil samples, Asia was the continent with the higher number of studies (~52%), followed by Europe (~24%) and America (~13%) (Figure 6b). The samples were mostly collected from human manipulated environments, which can be justified by the higher demographic density and industrial activity attributed to these continents. The number of studies performed with soil samples from Africa or Antarctica was very low (<2%). When considering the studies performed in marine sediments, a higher percentage was found for the Pacific Ocean (~5%) (Figure 6b). Catalysts 2022, 12, x FOR PEER REVIEW 8 of 10  On the other hand, the marine water samples studied by metagenomic approaches were mostly collected in the Atlantic Ocean (~27%), Indic Ocean (~19%) and Pacific Ocean (~15%) (Figure 6c). In terms of continental water, samples were obtained from Asia (~15%), Europe (~8%) and America (~8%).
The main groups of enzymes identified in the metagenomic studies include lipolytic enzymes, glycosyl hydrolases, oxidoreductases, phosphatases/phytases and proteases. The global distribution of these enzymes according to the type of sample used in the metagenomic study is represented in Figure 7a. The soil samples from Asia, Europe and America generally resulted in the identification of lipolytic enzymes and glycosyl hydrolases (Figure 7b). Oxidoreductases were mostly obtained from Asian soil samples. On the other hand, some phosphatases/phytases were identified in samples from European soils. Additionally, the metagenomic studies performed with both European and Asian soil samples showed the presence of other types of enzymes such as RNase [69], rhodanase [70], trehalose synthase [143], sulfatases [145] or wax ester synthase [142]. The soil samples collected in Africa and Antarctica, as well as the marine sediments from the Pacific, Antarctic and Atlantic Oceans, essentially allowed the identification of lipolytic enzymes. Overall, it was discovered that lipolytic enzymes, glycosyl hydrolases, oxidoreductases and proteases could be found in both natural resources ( Figure 8). Nevertheless, phosphatases and phytases were only identified in soil samples. Furthermore, the enzymatic activities included in the "others" group were significantly different for soil and water samples (Tables 1-5). For the water samples, the lipolytic enzymes were also the predominant group, being identified in almost all the studied locations (except for samples from America) (Figure 7c). Nevertheless, proteases were only identified in samples collected in America. Glycosyl hydrolases were found in continental water from America and marine water from the Pacific and Atlantic Oceans. On the other hand, oxidoreductases were found in marine water samples from the Pacific and Indic Oceans. Similar to the soil, other enzymatic activities were identified in the water samples (except for samples from America and the Arctic Ocean) such as nitrilase [197], beta-lactamase [198], L-asparaginases [199], fumarase [196] or haloalkane dehalogenases [208]. However, no phosphatases/phytases were identified in the metagenomic studies of water.
Overall, it was discovered that lipolytic enzymes, glycosyl hydrolases, oxidoreductases and proteases could be found in both natural resources (Figure 8). Nevertheless, phosphatases and phytases were only identified in soil samples. Furthermore, the enzymatic activities included in the "others" group were significantly different for soil and water samples (Tables 1-5).  The five groups of enzymes mostly identified in the metagenomic studies (namely, lipolytic enzymes, glycosyl hydrolases, oxidoreductases, phosphatases/phytases and proteases) correspond to catalytic activities with a great interest in the industry. Lipolytic enzymes, such as lipases and esterases, can catalyse either the hydrolysis or the synthesis of ester bonds. They are considered robust enzymes, which can resist the harsh conditions of some industrial processes such as high temperature and pH, and the presence of organic solvents. These enzymes have been applied in the food, cosmetic, pharmaceutical, detergent, laundry and oleo-chemical industries [211]. Lipases and esterases showing resistance to high pH values (>8.5) were mostly found in soil samples, namely, petroleum hydrocarbonscontaminated soil (esterase, optimal pH = 9.0 [117]), Brazilian cerrado soil (lipase, optimal pH = 9.0-9.5 [165]), compost units (esterase, optimal pH = 10.0 [114]), fat-contaminated soil (lipase, optimal pH = 10.0 [129]) or cold desert soil (esterase, optimal pH = 11.0 [39]). On the other hand, lipases and esterases with tolerance to high temperatures (>65 • C) were identified in hot spring soils (esterase, optimal temperature 80 • C [43]; lipase, optimal temperature 65 • C [61]) and compost units (esterase, optimal temperature 75 • C [43,121]. Additionally, in water samples from the Red Sea (esterase, optimal temperature 65 • C [189]) and the South China Sea (esterase, optimal temperature 65 • C [190]), thermo and halotolerant esterases were found. Furthermore, other interesting enzymes able to catalyse the hydrolysis of ester compounds were obtained from compost (cutinase [59]), wheat field soil (feruloyl esterase [116]) or contaminated soil from wood-processing industries (carboxylesterases [50]).
The group of glycosyl hydrolases includes several enzymes which promote the breakdown of carbohydrates into simple sugars through the hydrolysis of specific glycosidic bonds. This kind of enzyme has been widely explored in biorefining processes for the hydrolysis of different polysaccharides from plant origin and the subsequent production of important added-value products like biofuels. Furthermore, glycosyl hydrolases are also interesting enzymes for the cosmetic (e.g., toothpaste additives), food (e.g., dairy, baking or brewing processes) and feed industries [212]. Three metagenomic studies performed with water samples from marine water and freshwater led to the identification of β-glucosidases [186][187][188]-important enzymes in the hydrolysis of short-chain oligosaccharides into glucose. As expected, enzymes belonging to the family of cellulases (e.g., endoglucanases and β-glucosidases), xylanases (e.g., endoxylanases and β-xylosidase) and amylases were widely found in soil samples. Chitinases able to catalyse the hydrolysis of chitin commonly present in the exoskeleton of some animals were also identified in metagenomic studies with soil. The natural presence of plant and animal detritus in soils justifies the existence of microorganisms with the catalytic ability for the conversion of cellulose, xylan, starch or chitin.
The family of proteases include the enzymes which promote proteolysis, i.e., the breakdown of proteins into smaller peptides or amino acids. Proteases are well-established enzymes in the food and feed industry, being used as stabilizers, meat tenderizers and additives for better digestion, or even for the improvement of brewing and baking processes. They can also be applied in leather and detergent industries, as well as in photography or therapeutic uses [213]. According to the metagenomic studies analysed here, proteolytic activity seems to be more common in soils than in water. Alkaline proteases were found in desert environments [66], forest soil [177], hot springs sediments [67] and oil-contaminated soil [111].
Oxidoreductases are a vast number of enzymes able to catalyse the transfer of electrons from one molecule (reductant) to another (oxidant). This kind of enzymes is regarded as important biocatalysts in the textile (dye bleaching), pulp and paper (bleaching), food and beverage (stabilizer) and pharmaceutical industries (synthesis of bioactive compounds), as well as in biorefining processes for the bioconversion of lignocellulose [214]. They can also have an important role in the bioremediation of industrial and municipal wastewater contaminated with organic compounds such as textile dyes, pharmaceuticals, hormones or personal care products [215]. Diverse oxidoreductases were reported in the metagenomic studies analysed here, including laccases [154,192], oxygenases [137], alcohol and aldehyde dehydrogenases [63,65], D-amino acid oxidases [136] and bilirubin oxidases [135]. Nevertheless, the majority of these enzymes were identified in soil samples.
Phosphatases are enzymes involved in the cleavage of ester bonds and the release of phosphate groups. Phytases are a type of phosphatases that act in the hydrolysis of phytic acid, a specific organic form of phosphorous. These enzymes are of great interest in human and animal nutrition by reducing the phytate content of food products and contributing to more efficient digestion and absorption of phosphorus [216]. Phytases are common and very useful biocatalysts in soils since they are considered as primary agents for dephosphorylation [217]. In fact, phytases were only found in the metagenomic studies performed with soil samples [141,159].

General Conclusions and Future Perspectives
Metagenomics proved to be an efficient technique to explore natural resources such as soil and water. Following either a sequence-or a function-based approach, several enzymes with distinct catalytic activities were identified. In the last decade, the number of metagenomic studies performed with soil samples was considerably higher than compared with water samples. This fact can be related to some limitations reported for DNA extraction. When working with water, high volumes of samples need to be collected and filtered to allow the extraction of an appropriate quantity of DNA considered representative of the microbial communities present in the environment. Furthermore, water sources are immensely vast and, in most cases, not fully attainable without specific equipment and protection (e.g., oceans, seas and groundwater). On the other hand, as terrestrial living organisms, we found the soil as a much more accessible and near environment. Nevertheless, it is clear that the biotechnological potential of soil remains barely explored. The global distribution of the metagenomic studies carried out for enzyme discovery in the last decade noticeably shows that a vast number of promising environments are still waiting for their potential to be unravelled. The lack of economic and human resources in the African continent may justify the reporting of only two metagenomic studies, namely with compost samples from the same composting facility. However, other continents like America and Europe could also be explored more. The same conclusions are obviously taken for water samples.
The catalytic activities found in the metagenomic studies were mostly distributed in five representative groups. The origin and composition of the samples are generally connected with the biocatalytic potential of the microbial communities. Parameters like the pH and temperature of the sampling location are frequently related to the optimal conditions reported for the biocatalysts. Additionally, in most cases, it was possible to establish an association between the type of enzymes and the main constituents of the sample. Lipolytic enzymes were often found in environments containing petroleum hydrocarbons, oils or fat. Glycosyl hydrolases were mostly identified in soils since these environments are naturally composed of plant and animal debris. Thus, the microbial communities take advantage of producing these types of enzymes to convert the complex polysaccharides of plant and animal origin. Similarly, phytases are enzymes with a catalytic action that properly fits the soil environments where phosphorus assumes an important role. The presence of oxidoreductases can also be more expected in soils due to the key role they have in the conversion of lignin from plants. For proteases, no clear association was established for the metagenomic data reported.
In the last years, more accurate and advanced software to access and analyse the complex sequencing data generated in metagenomic studies have been developed. This fact greatly contributed to a better understanding of the interactions established between microbial communities and the environment by accessing important information about biosynthetic routes and taxonomic annotation. As a consequence, the motivation of the metagenomic studies slightly diverged. The studies initially focused on microbial functionality have been replaced by studies exclusively directed to whole-genome sequencing and metabolite prediction. It is expected that metagenomics consolidates in the future its status as a key technique to unravel the biotechnological potential of microbial resources, thus contributing to the search for novel and effective bioactive compounds according to the market trends.