Insights into Red Sea Brine Pool Specialized Metabolism Gene Clusters Encoding Potential Metabolites for Biotechnological Applications and Extremophile Survival

The recent rise in antibiotic and chemotherapeutic resistance necessitates the search for novel drugs. Potential therapeutics can be produced by specialized metabolism gene clusters (SMGCs). We mined for SMGCs in metagenomic samples from Atlantis II Deep, Discovery Deep and Kebrit Deep Red Sea brine pools. Shotgun sequence assembly and secondary metabolite analysis shell (antiSMASH) screening unraveled 2751 Red Sea brine SMGCs, pertaining to 28 classes. Predicted categorization of the SMGC products included those (1) commonly abundant in microbes (saccharides, fatty acids, aryl polyenes, acyl-homoserine lactones), (2) with antibacterial and/or anticancer effects (terpenes, ribosomal peptides, non-ribosomal peptides, polyketides, phosphonates) and (3) with miscellaneous roles conferring adaptation to the environment/special structure/unknown function (polyunsaturated fatty acids, ectoine, ladderane, others). Saccharide (80.49%) and putative (7.46%) SMGCs were the most abundant. Selected Red Sea brine pool sites had distinct SMGC profiles, e.g., for bacteriocins and ectoine. Top promising candidates, SMs with pharmaceutical applications, were addressed. Prolific SM-producing phyla (Proteobacteria, Actinobacteria, Cyanobacteria), were ubiquitously detected. Sites harboring the largest numbers of bacterial and archaeal phyla, had the most SMGCs. Our results suggest that the Red Sea brine niche constitutes a rich biological mine, with the predicted SMs aiding extremophile survival and adaptation.


Introduction
In the era of antibiotic resistance and a concern for a post-antibiotic era there is a pressing need to combat resistance and discover novel antibiotics [1][2][3]. In the USA alone, around 2 million people a year acquire a bacterial infection that is resistant to all available antibiotics [4]. Additionally, anticancer chemotherapeutic resistance is another recent biomedical challenge, which arises either intrinsically or extrinsically, following therapy [5]. Therefore, it is a necessity to search for new chemotherapeutics [3][4][5].
Nature is considered a mine to explore for small molecules, which may be used as new therapeutic drug leads. Until 2014, a large portion of the rising number of drugs was attributed to natural products [6]. Many organisms and microbes produce specialized metabolites that have a plethora Metagenomic prokaryotic DNA was then extracted from each site and 454 shotgun sequencing was performed followed by read assembly. Taxonomic classification for archaeal and bacterial phyla was performed by proteinbased phylogeny using the metagenomics rapid annotation using subsystems technology (MG-RAST) tool [37]. The metabolite analysis shell (AntiSMASH) tool was then used for annotation, for identifying specialized metabolism gene clusters (SMGCs) by translated amino acid sequence comparison with signature biosynthetic genes profile hidden Marcov Model (pHMMs), and for structure prediction of the specialized metabolites [9]. The Antibiotic Resistance Target Seeker (ARTS) tool was used to detect housekeeping and/or resistance genes within the SMGCs [38]. The predicted specialized metabolites were grouped by their potential functions into three major groups. Lastly, top candidate SMGCs were identified in the Red Sea brine dataset. Metagenomic prokaryotic DNA was then extracted from each site and 454 shotgun sequencing was performed followed by read assembly. Taxonomic classification for archaeal and bacterial phyla was performed by protein-based phylogeny using the metagenomics rapid annotation using subsystems technology (MG-RAST) tool [37]. The metabolite analysis shell (AntiSMASH) tool was then used for annotation, for identifying specialized metabolism gene clusters (SMGCs) by translated amino acid sequence comparison with signature biosynthetic genes profile hidden Marcov Model (pHMMs), and for structure prediction of the specialized metabolites [9]. The Antibiotic Resistance Target Seeker (ARTS) tool was used to detect housekeeping and/or resistance genes within the SMGCs [38]. The predicted specialized metabolites were grouped by their potential functions into three major groups. Lastly, top candidate SMGCs were identified in the Red Sea brine dataset.   Figure 2. Overview of the specialized metabolism gene clusters encoded by the Red Sea brine pool metagenomes. The detected gene clusters are named as denoted by antiSMASH [9]. Normalized SMGC values were used.

Red Sea Brine Pool SMGCs Code for Diverse Potential Functions
We classified the predicted Red Sea SMGCs according to the potential function of their products into three main groups: (1) products of predicted functions commonly abundant in microbes, including saccharides, fatty acids, aryl polyenes and acyl-homoserine lactones, (2) subset of products with potential antibacterial and/or anticancer effects, including terpenes, ribosomal peptides, non-ribosomal peptides, polyketides and phosphonates, and (3) miscellaneous products that are predicted to confer adaptation to the environment/special structure/unknown function, including polyunsaturated fatty acids, ectoine, ladderane and others (Table 3). When applicable, the product class representative chemical structures are also depicted in Table 3, e.g., ladderane structure. Additionally, five predicted core structures of the potential products coded by the SMGCs of the Red Sea dataset were computationally predicted ( Figure 3). Two core structures were predicted in the KD LINF layer for non-ribosomal peptides, both coded for putative NRPS clusters ( Figure 3A). A chiral non-ribosomal peptide structure was predicted in KD UINF layer that was encoded by a putative NRPS cluster, as well as a hybrid polyketide-non-ribosomal peptide chiral structure encoded by a T1PKS-NRPS cluster ( Figure 3B). The fifth structure was a polyketide predicted in ATII 1500 layer and was encoded by the putative hybrid cluster T1pks-pufa-otherks ( Figure 3C). To prioritize specific SMGCs for further experimental work among all the detected SMGCs, Antibiotic Resistance Target Seeker (ARTS) analysis was used and revealed that four SMGCs harbored neighboring housekeeping and/or resistance genes ( Figure 3D-F).
We identified a subset of SMGCs common in brine pool water samples and distinct from the sediment samples and the overlying water column. Six different SMGCs (Bacteriocin, cf_fatty_acid, cf_putative, cf_saccharide, other and T3PKS) were detected only in the ATII brine samples (ATII INF, ATII UCL, ATII LCL) and four SMGC types (Bacteriocin, cf_fatty_acid, cf_putative and cf_saccharide) were common among the KD samples (KD BR, KD LINF, KD UINF). Additionally, subsets of SMGCs were common in the water column overlying ATII; namely, the three SMGC types (cf_fatty_acid, cf_putative and cf_saccharide). Cf_saccharide was the only SMGC type common in both DD BR and DD INF. Note that four SMGCs (cf_fatty_acid, cf_putative, terpene and cf_saccharide) were detected in all of the sediment samples (ATII SDM, DD SDM and NB SDM) (Table S3).
When hierarchical clustering was computed for the water samples only ( Figure 4A), i.e. excluding the sediment samples, most brine SMGCs reads grouped together (ATII LCL, ATII UCL, DD BR, DD INF, ATII INF, KD BR and KD LINF), except for KD UINF. Yet, ATII 50-1500 m and KD UINF water samples clustered together ( Figure 4A). Hierarchical classification of the sediment samples revealed that brine pool SMGCs clustered together (ATII SDM and DD SDM) ( Figure 4B). When all the samples in the dataset were used as an input for hierarchical classification (i.e. Red Sea and non-Red Sea metagenomes), the non-Red Sea SMGCs clustered together as an outgroup to Red Sea samples ( Figure S2). Table 3. The potential functions of the array of specialized metabolites encoded by Red Sea brine SMGCs. SMGCs are named as denoted by antiSMASH tool [45]. Hyphens indicate hybrid clusters. Cf_fatty_acid: fatty acid putative cluster. Cf_putative: unknown type putative cluster. Cf_saccharide: saccharide putative cluster. Hserlactone: cluster coding for homoserine lactone. Lantipeptide: cluster coding for lanthipeptide. NRPS: cluster coding for non-ribosomal peptide synthetase. OtherKS: cluster coding for other types of polyketide synthases. Pufa: cluster coding for poly-unsaturated fatty acids. T1pks: type I polyketide synthase. T2pks: type II polyketide synthase. T3pks: type III polyketide synthase.  Table 3. The potential functions of the array of specialized metabolites encoded by Red Sea brine SMGCs. SMGCs are named as denoted by antiSMASH tool [45]. Hyphens indicate hybrid clusters. Cf_fatty_acid: fatty acid putative cluster. Cf_putative: unknown type putative cluster. Cf_saccharide: saccharide putative cluster. Hserlactone: cluster coding for homoserine lactone. Lantipeptide: cluster coding for lanthipeptide. NRPS: cluster coding for non-ribosomal peptide synthetase. OtherKS: cluster coding for other types of polyketide synthases. Pufa: cluster coding for poly-unsaturated fatty acids. T1pks: type I polyketide synthase. T2pks: type II polyketide synthase. T3pks: type III polyketide synthase.   Table 3. The potential functions of the array of specialized metabolites encoded by Red Sea brine SMGCs. SMGCs are named as denoted by antiSMASH tool [45]. Hyphens indicate hybrid clusters. Cf_fatty_acid: fatty acid putative cluster. Cf_putative: unknown type putative cluster. Cf_saccharide: saccharide putative cluster. Hserlactone: cluster coding for homoserine lactone. Lantipeptide: cluster coding for lanthipeptide. NRPS: cluster coding for non-ribosomal peptide synthetase. OtherKS: cluster coding for other types of polyketide synthases. Pufa: cluster coding for poly-unsaturated fatty acids. T1pks: type I polyketide synthase. T2pks: type II polyketide synthase. T3pks: type III polyketide synthase.        DD BR, DD INF, ATII INF, KD BR and KD LINF), except for KD UINF. Yet, ATII 50-1500 m and KD UINF water samples clustered together ( Figure 4A). Hierarchical classification of the sediment samples revealed that brine pool SMGCs clustered together (ATII SDM and DD SDM) ( Figure 4B). When all the samples in the dataset were used as an input for hierarchical classification (i.e. Red Sea and non-Red Sea metagenomes), the non-Red Sea SMGCs clustered together as an outgroup to Red Sea samples ( Figure S2).

Red Sea Brine Prokaryotic Diversity in Relation to Specialized Metabolism Genes
To correlate the microbial communities and how their taxonomic profiles contribute to specialized metabolites production, archaeal and bacterial phyla were investigated and were related to the SMGCs found in each site ( Figure 5, Table S4, Table S5, Table S6). The abundance of major archaeal and bacterial phyla in the dataset was previously reported, albeit for the sequence reads not for the assembled contigs [22,36]. However, in this study we focused on bacterial phyla with the potentiality to produce bioactive products; as certain bacterial phyla were reported to produce bioactive compounds such as antibacterial and anticancer agents.

Red Sea Brine Prokaryotic Diversity in Relation to Specialized Metabolism Genes
To correlate the microbial communities and how their taxonomic profiles contribute to specialized metabolites production, archaeal and bacterial phyla were investigated and were related to the SMGCs found in each site ( Figure 5, Tables S4-S6). The abundance of major archaeal and bacterial phyla in the dataset was previously reported, albeit for the sequence reads not for the assembled contigs [22,36]. However, in this study we focused on bacterial phyla with the potentiality to produce bioactive products; as certain bacterial phyla were reported to produce bioactive compounds such as antibacterial and anticancer agents.   A total of 5 archaeal phyla and 28 bacterial phyla were detected in all of the Red Sea assemblies analyzed ( Figure 5, Table 2, Tables S4 and S5). ATII 1500 and KD LINF showed the most diverse archaeal and bacterial phyla sites and each harbored all of the detected archaeal and bacterial phyla (33 distinct phyla), while NB SDM showed the least number of bacterial phyla (9 distinct phyla). The following phyla were ubiquitously detected in all the sites included in the dataset: Proteobacteria, Cyanobacteria, Acidobacteria, Actinobacteria, Bacteroidetes, Chlorobi and Firmicutes. Euryarchaeota and Thaumarchaeota were present in all the water samples ( Figure 5, Table S4). Sediment samples harbored viruses (33.81%-52.72%) and Cyanobacteria (19.46%-38.18%) in addition to Proteobacteria (5.12%-31.51%), as major taxa. The percentage of the viruses was lower in all the water samples (0.23%-3.96%). DD samples harbored mainly Thaumarchaeota (17.44%-21.50%) and Proteobacteria (51.63%-52.33%) as major taxa. ATII 50 harbored mainly Cyanobacteria (29.97%), while ATII 200-1500 harbored Thaumarchaeota (29.12%-33.08%) and ATII 50-1500 harbored Proteobacteria (48.86%-57.97%) as major taxa. ATII brine layers mainly harbored Proteobacteria (92.44%-97.16%) as the major taxa. KD BR mainly harbored Euryarchaeota (54.59%) while KD UINF-LINF harbored Thaumarchaeota (26.68%-66.62%), and all KD layers harbored Firmicutes (6.06%-16.57%) and Proteobacteria (7.07%-45.04%) as major taxa. The most abundant archaeal and bacterial genera were recorded for each site (Table S6). Additionally, the prokaryotic phyla unique to Red Sea samples were recorded, prokaryotic phyla common to Red Sea and other hydrothermal vent metagenomes as well as the prokaryotic phyla unique to the other hydrothermal vent metagenomes (Table S10).

Rare Leucine Codons within Red Sea Brine SMGCs and Low Similarity to Known Clusters with Characterized Products
The counts of TTA codons were denoted for all the Red Sea SMGCs, excluding saccharides and fatty acids (Table S7). All the Red Sea sites harbored TTA rare leucine codons except DD INF layer, with a total of 7684 (absolute value), and an average of 512. The TTA codon counts were particularly high (>1000) in ATII 50, KD UINF and ATII 700 water layers.
To detect homologous gene clusters to the SMGC hits found in the Red Sea brine pool dataset we used 1-ClusterBlast, 2-KnownClusterBlast and 3-SubClusterBlast algorithms which are embedded in antiSMASH pipeline, for all the SMGCs except for cf_saccharides and cf_fatty_acids, (Table S8) [45]. Those algorithms detect homologous gene clusters and thus identify taxonomic and functional characteristics for the SMGCs [45]. Out of the 301 SMGCs, 204 were found to be of significant homology to gene clusters in the database, 30 SMGCs had identified homologous known gene clusters while only 4 SMGCs showed significant hits with homologous subclusters (Table S8). 77 of the 301 SMGCs had no significant homologous gene clusters, according to the ClusterBlast module embedded in antiSMASH (Table S8). The percentage of genes with similarity to the Red Sea SMGCs ranged between 2%-43% for the homologous gene clusters, 2-77% for the homologous known gene clusters and 20-66% for the homologous subclusters.

Saccharide and Putative SMGCs Are the Most Abundant Groups in the Red Sea Brine Dataset
Metagenomic read sequences were previously reported for ATII water column samples [19], brine ATII, DD and KD water samples [20], as well as for the sediment samples [36]; 16S rRNA and taxonomic analysis of the different sediment sections were previously discussed [28]. In this study we utilized the assembled metagenomes for all the previously mentioned samples, in order to detect and thoroughly analyze the specialized metabolism gene clusters encoded in the Red Sea brine pool metagenomic dataset.
A total of 2751 SMGCs belonging to 28 different SMGC types, were detected in all 15 assembled metagenomes (Figures 1 and 2). The average number of detected clusters/Mb in the Red Sea dataset was 0.38 SMGC/Mb, ranging from 0.13 SMGC/Mb (NB SDM) to 0.67 SMGC/Mb (KD LINF) (Tables 1 and 2).
The average number of detected SMGC/Mb in more than 1000 studied bacterial genomes was reported to be 2.4 [46],~6 times higher than that of the Red Sea brine metagenomic dataset. The number of SMGCs detected by antiSMASH is linear to the bacterial genome size [46], however, metagenomic data analysis is inherently more challenging than genomic data, owing to the short reads followed by assembly [61]. This could explain the relatively limited number of detected SMGCs. Recent studies detected SMGCs in genomes of new soil-residing Pseudovibrio bacterial strains using antiSMASH [62,63].
The most abundant SMGC class detected in the Red Sea samples coded for saccharides (80.49% of total SMGCs) ( Figure 2). This is in concordance with the study of Cimermancic et al., wherein saccharides comprised the largest detected SMGC type (40% of total detected SMGCs) [46]. The second largest group of SMGCs was the cf_putative SMGCs (7.46% of the total SMGCs) ( Table S2). Such putative gene clusters are unknown biosynthetic gene clusters, with no specific category assigned to them. It is likely that these novel gene clusters constitute a specific group of 'SMGC dark matter' [9], that we aimed to preliminarily categorize.
Selected SMGCs were unique to selected Red Sea sites, e.g. terpene SMGC was common to all SDM samples (Table S3), suggesting that each group of sites possesses a distinct SMGC 'signature' profile. Interestingly, ATII LCL has the harshest physicochemical conditions [33] and it harbors the most diverse SMGCs (Table 2), which shows the importance of studying SMGCs in extreme environments.

Preliminary Evidence of Potential Products with Pharmaceutical Applications
In order to prioritize the search for novel antibacterial and anticancer compounds among the detected orphan SMGCs, we recommend characterizing the following SMGCs in the future: 1, SMGCs coding for terpenes, peptides, polyketides and phosphonates (Table 3, group 2 SMGCs). 2, SMGCs having products with predicted structures ( Figure 3A-C), wherein the SMGCs can be expressed, the structures elucidated and their bioactivities characterized [46]. 3, Type III PKSs that are easier to clone than other PKSs, and are capable of producing products with diverse structures [54]. 4, SMGCs likely to encode for novel metabolites, perhaps with antibacterial activity (Figure 3D-F): terpene, type III PKS, aryl polyene and lastly otherks-PUFA-T1PKS. The latter SMGC category should be studied, as core housekeeping genes and/or resistance genes were detected in close proximity to them. Recently, targeted mining genomes was successfully conducted through the detection of resistance and housekeeping genes within SMGCs [38]. Therefore, it is important to further explore those former four SMGCs categories as promising candidate Red Sea brine SMGCs.
Structural prediction of Red Sea SMGCs products revealed chiral and non-chiral non-ribosomal peptides as well as a polyketide ( Figure 3D-F). Only five specialized metabolite core structures were predicted with antiSMASH. Perhaps the Red Sea brine microbiota are evolutionarily distant from the well-characterized genes in the published databases, hindering further product structural and functional prediction. The few predicted Red Sea SMGCs chemical structures ( Figure 3D-F) remain to be verified by experimentation e.g. mass spectrometry and comparative metabolomic studies [8].
Rare TTA leucine codons are especially enriched in specialized metabolism genes and cell differentiation genes; however, they should be optimized for successful expression [45]. The TTA codon counts were particularly high within ATII 50, KD UINF, and ATII 700 SMGCs (Table S7). This is a possible translational impediment in the case of SMGCs in heterologous hosts, especially for those three sites, and should be accounted for prior to expression.
This metagenome mining study has several limitations because: 1, biosynthetic genes might have been missed as antiSMASH searches for partial or complete gene clusters rather than individual genes, 2, small contigs were excluded as the contig length cut-off was 1000 bp, and 3, antiSMASH detects SMGCs using a rule-based approach based only on the known pathways, so it is likely that we obtained less hits, e.g., SMGCs utilizing un-characterized pathways [64].
Functional screening of the ATII LCL fosmid library lead to the detection of orphan biosynthetic gene clusters that conferred antibacterial and anticancer effects [65]. Hence both functional [65] and computational detection of SMGCs in Red Sea brine pools corroborate that investigating the Red Sea brine pool niche for SMGCs of pharmaceutical application, is a promising approach.

Red Sea Brine SMGCs form a Unique Cluster
Hierarchical clustering revealed that almost all brine water samples (ATII LCL, ATII UCL, DD BR, DD INF, ATII INF, KD BR and KD LINF) formed a unique SMGC brine cluster ( Figure 4A), while KD UINF and ATII 200-1500 water samples clustered together. The physical conditions in KD UINF resembles deep Red Sea water, as the conditions are not as harsh as other brine sites [20,27]. Also, the SMGC percentage composition of KD UINF was closer to ATII 50-1500 m water sites, e.g. cf_saccharide abundance is similar in KD UINF and ATII water samples (77.78% in KD UINF and 58.22%-76.79% in ATII water samples), as opposed to 92.51% and 95.30% in the other KD samples. The detailed percent composition of each SMGC per Red Sea brine site is included (Table S9). Brine sediments had similar SMGC profiles, perhaps due to similar environmental conditions as opposed to the NB site, the latter having the least harsh conditions ( Figure 4B) [28]. Upon hierarchical classification of all of the included sites, the Red-Sea metagenomic samples clustered together, contrast with the non-Red Sea SMGC profile, re-emphasizing the possibility of a Red Sea SMGC profile signature ( Figure S2).

Environment-Microbe Interaction, Antagonistic Stressors and Extremophile Survival Implicated by Red Sea Brine SMGCs
Certain sites harbored unique SMGCs (Table 2) such as ladderane, that was detected only in KD UINF. Ladderane is exclusively present in the membranes of anaerobic ammonia-oxidizing (anammox) bacteria [66]. Noting that anammox bacteria thrive in the presence of sulfide influx and can perform denitrification [67], and KD is characterized by high H 2 S content [23], thus denitrification may be coupled to sulfide in KD UINF anammox bacteria.
Bacteriocin SMGCs were detected in all ATII and KD brine water samples (Table S3), which are putative antagonistic stressors. An earlier study detected numerous halocins-bacteriocins and archaeocins produced by halophiles and were hypothesized to play a role in ensuring microbial diversity in hypersaline environments [68]. Microbes inhabiting ATII and KD niches seem to be utilizing similar mechanisms for survival. KD samples also hosted ectoine synthase-coding genes in the brine-seawater interface layers (Table S3). Ectoine is a stress solute that enables halophiles to withstand high salt content [59], and thus "salt-out" mechanism is likely to account for KD microbes survival in such high salinity [69].
Terpene SMGCs were detected in all SDM samples (Table S3). A previous study suggested that diterpenoids produced by the hyperthermophilic Chloroflexus aurantiacus may function in modulating membrane fluidity [70]. It is likely that terpenes provide the correct cell membrane fluidity for extremophiles [70]. Further experimentation is needed to shed light on the particular structure and function of the Red Sea terpenes. Terpenes with novel chemistries were isolated from a fungus inhabiting deep-ocean sediments, and conferred anticancer activity [71]. Terpene SMGCs were most enriched in the ATII 50 site, pointing to the possibility that shallow seawater favors terpene production. ATII brine water layers all harbored PKSIII SMGCs (Table S3). PKSIII enzymes are capable of producing different chemical structures, some of which have important bioactivities [54]. Maybe different stress factors within the same brine, ATII, have interfered in the overlaying water and underlying sediment microbiomes to produce the terpene SMGC signature, while causing the brine water layer microbiomes to have the PKSIII SMGC signature.

Prolific Specialized-Metabolite-Producing Phyla Detection and Red Sea Brine Pool SMGC Dark Matter Analysis
As expected, all Red Sea sites harbored Proteobacteria, Actinobacteria and Cyanobacteria (Table S5). Yet these phyla are known to produce a huge repertoire of specialized metabolites [46,[72][73][74]. Certain bacterial taxa produce above average SMGCs/Mb, e.g., Myxococcus, Streptomyces and Gloeobacter -which belong to the aforementioned phyla, respectively [46]. Our analysis of the Red Sea brine dataset revealed that not all the phyla were shared in neighboring sites, even when their SMGCs clustered together, e.g., KD BR and KD LINF. Additionally, not all neighboring sites clustered together based on the SMGCs even if they harbored similar phyla, e.g., KD UINF and KD LINF (Figures 4 and 5). These observations hint at a debatable relationship between SMGCs and phylogeny, similar to that reported for genomic datasets [46].
Members of the most abundant bacterial and archaeal genera (Table S6) were reported to be living in similar niches or having biotechnological applications, e.g., Nitrosopumilus maritimus survives in minimal ammonia available in its marine environment [75], and Prochlorococcus MIT9313 produces lanthipeptides [76]. Only 10% of the SMGCs were homologous to experimentally characterized gene clusters (Table S8), indicating a huge opportunity to further investigate the Red Sea brine pool SMGC dark matter. Several homologous gene clusters pertained to aquatic microbes (e.g., Mizugakiibacter sediminis) [77], while others pertained to halophilic microbes (e.g., Verrucomicrobia bacterium) [78]. SMGC similarity may contribute to microbial evolution in similar ecological niches [79], perhaps leading to a future halophilic SMGC signature profile.
When we compared the phyla in the Red Sea samples to the other included marine hydrothermal vents (GB VNT, KSW VNT, K VNT, JDF VNT, LC MM), we identified 145 unique prokaryotic phyla in the Red Sea samples (Table S10). The unique phyla to the Red-Sea, the unique phyla to the other marine hydrothermal vent metagenomes and the common phyla in all sites used are presented (Table S10).

Sampling, DNA Extraction and Sequencing
The overall study workflow that we employed is depicted in Figure 1 [19,20]. Additionally, sediment samples included sediments underlying ATII brine pool (ATII SDM) were included, comprising seven distinct layers, sediments underlying DD brine pool (DD SDM), comprising seven distinct layers and sediments underlying two brine-influenced sites (NB SDM) [28]. In total, 28 different samples were analyzed, including 12 water samples and 16 sediment samples, that were previously described [19,20,28].
The samples were previously collected in April 2010 on Aegaeo research vessel that was the second leg of the Red Sea expedition of KAUST/WHOI/HCMR (KAUST: King Abdullah University of Science and Technology, WHOI: Woods Hole Oceanographic Institution, HCMR: Hellenic Center for Marine Research) [19,20,28]. Further information on sample locations are tabulated in Table S1. The water samples were sequentially filtered through 3.0, 0.8 and 0.1 µm filters. DNA was then extracted from the 0.1 µm filter [28]. A Genome Sequencer (GS FLX) pyrosequencer was used for sequencing the DNA samples by Titanium pyrosequencing kit (454 Life Sciences). Quality control for the obtained reads was done by PRINSEQ-lite v0.20.4 [80] and CD-HIT-454 [81].

Bioinformatics Assembly
The assembly files were generated using Newbler assembler v2.6 [82]. For the Red Sea samples, default parameters were used for overlap layout consensus, except extension over read tips, that was opted to cope with the low coverage in direct DNA 454 shotgun sequencing runs on the metagenomic samples. The reads were limited to one contig in output. For the non-Red Sea samples, default parameters were used.
Distinct assembly files were generated for all the water samples, from particular sites and depths [19,20]. However, sediment (SDM) reads from the same brine pool were cross-assembled, i.e., ATII SDM cross-assembly comprised samples obtained from seven different ATII sediment depths, and DD SDM cross-assembly from seven different DD sediment depths [36]. NB sediments cross-assembly was derived from samples obtained from two brine-influenced sites [28]. Although NB SDM contains sediments from two distinct neighboring sites that are brine-influenced, which are spatially different, they were pooled to provide an outgroup for the brine sediments and serve as non-brine sediment samples. This resulted in the generation of 15 assembly files that were further used for downstream analyses. In addition, five additional assembly files were constructed for the non-Red-Sea samples from other marine hydrothermal vent sites (JDF VNT, KSW VNT, K VNT, LC MM, GB VNT).

Annotation, SMGCs Analyses and Hierarchical Classification
The metagenomic assembly files were run on antiSMASH using contigs of size equal to or larger than 1000 bp. Annotation was performed using the Prodigal gene finding option for metagenomes. The SMGCs were detected by comparison of translated amino acid sequences with pHMM signature for biosynthetic genes using the ClusterFinder algorithm [9]. The full SMGC detection analysis was performed during 08/2015 (i.e. by antiSMASH version 3.0 [9]). All the counts of detected SMGCs were normalized by dividing each value with the number of assembled reads at each site *10 6 and used for subsequent analyses. To account for the variability in the sequencing depth, normalization was performed.
In order to detect homologous gene clusters to all Red Sea brine SMGCs-excluding putative clusters for saccharides and fatty acids-the algorithms ClusterBlast, KnownClusterBlast and SubClusterBlast were used. The first algorithm compares the SMGCs with all gene clusters in Genbank database of NCBI (National Center for Biotechnology Information) and its output is useful to identify the closest organism based on SMGCs [45]. The second algorithm compares the SMGCs with the gene clusters that are experimentally characterized in MIBiG database, thus giving an indication on the products likely to be synthesized by the SMGCs [45]. The third algorithm detects operons that are conserved and with a known function and gives more specific information on the product synthesized by the SMGCs [45]. This analysis was performed during 11/2017. Putative clusters coding for the production of saccharides and fatty acids were excluded because the scope of the study was to investigate products of potential ecological functions and/or pharmaceutical applications. The TTA codons were also recorded for each site and the absolute counts and normalized values were denoted for each assembly file. Additionally, predicted core structures of the specialized metabolites were recorded. All the aforementioned analyses were done using antiSMASH version 4.0 [45]. The contigs coding for products other than saccharides and fatty acids were screened for housekeeping and resistance genes within the SMGCs using the ARTS program [38].
Hierarchical classification for all the SMGCs detected in all the assembly files, was performed using R version 3.3.1 (R Development Core Team 2016). Distinct heat maps were generated for water and sediment samples using as an input the normalized SMGC count for each of the sites.

Taxonomic Classification
Taxonomical trees for the archaeal, bacterial and viruses phyla were generated by metagenomics rapid annotation using subsystems technology (MG-RAST) tool using the assembled contigs as the input sequences, by comparing them to the non-redundant protein database M5NR (maximum e-value 1e-5, minimum identity of 70%, minimum alignment length of 50 measured in amino acids for protein and bp for RNA databases) [37,83]. Based on the protein-based phylogeny, the detected phyla were denoted for each site, with focus on the archaeal and bacterial phyla of relative abundance ≥0.5% in at least one of the assemblies.

Conclusions
Our study highlights the importance of Red Sea brine pool water and sediment microbes and their potential capability in producing specialized metabolites. ATII, DD and KD brine pool sites included in the study are thus worthy of bioprospecting ( Figure 2, Table 3). The diverse potential functions of the detected SMGCs' products in the Red Sea dataset, varying from halophilic adaptations to polyketides and peptides of potential antibacterial activity, renders it an attractive mine for such exploration. Our data provides insights on Red Sea brine SMGCs particularly focusing on antibacterial and anticancer exploration. Promising SMGCs code for products with reported antibacterial and anticancer effects, namely terpenes, peptides, polyketides and phosphonates. Also interesting are SMGCs with predicted structures, and SMGCs harboring housekeeping and/or resistance genes. Cloning such genes clusters would provide information on new 'cryptic' gene clusters that might be responsible for synthesis of novel natural products, and improve our understanding of the evolution of extremophiles and their adaptation mechanisms to such extreme environments [84]. Future studies are required to clone and express those SMGCs, in order to elucidate the novel chemical entities that possibly serve as antibacterial and/or anticancer drugs, among other different potential functions.  Figure S2: Heat map representing hierarchical classification of the SMGCs detected in all the metagenomes in the dataset. Table S1: The sampling locations of each of the sites in the dataset. Table S2: The SMGCs detected at each site (absolute values) with the name of the SMGCs detected, the count and total numbers for each site and for each SMGC are denoted. Table S3: The normalized number of SMGCs detected at each site (normalized: number of SMGCs detected/ number of assembled reads at each site *10 6 ), with the name of the SMGCs detected, the count and total numbers for each site and for each SMGC are denoted. In dark blue are the cells with unique SMGCs -only detected once in all the data. Similarly colored cells indicate the SMGCs common to groups of sites (light pink: ATII water column depths, light blue: sediments, yellow: ATII brine layers, bright pink: KD brine layers). Table S4: Archaeal and Bacterial and viral phyla detected for each of the Red Sea brine assembly files. The relative abundance is shown whenever detected to be ≥ 0.5% in at least one of the assemblies. Table S5: All the detected archaeal and bacterial phyla in all the Red Sea brine sites by MG-RAST. The relative abundance is shown for all the phyla and values > 0 are highlighted in red. Table S6: The most abundant archaeal and bacterial genera in all the sites in the dataset. Table S7: Table showing the rare leucine TTA codon absolute and normalized count in each site, for all the Red Sea brine SMGCs excluding the saccharides and fatty acids. Table S8: The best hit homologous gene clusters pertaining to all the Red Sea brine SMGCs detected in the study by antiSMASH excluding cf_saccharides and cf_fatty_acids in all the sites included in the Red Sea dataset as computed by ClusterBlast algorithm of antiSMASH [1]. The homologous known gene clusters were detected are also denoted, as well as homologous subclusters. Blue: SMGCs identified in 11/2017 upon re-running the contigs with hits that did not appear in 8/2015. Green: SMGCs identified before in 8/2015 but not when re-run in 11/2017. Table S9: Table showing the percentage of each SMGC as compared to the total SMGCs detected per site among the Red Sea brine samples. Table S10: All the detected archaeal and bacterial genera in all sites included in the dataset.
Author Contributions: L.Z. has conducted the computational analyses and data visualization. M.A. and M.N.M. have performed the assembly of the metagenomic sequencing data. R.S. has supervised this work. L.Z. and R.S. wrote the manuscript. R.S., L.Z. and M.A. have edited the manuscript, discussed the results and commented on the manuscript. All authors read and approved the manuscript.
Funding: This work was funded by an AUC Faculty Support Grant to R.S.