Diverse and Abundant Secondary Metabolism Biosynthetic Gene Clusters in the Genomes of Marine Sponge Derived Streptomyces spp. Isolates

The genus Streptomyces produces secondary metabolic compounds that are rich in biological activity. Many of these compounds are genetically encoded by large secondary metabolism biosynthetic gene clusters (smBGCs) such as polyketide synthases (PKS) and non-ribosomal peptide synthetases (NRPS) which are modular and can be highly repetitive. Due to the repeats, these gene clusters can be difficult to resolve using short read next generation datasets and are often quite poorly predicted using standard approaches. We have sequenced the genomes of 13 Streptomyces spp. strains isolated from shallow water and deep-sea sponges that display antimicrobial activities against a number of clinically relevant bacterial and yeast species. Draft genomes have been assembled and smBGCs have been identified using the antiSMASH (antibiotics and Secondary Metabolite Analysis Shell) web platform. We have compared the smBGCs amongst strains in the search for novel sequences conferring the potential to produce novel bioactive secondary metabolites. The strains in this study recruit to four distinct clades within the genus Streptomyces. The marine strains host abundant smBGCs which encode polyketides, NRPS, siderophores, bacteriocins and lantipeptides. The deep-sea strains appear to be enriched with gene clusters encoding NRPS. Marine adaptations are evident in the sponge-derived strains which are enriched for genes involved in the biosynthesis and transport of compatible solutes and for heat-shock proteins. Streptomyces spp. from marine environments are a promising source of novel bioactive secondary metabolites as the abundance and diversity of smBGCs show high degrees of novelty. Sponge derived Streptomyces spp. isolates appear to display genomic adaptations to marine living when compared to terrestrial strains.


Introduction
Microbially derived natural products are an important source of novel biotherapeutic agents, with more than 22,000 biologically active compounds being isolated to date from microorganisms [1]. Approximately 45% of these are produced by Actinobacteria, with members of the genus Streptomyces being particularly proficient producers [2]. Actinomycetes have produced a number of important drug while antibacterial compounds, are produced by Streptomyces scopuliridis SCSIO ZJ46 which was isolated from a depth of 3536 m in the South China Sea [21]. Streptomyces drowzdowiczii SCSIO 10141, isolated from 1396 m in the South China Sea is known to produce the anti-infective metabolites [22]; cytotoxic and antibacterial compounds are produced by another deep sea isolate, Streptomyces niveus SCSIO 3406 [23].
With this in mind and following an extensive screening regime of marine actinomycete bacteria which we had previously isolated from the shallow water sponge Haliclona simulans [24] and from two deep-sea sponges Lissodendoryx diversichela and Stelletta normanii [25], involving growth inhibition of a number of clinically relevant bacterial and fungal species, 13 marine sponge-derived Streptomyces spp. strains were chosen for whole genome sequencing in an attempt to determine the secondary metabolite biosynthetic potential of these strains.

Antimicrobial Activities
A total of over 540 actinomycetes, including some which had previously been isolated from shallow water and deep-sea sponges in Irish waters [24][25][26], were screened for growth inhibition of a number of clinically relevant bacterial and fungal/yeast species. Thirteen of these strains which displayed the most interesting range of bioactive antimicrobial activities, including growth inhibition of problematic anti-microbial resistant (AMR) human pathogens such as methicillin-resistant Staphylococcus aureus (MRSA) and vancomycin-intermediate Staphylococcus aureus (ViSA), were identified for subsequent analysis (Table 1).

Genome Sequencing
The genomes of these 13 Streptomyces spp. were sequenced using Illumina MiSeq paired-end sequencing. These strains were designated as follows: SM1, SM5, SM9, SM10, SM11, SM12, SM14, SM16, SM17, SM18, FMC008 isolated from shallow water sponges and B188M101 and B226SN101, isolated from deep-sea sponges. Genome assemblies resulted in a large number of contigs (n = 195-1592) ( Table 2). Gap closure of these assemblies was hampered by the typical high GC content of the genomes of Streptomyces spp., as well as the presence of very many highly repetitive sequences in gene clusters such as polyketide synthase (PKS) and nonribosomal peptide synthetases (NRPS) clusters. The genomes ranged in size from 6.41 to 8.44 Mb (Table 2).

Taxonomy and Phylogeny
The phylogeny of the 13 marine Streptomyces was determined by analysis of the 16S rRNA genes ( Figure 1A), and by Feature Frequency Profile (Ffp) [27] of the whole genomes ( Figure 1B), together with genomic analysis using Kraken (Supplementary Table S1) [28]. In the latter analysis, greater than 75% of the sequence reads were assigned closely to a known species while three of the strains (SM1, SM12 and SM14) could only be assigned at the genus level ( Table 3).
The strains recruited to four main groups, Group A (comprising S. griseus, S. fulvissimus as well as SM11 and SM16 and both of the deep sea strains, B188M101 and B226SN101), Group B (comprising S. albus, SM17, SM9 and FMC008), Group C (comprising S. sirex, SM18, SM5 and SM10) and Group D (comprising SM12 and SM14) ( Figure 1B). Analysis of the 16S rRNA genes suggest that SM1 is closely related to the group D isolates in a polyphyletic clade which is a sister clade to the Group B isolates, however Ffp based phylogeny indicates that SM1 falls outside of that polyphyletic clade.

Pan Genome and Core Genome
The pan genome (Figure 2A) of these marine isolates is open with an average of 648 genes added for each additional genome considered and a total gene complement of 14,066 genes shared amongst the isolates. The core genome ( Figure 2B) comprises 1699 genes indicating that only 12% of the pan genome is shared by all of the strains examined here. The pan genome analysis highlights the genetic diversity within the genus, indicating that the biosynthetic potential of these marine Streptomyces warrants further investigation, including the potential for biodiscovery of novel secondary metabolites from these isolates.
The highest intra-group sequence conservation, and therefore the lowest diversity, was seen within the Streptomyces albus group (Group B-SM9, SM17 and FMC008), suggesting that this group may prove to be less interesting in terms of potential discovery of novel bioactive secondary metabolites. While this hypothesis is supported by antiSMASH [29], analyses of the genomes of strains SM9 and FMC008 (which noticeably harbours the lowest abundance of smBGCs of any of the strains in this study-16 and 15 clusters, respectively) but not when considering the genome of strain SM17 (comprising 49 secondary metabolism gene clusters-the fourth highest abundance of any of the strains in this study).

Secondary Metabolism Biosynthetic Gene Clusters
The 13 Streptomyces spp. genomes were investigated for the presence of smBGCs of potential interest. All of the genomes examined contained numerous smBGCs as identified by antiSMASH (Table 3; Supplementary Tables S2-S14). The 13 strains contain a combined total of 485 individual clusters which share homology to 87 distinct known gene clusters whose metabolic product is known. While some clusters were common to many strains, all strains also contained clusters unique to that strain only (48 gene clusters are found in only one strain). The most commonly shared clusters were those showing similarities to the known clusters which produce the compatible solute ectoine (11 strains), the siderophore desferrioxamine B and the anti-tumour metabolite herboxidiene (7 strains), the peptide siderophore coelichelin, the carotenoid light-harvesting pigment isorenieratene and the terpene hopene (6 strains). The abundances or types of particular smBGCs are not a clear indicator of phylogenetic relatedness or of the isolation source (host sponge or sampling depth).

Pan Genome and Core Genome
The pan genome (Figure 2A) of these marine isolates is open with an average of 648 genes added for each additional genome considered and a total gene complement of 14,066 genes shared amongst the isolates. The core genome ( Figure 2B) comprises 1699 genes indicating that only 12% of the pan genome is shared by all of the strains examined here. The pan genome analysis highlights the genetic diversity within the genus, indicating that the biosynthetic potential of these marine Streptomyces warrants further investigation, including the potential for biodiscovery of novel secondary metabolites from these isolates.
The highest intra-group sequence conservation, and therefore the lowest diversity, was seen within the Streptomyces albus group (Group B-SM9, SM17 and FMC008), suggesting that this group may prove to be less interesting in terms of potential discovery of novel bioactive secondary metabolites. While this hypothesis is supported by antiSMASH [29], analyses of the genomes of strains SM9 and FMC008 (which noticeably harbours the lowest abundance of smBGCs of any of the strains in this study-16 and 15 clusters, respectively) but not when considering the genome of strain SM17 (comprising 49 secondary metabolism gene clusters-the fourth highest abundance of any of the strains in this study).

Secondary Metabolism Biosynthetic Gene Clusters
The 13 Streptomyces spp. genomes were investigated for the presence of smBGCs of potential interest. All of the genomes examined contained numerous smBGCs as identified by antiSMASH (Table 3; Supplementary Tables S2-S14). The 13 strains contain a combined total of 485 individual clusters which share homology to 87 distinct known gene clusters whose metabolic product is known. While some clusters were common to many strains, all strains also contained clusters unique to that strain only (48 gene clusters are found in only one strain). The most commonly shared clusters were those showing similarities to the known clusters which produce the compatible solute ectoine (11 strains), the siderophore desferrioxamine B and the anti-tumour metabolite herboxidiene (7 strains), the peptide siderophore coelichelin, the carotenoid light-harvesting pigment isorenieratene and the terpene hopene (6 strains). The abundances or types of particular smBGCs are not a clear indicator of phylogenetic relatedness or of the isolation source (host sponge or sampling depth).  Group A isolates display the largest genome sizes (Table 1) and with the exception of SM16 (n = 39) also possess the most abundant smBGCs (n = 51-54). The genomes of the Group A isolates show many similarities with all strains harbouring gene clusters related to the known clusters which produce the previously characterised metabolites ectoine, desferrioxamine B, coelichelin, herboxidiene and isorenieratene and also for γ-butyrolactone, alkylresorcinol and griseobactin. While these latter three metabolites are shared by all Group A strains, they are also exclusive to that group. Three of the four Group A strains also harbour gene clusters with similarities to the cluster known to produce the antibiotic friulimicin produced by the actinomycete Actinoplanes friuliensis, daptomycin, the heat-stable antifungal factor (HSAF) and also the peptide morphogen AmfS.
Despite the phylogenetic relatedness of Group A isolates, the bioactivity profiles of the strains were not consistent. The deep-sea strains (B188M101, B226SN101) only displayed activity against yeasts while those sourced from shallow waters were only active against bacteria. Both Group A shallow water isolates (SM11, SM16) inhibited the growth of problematic drug resistant Staphylococcus spp. (Table 1). This makes further investigation of the smBGCs of these strains of high interest.
Groups B, C and D strains are less similar with few gene clusters shared by all members of any of those groups, except for ectoine biosynthesis genes which are highly similar in all Group B and Group C strains.
Two of the Group B strains (SM9, FMC008) harbour the fewest smBGCs (Table 3) Figure S5A) are diverse and abundant and form clades which correlate closely with phylogeny and show high similarities to those domains in the genome of S. albus J1074. All Group B strains harbour gene clusters with low similarity to the antimycin production cluster from S. albus J1074. Two of the three Group B strains also harbor clusters with low similarities to complestatin biosynthesis genes (SM9 and SM17) from S. albus J1074, the desotamide production cluster (SM9 and FMC008) from Myxococcus fulvus HW-1 and mannopeptimycin production genes (SM9 and FMC008) from Streptomyces hygroscopicus. The abundance of unknown NRPS gene clusters in the genome of SM17 as well as its broad range of antimicrobial activities versus E. coli and MRSA (Table 1) warrant further research efforts aimed at identifying the encoded metabolites.
Two of the three Group C strains contain gene clusters showing relatedness to the clusters known to produce mirubactin and paenibactin in Streptomyces sp. NTK 937, coelibactin in Streptomyces lividans TK24 and bafilomycin in Streptomyces lohii strain ATCC BAA-1276 (SM5 and SM18), together with cystothiazol A in Streptomyces griseus subsp. griseus NBRC 13350 and steffimycin in Streptomyces fulvissimus DSM 40593 (SM10 and SM18). This group contains numerous PKS, NRPS and PKS/NRPS hybrid clusters and display antibiotic activity against E. coli (SM5 and SM10) and against hVISA (SM10) and MRSA (SM18) while also inhibiting the growth of Bacillus spp. (Table 1). Thirteen of the smBGCs in the genome of SM18, including PKS, NRPS, PKS/NRPS hybrid, lantipeptide and bacteriocin clusters show no relatedness to known clusters in antiSMASH analysis.
The only secondary metabolism gene cluster common to Group D isolates displays low levels of similarity to the cluster which produces zorbamycin in Streptomyces flavoviridis ATCC 21892. Both Group D strains (SM12 and SM14) inhibited the growth of hVISA ( Table 1) making them of particular interest because they appear phylogenetically distant from all other strains presented here. The strains could not be assigned to a genus in Kraken analysis (Supplementary Table S1) and appear phylogenetically distant in FFP analysis ( Figure 1B). The genome of SM12 hosts 18 Type I PKS clusters of which 10 share no homology with known gene clusters in the linked Minimum Information about a Biosynthetic Gene Cluster (MIBiG) database (https://mibig.secondarymetabolites. org/index.html) [30]. Increasing the taxonomic diversity of available Streptomyces spp. strains available for screening regimes is predicted to increase the chemical diversity of identified metabolites. With that in mind, these phylogenetically remote strains with demonstrated bioactivities and numerous uncharacterised PKS gene cluster products are of high interest.
The genome of SM1, which does not recruit to the groups (A-D) described here, also harbours a unique secondary metabolism gene cluster profile. While SM1 does harbour clusters for the frequently observed desferrioxamine B and isorenieratene production genes, it is one of only two genomes presented here not to encode ectoine biosynthesis genes. Of the 28 gene clusters in the genome of SM1, as identified by antiSMASH, only nine of those clusters show homology to known gene clusters and three of these are unique to SM1 in our study. As this strain has displayed broad-range antimicrobial activity and appears only distantly related to other strains in Figure 1B and distantly related to the metabolically talented S. coelicolor A3(2) in Kraken analysis (Supplementary Table S1), it also warrants further investigation with respect to the further characterization of its metabolites.

Protein Family (Pfam) Domain Analysis
The genome assemblies of the 13 marine Streptomyces strains were highly fragmented, and so smBGC prediction can be hampered by particular clusters being fragmented across different contigs. This can lead to an over-prediction of the numbers of clusters present. Nonetheless it was possible to analyse deduced proteins at the domain level.   Figure S1), 12 of these clades include only one of the phylogenetic sub-groups (A-D) described earlier (Figure 1B), suggesting a link between the phylogeny of the isolates and a subset of the KS domains therein. As the remaining 13 clades of KS domains contain domains from two or more phylogenetic subgroups, it appears that these KS domains are not phylogeny-related but represent a more general genetic diversity. It is clear from the tree (Supplementary Figure S1) that the number of clades and the individual branch lengths, indicate that very high levels of diversity are apparent even within a single protein domain type amongst a limited number of isolates from this single genus.

Nrps Gene Clusters
Protein domains from NRPS clusters (condensation starter, condensation DCL, condensation LCL, epimerization and heterocyclization were analysed individually by sequence alignments and phylogenetic tree building (Supplementary Figures S2-S5). Thirteen distinct clades of condensation starter domains were observed (Supplementary Figure S2). These domains appear diverse but are more closely distinguishable as recruiting to phylogenetic groups A-D ( Figure 1B) than is observed in the KS domain analysis (Supplementary Figure S1). Six of the clades contain domains from only one phylogenetic sub-group while one of these clades includes domains from all four phylogenetic sub-groups. It is noted that a single isolate may contain more than one condensation starter domain gene (where more than one NRPS gene cluster is present). These condensation starter domains may be highly similar (e.g., condensation starter domains of SM9-Supplementary Figure S2) or indeed very different (e.g., condensation starter domains of SM12- Figure S2). It is also noteworthy that although the genome of SM14 harbours two NRPS gene clusters, no condensation starter domain has been identified in these clusters. When considering other NRPS domains, condensation DCL (Supplementary Figure S3) and condensation LCL (Supplementary Figure S4), epimerization (Supplementary Figure S5A) and heterocyclization (Supplementary Figure S5B) domains, many phylogenetic clades harbour sequences which correlate with the taxonomic phylogeny described in  Figure S1), 12 of these clades include only one of the phylogenetic sub-groups (A-D) described earlier (Figure 1B), suggesting a link between the phylogeny of the isolates and a subset of the KS domains therein. As the remaining 13 clades of KS domains contain domains from two or more phylogenetic subgroups, it appears that these KS domains are not phylogeny-related but represent a more general genetic diversity. It is clear from the tree (Supplementary Figure S1) that the number of clades and the individual branch lengths, indicate that very high levels of diversity are apparent even within a single protein domain type amongst a limited number of isolates from this single genus.

Nrps Gene Clusters
Protein domains from NRPS clusters (condensation starter, condensation DCL, condensation LCL, epimerization and heterocyclization were analysed individually by sequence alignments and phylogenetic tree building (Supplementary Figures S2-S5). Thirteen distinct clades of condensation starter domains were observed (Supplementary Figure S2). These domains appear diverse but are more closely distinguishable as recruiting to phylogenetic groups A-D ( Figure 1B) than is observed in the KS domain analysis (Supplementary Figure S1). Six of the clades contain domains from only one phylogenetic sub-group while one of these clades includes domains from all four phylogenetic sub-groups. It is noted that a single isolate may contain more than one condensation starter domain gene (where more than one NRPS gene cluster is present). These condensation starter domains may be highly similar (e.g., condensation starter domains of SM9-Supplementary Figure S2) or indeed very different (e.g., condensation starter domains of SM12- Figure S2). It is also noteworthy that although the genome of SM14 harbours two NRPS gene clusters, no condensation starter domain has been identified in these clusters. When considering other NRPS domains, condensation DCL (Supplementary Figure S3 Figure S5B) domains, many phylogenetic clades harbour sequences which correlate with the taxonomic phylogeny described in Figure 1B. As before, no epimerization or heterocyclization domains were identified in the NRPS clusters of Group D strains (SM12 and SM14). Nonetheless, even within the clades identified, noticeable degrees of diversity are apparent (i) within strains with multiple gene clusters, (ii) within phylogenetic clades and (iii) overall. The tree topologies and branch lengths indicate high levels of gene diversity in marine Streptomyces spp. Group A isolates (SM11, SM16, B188M101 and B226SN101) appear enriched for adenylation ( Figure 3) and condensation (Supplementary Figures S2-S4) domains of NRPS gene clusters when compared to the other phylogenetic sub-groups described here. There appears to be a stronger link between taxonomic phylogeny and the evolutionary phylogeny of NRPS epimerization (Supplementary Figure S5A) and NRPS heterocyclization domains (Supplementary Figure S5B). These domains are highly abundant in Group A and Group B isolates but entirely absent from the genomes of Group D strains. The majority of predicted protein domains from the deep sea isolates (B188M101 and B226SN101) are more similar to each other than to similar genes in shallow water or terrestrial isolates. This supports previous findings by our group where we reported putative deep sea sponge specific microbiome [24] and secondary metabolome [31].

Siderophores, Bacteriocins and Lantibiotics
Phylogeny-related patterns of predicted protein domains of siderophores (IucA-IucC-Supplementary Figure S6), bacteriocins (DUF692-Supplementary Figure S7) and lantipeptides (LanC-like-Supplementary Figure S8) were notable with the majority of those domains identified here, clustering more closely to domains from within the phylogenetic subgroups than to those of other subgroups. Bioavailable iron in ocean waters has long been recognized as a limiting factor for growth [32]. Dissolved iron concentrations are orders of magnitude higher in surface and mesopelagic waters when compared to the deep sea [33]. Thus, it might be expected that microbes from deep waters may contain more genes involved in iron chelation or with higher degrees of siderophore gene diversity. We have not observed this however and possible explanations may be that the genes in the deep sea strains are more highly transcribed than those of strains from shallower waters, or that the symbiotic relationship between sponges and their resident microbiota may provide higher concentrations of iron to the microbes than is available to planktonic ocean microorganisms. Iron has been identified as an important mineral for sponge primmorph proliferation and morphogenesis [34] and it has been demonstrated that some sponges can accumulate trace elements, including iron, at concentrations high above those of seawater and sediments [35].
Eleven of the 13 genomes described here host DUF692 domains found in bacteriocin production genes, the exceptions being SM9 and FMC008 despite antiSMASH predictions of one bacteriocin gene cluster in each of those genomes. Conversely, the genome of SM14 hosts a DUF692 domain, but no bacteriocin cluster was predicted for this strain by antiSMASH. Although five NRPS gene clusters were predicted in the genome of SM10 by antiSMASH, only three DUF692 domains were identified. Nonetheless, two of those domains are notably different to all other DUF692 domains identified here and to each other. The genomes of all other strains host one DUF692 gene copy each. Degrees of conservation in all deduced protein sequences apart from the aforementioned SM10 sequences are evident from the phylogenetic tree (Supplementary Figure S7), though an evolutionary pattern correlated with taxonomic phylogeny can also be seen.
When considering the lantibiotic (LanC-like) domains, Groups A (SM11, SM16, B188M101 and B226SN101) and C (SM5, SM10 and SM18) as well as the terrestrial strains S. griseus and S. coelicolor, hosted more potential LanC-like domains when compared to Groups B (SM9, SM17 and FMC008) and D (SM12 and SM14) (Supplementary Figure S8). Strains B188M101 and SM16 hosted the most lantibiotic smBGCs in this study (four and five respectively) and also hosted the most LanC-like protein domain gene sequences (five and seven respectively). All seven of these protein domains on the SM16 genome are quite different to each other indicating that diverse lantipeptides may potentially be produced by that strain.

Marine Adaptations
The 13 marine Streptomyces genomes were interrogated for the presence of genes that may be potentially involved in the biosynthesis and/or transport of compatible solutes and other osmoregulatory systems, which have previously been suggested to play a role in marine adaptation of microorganisms [36]. The abundances of these genes were compared to abundances in the terrestrial Streptomyces spp. genomes which were used to construct the Ffp phylogenetic tree (Figure 4).

Marine Adaptations
The 13 marine Streptomyces genomes were interrogated for the presence of genes that may be potentially involved in the biosynthesis and/or transport of compatible solutes and other osmoregulatory systems, which have previously been suggested to play a role in marine adaptation of microorganisms [36]. The abundances of these genes were compared to abundances in the terrestrial Streptomyces spp. genomes which were used to construct the Ffp phylogenetic tree (Figure 4). The marine isolates appear to be enriched for mercury and arsenic transport systems, branchedchain amino acid transport (LIV), betaine and choline biosynthesis and heat shock proteins. The marine strains appeared however to be deficient in sodium-proton antiporters, tripartite ATP independent periplasmic transporters (TRAP), betaine transport (BCCT), serine/threonine export (Rht), aquaporins (MIP) transport and cold shock proteins. It is interesting that the genomes of the marine strains are enriched for the arsenic transport genes in light of the recent finding that sponge associated Entotheonella spp. sequester arsenic in intracellular vesicles whilst residing in its sponge host, Theonella swinhoei [37]. More work will be necessary to determine if the Streptomyces strains described here are preferentially exporting arsenic or importing and sequestering it as a hostprotection symbiotic function. Our analysis also suggests that betaine may be the preferred compatible solute amongst marine Streptomyces spp. as indicated by the higher abundances of betaine biosynthetic genes when compared to those of ectoine or choline. This however needs to be experimentally confirmed by assessing transcription rates of those genes. The marine isolates appear to be enriched for mercury and arsenic transport systems, branched-chain amino acid transport (LIV), betaine and choline biosynthesis and heat shock proteins. The marine strains appeared however to be deficient in sodium-proton antiporters, tripartite ATP independent periplasmic transporters (TRAP), betaine transport (BCCT), serine/threonine export (Rht), aquaporins (MIP) transport and cold shock proteins. It is interesting that the genomes of the marine strains are enriched for the arsenic transport genes in light of the recent finding that sponge associated Entotheonella spp. sequester arsenic in intracellular vesicles whilst residing in its sponge host, Theonella swinhoei [37]. More work will be necessary to determine if the Streptomyces strains described here are preferentially exporting arsenic or importing and sequestering it as a host-protection symbiotic function. Our analysis also suggests that betaine may be the preferred compatible solute amongst marine Streptomyces spp. as indicated by the higher abundances of betaine biosynthetic genes when compared to those of ectoine or choline. This however needs to be experimentally confirmed by assessing transcription rates of those genes.

DNA Extraction
Isolates were grown overnight (~16 h) in 5 mL liquid cultures. Cells were pelleted by centrifugation (6000 g), supernatants were decanted and discarded. Cell pellets were resuspended in 467 µL TE buffer. 30 µL of 10% SDS and 3 µL of 20 mg/mL Proteinase K (Fermentas, Sankt Leon-Rot, Germany) was added to each tube and incubated for 1 h at 37 • C. An equal volume of phenol/chloroform (phenol-chloroform-isoamyl alcohol mixture ratio 25:24:1, Sigma Aldrich, Arklow, Ireland) was added and mixed well. Tubes were centrifuged, at~18,042 g for 10 min (Eppendorf Centrifuge, 5417r, Eppendorf UK Ltd., Stevenage, UK). The upper phase of each Eppendorf tube was aspirated to fresh 2 mL Eppendorf tubes, avoiding the interphase. 100 µL of 3M sodium acetate (NaOAc) pH 5.2 was added to each tube and mixed well. 600 µL of isopropanol (Sigma Aldrich, Arklow, Ireland) was added, mixed well, and incubated at room temperature for 15 min. Tubes were then centrifuged at 18,042 g for 20 min. Supernatants were removed and discarded. DNA pellets were washed with cold (4 • C) 70% EtOH. Tubes were centrifuged at~18,042 g for 10 min. Ethanol was removed and discarded and DNA pellets were allowed to air-dry. DNA was re-suspended in 1 mL TE buffer. 1 µL of RNase A was added to the tubes, which were then incubated at 37 • C for 30 min. DNA was again purified by phenol extraction. DNA was analysed by gel electrophoresis and quantified using a spectrophotometer (NanoDrop ND-1000, Thermo Scientific, Gloucester, UK). The DNA solutions were stored at −20 • C. (1-10 ng), sdH2O. PCR cycle conditions comprised initial denaturation at 95 • C for 3 min followed by 30 cycles of denaturation at 95 • C for 1 min, primer annealing at 50 • C for 1 min, extension at 72 • C for 1 min followed by a final extension at 72 • C for 5 min. PCR products were analysed by electrophoresis on 1% agarose gels. PCR amplicons were sequenced by capillary electrophoresis, single extension sequencing (Macrogen Inc., Korea), using 3730xl DNA Analyser.

16S rRNA Based Phylogenetic Analysis
Sequences were manually edited for quality using FinchTV 1.4.0 (Geospiza, Inc.; Seattle, WA, USA; http://www.geospiza.com). Sequence alignment and tree construction were performed using MEGA version 6 [ [43]. It should be noted that not all isolates were tested against all test strains. Only the results shown in Table 1 were performed. Isolates were spotted to the centre of Petri dishes on SYP-SW agar and grown until the colony reached 1-2 cm in diameter. Test strains were grown overnight in 5 mL LB broth, shaking, 200 rpm until they reached an OD600 nm 0.8. Test strains were then diluted 1:50 in LB soft agar (0.7% agar-w/v). Inoculated soft agar was poured onto the surface of the plates containing the isolates. Plates were incubated at 37 • C and examined the next day for zones of inhibition-clearance zones in the growth of the test strain around the colony of the isolate.

Whole Genome Sequencing
For whole genome sequencing, mate pair libraries were prepared the using the Nextera XT DNA Library Preparation Kit (Illumina, San Diego, CA, USA) according to the manufacturer's instructions. All libraries were sequenced in 250 bp paired read runs on the Illumina MiSeq platform. Reads were trimmed for quality with Sickle (Available at https://github.com/najoshi/sickle) and Scythe (Available at https://github.com/vsbuffalo/scythe).

Bioinformatic Methods
Genome assemblies were performed using SPAdes, version 3.1.1 using K-mer lengths of 21, 33, 55, 77, 81 and 91 [44]. Pan genome analysis was performed using Get Homologues [45]. Comparative genomic methods were carried out on the following 25 genomes which included the 13 isolates from Irish marine sponges presented here and the following 12 terrestrial Streptomyces spp. genomes from the sequence databases: S. albus J1074, S. avermitilis MA46-80, S. bingchenggensis BCW-1, S. coelicolor A3(2), S. davaoensis JCM 4913, S. fulvissimus DSM 40593, S. hygroscopicus subsp jingganggensis 5008, S. griseus NBRC 13350, S. lividans, S. sirex AA-E, S. venezuelae ATCC 10712 and S. violaceusniger Tü4113. Potential bioactive gene clusters were identified by searching the genomes with antiSMASH version 3 and parsed with R methods, R is a free software environment for statistical computing and graphics (The R Project-available at https://www.r-project.org/). Pfam domains often represented within these gene clusters were counted and graphed using R.

Conclusions
Analysis of the draft genomes of 11 Streptomyces spp. isolated from a shallow water sponge and an additional two isolated from deep sea sponges reveal the presence of highly abundant smBGCs. Amongst the 485 gene clusters identified, the majority display little or no homology with known smBGCs in the MIBiG database. The notably abundant PKS, NRPS, PKS/NRPS hybrid, bacteriocin and lantipeptide gene clusters are of particular interest in the search for novel antibiotic families and species in light of the worrisome trends in the occurrences of antimicrobial resistant human pathogens, particularly so because many of these strains have effectively inhibited the growth of problematic pathogenic strains (hVISA, MRSA) in plate screens.
Supplementary Materials: The following are available online at www.mdpi.com/1660-3397/16/2/67/s1, Figure S1: Phylogenetic tree of deduced amino acid sequences of KS domains from PKS gene clusters from Streptomyces spp.; Figure S2: Phylogenetic tree of deduced amino acid sequences of condensation starter domains of NRPS gene clusters from Streptomyces spp.; Figure S3: Phylogenetic tree of deduced amino acid sequences of condensation DCL domains NRPS gene clusters from Streptomyces spp.; Figure S4: Phylogenetic tree of deduced amino acid sequences of condensation LCL domains from NRPS gene clusters from Streptomyces spp.; Figure  S5: Phylogenetic tree of deduced amino acid sequences of (A) epimerization domains and (B) heterocyclization domains from NRPS gene clusters from Streptomyces spp.; Figure S6: Phylogenetic tree of deduced amino acid sequences of IucA-IucC domains from PKS gene clusters from Streptomyces spp.; Figure S7: Phylogenetic tree of deduced amino acid sequences of DUF692 domains from bacteriocin gene clusters from Streptomyces spp.; Figure S8: Phylogenetic tree of deduced amino acid sequences of LanC-like domains from lantipeptide gene clusters from Streptomyces spp.; Table S1: Taxonomy of Streptomyces spp. isolates using KRAKEN; Table S2: Secondary metabolism biosynthetic gene clusters in the genome of Streptomyces sp. B188M101 as predicted by antiSMASH; Table S3: Secondary metabolism biosynthetic gene clusters in the genome of Streptomyces sp. B226SN101 as predicted by antiSMASH; Table S4: Secondary metabolism biosynthetic gene clusters in the genome of Streptomyces sp. SM5 as predicted by antiSMASH; Table S5: Secondary metabolism biosynthetic gene clusters in the genome of Streptomyces sp. SM10 as predicted by antiSMASH; Table S6: Secondary metabolism biosynthetic gene clusters in the genome of Streptomyces sp. SM11 as predicted by antiSMASH; Table S7: Secondary metabolism biosynthetic gene clusters in the genome of Streptomyces sp. SM12 as predicted by antiSMASH; Table  S8: Secondary metabolism biosynthetic gene clusters in the genome of Streptomyces sp. SM16 as predicted by antiSMASH; Table S9: Secondary metabolism biosynthetic gene clusters in the genome of Streptomyces sp. SM17 as predicted by antiSMASH; Table S10: Secondary metabolism biosynthetic gene clusters in the genome of Streptomyces sp. SM18 as predicted by antiSMASH; Table S11: Secondary metabolism biosynthetic gene clusters in the genome of Streptomyces sp. SM1 as predicted by antiSMASH; Table S12: Secondary metabolism biosynthetic gene clusters in the genome of Streptomyces sp. FMC008 as predicted by antiSMASH; Table S13: Secondary metabolism biosynthetic gene clusters in the genome of Streptomyces sp. SM14 as predicted by antiSMASH; Table S14: Secondary metabolism biosynthetic gene clusters in the genome of Streptomyces sp. SM9 as predicted by antiSMASH.