Genome Mining for Antimicrobial Compounds in Wild Marine Animals-Associated Enterococci

New ecosystems are being actively mined for new bioactive compounds. Because of the large amount of unexplored biodiversity, bacteria from marine environments are especially promising. Further, host-associated microbes are of special interest because of their low toxicity and compatibility with host health. Here, we identified and characterized biosynthetic gene clusters encoding antimicrobial compounds in host-associated enterococci recovered from fecal samples of wild marine animals remote from human-affected ecosystems. Putative biosynthetic gene clusters in the genomes of 22 Enterococcus strains of marine origin were predicted using antiSMASH5 and Bagel4 bioinformatic software. At least one gene cluster encoding a putative bioactive compound precursor was identified in each genome. Collectively, 73 putative antimicrobial compounds were identified, including 61 bacteriocins (83.56%), 10 terpenes (13.70%), and 2 (2.74%) related to putative nonribosomal peptides (NRPs). Two of the species studied, Enterococcus avium and Enterococcus mundtti, are rare causes of human disease and were found to lack any known pathogenic determinants but yet possessed bacteriocin biosynthetic genes, suggesting possible additional utility as probiotics. Wild marine animal-associated enterococci from human-remote ecosystems provide a potentially rich source for new antimicrobial compounds of therapeutic and industrial value and potential probiotic application.


Introduction
Drug-resistant bacteria kill an estimated 700,000 people worldwide each year, and the discovery of new antimicrobial drugs is urgently needed [1][2][3]. This is motivating the search for new ecologies for novel natural products of potential therapeutic value. Human-proximal terrestrial life has been screened for diverse natural products to a much greater extent than larger but less accessible marine ecosystems. Blue biotechnology (or marine biotechnology) is an emerging field that investigates the rich diversity of bioactive molecules produced by marine organisms with potential industrial and therapeutic computational technologies are being used for screening, presumptive chemical elucidation, and understanding of activities and biological aspects of new compounds [7,24]. Therefore, genome mining may represent a fertile strategy for identifying new biomolecules for future therapeutic and industrial applications. In this sense, the aim of the present study was to examine 22 genomes of Enterococcus species isolated from fecal samples of 17 wild marine animals from remote ecologies for potential antimicrobial compounds and/or probiotics strains.

Diversity of Bacteriocins Genes among Wild Marine Animals-Associated Enterococci
A total of 30 unique bacteriocin species were identified, including 8 belonging to class I, 19 to class II, and 3 to class III ( Figure 1). Although class II bacteriocins showed the greatest diversity, class III bacteriocins were most common and widely distributed. Interestingly, eight new putative bacteriocins with no significant identity to known peptides were found amongst marine enterococci genomes, including two new putative lanthipeptides (I and II) identified as class I, five unknown bacteriocins (I, II, III, IV, and V) identified as class II, and one unknown class III bacteriocin (VI) (Figure 1; Supplementary Table S4).

Phylogenetic Relationship among Class II and III Bacteriocins Predicted from Wild Marine Animal-Associated Enterococcal Genomes
To gain insights into the phylogeny of the 30 class II and 19 class III bacteriocins genes identified, phylogenetic analysis was performed ( Figure 2) to determine their relationship (Supplementary Table S5) to 16 reference sequences in Bagel4 and Uniprot databases (Supplementary Table S6). This identified two groups with significant branch support ( Figure 2). Group 1 included bacteriocins of both classes II and III. Class II bacteriocin gene clusters in Group 1 could be divided into subclasses a, b, and others. Included within each are: IIa) mundticin AT06, enterocin P, bacteriocin T8, bacteriocin 31, and enterocin SE-K4; IIb) enterocin X chain alpha, enterocin X chain beta; II leaderless) enterocin EJ97; II circular bacteriocin) carnocyclin A; II other subclasses) sakacin Q, enterocin 96, uviB, and enterocin NKR-5-3D; and unknown bacteriocins I, II, III, IV, and V. Class III bacteriocins in Group 1 included: enterolysin A, propionicin SM1, and unknown bacteriocin VI. In contrast, phylogenetic Group 2 included only the class II bacteriocin, lactococcin 972.
The alignment of the other bacteriocin sequences with reference sequences was performed (Supplementary Figures S5-S10). Among identities found were conserved motifs such as YGN and cysteine residues (all class IIa bacteriocins can be found in Supplementary Figure S6   each are: IIa) mundticin AT06, enterocin P, bacteriocin T8, bacteriocin 31, and enterocin SE-K4; IIb) enterocin X chain alpha, enterocin X chain beta; II leaderless) enterocin EJ97; II circular bacteriocin) carnocyclin A; II other subclasses) sakacin Q, enterocin 96, uviB, and enterocin NKR-5-3D; and unknown bacteriocins I, II, III, IV, and V. Class III bacteriocins in Group 1 included: enterolysin A, propionicin SM1, and unknown bacteriocin VI. In contrast, phylogenetic Group 2 included only the class II bacteriocin, lactococcin 972.  New putative bacteriocins I, II, and VI showed greater similarity to carnocyclin A, while the unknown bacteriocins III, IV and V were more closely related to enterocin X chain alpha (Xα) (Figure 2). Alignment of unknown bacteriocins with carnocyclin A and Enterocin Xα reference sequences allowed detection of conserved amino acid residues and motifs such as GxxxG or AxxxA ( Figure 3). Putative novel bacteriocins I, II, VI and carnocyclin A showed only 1.3% overall amino acid sequence identity ( Figure 3A), whereas bacteriocins I and II share 55.22% identity between them ( Figure 3B). Putative bacteriocins III, IV, and V, which were closely related to enterocin Xα, have 9.2% overall amino acid sequence identity ( Figure 3C); and bacteriocins III and V share 43.4% identity between them ( Figure 3D). Structural modeling of these putative class II and III bacteriocins using the I-TASSER [62] package to build models using a combination of fragment and ab initio model building [63] is shown in Figure 4. Insights into structural features are important for the biosynthesis, mode of action, and biological activity of bacteriocins. The molecular models are in agreement with the expected protein folds (mostly alpha-helices with coil regions). Likewise, the most divergent model (Bacteriocin VI) is also isolated in its group in the phylogenetic reconstruction, supporting its uniqueness among other unknown bacteriocins.

Detection of Genes Associated with Enhanced Enterococcal Virulence
Among the 22 genomes evaluated, E. avium (L8) and E. mundtii (MP7-18) were found to be devoid of determinants that have mainly been identified in E. faecalis and E. faecium strains associated with enhanced virulence ( Figure  5A,B). All other enterococci strains possessed at least one potential virulenceassociated trait ( Figure 5B). As expected, these were most common in E. faecalis, where they have been most thoroughly studied. Some of these traits are encoded within the core genomes [25,26]. The unique E. lactis harbored efaAfm and acm genes, while all E. faecalis contained several genes associated with adhesion (ace, efaAfs), biofilm production (ebpA, ebpB and ebpC), proteases (gelE and srtA), protection against oxidative stress (tpx), and quorum sensing and sex pheromone (cad, camE, cCF10, cOB1, and fsrB). Enterococcus faecalis genomes varied in the presence of hyaluronidase genes (hylA and hylB) and adhesion-associated gene (ElrA).

Detection of Genes Associated with Enhanced Enterococcal Virulence
Among the 22 genomes evaluated, E. avium (L8) and E. mundtii (MP7-18) were found to be devoid of determinants that have mainly been identified in E. faecalis and E. faecium strains associated with enhanced virulence (Figure 5A,B). All other enterococci strains possessed at least one potential virulence-associated trait ( Figure 5B). As expected, these were most common in E. faecalis, where they have been most thoroughly studied. Some of these traits are encoded within the core genomes [25,26]. The unique E. lactis harbored efaAfm and acm genes, while all E. faecalis contained several genes associated with adhesion (ace, efaAfs), biofilm production (ebpA, ebpB and ebpC), proteases (gelE and srtA), protection against oxidative stress (tpx), and quorum sensing and sex pheromone (cad, camE, cCF10, cOB1, and fsrB). Enterococcus faecalis genomes varied in the presence of hyaluronidase genes (hylA and hylB) and adhesion-associated gene (ElrA). Determinants of resistance (light yellow) and virulence (dark yellow) were associated with the results of in silico screening by bacteriocins (green, blue, and purple colors). *Genomes showing duplicated bacteriocin genes (rectangles are representing the number of these genes). Blue dash representing the potential probiotic candidate strains (L8 and MP7-18). The illustration was designed using D3, R software, and Adobe Illustrator. Resistome analysis ( Figure 5B) revealed that all E. casseliflavus genomes (n = 3) possessed genes related to low-level vancomycin resistance (vanRC and vanXYC), as expected since these are part of the core genome for that species [64]. All E. faecalis genomes (n = 10) contained genes within the core genomes [26] conferring resistance to trimethoprim (dfrE); to macrolide, fluoroquinolone, and rifamycin (efrA and efrB); to pleuromutilin, lincosamide, and streptogramin (lsaA); and have a multidrug and toxic compound extrusion transporter (emeA). On the other hand, the unique E. lactis genome possessed genes related to the resistance to aminoglycosides (aac(6 )-Ii); to macrolide, lincosamide, streptogramin, tetracycline, oxazolidinone, phenicol, pleuromutilin (eatAv); and to macrolide, lincosamide, streptogramin (msrC). In addition, E. hirae genomes harbored genes related to aminoglycoside (acc(6 )-Iid; n = 6), and tetracycline [tet(W/N/M), n = 2; tet(L); n = 1] resistance.

Discussion
Microbes associated with marine animals from remote ecologies may be important sources for new tools to manage human and/or microbial interactions. In this study, we explored Enterococcus strains from the microbiota of wild sea birds, sea turtles, and marine mammals that range from the Antarctic to the coast of Brazil to identify potentially novel BGCs. These prospective BCGs were found in generalist species E. faecalis, as well as less common and less studied species, including E. avium, E. casseliflavus, E. hirae, E. lactis, and E. mundtii.
Putative bacteriocin genes were present in all enterococcal strains investigated, highlighting the competitive nature of the gut niche. Bacteriocin-encoding genes are known to be widely disseminated among enterococci species of different origins [33,54,55]. However, likely because of the novel environmental source of these strains, we found considerable diversity and novelty (Figure 1), with eight genomes possessing four or more bacteriocin gene clusters. This may be driven by variation in wild marine animal diets along migratory routes, combined with selection pressure for factors to control population structure and niche control in the host gut.
In this study, we identified known bacteriocins, natural variants of known bacteriocins, and potentially new bacteriocins distributed among different enterococcal species. The potency and spectrum of bacteriocins against important pathogens vary according to the peptide subclass [34,35,66,70]. Class I bacteriocins were identified in our in silico screening, with sactipeptides, new lanthipeptides I, lasso peptides, and thiopeptides being found in high numbers (Figure 1). Sactipeptides are produced mainly by Gram-positive organisms, and according to previous studies, the sactipeptides from Bacillus subtilis (subtilisin A) and Bacillus thuringiensis (Thuricin CD) have broad and narrow antimicrobial activity spectra, respectively [34,71]. A previous study also identified sactipeptide BGC in Enterococcus mudtii QU25 [36], similar to one found in this study. Lantibiotics and thiopeptides are most active against Gram-positive pathogens, including MRSA, VRE, and Clostridium difficile [23,34]. In contrast, most lasso peptides show activity against Gram-negative pathogens, e.g., bacteriocin MccJ25, which is active against some strains of Escherichia coli and Salmonella spp. [34].
The present study provides further evidence of the significant biodiversity of BGCs for class II, 19 bacteriocins, including five new putative bacteriocins (Figures 1 and 2; Supplementary Table S4). Class II bacteriocins are of special interest as potential therapeutic agents and have been proposed on a larger scale production, whether in the food industry or in human health and veterinary medicine [72][73][74]. Because they consist of unmodified peptides, they do not require enzymes for their maturation and are small structures, less than 10 kDa [36,73], that may subject to low-cost production than other classes by chemical synthesis [73]. Complementing the recombinant technologies, chemical synthesis of bacteriocins may allow further molecular engineering for enhanced potency, improved pharmacological properties, increased stability and modified spectra of activity [73]. Class II bacteriocins and analogs thereof have been successfully prepared by chemical syntheses, such as aureocin A53 (AucA), durancin A5-11, enterocin CRL35, lactococcin MMFII, leucocin A, pediocin PA-1, curvacin A, lacticin Q (LnqQ), mesentericin Y105, and sakacin P [72][73][74].
It is also important to highlight that class III bacteriocins were most common and widely distributed from wild marine animals and also included the unknown bacteriocin VI (Figure 1). Furthermore, three different enterolysin A sequences were verified among enterococci species, with two of them from E. hirae genomes that are reported for the first time in this species. Enterolysin A is a cell wall-degrading bacteriocin first reported to be produced by E. faecalis isolated from fish in Iceland [77]. Despite class III bacteriocins are large proteins (more than 10 kDa) and complex produced by chemical approaches [61], enterolysin A have been reported as broad-spectrum activity against pathogenic and nonpathogenic bacteria; acting on cleave the peptide bonds within the stem peptide as well as in the interpeptide bridge of Gram-positive bacterial cell walls [33,78].
In addition to bacteriocins, a wide variety of novel gene clusters encoding putative terpenes, NRPs, polyketides, and other active compounds have been uncovered by in silico analysis, creating new opportunities for drug development [23,24,49,79]. NRPs and terpenes have been reported with activity against several antibiotic-resistant strains [80][81][82][83][84][85]. A small library of predicted NRP peptides was chemically synthesized, based on the primary sequence of NRP clusters in the human microbiome, and a potent anti-MRSA (methicillin-resistant Staphylococcus aureus) peptide with a new mechanism of action, named humimycin, was identified [80]. The antitubercular agent levesquamide is a new polyketidenonribosomal peptide (PK-NRP) hybrid of a marine natural product (BGC) identified and isolated from Streptomyces sp. [84]. Furthermore, the antibacterial activity of 33 free terpenes commonly found in essential oils was evaluated, with 16 compounds showing antimicrobial activity, including eugenol, which exhibited rapid bactericidal action against Salmonella enterica serovar Typhimurium. Further, terpineol showed excellent bactericidal activity against S. aureus strains, and carveol, citronellol, and geraniol were rapidly bactericidal for E. coli [81]. In this study, we also found terpene biosynthesis-related clusters in E. casseliflavus, E. hirae, and E. mundtii species. Terpenes are secondary metabolites found in plants, bacteria, and fungi and have been shown to act as antibiotics, hormones, flavor or odor constituents, and pigments [86][87][88]. Beukers and collaborators [89] also identified putative genes or operons involved in terpene synthesis in E. hirae, E. villorum, E. gallinarum, E. durans, and E. casseliflavus strains isolated from bovine feces. The role of terpenes in enterococcal biology, including their possible involvement as bacteriocins, remains unclear [89].
Previous studies have examined the probiotic potential of enterococci from the marine environment [43,90,91]. Marine probiont strains have been used in finfish aquaculture due to their health beneficial effect and low potential to transfer antibiotic resistance genes to pathogens through horizontal gene transfer [92]. The potential of 13 enterococci isolated from wild seals was evaluated in a previous study from our group, and five (36.46%) showed activity against L. monocytogenes ATCC 35152 in the double-agar layer test, and one of them should be a good candidate for probiotic application [43]. In the present study, genome screening for bacteriocins highlighted potential probiotic enterococcal strains lacking known virulence or resistance traits ( Figure 5A, B). In particular, the E. avium (L8) genome contained gene clusters for bicereucin BsjA1 and BsjA2, enterocin NKR-5-3D, mundticin AT06, and unknown bacteriocin I; and the E. mundtii genome (MP7-18) encoded sacpeptide and mundticin AT06 variants. Members of the genus Enterococcus have not yet obtained the status of generally recognized as safe (GRAS), although some are already being used as probiotics and in the production of animal food additives to prevent diseases or to improve growth [93,94]. New regulations for probiotics that distinguish between safe and potentially harmful strains are needed [35]. The application of genomic approaches in probiotic research would improve the understanding of the molecular mechanisms that endow the genera with safe and favorable traits [95].
Host-associated microbes are a rich source of factors that regulate community structure in a manner compatible with host health [96,97]. Our findings show a considerable novelty of biosynthetic pathways to be found by exploring the genomes of wild marine-animalsassociated microbes in remote ecologies with the potential to shape host-associated microbial population structures. The novel compounds and natural bacteriocin variants were discovered to provide the first leads for deriving new approaches for managing humanmicrobe interactions in health and disease. Besides, this data will inform and broaden the limits of known structural variation, knowledge of how structure relates to activity, and synthetic biology. In this context, as a perspective for further studies, the data generated here may be associated with recombinant technologies, chemical synthesis, molecular engineering, and other strategies to increase the biological potency, stability, and pharmacological properties in order to guarantee or modify the antimicrobial activity. Therefore, our results may contribute to promote the future development of bacteriocin-based drugs for potential use in managing animal and human health and as food preservatives.

Bacterial Strains
Twenty-two enterococci strains previously described [26,98,99] were evaluated in the present study. Briefly, the collection includes Enterococcus species isolated from fecal samples (cloacal/anal swabs or intestinal content) collected from 17 wild marine animals. These animals, including sea turtles (n = 3), seabirds (n = 8), and marine mammals (n = 6), were found along the North Coast of Rio Grande do Sul, Southern Brazil, from Torres Beach  Table 1). The enterococci collection was stored frozen at −20 • C in skim milk supplemented with 20% glycerol, and cultures were routinely grown in brain heart infusion (BHI) at 37 • C for 18 h.

Genomic DNA Preparation, High-Throughput Sequencing, Assembly, and Annotation
The Enterococcus spp. strains were grown in BHI at 37 • C for 18 h. Genomic DNA was extracted using a commercial kit (QIAGEN DNeasy Blood & Tissue Kit, San Luis, MO, USA). Manufacturer instructions were followed with minor modification, namely, the addition of 50 µL of lysozyme (50 mg/mL) and 10 µL mutanolysin (2500 U/mL, Sigma-Aldrich, Germantown, MD, USA) for 30 min at 37 • C before the addition of 20 µL proteinase K (20 mg/mL). Extracted DNA was quantified using the Qubit double-stranded DNA (dsDNA) high-sensitivity (HS) assay kit (Life Technologies, Carlsbad, CA, USA). Libraries for genome sequencing were prepared using the Nextera XT DNA kit and index primers (Illumina), and reads were generated by HiSeq/MiSeq reagent kit version 2 with 250 cycles on an Illumina HiSeq/Miseq platforms. Reads were subjected to de novo assembly using the CLC genomics workbench v8.0.3, and open reading frames (ORFs) were predicted using the NCBI Prokaryotic Annotation Pipeline-PGAP [100]. The enterococci species assignment was confirmed by pairwise comparison of their average nucleotide identity (ANI) using JSpeciesWS [101]

Genome Mining for Antimicrobial Compounds
Putative biosynthetic gene clusters (BGCs) were predicted using antiSMASH (antibiotics and Secondary Metabolite Analysis Shell 5.0) [57] and Bagel4 (bacteriocins and RiPP-Ribosomally synthesized and Post-translationally modified Peptides) [58] using the default parameters. The bacteriocin classification is in accordance with previous proposals for enterococci [33] and lactic acid bacteria [36] that accommodate the novel subclasses that are appearing over the last years, based on the biosynthesis mechanism and biological activity.

Phylogenetic Analysis
Amino acid sequences corresponding to bacteriocin genes (class II and class III) found in this work, along with reference sequences identified by AntiSMASH 5.0 [57] and Bagel4 [58], and Uniprot databases were aligned using MAFFT [102]. Guidance2 [103] was used to filter unreliable positions and generate a mega alignment encompassing 5 alternative alignments for the sequences. The mega alignment was used to infer the evolutionary history of these proteins by using the Maximum Likelihood method, based on the VT model [104]. A discrete Gamma distribution was used to model evolutionary rate differences among sites, and the rate variation model allowed for some sites to be evolutionarily invariable [105]. Significance was assessed via aLRT [106]. All evolutionary analyses were conducted in PhyML 3.0 [107]. Tree visualization and annotation were performed on Interactive Tree Of Life (iTOL) v [108].

Molecular Modeling
The structural modeling of unknown bacteriocins (I, II, III, IV, and VI) was performed using the I-TASSER package [62,63] since they were not suitable for traditional comparative modeling, requiring a combination of fragment and ab initio model building. UCSF Chimera [109] was used to visualize and edit the new bacteriocin structural models. Physico-chemical parameters were calculated with ProtParam [110].

Potential Virulence Markers
The comprehensive antibiotic resistance database (CARD/RGI-2017) [111] and Resfinder 3.2 [112] were used to identify antimicrobial resistance genes with default parameters and identification threshold of 60% identity over a length of 60% coverage, respectively. Virulence genes were predicted using VirulenceFinder [113], with a threshold of 85% identity over a length of 60%.

Figures Design
Figures were designed using D3 (or D3.js, a JavaScript library for visualizing data using web standards) [114], R software (R Development Core Team, 2019) [115], and Adobe Illustrator.

Conclusions
Our findings show that there is a considerable novelty to be found through exploring the genomes of host-associated microbes from animals in remote ecologies for biosynthetic pathways with the potential to shape host-associated microbial population structures. The novel compounds and natural bacteriocin variants discovered provide first leads for the derivation of new approaches for managing human-microbe interactions in health and disease.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/md19060328/s1, Table S1: Sequencing statistics, genome sizes, fold coverage, G+C content, of the Enterococcus spp. sequenced. Table S2: Reference genomes used to confirm the enterococci species. Table S3: Putative antimicrobial compounds biosynthesis gene clusters (BGCs) data predicted with antiSMASH5 and Bagel4 software. Table S4: Class I, class II, and class III unknown bacteriocins BGCs data that were not previously identified in antiSMASH5 and Bagel4 databases. Table S5: Class II and class III bacteriocin sequences predicted with antiSMASH5 and Bagel4 software. Table S6: Reference sequences from Bagel4 and Uniprot databases. Figure S1: The alignment of putative enterolysin A (class III) sequences (first branch) from E. hirae genomes using Clustal Omega software. Figure S2: The alignment of putative enterolysin A (class III) sequences (second branch) from E. hirae genomes using Clustal Omega software. Figure S3: The alignment of putative enterolysin A (class III) sequences (third branch) from E. faecalis genomes using Clustal Omega software. Figure S4 Figure S5: The alignment of putative propionicin SM1 (class III) and reference sequence using Clustal Omega software. Figure S6: The alignment of putative Class IIa bacteriocins and reference sequences using Clustal Omega software. Figure S7: The alignment of putative class IIb bacteriocins and reference sequences using Clustal Omega software. Figure S8: The alignment of putative class II circular bacteriocin carnocyclin A and reference sequence using Clustal Omega software. Figure S9: The alignment of putative class II leaderless bacteriocin enterocin EJ97 and reference sequence using Clustal Omega software. Figure S10: The alignment of putative class II other bacteriocins and reference sequences using Clustal Omega software.

Data Availability Statement:
The novel genome sequences were deposited at DDBJ/ENA/GenBank as whole-genome shotgun projects under the accession numbers according to Table S1.