Identification of Putative Novel Class-I Lanthipeptides in Firmicutes: A Combinatorial In Silico Analysis Approach Performed on Genome Sequenced Bacteria and a Close Inspection of Z-Geobacillin Lanthipeptide Biosynthesis Gene Cluster of the Thermophilic Geobacillus sp. Strain ZGt-1

Lanthipeptides are ribosomally synthesized and post-translationally modified polycyclic peptides. Lanthipeptides that have antimicrobial activity are known as lantibiotics. Accordingly, the discovery of novel lantibiotics constitutes a possible solution for the problem of antibiotic resistance. We utilized the publicly available genome sequences and the bioinformatic tools tailored for the detection of lanthipeptides. We designed our strategy for screening of 252 firmicute genomes and detecting class-I lanthipeptide-coding gene clusters. The designed strategy resulted in identifying 69 class-I lanthipeptide sequences, of which more than 10% were putative novel. The identified putative novel lanthipeptides have not been annotated on the original or the RefSeq genomes, or have been annotated merely as coding for hypothetical proteins. Additionally, we identified bacterial strains that have not been previously recognized as lanthipeptide-producers. Moreover, we suggest corrections for certain firmicute genome annotations, and recommend lanthipeptide records for enriching the bacteriocin genome mining tool (BAGEL) databases. Furthermore, we propose Z-geobacillin, a putative class-I lanthipeptide coded on the genome of the thermophilic strain Geobacillus sp. ZGt-1. We provide lists of putative novel lanthipeptide sequences and of the previously unrecognized lanthipeptide-producing bacterial strains, so they can be prioritized for experimental investigation. Our results are expected to benefit researchers interested in the in vitro production of lanthipeptides.

.1. Amino acid sequences of the unmodified antiSMASH-detected lanthipeptides and the corresponding nucleotide sequences. Sequences in lower case correspond to the leader peptide. The presented gene positions are based on those predicted by antiSMASH and confirmed by BLAST analysis. When the gene start position value is larger than that of the end position, the gene is located on the reverse strand. Entries typed in bold represent data related to the putative novel lanthipeptides.   1 Gene identity is presented as contig, and running number of the predicted genes within the contig. The contig numbers are the same as found in the NCBI genome record (LDPD01000000). 2 The low matching identity is explained by the 3288 undetermined nucleotides in the gene sequence. 3 The protein hit has 5 more amino acids at the N-terminal as compared to the antiSMASH 3.0predicted lanthipeptide. 4 The protein hit has 20 more amino acids at the N-terminal as compared to the antiSMASH 3.0predicted lanthipeptide. 5 The protein hit has 3 more amino acids at the N-terminal as compared to the antiSMASH 3.0predicted lanthipeptide.  Table S5. Annotation of the proteins coded by the lanthipeptide biosynthesis gene cluster of G. thermodenitrificans NG80-2. As described by NCBI, protein accession numbers with the WP_ prefix represent "non-redundant RefSeq protein records" that are "found in RefSeq genomes from multiple species". Locus-tags containing "RS" represent annotations of the RefSeq genome records, while the others were derived from the annotations of the original genome records.

Protein coded by a member of the lanthipeptide gene cluster
Accession number Locus-tag   (Table S1) is 100% identical to "ABC_RS22115" annotated on the RefSeq as coding for a class-I lanthipeptide (Table S2). Thus, the RefSeq annotation supports our analysis (Table S6). The core sequence of the predicted lanthipeptide was briefly mentioned by van Heel et al, 2016 [30]. The blastp analysis indicated that the predicted lanthipeptide is 56% identical to clausin (Table 2). We noticed that clausin has not been reported in BAGEL database; therefore, we recommend including it as a class-I lanthipeptide.
S.2.3.1.2. B. megaterium QM B1551 The aa sequences of the two antiSMASH-predicted lanthipeptides are identical, but the nt sequences of their two coding-genes are not (Table S1). The nt sequence of the gene coding for lanthipeptide (I) showed 100% identity to "BMQ_RS27575", and that coding for lanthipeptide (II) showed 100% identity to "BMQ_RS27580". These genes have been annotated in the RefSeq record as coding for class-I lanthipeptides (Table S2). Both lanthipeptides showed 56% identity to gallidermin as indicated by BAGEL BLAST ( Our analysis has indicated points that are worth noting, as discussed below. Using antiSMASH analysis, the gene coding for entianin was predicted (Table S1). In addition to the original and RefSeq genome sequence records of the strain, there is also a record on the entianin gene cluster that belongs to the same strain, separately deposited in NCBI under the accession number (HQ871873). We noticed that the entianin-coding gene has been annotated on the RefSeq as "GYO_RS39160" that codes for a class-I lanthipeptide. On the other hand, the coding gene in the entianin cluster record has been annotated and named as "etnS" coding for "EtnS" (AEK64494). The nt sequences of "GYO_RS39160" and "etnS" are identical and they both code for the same peptide. The two separate records related to the lanthipeptide resulted in having different tags/names for the same gene and its coded lanthipeptide. The entianin cluster (HQ871873) of the strain was sequenced by Fuchs et al and deposited in NCBI in 2011 [47] and the designation of the strain displayed for the record is DSM 15029 T . The antimicrobial activity of the entianin cluster has been experimentally proved [47]. On the other hand, the original genome (CP002905) was sequenced by Earl et al in 2012 [46], whereas the RefSeq genome was annotated in 2017, and the stated designation of the strain for both genome records is (TU-B-10). Accordingly, a re-evaluation of naming the gene and its lanthipeptide product in the RefSeq needs to be considered to match the description of the experimentally-verified entianin cluster (HQ871873). Moreover, since the RefSeq genome record shows only one of the designations of the strain; TU-B-10 while the record of the entianin cluster shows the other designation; DSM 15029 T , we recommend to present both designations in each record in order to avoid the confusion. Interestingly, the antibacterial activity of entianin has been experimentally confirmed; it is highly active against several Gram-positive bacteria [47].

S.2.3.1.4. B. subtilis spizizenii W23
The nt sequence of the gene coding for the antiSMASH-predicted lanthipeptide (Table S1) is 100% identical to "BSUW23_RS16845" annotated on the RefSeq as coding for a class-I lanthipeptide (Table S2). The RefSeq annotation thus supports our analysis (Table S6). BAGEL BLAST indicated that the predicted lanthipeptide is 100% identical to subtilin (Table 2; Table S2) and 95% to entianin. S.2.3.1.5. B. thuringiensis YBT-1518 The antiSMASH analysis indicated that the class-I lanthipeptide cluster coded on the chromosome of strain YBT-1518 has two putative genes coding for two different lanthipeptides. The nt sequence of the gene coding for the predicted lanthipeptide (I) (Table S1) is 100% identical to "YBT1518_RS19670", and that of the gene coding for lanthipeptide (II) is 100% identical to "YBT1518_RS19675". Both genes have been annotated on the RefSeq genome as coding for class-I lanthipeptides. The RefSeq annotation thus supports our analysis (Table S6). Based on the analysis using BAGEL BLAST, the predicted lanthipeptides I and II are 53% and 51% identical to gallidermin, respectively ( The nt sequence of the gene coding for the antiSMASH-predicted lanthipeptide (I) (Table S1) is 100% identical to "GK_RS18085" annotated on the RefSeq genome as coding for a class-I lanthipeptide. Therefore, the genome annotation supports our results (Table S6). BAGEL BLAST showed that lanthipeptide (I) is 91% identical to the experimentally characterized antimicrobially-active lanthipeptide; geobacillin I (ABO65649) produced by G. thermodenitrificans . Garg et al 2012 reported the same lanthipeptide for strain HTA426 [34], and this in turn further supports our analysis ( Table 2, Table S6). The nt sequence of the gene coding for the antiSMASH-predicted lanthipeptide of strain CV56 (Table S1) is 100% identical to "CVCAS_RS03115", and that coding for the predicted lanthipeptide of strain IO-1 (Table S1) is 100% identical to "LILO_RS03015". Both genes have been annotated in the RefSeq genome records as coding for class-I lanthipeptides. The RefSeq annotations thus support our analysis (Table S6). BAGEL BLAST showed that the predicted lanthipeptide of strain CV56 is 100% identical to nisin A (Table 2). Our results agree with those of Gao et al, 2011 [35] andMarsh et al, 2010 [15] (Table S6). On the other hand, BAGEL BLAST showed that the predicted lanthipeptide of strain IO-1 is 100% identical to nisin Z (Table 2).

S.2.3.4. Identification of Paenibacillus-associated lanthipeptide gene clusters
The nt sequence of the gene coding for the predicted lanthipeptide (Table S1) is 100% identical to "PPE_RS07020", which has been annotated on the RefSeq as coding for a class-I lanthipeptide (Table S2). The aa sequence of the characterized class-I lanthipeptide; named paenilan produced by strain E681 [36] is identical to that inferred in our study for the same strain (Table S2). This in turn supports our analysis ( The nt sequence of the gene coding for the antiSMASH-predicted lanthipeptide (I) (Table S1) is 100% identical to "MS7_RS09745" annotated on the RefSeq genome as coding for a class-I lanthipeptide. The annotation of the genome thus supports our analysis (Table S6). Moreover, BAGEL BLAST showed that lanthipeptide (I) is 100% identical to BsaA2 ( Table 2). The antimicrobial activity of BsaA2 has been experimentally verified in other S. aureus strains by Daly et al, 2010 [8].
The nt sequence of the gene coding for the antiSMASH-predicted lanthipeptide (II) (Table S1) is 100% identical to "MS7_RS09750" annotated on the RefSeq genome as coding for a class-I lanthipeptide (Table S2). However, in RefSeq and consequently also in antiSMASH, the start codon has been translated into an incorrect aa in the presented aa sequence. The nt sequence presented in RefSeq and antiSMASH showed the first codon as "TTG", which codes for the aa leucine (L) and not the reported methionine (M). Therefore, we edited the start aa of the predicted lanthipeptide (II) into the correct one (L) ( Table S1). BAGEL BLAST showed that lanthipeptide (II) is 83% identical to BsaA2 (Table 2). S.2.3.5.2. S. aureus COL The nt sequence of the gene coding for the antiSMASH-predicted lanthipeptide (I) (Table S1) is 100% identical to "SACOL_RS09635" annotated on the RefSeq genome as coding for a class-I lanthipeptide. The annotation of the genome thus supports our analysis (Table S6). Moreover, BAGEL BLAST showed that lanthipeptide (I) is 100% identical to BsaA2 (Table 2). Daly et al 2010 reported strain COL as a putative producer of BsaA2 [8].
The nt sequence of the gene coding for the antiSMASH-predicted lanthipeptide (II) (Table S1) is 100% identical to "SACOL_RS09645" annotated on the RefSeq genome as coding for a class-I lanthipeptide (Table S2). Again, we made the same correction of the aa as we did for lanthipeptide (II) of strain 11819-97 (Table S1). BAGEL BLAST showed that lanthipeptide (II) is 79% identical to BsaA2 (Table 2). S.2.3.5.3. S. aureus ED133 The nt sequence of the gene coding for the antiSMASH-predicted lanthipeptide (I) (Table S1) is 100% identical to "SAOV_RS09475", and that of the other predicted gene coding for lanthipeptide (II) (Table S1) is 100% identical to "SAOV_RS09490". Both genes have been annotated on the RefSeq genome as coding for a class-I lanthipeptide (Table S2), which in turn, supports our analysis (Table S6). Moreover, BAGEL BLAST showed that lanthipeptide (I) is 100% identical to BacCH91 produced by S. aureus strain CH91 [81] ( Table 2; Table S2). On the other hand, lanthipeptide (II) is 85% identical to BsaA2 (Table 2; Table S2). Only the core sequence of lanthipeptide (II) was very briefly mentioned by van Heel et al, 2016 and was inferred based on an in silico analysis [30].

S.2.3.5.4. S. aureus M1
The nt sequence of the gene coding for the antiSMASH-predicted lanthipeptide (I) (Table S1) is 100% identical to "BN843_RS09660" annotated on the RefSeq genome as coding for a class-I lanthipeptide. The annotation of the genome thus supports our analysis (Table S6). Moreover, BAGEL BLAST showed that lanthipeptide (I) is 100% identical to BsaA2 ( Table 2).
The nt sequence of the gene coding for the antiSMASH-predicted lanthipeptide (II) (Table S1) is 100% identical to "BN843_RS09665" annotated on the RefSeq as coding for a class-I lanthipeptide (Table S2). Here as well, we edited the start aa of the predicted lanthipeptide (II) of strain M1 into the correct one (L) instead of M (Table S1). BAGEL BLAST showed that lanthipeptide (II) is 83% identical to BsaA2 (Table 2). S.2.3.5.5. S. aureus MSSA476 The nt sequence of the gene coding for the antiSMASH-predicted lanthipeptide (I) (Table S1) is 100% identical to "SAS1746" which has been annotated on the RefSeq genome as coding for a class-I lanthipeptide (Table S2). The RefSeq genome annotation thus supports our analysis (Table  S6). BAGEL BLAST showed that lanthipeptide (I) is 100% identical to BsaA2 (Table 2). Daly et al, 2010 reported strain MSSA476 as a putative producer of BsaA2 [8].
The nt sequence of the gene coding for the antiSMASH-predicted lanthipeptide (II) (Table S1) is 100% identical to "SAS_RS09300" annotated on the RefSeq genome as coding for a class-I lanthipeptide. Here as well, we edited the start aa of the predicted lanthipeptide (II) of strain MSSA476 into the correct one (L) instead of M (Table S1). BAGEL BLAST showed that lanthipeptide (II) is 83% identical to BsaA2 (Table 2). S.2.3.5.6. S. aureus MW2 The nt sequence of the gene coding for the antiSMASH-predicted lanthipeptide (I) ( Table S1) is 100% identical to "MW_RS09420" annotated on the RefSeq as coding for a class-I lanthipeptide. Therefore, the RefSeq annotation supports our analysis (Table S6). BAGEL BLAST showed that lanthipeptide (I) is 100% identical to BsaA2 (Table 2). Daly et al, 2010 reported strain MW2 as a putative producer of BsaA2 [8].
The nt sequence of the gene coding for the antiSMASH-predicted lanthipeptide (II) (Table S1) is 100% identical to "MW_RS09425" annotated on the RefSeq as coding for a class-I lanthipeptide.
Here as well, we edited the start aa of the predicted lanthipeptide (II) of strain MW2 into the correct one (L) instead of M (Table S1). BAGEL BLAST showed that lanthipeptide (II) is 83% identical to BsaA2 (Table 2). S.2.3.5.7. S. aureus NCTC 8325 The nt sequence of the gene coding for the antiSMASH-predicted lanthipeptide (I) (Table S1) is 100% identical to "SAOUHSC_01953" annotated on the RefSeq genome as coding for a class-I lanthipeptide. The annotation of the genome thus supports our analysis (Table S6). BAGEL BLAST showed that lanthipeptide (I) is 100% identical to BsaA2 (Table 2). Daly et al, 2010 reported strain NCTC 8325 as a producer of BsaA2 [8].
S.2.3.5.8. S. aureus RF122 The nt sequence of the gene coding for the antiSMASH-predicted lanthipeptide (I) (Table S1) is 100% identical to "SAB_RS08990" annotated on the RefSeq genome as coding for a class-I lanthipeptide (Table S2), which thus supports our analysis (Table S6). Based on the analysis using BAGEL BLAST, lanthipeptide (I) is 100% identical to BacCH91, and 81% identical to BsaA2. Moreover, our results agree with the findings of Marsh et al, 2010 [15].
The nt sequence of the gene coding for the antiSMASH-predicted lanthipeptide (II) (Table S1) is 100% identical to "SAB_RS08995" annotated on the RefSeq as coding for a class-I lanthipeptide (Table S2). Thus, the RefSeq annotation supports our analysis (Table S6). Based on the analysis using BAGEL BLAST, lanthipeptide (II) is 87% identical to BsaA2. Daly et al, 2010 also reported strain RF122 as a producer of variants of Bsa [8].