Genomic Insight into Shimazuella Soli Sp. Nov. Isolated from Soil and Its Putative Novel Class II Lasso Peptide

The strain designated as AN120528T was isolated from farmland soil in South Korea. This strain grows well on R2A medium at 28 °C. The cells are an off-white colour and have no hyphae. The phylogenetic analysis indicated that the strain is a member of the genus Shimazuella with a 98.11% similarity to Shimazuella alba KC615T and a 97.05% similarity to S. kribbensis KCTC 9933T, respectively. The strain AN120528T shares common chemotaxonomic features with the other two type strains in the genus. It has MK-9 (H4) and MK-10 (H4) as its predominant menaquinones. The major fatty acids are iso-C14:0, iso-C15:0, anteiso-C15:0 and iso-C16:0. Diphosphatidylglycerol (DPG), phosphatidylethanolamine (PE), phosphatidylglycerol (PG), lipids (L), and aminolipids (AL) were identified as the major cellular polar lipids. Analysis of the peptidoglycan showed the presence of meso-diaminopimelic acid. Whole-genome sequencing revealed that the genome of the strain is approximately 3.3 Mbp in size. The strain showed a 77.5% average nucleotide identity (ANI) with S. alba KC615T. The genomic DNA (gDNA) G + C content is 39.0%. Based on polyphasic taxonomy analysis, it is proposed that this strain, AN120528T, represents a novel species in the genus Shimazuella, designated as Shimazuella soli sp. nov. The type stain is AN120528T (=KCTC 39810T = DSM 103571T). Furthermore, shimazuellin I, a new 15-amino-acid peptide, was discovered in the AN120528T through genome mining; it has the features of a lasso peptide, containing eight amino acids (-G-Q-G-G-S-N-N-D-) that form a macrolactam ring and seven amino acids (-D-G-W-Y-H-S-K-) that form a tail.


Introduction
The genus Shimazuella belongs to the family Thermoactinomycetaceae, order Caryophanales, class Bacilli and phylum Bacillota [1,2]. The genus only had two valid published species [3], with Shimazuella kribbensis as the type species and Shimazuella alba, which was proposed as a new member. Shimazuella spp. are all gram-positive, aerobic, mesophilic and have white-coloured cells [2,4]. The two species, S. kribbensis and S. alba, are both isolated from soil. The cells can grow on International Streptomyces Project medium number 2 (ISP 2), 3 (ISP 3) and nutrient media. Based on chemotaxonomic study, Shimazuella contains MK-9 (H 4 ) and MK-10 (H 4 ) as predominant menaquinones and PE as a major polar lipid. The gDNA G+C contents are between 38.5 and 39.4%, and the genome size is about 3.98 Mbp. The major fatty acids are anteiso-C 15:0 for both, iso-C 16:0 , C 16:0 , iso-C 15:0 and anteiso-C 17:0 for S. kribbensis and C 20:0 and C 18:0 for S. alba [2,4]. Both species have ribose and glucose in their cell-wall hydrolysates. In addition, Shimazuella spp. can grow on media with a pH of 6.0-8.0 at temperature range of 28-37 • C and can tolerate NaCl up to 1%.
Bioactive microbial peptides are short fragments of proteins produced by microorganisms and have been revealed to have substantial potential as drugs that maintain physiological homeostasis, such as antimicrobials, antioxidants, gut homeostasis therapies and immunomodulators [5]. Lasso peptides in bacteria are ribosomally synthesised and post-translationally modified peptides (RiPPs) with a unique N-terminal macrolactam ring structure and a C-terminal linear tail [6]. The peptides can be classified into four types according to the number of disulfide bonds. Among them, class II lasso peptides feature no disulfide bridges, and the topology is stabilised by steric interactions. Class II lasso peptides are known for their notable biological activities, including their antimicrobial, peptide antagonist, protease inhibitory and anti-cancer activities [7]. This wide range of application potential motivates researchers to find novel lasso peptides using genome mining. However, the detailed mechanism of maturation of the peptide has remained elusive due to the lack of structural information about the enzymes involved [6].
During our investigation on the microbial diversity of soil in Korea, we isolated a novel species belonging to the genus Shimazuella. In this study, we describe the characteristics of a novel species in the genus Shimazuella identified through polyphasic taxonomy analysis. We also report a novel class II lasso peptide, biosynthetic gene clusters (BGCs) and the inferred biosynthesis mechanism revealed using genome mining.

Bacteria and Culture Condition
AN120528 T was isolated in 2012 from soil at Goesan-gun, Chungcheongbuk-do, Republic of Korea (36 • 44 0.6" N, 127 • 51 30.1" E) by spreading on R2A at 28 • C and with a five-day incubation. The isolate was maintained as glycerol (20%) stock solution at −75 • C.

Genome Sequencing and Genomic Analysis
The preparation of gDNA for sequencing was performed by MGIEasy DNA Library Prep Kit (BGI, Shenzhen, China) on the de novo MGI platform. The resulting reads were quality trimmed to the Q 30 confidence level. The read sequences (12× coverage) were assembled with the CLC Assembly Cell 5.1.1 (Qiagen Inc, Cambridge, MA, USA) using default parameters. The sequences were deposited in the National Center for Biotechnology Information (NCBI) GenBank under accession numbers JAKWBN000000000. The draft genomes were annotated by Rapid Annotation using Subsystem Technology 2.0 (RAST; https://rast.nmpdr.org (accessed on 10 May 2022)) [13]. The circular map was constructed using the PATRIC 3.5.43 online server for bacteria [14]. ANI values were calculated using OrthoANI software [15].

Chemotaxonomy
The cells of strain AN120528 T were harvested from the culture broth and grown on TSA medium at 28 • C for three days. The fatty acids were extracted and methylated according to the instructions of the Microbial Identification System (MIDI) [24] and analysed with a gas chromatography (Model 6890; Hewlett Packard Co., Wilmington, DE, USA). Menaquinones were analysed by high-performance liquid chromatography (HPLC) according to Tamaoka et al., (1983) [25]. Cell wall amino acids and sugars were identified by following the method described in Staneck and Roberts (1974) [26]. Polar lipids were examined by two-dimensional thin-layer chromatography (TLC, silica, 20 × 20 cm, Merck, Darmstadt, Germany). For the first and second dimensional separation, a mixture of chloroform:methanol:DW (65:25:4, v/v) and another mixture of chloroform:acetic acid:methanol:DW (80:18:12:5, v/v) were used, respectively. Cell harvests were hydrolysed in 1.0 M sulfuric acid at 100 • C for 6 h and used for whole-cell sugar component analysis. The extracts were loaded onto cellulose TLC plates and developed by a mixture solution of n-butanol:DW:pyridine:toluene (10:6:6:1, v/v), twice [27].

Phylogenetic Analysis
Phylogenetic and phylogenomic analysis based on the 16S rRNA gene (1470 bp; GenBank accession number KX762321) and genome sequence revealed that the strain AN120528 T was clearly a member of the genus Shimazuella, having the highest 16S rRNA gene sequence similarity to S. alba KC615 T (98.11%; accession no. MG770674) [4], followed by S. kribbensis KCTC 9933 T (97.05%; AB049939) [2] (Figure 1a,b). In addition, the neighbourjoining, maximum-parsimony and maximum-likelihood tree-making algorithms showed that the strain AN120528 T formed a monophyletic group with the two species of the genus Shimazuella. Bioengineering 2022, 10, x FOR PEER REVIEW 5 of 18

Morphological, Physiological and Biochemical Characteristics
The strain AN120528 T could grow well on Bennett's agar ISP2, PDA and R2A. The strain showed a white-coloured colony and a 1.0 × 1.2 µm cell size on R2A agar medium (Figure 1c). Cells were aerobic and non-motile with a white colour. The strain AN120528 T was gram-positive, aerobic, spore-forming and non-motile. Colonies were circular, opaque and creamy white on the R2A medium. AN120528 T was able to grow in the range of 20-45 • C, with optimum growth at 28-40 • C, and at a pH of 6.0-7.0 with an optimum pH of 7.0, and it could tolerate up to 1.0% NaCl. Anaerobic growth was not observed. For cell growth, the strain utilised D -galactose and D -mannose as carbon sources and L -tyrosine and L -cysteine as nitrogen sources. It was positive for Tween 40 and 80 and negative for Tween 20. Cells were susceptible to all antibiotics tested. Table 1 shows several characteristics that distinguish the strain AN120528 T from the phylogenetically closely related strains. Table 1. Physiological and biochemical properties of the strain AN120528 T and related strains. All strains were gram-positive, aerobic and non-motile, and anaerobic growth was not observed. All strains also have MK-9 (H 4 ) and MK-10 (H 4 ) as respiratory quinones, meso-diaminopimelic acid as the diamino acid peptidoglycan and ribose and glucose as the major cell-wall sugars.  Table 2 lists the chemotaxonomic characteristics of the strain AN120528 T and its related strains. The major fatty acids (>30%) of AN120528 T were anteiso-C 15:0 (32.3%) and iso-C 15:0 (31.8%). Furthermore, the strain possessed MK-9 (H 4 ) and MK-10 (H 4 ), as the predominant menaquinones, and meso-diaminopimelic acid in the cell-wall peptidoglycan. The major polar lipids were DPG, PG, PE, AL and L ( Figure 2). The major cell-wall sugars were ribose and glucose.

Genome Analysis
The assembled draft genome of AN120528 T was 3.37 Mbp, containing 25 contigs with an N50 length of 408,672 bp, 3408 coding sequences, 10 rRNA and 52 tRNA ( Table 3). The gDNA G + C content was revealed to be 39.0%. The GenBank accession number for the genome sequences of the AN120528 T strain is JAKWBN000000000. Figure 3a shows the comparative genomic circular map. The OrthoANI values between AN120528 T and its related species S. alba KC615 T (accession no. WUUL00000000) and S. kribbensis KCTC 9933 T (ATZF01000001) were 77.53 and 77.60%, respectively ( Figure 3b). Furthermore, the dDDH values of S. alba KC615 T and S. kribbensis KCTC 9933 T were 20.60 and 21.13%, respectively ( Figure 3c). Based on the genome analysis on the RAST webserver, around 22% of detected genes-a total of 800 genes-were annotated in the subsystem (Figure 3d). The antiSMASH results on the functional metabolites showed BGCs for the one lasso peptide, three terpenes, one type I polyketide synthase (T1PKS), one type III polyketide synthase (T3PKS), three non-ribosomal peptide synthetases and one azol(in)e-containing peptide.

Genome Mining and Identification of Shimazuellin BGCs
Based on the precursor peptide sequence (protein ID; 00326) predicted from anti-SMASH, shimazuellin was identified as a class II lasso peptide, and was called shimazuellin due to it being the first lasso peptide discovered in the genus Shimazuella. Adjacent proteins (00327, 00328, 00329 and 00330) of the precursor estimated to be essential enzymes in the biosynthetic pathway were selected as queries, and a BLAST homology search was performed ( Table 4). The putative precursor peptide sequence was shown to have low levels of similarity with hypothetical protein PPOP_1752 (GAC42395.1) and hypothetical protein PPOP 1273 (GAC41916.1) of Paenibacillus popilliae ATCC 14706 at 47.06% and 45.45%, respectively. The putative lasso peptide-related proteins, which were annotated as uncharacterized proteins, had the highest amino acid sequence similarity to the lasso peptide biosynthesis protein (67.09%; accession no. WP_028776449.1), the asparagine synthase-related protein (65.54%; WP_028776448.1) and the hypothetical protein (60.70%; WP_028776447.1) of the strain KCTC 9933 T . Although they had low sequence similarities with even closely related proteins, lasso-related enzymes were identified by the local presence of gene encoding proteins matching the pivotal motif and domain for the precursor peptide, the lasso protease, the lasso cyclase, the RiPP recognition element (RRE) and the ABC transporter. The shimazuellin BGC housed the five major genes involved in peptide precursor, biosynthesis, maturation and secretion. Although the sequences and functions of the peptides can vary markedly, the BGC for shimazuellin has a typical lasso peptide biosynthetic gene locus encoding a linear precursor peptide without disulfide bonds, three conserved proteins for peptide maturation and a transporter to export the matured peptide (Figure 4a). ShiA consists of 22 amino acids for the leader peptide, 16 amino acids for the core peptide region, and five amino acids for the truncated C-terminal tail, respectively [34]. The shiC encodes for amidotransferase, ATP pyrophosphatase and asparagine synthetase-like protein domain, which is responsible for formation of both isopeptide bonds and subsequent macrocyclisation [35]. ShiB1, containing a pyrroloquinoline quinone protein domain D (PqqD), also known as the RRE, binds to the leader region and transfers the precursor peptide to the protease ShiB2 for further processing [34,36]. The ShiD is an ABC transporter which secretes shimazuelin from the cytoplasm to the extracellular space, and the presence of the ShiD indicates that the biosynthesised lasso peptides could exhibit antimicrobial activity. The putative leader region, MEYNSEWVEPKLIYLGSVEELT, was shown to have a VXPXLXXXG conserved motif, which is commonly found in lasso leader peptides [37]. The leader and core regions are separated by the emblematic TG motif needed to remove the leader peptide during maturation [38]. Residues G and D were identified in the core peptide which can form macrolactam rings [39] (Figure 4b). Furthermore, a cleaved tail containing D residues was identified. This is not commonly found in gram-positive bacteria. In addition, in this study, class II lasso peptides that do not have a cleaved tail were found in S. alba KC615 T (shimazuellin II and IV) and S. kribbensis KCTC 9933 T (shimazuellin II and shimazuellin III) using comparative genome analysis.   Figure 5 shows the proposed mechanism of shimazuellin biosynthesis and secretion. Four steps seem to be necessary for the biosynthesis and secretion of shimazuellin based on the gene clusters encoding separate ShiA, ShiB2, ShiC, ShiB1 and ShiD. First, after a precursor peptide of shimazuellin is translated from mRNA, ShiB1 binds to the VXPXLXXXG region of the leader peptide in ShiA to recognise ShiA. Second, the ShiB2 protein removes the leader peptide via proteolysis of the TG region and thereby releases the core peptide of shimazuellin. Next, ShiC, the lasso cyclase, activates the Asp carboxylic acid in the form of an adenosine monophosphate ester before catalysing the macrolactam formation via condensation with the α-amino group. Finally, the ShiD-encoded ABC transporter performs cleavage in the tail (-LAKDE-) and exports the mature form of shimazuellin out of the cells.

Genome Analysis
The assembled draft genome of AN120528 T was 3.37 Mbp, containing 25 contigs with an N50 length of 408,672 bp, 3408 coding sequences, 10 rRNA and 52 tRNA ( Table 3). The gDNA G + C content was revealed to be 39.0%. The GenBank accession number for the genome sequences of the AN120528 T strain is JAKWBN000000000. Figure 3a shows the comparative genomic circular map. The OrthoANI values between AN120528 T and its related species S. alba KC615 T (accession no. WUUL00000000) and S. kribbensis KCTC 9933 T (ATZF01000001) were 77.53 and 77.60%, respectively (Figure 3b). Furthermore, the dDDH values of S. alba KC615 T and S. kribbensis KCTC 9933 T were 20.60 and 21.13%, respectively (Figure 3c). Based on the genome analysis on the RAST webserver, around 22% of detected     Figure 5 shows the proposed mechanism of shimazuellin biosynthesis and secretion. Four steps seem to be necessary for the biosynthesis and secretion of shimazuellin based on the gene clusters encoding separate ShiA, ShiB2, ShiC, ShiB1 and ShiD. First, after a precursor peptide of shimazuellin is translated from mRNA, ShiB1 binds to the VXPXLXXXG region of the leader peptide in ShiA to recognise ShiA. Second, the ShiB2 protein removes the leader peptide via proteolysis of the TG region and thereby releases the core peptide of shimazuellin. Next, ShiC, the lasso cyclase, activates the Asp carboxylic acid in the form of an adenosine monophosphate ester before catalysing the macrolactam formation via condensation with the α-amino group. Finally, the ShiD-encoded ABC transporter performs cleavage in the tail (-LAKDE-) and exports the mature form of shimazuellin out of the cells.

Discussion
The genus Shimazuella, represented by S. kribbensis KCTC 9933 T , was first proposed in 2007 [2]. However, the Shimazuella spp. have consisted of only two recognized species so far. Based on our observation, along with previously published results, Shimazuella spp. prefer extremely limited or unique carbon sources for metabolism and take about 7 days or more to reach the stable late log phase or early stationary phase [2,4]. Shimazuella is considered to have evolved a metabolism that prefers other sugars and does not have a glucose metabolism, which is used as a basic carbon source for most organisms to survive in a community consisting of various microbes in soil. Because of these physiological characteristics, it could be difficult to isolate by antagonistic actions in the microbial community during the screening process of a single strain from the environment. Furthermore, the limitations of Shimazuella isolation were supported by an antibiotic susceptibility test and genome analysis in this study. These analyses found that Shimazuella spp. do not contain any antibiotic resistance genes except for one putative glycopeptide resistance gene cluster common to all strains and two antibiotic efflux-related genes in the strain KC615 T . In this study, we successfully isolated AN120528 T from soil using extreme serial dilution to obtain rare bacteria and investigate their functionality. Comparative genome analysis showed that the ANI and dDDH values between the strain AN120528 T and its related species were lower than the cut-off of 95-96% and were 70% for the delineation of a novel species, respectively [40,41]. Moreover, the results of the carbon utilisation and fatty acid composition allowed for the systematic differentiation of AN120528 T from related species. Therefore, the strain AN120528 T represents a novel species of the genus Shimazuella, for which we propose the name Shimazuella soli sp. nov. In addition, the description of the genus Shimazuella is as given previously [2,4], with the following modifications: its diagnostic polar lipids are DPG, PE and PG; its major fatty acids are anteiso-C 15:0 , iso-C 14:0 , iso-C 15:0 , iso-C 16:0 and C 16:0 ; and the G+C content is around 38.4-39.0%.
Shimazuellin I, a new lasso peptide belonging to class II in S. soli AN120528 T , was discovered, and lasso peptides with different sequences were also found in KC615 T (shimazuellin II and IV) and in KCTC 9933 T (shimazuellin II and shimazuellin III). Although its sequence homology with known lasso peptide biosynthetic enzymes was significantly low, sequence-based protein 3D modelling, comparative structure analysis, conserved domain and in silico molecular docking analysis demonstrated that the ShiA, ShiB1, ShiB2, ShiC and ShiD enzymes could be involved in lasso peptide biosynthesis. Shimazuellin I of S. soli AN120528 T was identified for the first time in this study, as it has a C-terminal cleavage tail unlike lasso peptides generally reported in gram-positive bacteria. In addition to shimazuellin I-IV, the following putative BGCs for various antimicrobial peptides were identified: enniatin, micrococcin P1, non-ribosomal tripeptide (D-Phe-D-Ala-Trp) and carnocyclin A in S. soli AN120528 T ; xenematide, lanthipeptide, micrococcin P1, sevadicin and xenotetrapeptide in KC615 T ; and micrococcin P1 and massetolide A in KCTC 9933 T . It is inferred that Shimazuella possesses various antimicrobial peptides for the stable uptake of nutrients after it germinates in the presence of sufficiently preferred sugars, even though it maintains a spore state under unfavourable conditions. This is similar to how Shimazuella has evolved a unique and limited carbohydrate metabolism to survive.
The trend in peptide science has changed from simply finding and applying natural peptides derived from organisms to the rational design of peptides with desirable physiological functions [42]. Major innovations in genomics, bioinformatics, and sequencing technology have enabled the rational design of excellent peptides with desirable biochemical activities. As a result, approximately 20 new therapeutic peptides have been released in the last 10 years, and dozens of peptides are in clinical development [43]. Therefore, the continuous discovery of new peptides that can inspire us makes peptides applicable to a wide range of diseases. At present, the existence of many lasso peptide gene clusters has been identified from a variety of bacteria with the development of bioinformatics and next-generation sequencing technology, resulting in revealed amino acid sequence diversity of precursor peptides [44]. Based on their biological functions, the peptides may be useful tools for the treatment of metabolic syndrome, autoimmune disease, microbial infections and cancer [7,45,46]. Furthermore, the macrocyclic forms of lasso peptides are an appropriate backbone for epitope grafting due to their proteolytic stability and thermostability [47]. These redesigned peptides can be applied as molecular probes and drug carriers for therapeutics [48][49][50]. Therefore, in addition to determining the extent of their physiological function, the major motivations of genomic efforts are to mine new examples of lasso peptides and discover novel classes, such as shimazuellin found in S. soli AN120528 T in our study. Considering the wide selection of active candidates, we suggest that the mining of shimazuellin can both contribute to an expansion in the scope of peptide therapeutics and be used in basic research that advances our peptide design capabilities. Taken together, we believe that the S. soli AN120528 T and shimazuellin we report here could be utilised as useful information to boost peptide research in the postgenomic era.

Data Availability Statement:
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.