Draft Genome Sequence Data of Lysinibacillus sphaericus Strain 1795 with Insecticidal Properties

: Lysinibacillus sphaericus holds a signiﬁcant agricultural importance by being able to produce insecticidal toxins and chemical moieties of varying antibacterial and fungicidal activities. In this study, the genome of the L. sphaericus strain 1795 is presented. Illumina short reads sequenced on the HiSeq X platform were used to obtain the genome’s assembly by applying the SPAdes v3.15.4 software. The genome size based on a cumulative length of 23 contigs reached 4.74 Mb, with a respective N50 of 1.34 Mb. The assembled genome carried 4672 genes, including 4643 protein-encoding ones, 5 of which represented loci coding for insecticidal toxins active against the orders Diptera, Lepidoptera, and Blattodea. We also revealed biosynthetic gene clusters responsible for the synthesis of secondary metabolites with predicted antibacterial, fungicidal, and growth-promoting properties. The genomic data provided will be helpful for deepening our understanding of genetic markers determining the efﬁcient application of the L. sphaericus strain 1795 primarily for biocontrol purposes in veterinary and medical applications against several groups of blood-sucking insects. Dataset: The raw genome sequencing data of Illumina HiSeq X were submitted to the NCBI SRA database in a FASTQ format with BioSample SAMN37209907, under BioProject PRJNA1011199. The assembled genome is available in the NCBI GeneBank under ASM3119793v1.


Summary
Lysinibacillus sphaericus (formerly called Bacillus sphaericus) is a spore-forming bacterium first described as an insect pathogen nearly six decades ago by Kellen et al. [1,2].Despite being initially perceived as a highly effective mosquito control agent [3][4][5], this species was later shown to exhibit a wide range of other characteristics, including insecticidal activity against species other than Diptera, and bactericidal, fungicidal, plant growth-promoting, and bioremediation activities, among others, thus being potentially useful in agriculture [1,6,7].The majority of L. sphaericus strains produce spore-associated larvicidal binary toxins comprised of two subunits called Tpp1 and Tpp2.These subunits were formerly known as BinA and BinB, respectively [1,8].Some strains are also capable of producing the 3-domain cry-toxin Cry48, requiring binary Tpp49 protein to activate toxicity [9].An extensive usage of the spore-crystal complex does not fully exploit the insecticidal potential of strains secreting other toxins during the vegetative stage, leading to the emergence of resistant insects [1,7,10].Given the aforementioned information, there is a high demand for isolating, characterizing, and testing novel strains, especially those synthesizing previously unreported proteins and compounds with agriculturally valuable activities [11].In this context, the ongoing accumulation of genomic data provides insights into possible mechanisms delineating the potential usefulness of the isolates and could ease the selection of promising strains.

Isolation of Lysinibacillus sphaericus Strain 1795 and Characterization of Its Morphology and Insecticidal Activity
The Lysinibacillus sphaericus strain 1795 was isolated from a freshwater pond inhabited by Aedes sp.larvae located in Babolovsky park, Pushkin, St. Petersburg, Russia.When cultivated on a Lysogeny broth agar nutrient medium [12], the strain forms yellow-white, smooth, flat, shiny, and circular colonies (Figure 1a).The vegetative cells are rod-shaped, 0.6-1.0 × 1.5-5.0µm in size, and capable of forming subterminal spores (Figure 1b).According to the information given in the certificate of deposition, the strain is highly toxic to the second instar larvae of a set of harmful mosquito species: Aedes caspis, Aedes communis, Aedes dorsalis, Aedes dorsalis, Aedes flavescens, Aedes leucomelas, and Culex pipiens molestus (Supplementary Data S1).
These data suggest the suitability of the Lysinibacillus sphaericus strain 1795 for the development of biological preparations against blood-sucking insects and determine the importance of studying its genome to decipher the molecular determinants of insecticidal properties.

Genome Assembly and Annotation
The whole genome of the 1795 strain was sequenced using the Illumina HiSeq X platform with 150 bp paired-end reads.According to the quality control reported using the FastQC v0.12.1 [13] and fastP v0.23.2 [14] programs, the sequencing data of the short-read DNA libraries, both raw and devoid of adapters, were of high quality, i.e., presented uniform distributions of quality scores, GC content, etc.The draft, de novo genome assembly generated using the SPAdes v3.15.4 software [15] consisted of 23 contigs with a total size and N50 of 4.74 Mb and 1.34 Mb, respectively.The genome's completeness was equal to 99%, while contamination constituted only 1%, as revealed with CheckM v1.2.2 [16] (Supplementary Data S2).Other properties of the assembly are presented in Table 1.
Table 1.The main characteristics of the draft genome assembly of the L. sphaericus strain 1795 obtained using the QUAST v5.2.0 [17] and CheckM v1.2.2 [16]  When utilizing the BUSCO v5.4.2 program [18], we found that the number of fully assembled single-copy orthologues was at least 99.8% percent, when compared with both the Bacillales_odb10 and Bacilli_odb10 databases (Table 2).Therefore, the results indicate the high quality and completeness of the genome assembly.
Table 2. Estimation of the presence of BUSCO v5.4.2 markers [18] in the protein-coding genes presented in the assembly.Presented are the number of orthologs found in the assembly, coupled with their respective percentages.
The annotation with the Prokka v1.14.6 tool [21] showed that the genome of the studied strain contained 4672 genes, 4643 of which were coding sequences, with 1128 of them marked as hypothetical proteins and lacking annotations (Supplementary Data S3).The BtToxin_Digger v1.0.10 tool indicated that the genome analyzed housed loci coding for insecticidal toxins, namely, Mtx1Aa1, Mpp3Aa1, Tpp1Aa2, Tpp2Aa2, and Spp1Aa1.The respective toxins were shown to exert an effect on a wide range of insects belonging to the orders Diptera, Lepidoptera, and Blattodea.According to the inferences obtained with the BtToxin_Digger v1.0.10 [22] and CryProcessor v1.0 [23] utility, the genome did not contain cry genes (Table 4).The biosynthetic gene clusters in the L. sphaericus strain 1795's genome revealed with the DeepBGC v0.1.30tool [25] belonged to seven gene clusters responsible for the synthesis of secondary metabolites with predicted bactericidal properties.The usage of the antiSMASH v6.1.1 tool [26], in turn, resulted in the eight biosynthetic gene clusters listed in Table 5.The clusters with the highest similarity to the known entities were fencing and petrobactin.The former is known for its strong fungicidal activity, whereas the latter, being a siderophore, serves as a metal-chelating peptide that diminishes iron accessibility to pathogens, thereby contributing to the reduction in pathogenic microorganisms within the soil [27][28][29].Siderophores could also exert a potential growth-promoting effect on plants, providing them with essential iron [30].
Table 5. Biosynthetic gene clusters harbored in the genomic assembly predicted with the antiSMASH v6.1.1 [26] and DeepBGC v0.1.30[25] programs.The score reflects the accuracy of cluster prediction obtained with the DeepBGC v0.1.30program, while the similarity to the known clusters is calculated with the antiSMASH v6.1.1 program.The "-" symbol indicates that the biosynthetic cluster was found by only one program out of the two used.Having analyzed the gathered genomic data, we might conclude that the strain possesses insecticidal efficacy, along with potential bactericidal and fungicidal properties.It follows, therefore, that it may find its further application primarily, but not limited to, as a biological control agent against blood-sucking insects.

DNA Extraction
For total DNA isolation, the bacterial culture was grown for 12 h on a liquid Spizizen nutrient medium [31,32] with aeration at +28 • C. It was then centrifuged and washed three times with the buffer (EDTA 0.01M, NaCl 0.15 M pH 8.0).Next, we added 500 µL of the above buffer and 15 µL of a 10 mg/mL Ribonuclease A solution (VWR International Ltd., Poole, UK) to the washed cells.To perform cell lysis, the samples were incubated for 60 min at +37 • C with 10 µL of lysozyme (PanReac AppliChem, Darmstadt, Germany) and 5 µL of mutanolysin (Sigma Chemical Co., St. Louis, MO, USA) added to the solution.The lysozyme had previously been diluted in a buffer (20 mM TrisCl pH 8.0, 2 mM EDTA, 1.2% Triton X-100) to a 10 mg/mL concentration.A mutanolysin working solution (1 mg/mL) was also prepared using the buffer with the following chemical composition: 0.05 M of TES, 1 mM of MgCl 2 , a pH of 7.0.The purification of the sample from polysaccharides and proteins was carried out by adding 3 µL of proteinase K (600 U/mL; ThermoFisher Scientific, Bremen, Germany) to the cell lysate (3 µL, 30 min incubation at +37 • C).The samples were then incubated with 10% of sodium dodecyl sulfate (50 µL, 10 min incubation at +65 • C) and 80 µL of cetyltrimethylammonium bromide (CTAB) and NaCl solution in a ratio of 1:10 to achieve the effective denaturation of the proteins.A further DNA purification was performed through phenol-chloroform extraction, without the addition of isoamyl alcohol.The DNA was precipitated by adding isopropanol to the samples, followed by washing three times with 70% of freshly prepared ethanol solution.At the last stage, 30 µL of Tris-EDTA (TE) buffer (pH 8.0) was added to dissolve the DNA, and the samples were left in a refrigerator at +4 • C for 18-24 h.

DNA Quality Control
The concentration of the isolated genomic DNA was assessed using a Qubit ® 3.0 fluorimeter and a Qubit dsDNA BR Assay kit (Life Technologies, Eugene, ON, USA).The contamination with proteins, phenol, or other contaminating agents was evaluated using 260 nm/280 nm and 260 nm/230 nm absorbance ratios, with a value of ≥1.8 indicating the purity of the sample.The measurements were carried out on a CLARIOstar Plus multimodal reader (BMG labtech, Germany).Additional qualitative and quantitative analyses of the DNA samples were performed using electrophoresis in 1% of agarose gel stained with 0.002% of ethidium bromide via comparison with the λ DNA/HindIII marker (Thermo Fisher Scientific, Inc., Waltham, MA, USA).

Whole Genome Sequencing, De Novo Genome Assembly, and Annotation
The whole genome sequencing was conducted on the Illumina HiSeq X platform (Illumina) in the paired-end mode, with a read length of 2 × 150 bp, by Macrogen Inc. (Seoul, Republic of Korea).Quality control of the short nucleotide reads was executed with FastQC v0.12.1 [13].The removal of the adapter sequences and the additional quality control of the reads was performed using fastp v0.23.2 [14].The de novo assembly of the genome was made using SPAdes v3.15.4 [15] in the "--careful" mode.The obtained assembly was then quality-controlled with QUAST v5.2.0 [17].The taxonomy-wise completeness of the assembly was evaluated by calculating the percentage of the one-copy orthologs from the "Bacillales_odb10" and "Bacilli_odb10" databases using the BUSCO v5.4.2 software [18].The benchmarking datasets used in the analysis were based on the v10 release of the OrthoDB database [33].We also verified the taxonomical attribution by assessing the completeness and contamination level utilizing CheckM v1.2.2 [16].
In the next stage, we used fastANI v1.33 [20] to reveal the phylogenetically closest genomes belonging to the Lysinibacillus spp.downloaded from the NCBI RefSeq database [19] by picking ten genomes with the highest average nucleotide identity (ANI) values when compared with our assembly.The selected dataset was then applied to train a model for Prodigal v2.6.3 [34], which was further used for the accurate prediction of coding sequences with the Prokka v1.14.6 [21].To increase the number of meaningful annotations, we included more than 700,000 protein sequences of the Bacillus spp.from the Identical Protein Groups database [35] as the most-trusted proteins for Prokka-derived annotations.

Table 3 .
[20]list of the phylogenetically closest assemblies relative to the genome of the studied strain according to the ANI value calculated with the fastANI v1.33 software[20].

Table 4 .
[24]repertoire of insecticidal toxins identified in the genome of the analyzed strain using the BtToxin_Digger v1.0.10 program[22].The target species describe experimentally derived data deposited in the BPPRC[24](Bacterial Pesticidal Protein Resource Center) specificity database.