The First Data on the Complete Genome of a Tetrodotoxin-Producing Bacterium

Tetrodotoxin (TTX)-producing bacteria have attracted great interest as a model system for study of the TTX biosynthetic route. Here, we report the complete genome of the TTX-producing bacterium Bacillus sp. 1839. The genome of the strain Bacillus sp. 1839, previously isolated from the TTX-bearing marine ribbon worm Cephalothrix cf. simula, was obtained using second generation Illumina and third generation nanopore sequencing technologies. Phylogenetic analysis has classified this strain as Cytobacillus gottheilii.


Introduction
Tetrodotoxin (TTX), broadly distributed in marine ecosystems, is one of the most studied neurotoxins of the 20th-21st centuries [1]. Its ability to selectively bind voltagegated sodium channels resulted in the popularity of the toxin in medical, pharmaceutical, and scientific spheres [2]. Despite the wide distribution, the molecular basis of TTX biosynthesis is still unresolved. The first attempt to decipher the genes involved in TTX production was made by Liu et al. [3]. The authors suggested the TTX-producing ability of the bacteria Aeromonas sp. strain Ne-1 was associated with the copy number of plasmid pNe-1, containing 32 open reading frames encoding hypothetical proteins. However, the bacteria lost the plasmid after 18 h of culture. In the other work, the authors suggested an association between some natural product biosynthesis genes (polyketide synthase (PKS) and non-ribosomal peptide synthetase (NRPS)) and the TTX-producing ability of the microflora of toxic gastropods [4].
Bacillus sp. 1839 was isolated from the TTX-bearing nemertean Cephalothrix cf. simula in 2014 [5]. Confocal laser scanning microscopy with polyclonal antibodies against TTX allowed us to reveal TTX-positive labeling in the cells of the strain. Further detailed investigations with immunoelectron microscopy with anti-TTX antibodies revealed that toxin labeling was directly linked with the sporulation forms of the bacterium [6]. The life cycle and sporulation conditions studies [7,8] showed that TTX labeling was preserved through numerous passages for several years after the discovery of the strain. In 2019, the TTX producing ability of the strain was confirmed with high-performance liquid chromatography with tandem mass spectrometry [9]. TTX was revealed in the culture of the strain enriched with spores, confirming previous results.
A bacterial strain with TTX production in laboratory conditions is of great interest for the toxin biogenesis investigation. This study is the first to present the complete genome of a bacterium with in vitro TTX producing ability.

General Genome Features of Bacillus sp. 1839
The specific features of the Bacillus sp. 1839 genome are summarized in Table 1 and Figure 1. The genome of the strain consists of a single 4.5 Mb circular chromosome with 39% GC content ( Figure 1A) and a 0.06 Mb plasmid with 34% GC content ( Figure 1B). The Bacillus sp. 1839 genome is predicted to include 4527 total genes, of which 4369 (96.5%) are protein-coding genes, 119 (2.6%) are RNA-coding, and 39 (0.9%) are pseudogenes (Table 1). Among the predicted genes, 3508 (77.5%) are associated with general Clusters of Orthologous Groups (COG) function categories (Table 2), however, 22.8% of them are poorly characterized and are assigned to the S group with unknown functions. Among all COG groups, genes encoding transcription (K, 6.2%), amino acid transport and metabolism (E, 5.8%), inorganic ion transport and metabolism (P, 5.7%), and carbohydrate transport and metabolism (G, 4.9%) are the most abundant. Using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database, 2314 coding sequences (CDSs) of Bacillus sp. 1839 genome were assigned to 212 KEGG pathways (Supplementary Table S1). Among all the KEGG pathways, "Metabolic pathways" (547), "Biosynthesis of secondary metabolites" (254), "Microbial metabolism in diverse environments" (129), "Biosynthesis of amino acids" (104), "Biosynthesis of cofactors" (112), "Two-component system in signal transduction" (91), "ABC transporters" (90), and "Carbon metabolism" (81) accounted for the largest proportion. Comparative genomic analysis of Bacillus sp. 1839 using antiSMASH identified four gene clusters related to secondary metabolite production (Table 3).  . From outer to inner rings, circle 1: the size of the complete genome; circle 2: the predicted protein-coding genes on the sense and antisense strands (colored according to Clusters of Orthologous Groups categories); circle 3: RNA genes; circle 4: G + C content, with >39% G + C in blue, with ≤39% G + C in orange; circle 5: G + C skew, with G% > C% in green, with G% < C% in violet. . From outer to inner rings, circle 1: the size of the complete genome; circle 2: the predicted protein-coding genes on the sense and antisense strands (colored according to Clusters of Orthologous Groups categories); circle 3: RNA genes; circle 4: G + C content, with >39% G + C in blue, with ≤39% G + C in orange; circle 5: G + C skew, with G% > C% in green, with G% < C% in violet.  Mobile genetic elements predicted in the genome of Bacillus sp. 1839 are summarized in Supplementary Table S2. Analysis of transposable elements revealed numerous insertion sequences (IS) distributed over the genome of the strain. The majority of IS belong to the IS1182 (76), followed by IS3 (10), IS21 (3), IS4 (1), IS110 (1), and IS1595 (1) families. Eight genomic islands (GEIs) are found in the genome with the IslandViewer4. The largest GEI is also assigned as an intact prophage by the PHASTER server. A total of two intact (score >90), one incomplete (score <70), and one questionable (score 70-90) prophage regions were predicted in the genome of the strain, indicating previous phage infection. Two clustered regularly interspaced short palindromic repeats (CRISPR) loci were also detected.

Phylogenetic Analysis and Genome Similarity Measures
The phylogenetic tree based on 16S rRNA gene sequences revealed that Bacillus sp. 1839 is closest to Cytobacillus gottheilii ( Figure 2). Up to now, only two complete genome sequences of this species have been deposited to the National Center for Biotechnology Information (NCBI) database (Table 4). We used these genomes for a more detailed analysis. As shown in Table 5, the closely related Cytobacillus strains resulted in a high average nucleotide identity (ANI) (>97%), exceeding the threshold value of 95% for distinguishing different species. Digital DNA-DNA hybridization (dDDH) values of Bacillus sp. 1839 against reference genomes were in the range of 79.6-85.9%, also exceeding the 70% DDH cutoff for species delineation ( Table 6). This combination of analyses allowed us to classify Bacillus sp. 1839 as Cytobacillus gottheilii.

Discussion
In this study, we have described the genomic and phylogenetic features of the TTXproducing bacterium Bacillus sp. 1839. The phylogenetic analysis allowed to assign this strain to Cytobacillus gottheilii. The strains of this species were not earlier reported in TTXproduction. Due to the complex structure of TTX, its biosynthetic pathway is not even predicted to date. It is assumed that the carbon backbone of TTX may originate through a polyketide, C5 branched sugar, or terpene [10]. The specific guanidinium moiety of the toxin can be obtained from a donor, such as an arginine, via amidinotransferase, similarly to amidino group transfer from l-arginine in saxitoxin biosynthesis [11], or NRPS and PKS systems [10]. The KEGG pathway database used to map the Bacillus sp. 1839 genome assigned 15 CDSs to "Arginine biosynthesis" and 17 CDSs to "Arginine and proline metabolism" indicating the potential ability of the bacterium to involve arginine in TTX production. Terpene and PKS gene clusters were also mined in the genome of the strain using

Discussion
In this study, we have described the genomic and phylogenetic features of the TTXproducing bacterium Bacillus sp. 1839. The phylogenetic analysis allowed to assign this strain to Cytobacillus gottheilii. The strains of this species were not earlier reported in TTX-production. Due to the complex structure of TTX, its biosynthetic pathway is not even predicted to date. It is assumed that the carbon backbone of TTX may originate through a polyketide, C5 branched sugar, or terpene [10]. The specific guanidinium moiety of the toxin can be obtained from a donor, such as an arginine, via amidinotransferase, similarly to amidino group transfer from l-arginine in saxitoxin biosynthesis [11], or NRPS and PKS systems [10]. The KEGG pathway database used to map the Bacillus sp. 1839 genome assigned 15 CDSs to "Arginine biosynthesis" and 17 CDSs to "Arginine and proline metabolism" indicating the potential ability of the bacterium to involve arginine in TTX production. Terpene and PKS gene clusters were also mined in the genome of the strain using antiSMASH. This study is the first to show the genome of a bacterial strain capable of TTX production in the laboratory-a good candidate for the unraveling of the molecular mechanisms of TTX synthesis. Genome sequencing is the first step allowing further experimental work aimed at gene cloning and expression, and reconstruction of the TTX synthetic and metabolic networks. The genome description and taxonomic classification opens the door to the comparative study of mutational patterns, ecological adaptations, and virulence determinants of the TTX producer and its closely related bacterial strains. Moreover, the genome obtained gives the possibility to reveal genes and traits not present in the other representatives of this species or, on the contrary, to find common features indicating their TTX-producing ability. Further investigations with the strain will be focused on the transcriptome studies on different stages of the life cycle of the bacterium.

DNA Extraction
The strain Bacillus sp. 1839 (KF444411-KF444416) was previously isolated from the TTX-bearing nemertean Cephalotrix cf. simula (Ivata, 1952) [5]. For DNA analysis, a strain was obtained from the Collection of Marine Heterotrophic Bacteria, A.V. Zhirmunsky National Scientific Centre of Marine Biology, Far Eastern Branch of the Russian Academy of Sciences. The strain was aerobically cultivated in 2 mL of Youschimizu-Kimura liquid medium [5] at 23 • C overnight. Bacteria were centrifuged at 3000 g for 10 min. The bacterial pellet was suspended in 1 mL of lysis buffer containing 20 мM Tris-HCL (pH 8.0), 2 мM EDTA, 1.2% Triton X-100, and 20 mg/mL lysozyme, and incubated for 30 min at 37 • C. Genomic DNA of the strain was extracted using a GeneJET Genomic DNA Purification Kit #K0721 (Thermo Fisher Scientific, Waltham, MA, USA) according to the manufacturer's instructions. For plasmid DNA extraction through Illumina sequencing, a GeneJET Plasmid Miniprep Kit # K0502 (Thermo Fisher Scientific, Waltham, MA, USA) was used. The DNA quality was evaluated using 1% agarose gel electrophoresis, an UV5Nano spectrophotometer (Mettler Toledo, Columbus, OH, USA), and a Qubit ® 2.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA). The optical density ratio at 260/280 nm and 260/230 nm was >1.8 and >2.0, respectively. DNA was stored at −20 • C until further processing.

Genome Sequencing
The complete genome of strain Bacillus sp. 1839 was sequenced by the Illumina HiSeq 2500 (Illumina Inc., San Diego, CA, USA) and MinIon (Oxford Nanopore Technologies, Oxford, UK) platforms. Sequencing on the Illumina HiSeq 2500 system was performed at Genoanalytica Company (Moscow, Russia). Genome DNA was fragmented by the Covaris M220 sonicator (Covaris, Woburn, MA, USA). Libraries were constructed using the NEBNext ® Ultra™ II DNA Library Prep Kit for Illumina ® (Illumina Inc., San Diego, CA, USA) for paired-end sequencing, with a target average insert size of 200 bp. For nanopore sequencing, genomic DNA was fragmented by passing through a small gauge needle. The fragmented genomic DNA was used to construct the library with the 1D Genomic DNA by ligation kit SQK-LSK108 following the instructions provided by the manufacturer. Sequencing was carried out on a MinION Mk1B sequencer (Oxford Nanopore Technologies) using an R9.4.1 Flow Cell. The read qualities were examined by FastQC.

Genome Assembly
Low quality reads from the Illumina HiSeq 2500 system were filtered out before de novo genome assembly by a St. Petersburg genome assembler (SPAdes) v. 3.7.1. [12] (kmer = 127), followed by genome finishing using the CONTIGuator tool v. 2.7 3. [13]. The filtered Illumina reads were used to improve the de novo assembly from the MinIon reads. The reads from the MinIon system were assembled by the Staden Package pipeline v. 2.0.0b11 (http://staden.sourceforge.net/ (accessed on 6 June 2021)). The final contigs were circularized by Unicycler v. 0.4.8 [14] and validated by BUSCO [15]. The final assembly was 98.3% complete with 0.4% of the sequence predicted to be missing, as estimated by BUSCO.

Genome Annotation
The complete genome sequence of Bacillus sp. 1839 was annotated using the NCBI Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP). The gene functions were determined against the NCBI UniProt/Swiss-Prot and non-redundant (NR) protein databases, COG, Gene Ontology (GO), and KEGG databases with the E-value cutoff set to 10 −5 and subsequent filtering for the best hit. tRNA and rRNA were identified by tRNAscan-SE v. 2.0 [16] and RNAmmer 1.2 [17], respectively. Circular representation of the genome including noncoding RNAs and gene function annotations was generated using the Circos software (version v.0.69-9) (http://www.circos.ca/ (accessed on 6 June 2021)). IS were predicted and classified with the ISFinder platform [18] against the ISfinder database v. 2.0 (http://www-is.biotoul.fr (accessed on 6 June 2021)). GEIs were detected with the Island-Viewer4 online server using IslandPick, SIGI-HMM, and IslandPath-DIMOB prediction methods with default parameters [19]. CRISPR loci were detected using the CRISPRCas-Finder online server [20]. Prophages in the genome were predicted with PHASTER online server [21,22]. The potential secondary metabolic gene clusters were predicted using antiSMASH v. 5.0 [23].

Phylogenetic Analysis and Genome Similarity Calculations
The 16S rRNA gene sequences of some strains closely related to Bacillus sp. 1839 were obtained by the BLASTN search against the NCBI database. The phylogenetic tree of Bacillus sp. 1839 and closely related species based on the 16S rRNA gene sequences was constructed using the neighbor-joining method [24] in MEGA X [25]. The evolutionary distances were computed using the maximum composite likelihood method [26] with 1000 bootstrap replications [27]. For genome similarity analysis, the genomes of two organisms closely related to the newly sequenced Bacillus sp. 1839 species were retrieved from the NCBI GenBank database ( Table 4). The ANI values between the Bacillus sp. 1839 and closely related species Cytobacillus spp. were calculated using the BLASTALL algorithm (ANIb) and tetranucleotide frequency correlation coefficient (Tetra) with default parameters of the web server JSpecies v 1.2.1 [28]. Pairwise dDDH values were calculated using the Genome-to-Genome Distance Calculator (GGDC 2.1) [29].