Genomic Analysis of Sphingopyxis sp. USTB-05 for Biodegrading Cyanobacterial Hepatotoxins

Sphingopyxis sp. USTB-05, which we previously identified and examined, is a well-known bacterial strain for biodegrading cyanobacterial hepatotoxins of both nodularins (NODs) and microcystins (MCs). Although the pathways for biodegrading the different types of [D-Asp1] NOD, MC-YR, MC-LR and MC-RR by Sphingopyxis sp. USTB-05 were suggested, and several biodegradation genes were successfully cloned and expressed, the comprehensive genomic analysis of Sphingopyxis sp. USTB-05 was not reported. Here, based on second and third generation sequencing technology, we analyzed the whole genome of Sphingopyxis sp. USTB-05, which is 4,679,489 bp and contains 4,312 protein coding genes. There are 88 protein-coding genes related to the NODs and MCs biodegradation, of which 16 genes (bioA, hmgL, hypdh, speE, nspC, phy, spuC, murD, glsA, ansA, ocd, crnA, ald, gdhA, murC and murI) are unique. These genes for the transformation of phenylacetic acid CoA (PA-CoA) to CO2 were also found in Sphingopyxis sp. USTB-05. This study expands the understanding of the pathway for complete biodegradation of cyanobacterial hepatotoxins by Sphingopyxis sp. USTB-05.


Introduction
With the rapid development of the world's agricultural and industrial sectors, a large amount of wastewater and domestic sewage containing nitrogen and phosphorus are discharged into water bodies, resulting in increasing natural water eutrophication. Record-breaking harmful algal blooms and other severe impacts are becoming increasingly frequent [1]. Cyanobacterial hepatotoxins including microcystins (MCs) and nodularins (NODs) are derived from cyanobacteria and are highly toxic, causing a risk to humans and aquatic animals. At least 279 variant structures of MCs have been reported [2]. NODs were also identified to have 12 variant structures [3]. Microcystin-LR (MC-LR), MC-RR, MC-YR, and [D-Asp 1 ] NOD have been found and studied. The World Health Organization (WHO) prescribed that the concentration of MC-LR in our drinking water should not be higher than 1.0 µg/L [4]. The lethal dose concentration of nodularins caused by poisoning episodes in certain places is around 50 µg/kg [5,6].
Biodegradation is an efficient and environmentally friendly method to eliminate hepatotoxins. Since Sphingomonas ACM-3962 was first reported as a biodegradable MC-LR bacterium in 1994 [7], a variety of bacterial strains for biodegrading MCs from different ecosystems have been found. The majority of strains have been identified as Sphingomonas and Sphingopyxis of the family Sphingomonadaceae [8][9][10][11]. In surface water, Sphingomonas sp. ACM-3962 could biodegrade 1 mg/L MC-LR after a delay period of 2-8 days [12]. However, Sphingomonas sp. ACM-3962 has been demonstrated to biodegrade MC-LR, but not [D-Asp 1 ] NOD [12]. Further studies found that three enzymatic reaction processes at least involve in MCs biodegradation by Sphingomonas sp. ACM-3962 [13], as well as gene clusters involved in MCs biodegradation (mlrD, mlrA, mlrB and mlrC) [12,14,15]. Currently, more than 29 MCs biodegrading strains have been identified in Sphingomonadaceae, including 23 strains containing mlr gene clusters and 6 strains having biodegradation gene clusters other than mlr gene clusters [16][17][18]. Although studies have shown that MCs have many biodegradation pathways, the majority of them have yet to be thoroughly defined [19][20][21].
Although the bacterial strains, enzymes, and genes for biodegrading both MCs and NODs in Sphingomonadaceae have been well studied, research on the genome of bacterial strains for biodegrading hepatotoxins is rarely reported, and a large number of genes encoding enzymes in the biodegradation pathway of hepatotoxins are rarely clarified. To fully comprehend the biodegradation process, it is essential to identify the corresponding genes and enzymes for biodegrading hepatotoxins through genomic data mining. We analyzed the whole genome of Sphingopyxis sp. USTB-05, which 88 genes related to the biodegradation of both NODs and MCs, 16 of which are unique (bioA, hmgL, hypdh, speE, nspC, phy, spuC, murD, glsA, ansA, ocd, crnA, ald, gdhA, murC, and murI). These genes for the transformation of phenylacetic acid CoA (PA-CoA) to CO 2 were also found in Sphingopyxis sp. USTB-05.

General Genome Features of Strain USTB-05
The genome of Sphingopyxis sp. USTB-05 (4.679 Mb), with an overall GC content of 64%, accounts for 62.39% of the total encoding sequences ( Figure 1 and Table 1). Without a CRISPR site, the genome sequences of strain USTB-05 comprises 4312 predicted proteinencoding sequences. Forty-eight tRNAs and one tmRNA were identified ( Table 1). The 16S rRNA of strain USTB-05 is one copy gene within a single genome. . From inner to outer ring: circles 1 illustrates position in megabases (black); circles two and three denote GC Content and GC Skew, respectively; circles four and five indicate forward and reverse strand CDS (purple), tRNA (light purple), rRNA (light blue), respectively; circle six is COG analysis of reverse strand CDSs; circle seven is COG analysis of forward strand CDSs. Abbreviations: L, replication, recombination, and repair; E, amino acid transport and metabolism; Q, secondary metabolites biosynthesis, transport and catabolism; K, transcription; M, cell wall, membrane, envelope biogenesis; S, function unknown; H, coenzyme transport and metabolism; F, nucleotide transport and metabolism; P, inorganic ion transport and metabolism; O, posttranslational modification, protein turnover, chaperones; C, energy production and transformation; T, signal transduction mechanisms; J, translation, ribosomal structure, and biogenesis; I, lipid transport and metabolism; U, intracellular trafficking, secretion, and vesicular transport; D, cell cycle control, cell division, chromosome partitioning; V, defense mechanisms; N, cell motility; G, carbohydrate transport and metabolism; B, chromatin structure and dynamics.

Gene Ontology Annotation
Gene ontology (GO) is a standardized gene functional classification system that tenders to a dynamic-updated controlled vocabulary. In the GO database, gene functions are categorized as biological processes, cellular components, and molecular functions. The GO analysis indicates that a total of 4,731 GO terms are associated with all unigenes (Figure 2, Supplementary Table S1). According to the secondary classification of the GO terms, all unigenes are sorted into 49 functional groups. Biological process is the main category of GO annotations (2,012, 42.53%) unigenes, followed by cellular component (1,787,37.77%) and molecular function (932, 19.70%). Most of the biological process categories are represented by the cellular process (29.08%) and metabolic process (27.98%), suggesting that the bacterium has strong metabolic activity. There are also some subcategories including response to stimulus (8.20%), cellular component organization or biogenesis (6.51%), biological regulation (6.46%), regulation of biological process (5.22%), growth (4.77%) and localization (3.28%). Cell (32.51%), cell part (32.51%), membrane (12.53%) and protein containing complex (7.67%) are the cell gene clustering of three main components. The catalytic activity (51.50%) and binding (34.23%) represent most of the molecular function category, forecasting that the bacterium has a high degree of molecular catalysis ( Figure 2).

Cluster of Orthologous Groups Classification
The Cluster of Orthologous Groups (COG) is a database used to classify gene products based on their homology. Unigenes Sphingopyxis sp. USTB-05 are annotated in the COG database is 62.39%. A total of 4040 classified unigenes are divided into 25 functional categories. The four main groups of amino acid transport and metabolism (352, 8.71%), transcription (309, 7.65%), lipid transport and metabolism (273, 6.76%), and function unknown (825, 20.42%) are the most prevalent. The biodegradation of Adda is completed by carbohydrate transport and metabolism (4.85%), this is also a key category to consider. In addition, the biodegradation of cyanobacterial hepatotoxins is dependent on various biological enzymes during cellular processes and signaling (20.15%). Thus, posttransla-tional modification, protein turnover, chaperones (4.06%) are also considered an important functional group (Figure 3, Supplementary Table S2).

Discussion
Biodegradation is an efficient way to remove cyanobacterial hepatotoxins from water bodies. 16S rRNA-oriented phylogeny is used to evaluate the taxonomic placement of bacteria [10]. In 2010, 16S rDNA sequence analysis showed that the Sphingopyxis sp. USTB-05 was most similar to the reference strain Sphingopyxis sp. C-1 [22]. However, evolutionary relations among the Sphingopyxis genus of MC-biodegrading bacteria were re-evaluated.
Hashimoto et al. [44] noted that genes referred to the biodegradation of MC-LR comprised a lot more than these four genes. However, there are few genomic data on strains that biodegrade cyanobacterial hepatotoxins, limiting further research on the biodegradation mechanism. Via the KEGG database metabolic pathway annotation, the following genes may be referred to the biodegradation processes of [D-Asp 1 ]NOD, MC-LR, MC-YR, and MC-RR by Sphingopyxis sp. USTB-05. These genes gdhA, ansA, and ald in the metabolic pathways of glutamate, alanine and aspartate are referred to the biodegradation of D-glutamate, D-alanine and D-erythro-β-methylaspartate. The involving biodegradation genes of Larginine are speE, crnA, hypdh, nspC, ocd, spuC, and phy in the metabolic pathway of proline and arginine. murC, murD, and murI take part in the biodegradation of D-glutamate. glsA, bioA and hmgL participate in the biodegradation of D-isoleucine (Table 2).
Adda is the detoxification end-product produced by the final enzymatic reaction. Sphingopyxis sp. USTB-05 was forecasted to possess a full set of genes taken part in phenylacetate biodegradation, in addition to the gene encoding AMP-forming phenylacetyl-CoA ligase (PA-CoA ligase) ( Figure 4). Recently, genes and transposable elements associated with phenylacetate biodegradation have been identified near the mlr gene cluster [36,45]. A previous report [36,37] demonstrated the identification of a set of genes involved in the phenylacetate biodegradation in Sphingopyxis sp. C-1. However, the gene encoding phenylacetyl-CoA ligase was absent. Sphingopyxis sp. YF1, which leans on the mlr biodegradable metabolic pathway, can also biodegrade Adda by the phenylacetic acid metabolism pathway [46].

Conclusions
The whole genome of Sphingopyxis sp. USTB-05 consists of a circular chromosome of 4,679,489 bp with 4312 protein-coding genes including 88 genes related to the biodegradation of both NODs and MCs, of which 16 genes (bioA, hmgL, hypdh, speE, nspC, phy, spuC, murD, glsA, ansA, ocd, crnA, ald, gdhA, murC, and murI) are unique. These genes for the transformation of phenylacetic acid CoA (PA-CoA) to CO 2 were also found in Sphingopyxis sp. USTB-05. This study expands the understanding of the pathway for complete biodegradation of cyanobacterial hepatotoxins by Sphingopyxis sp. USTB-05.

Bacterial Strains
Sphingopyxis sp. USTB-05 was isolated and identified from the sediment of Dianchi Lake in Kunming, Yunnan, China [22].

DNA Extraction and Sequencing
Sphingopyxis sp. USTB-05 was initially incubated on the original solid isolation media at 30 • C for 48 h. A single colony was selected and cultivated in the culture medium of previous report [22]. The genomic DNA was extracted using the Rapid Bacterial Genomic DNA Isolation Kit (CoWin Biosciences, Taizhou, Jiangsu, China) according to the manufacturer's instructions. NanoDrop (Thermo Fisher Scientific, Waltham, MA, USA) analysis and gel electrophoresis were used to determine the purity and concentration of the DNA samples. A small fragment second-generation genomic library with a size of 350 bp was constructed using the NEBNext ® Ultra™ II DNA kit. The genome was sequenced by using Illumina X10 platform (Madison, WI, USA) [47]. The third-generation genomic library was structured by the standard protocol of Oxford Nanopore Technologies (ONT, Oxford, UK).

Genome Assembly and Quality Control
Prior to genome assembly, the qualities of the next-generation sequencing reads were optimized by fastp software v0.23.2 before assembly. The sequences with a quality value of Q < 25 and containing linker fragments were deleted. The first fastq formatted data for nanopore sequencing was gathered using FAST5 files included in the MinKNOW software v4.0.4 package. For genome assembly, a total of 11 Mb Nanopore long reads with an N50 length of 8 kb were produced ( Figure S1). Spades software (combined with its development process) was used for hybrid assembly, while Pilon software v1.5 was used to correct the assembly results.

Genome Annotation
The online NMPDR-rust server was used to forecast the gene and coding sequence (CDs). All unigenes were functionally annotated using the Pfam and Swiss-Prot databases. Circos calling a visualization tool was effective in evidencing variation in the genome's structure. The annotation of eggNOG of protein-coding genes was completed by blast software v2.9.0 [48]. The GO annotation of protein-coding genes were annotated using the Pfam and SwissProt databases. Kobas 3.0 software was used to document KO pathway annotations of protein-coding genes. In addition, the CRISPRFinder software v4.2.19 was used to forecast the clustered regularly interspaced short palindromic repeats (CRISPR) structure of the Sphingopyxis sp. USTB-05 genome [49]. The coding sequences of the genome were arranged for using MUMmer version 4.0+ and analyzed in combination with the results of the genome annotation [50].

Nucleotide Sequence Accession Number
The sequence data were submitted to NCBI Sequence Read Archive (https://www. ncbi.nlm.nih.gov/sra/ (accessed on 6 March 2022)) with accession numbers CP084712, CP084930-CP084933. The sequence data will be released on 31 October 2023.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/toxins14050333/s1. Table S1: Gene classification results: Gene Ontology (GO) classifications of genes. It includes the hierarchy of GO and gene number in this GO term; Table S2: Gene classification results: Clusters of Orthologous Group (COG) classifications of genes. It shows the COG class, abbreviation, gene numbers and IDs in the COG term; Figure S1: The length of the three generations of nanopore data is distributed in turn; Figure S2: Comparison of genes related to hepatotoxin biodegradation in the genomes of Sphingomonas morindae sp. NBD5 and Sphingopyxis sp. USTB-05.

Conflicts of Interest:
There are no conflict of interest to declare.