The Complete Structure of the Core Oligosaccharide from Edwardsiella tarda EIB 202 Lipopolysaccharide

The chemical structure and genomics of the lipopolysaccharide (LPS) core oligosaccharide of pathogenic Edwardsiella tarda strain EIB 202 were studied for the first time. The complete gene assignment for all LPS core biosynthesis gene functions was acquired. The complete structure of core oligosaccharide was investigated by 1H and 13C nuclear magnetic resonance (NMR) spectroscopy, electrospray ionization mass spectrometry MSn, and matrix-assisted laser-desorption/ionization time-of-flight mass spectrometry. The following structure of the undecasaccharide was established: The heterogeneous appearance of the core oligosaccharide structure was due to the partial lack of β-d-Galp and the replacement of α-d-GlcpNAcGly by α-d-GlcpNGly. The glycine location was identified by mass spectrometry.


Introduction
Edwardsiella tarda is a Gram-negative bacterium and a pathogen of farmed fish. It is the etiological agent of a systematic disease called edwardsiellosis, which has been reported to affect a wide range of freshwater and marine fish [1,2]. In addition to fish, E. tarda is also an occasional human pathogen and known to cause both gastroenteritis and extraintestinal infections in humans [3,4]. A number of virulence-associated systems and factors, such as the type III and type VI secretion

Introduction
Edwardsiella tarda is a Gram-negative bacterium and a pathogen of farmed fish. It is the etiological agent of a systematic disease called edwardsiellosis, which has been reported to affect a wide range of freshwater and marine fish [1,2]. In addition to fish, E. tarda is also an occasional human pathogen and known to cause both gastroenteritis and extraintestinal infections in humans [3,4]. A number of virulence-associated systems and factors, such as the type III and type VI secretion systems, the LuxS/AI-2 quorum sensing system, and hemolysin systems, have been identified in E. tarda [5]. Additionally, a sialidase shows a potential pathogenicity and immunogenicity [6].
In Gram-negative bacteria, the lipopolysaccharide (LPS) is one of the major structural and immunodominant molecules of the outer membrane. It consists of three moieties: lipid A, core oligosaccharide, and O-specific polysaccharide (O-antigen). The O-antigen is the external component of LPS, and its structure consists of different number of repeating units. The O-specific polysaccharide chains are transferred to lipid A-core to form LPS, in a step involving WaaL, the putative bifunctional enzyme named O-antigen ligase. Another interesting feature is the high chemical variability shown by the O-antigen LPS, leading to a similar genetic variation in the genes involved in their biosynthesis, the so-called wb cluster (for a review, see [7]). Despite the emerging importance of this pathogenic microorganism, until now only four LPS structures of E. tarda strains were investigated [8][9][10][11].
In studies of several Enterobacteriaceae such as Escherichia coli, Salmonella enterica, and Klebsiella pneumoniae, genes involved in LPS core biosynthesis are usually found clustered in a region of the chromosome, the waa gene cluster [12,13]. On the other hand, a careful analysis of several full sequenced genomes suggested that genes for the LPS core biosynthesis may not be clustered and may be distributed between several regions, e.g., as in Yersinia pestis [14] or Proteus mirabilis [15]. In other cases, only a single gene involved in LPS core biosynthesis is out of the waa gene cluster, for instance, Plesiomonas shigelloides [16]. Nothing is known about the genomics or the LPS core structure from any E. tarda strain, besides the role played by the waaL (O-antigen ligase) characterized from strain EIB 202 [17]. E. tarda strain EIB 202 was isolated from moribund fish Scophthalmus maximum in a marine culture farm in China [18], and its full genome sequenced [19].
Here, the chemical structure of the core oligosaccharide in a pathogenic strain of E. tarda EIB 202 to proceed with the genomics of the core biosynthesis is reported for the first time.

Isolation of the Core Oligosaccharide
LPS of E. tarda EIB 202 was isolated from bacterial mass with a yield of 0.5%. The mild acid hydrolysis of the LPS yielded eight polysaccharide (PS) and oligosaccharide (OS) fractions: PSI-VI consisting of a core oligosaccharide substituted by several repeating units, and OSVII and OSVIII-the unsubstituted core oligosaccharide fractions. The high yield of PSI-VI suggested the smooth (S-LPS) type of E. tarda EIB 202 LPS. The data presented herein concern the OSVIII fraction. The differences between OSVII and OSVIII fractions are presented herein based on MALDI-TOF MS (matrix assisted laser desorption/ionization-time of flight mass spectrometry) and ESI MS n (electrospray ionization mass spectrometry) analysis.

Structure Analysis of the Core Oligosaccharide Fractions
The chemical analyses of OSVIII showed the presence of 2,3,7-trisubstituted L,D-Hepp, 3,4-disubstituted L,D-Hepp, 7-substituted L,D-Hepp, two terminal D-GlcpN, two terminal D-Glcp, two 4-substituted D-GalpA, and 5-substituted Kdop. The analyses of OSVII showed the presence of monosaccharides identified for OSVIII and additionally two sugar residues: the terminal D-Galp, and 3-substituted D-GlcpNAc was identified instead of the terminal D-GlcpN in OSVIII.
The 1 H NMR (nuclear magnetic resonance) spectrum of the OSVIII contained main signals for nine anomeric protons, and signals characteristic for the deoxy protons of Kdop residue belongs to part of the core oligosaccharide (residues A-J). The HSQC-DEPT (heteronuclear single-quantum correlation-distortionless enhancement by polarization transfer) spectra obtained for the OSVIII fraction contained signals for nine major anomeric protons and carbons, and Kdo spin systems, respectively ( Figure 1 and Table 1).  , and C-4 (δ C 77.9 ppm) signals, the large vicinal couplings between H-2 and H-3 and small vicinal coupling between H-3, H-4, and H-5. Residue I (δ H /δ C 5.47/102.5 ppm, 1 J C-1,H-1~1 74 Hz) was also recognized as the 4-substituted α-D-GalpA residue based on the similar characteristic five proton spin system. Residues G (δ H /δ C 5.33/95.6 ppm, 1 J C-1,H-1~1 76 Hz) and J (δ H /δ C 5.29/97.0 ppm, 1 J C-1,H-1~1 76 Hz) were recognized as the terminal α-D-GlcpN due to the large coupling between H-1, H-2, and H-3 and the small vicinal coupling between H-3, H-4, and H-5, as well as the chemical shift value of the C-2 (δ C 55.1 and δ C 55.1 for G and J, respectively). The 1D 31 P NMR spectrum showed no indication of phosphate groups in the OSVIII.
Additionally, the residue K (δ H /δ C 5.06/99.8 ppm, 1 J C-1,H-1~1 65 Hz) was recognized as the terminal α-D-GlcpNAc from a low 13 C chemical shift of the C-2 signal (δ C 54.6 ppm), and the large vicinal couplings between all ring protons. The N-acetyl group at δ H /δ C 2.13/23.2 ppm (δ C 175.9 ppm) was identified. The presence of heterogeneity in OSVIII was due to partial replacement of α-D-GlcpN In the HSQC-DEPT spectra of OSVIII (at δ H /δ C 3.90/41.8 ppm), additional negative CH 2 signals were detected. These resonances showed correlation with a carbonyl carbon signals at δ C 168.0 ppm in the HMBC (heteronuclear multiple bond correlation) spectra, suggesting the presence of glycine (residue L). This residue was also confirmed by mass spectrometry.
The monosaccharide sequence in OSVIII was established using a NOESY (nuclear overhauser spectroscopy) and HMBC experiments. NOESY spectra showed strong inter-residue cross-peaks between the following transglycosidic protons:  Figure 2B).       Figure 3A, the ion at m/z 162.00 corresponds to GlcN, while the ion at m/z 218.98 was explained by the GlcN-Gly. The similar pair of fragment ions at m/z 1116.93 and at m/z 1059.94 with the mass difference corresponding to the glycine residue was also identified. These ions were not identified on the spectrum of the ion m/z 1813.15 fragmentation ( Figure 3B). In Figure 3C, the daughter ion at m/z 260.94 was subsequently attributed to the GlcNAc-Gly fragment. The similar pair of fragment ions at m/z 1911.91 and at m/z 1651.79 with the mass difference corresponding to the glycine residue was also identified. These ions were not identified on the spectrum of the ion at m/z 2017.19 fragmentation ( Figure 3D). These observations indicate that glycine substitutes GlcN (residue J) in OSVIII and GlcNAc (residue K) in OSVII. The positions of glycine in OSVIII and OSVII were not determined.

Organization of the E. tarda Strain EIB 202 waa Gene Cluster
In most Enterobacteriaceae studied so far, the genes involved in core LPS biosynthesis were found clustered (waa gene cluster). When we inspected the currently available E. tarda strain EIB 202 genome, we found a clear region with the waa gene cluster (proteins encoded ETAE_0083 to ETAE_0072). This waa region, like in the majority of Enterobacteriaceae, is started by the hldE (encoded protein ETAE_0083), which codifies for the ADP-L-glycero-D-mannoheptose-6-epimerase, and the end flanked by the coaD (encoded protein ETAE_0071) codifying for phosphopantetheine adenylyltransferase [20].
Despite the genome annotation, it seems that more of the genes are shared by different Enterobacteriaceae mainly K. pneumoniae or P. shigelloides, which were previously characterized by us [7,14]. Table 2 shows proteins encoded from E. tarda EIB 202 waa.

Discussion
Here, the chemical structure and genomics of the complete undecasaccharide core structure of E. tarda EIB 202 LPS are presented for the first time. This core oligosaccharide is heterogeneous. The heterogeneity corresponded to the partial lack of β-D-Galp and the replacement of α-D-GlcpNAcGly by α-D-GlcpNGly. The functions of the genes found in the waa gene cluster from the E. tarda strain EIB 202 seems to be in agreement with the chemical structure of the LPS-core. The E. tarda core LPS structure is highly similar to that of K. pneumoniae at least up to the outer-core residue GlcNI and P. shigelloides 302-73 in practically up to the last monosaccharide residue that links the O-antigen LPS [7,14]. WabH and WapB are enzymes that transfer GlcNAc to a GalA in different acceptor substrates of LPS-core in an α(1→4) linkage. WabG and WapC are enzymes that transfer GalA to a Hep also in different LPS-core substrates and with different linkage, α(1→3) and α(1→7), respectively. It is important to note that besides performing the same enzymatic functions the acceptor substrate differences determine that the enzymes showed very little homology; furthermore, WabG and WapC are more similar among them (26 identity and 47% similarity) than WabH and WapB are (24% identity and 46% similarity), except that the latter ones showed identical linkage. This point indicates the importance of the substrates in the enzymatic reactions to build-up the LPS-core molecules. K. pneumoniae WabK is a glycosyltransferase that incorporates a Glc residue in a β(1→4) to GlcN. E. tarda waa ORF4, according to the E. tarda strain EIB 202 LPS-core established, as well as their low homology but unique to WabK, could be the galatosyltransferase that incorporates a Gal residue in a β(1→4) to GlcNAc E. tarda waa Orf5 encoding for ETAE_0079 (Table 2). All genes from the E. tarda waa cluster were found in the E. ictaluri genomes, except for wapB and wapC, which seems to be unique for the species E. tarda. All of the other genes from the E. tarda waa cluster (hldE, waaF, waaC, ETAE_0079, waaN, waaQ, wabG, wabH, waaA, and wapE) show 98% or more identity to the related E. ictaluri LPS-core biosynthetic genes according to their genomes. Of course, E. ictaluri waaL, which is the O-antigen ligase, is a bifunctional enzyme recognizing the O-antigen LPS and the LPS-core, as it usual shows a reduced identity (56%) compared to E. tarda waaL. Nevertheless, the E. ictaluri waaL-encoded protein shows the typical transmembrane domains (data not shown).
The E. tarda LPS motif β-Glc-(1→2)-α-L-HepII seems not to be encoded by any of the glycosyltransferases found in the waa cluster. This LPS motif is identical to a previously studied by us in the P. shigelloides strain 302-73 encoded by WapG. For this reason, we decided to blastx the P. shigelloides 302-73 WapG [16] against the E. tarda strain EIB 202 genome. We found a clear unique candidate, the gene encoding ETAE_1955, which showed 58% identity and 74% similarity to WapG. We suggest that it could be responsible for β-Glc-residue linked to HepII-[(1→2)-α-L-HepII].

Growth Conditions and Isolation of the Lipopolysaccharide and the Polysaccharide
Bacteria E. tarda EIB 202 was obtained from the Y. Zhang laboratory [18]. The bacteria were grown and harvested as described previously [21]. The LPS was extracted from bacterial cells of E. tarda EIB 202 by the hot phenol/water method [22]. LPS (200 mg) was degraded by treatment with 1.5% acetic acid at 100 • C for 45 min. The supernatant was fractionated on a column (1.6 × 100 cm) of Bio-Gel P-10, equilibrated with 0.05 M pyridine/acetic acid buffer, pH 5.4. Eluates were monitored with a Knauer differential refractometer, and all fractions were checked by NMR spectroscopy and mass spectrometry (MALDI-TOF and ESI MS n ).

Chemical Methods
Methylation of oligosaccharide fractions was performed according to the method described by Ciucanu and Kerek [23]. The absolute configurations of the monosaccharides were determined as described by Gerwig et al. [24]. Alditol acetates and partially methylated alditol acetates were analyzed by gas chromatography GC-MS with the Thermo Scientific TSQ system using an RX5 fused-silica capillary column (0.2 mm by 30 m) and a temperature program of 150 → 270 • C at 12 • C/min.

Instrumental Methods
All NMR spectra were recorded on a Bruker Avance III 600 MHz spectrometer equipped with a 5 mm QCI cryoprobe with z-gradient. The measurements were performed at 303 K without sample spinning and using the acetone signal (δ H /δ C 2.225/31.05 ppm) as an internal reference. The signals were assigned by one-and two-dimensional experiments: 1 H-1 H COSY (correlation spectroscopy), TOCSY (total correlated spectroscopy), NOESY, 1 H-13 C HSQC-DEPT, HSQC-TOCSY, and HMBC. In the TOCSY experiments, the mixing times were 30, 60, and 100 ms. The NOESY experiment was performed with the mixing time of 200 ms, and HMBC experiment with a delay of 80 ms. For observation of phosphorus atoms, one-dimensional 31 P NMR spectra were recorded. The data were acquired and processed using standard Bruker software. The processed spectra were assigned with the help of the SPARKY program [25].

Comparative Genomics
For each analyzed genome we gathered all coding sequence (CDS) and pseudo-CDS information by parsing NCBI GenBank records. When we obtained the UniProt Knowledge Base records for these loci using the cross-reference with Entrez GeneIDs and parsed them for gene names, functional annotations, and associated COG, PFAM, and TIGRFAM protein domains were studied. To annotate orthologs, we wrote custom scripts to analyze reference sequence alignments made to subject genomes with blastn and tblastn via NCBI's web application programming interface.