Molecular Modeling of the Shigella flexneri Serogroup 3 and 5 O-Antigens and Conformational Relationships for a Vaccine Containing Serotypes 2a and 3a

The pathogenic bacterium Shigella flexneri is a leading global cause of diarrheal disease. The O-antigen is the primary vaccine target and distinguishes the 30 serotypes reported. Except for serotype 6, all S. flexneri serotypes have a common backbone repeating unit (serotype Y), with variations in substitution creating the various serotypes. A quadrivalent vaccine containing serotypes 2a and 3a (as well as 6 and Shigella sonnei) is proposed to provide broad protection against non-vaccine S. flexneri serotypes through shared epitopes and conformations. Here we model the O-antigen (O-Ag) conformations of serogroups 3 and 5: a continuation of our ongoing systematic study of the S. flexneri O-antigens that began with serogroup 2. Our simulations show that S. flexneri serogroups 2, 3, and 5 all have flexible O-Ags, with substitutions of the backbone altering the chain conformations in different ways. Our analysis suggests three general heuristics for the effects of substitution on the Shigella O-Ag conformations: (1) substitution on rhamnose C reduces the extension of the O-Ag chain; (2) substitution at O-3 of rhamnose A restricts the O-Ags to predominantly helical conformations, (3) substitution at O-3 of rhamnose B has only a slight effect on conformation. The common O-Ag conformations across serotypes identified in this work support the assumption that a quadrivalent vaccine containing serotypes 2a and 3a could provide coverage against S. flexneri serotype 3b and serogroup 5.


Introduction
Diarrheal diseases cause over 1.6 million deaths each year [1], disproportionately affecting low-income regions [2] and young children [3]. Shigella flexneri is a leading cause of enteric infections, with no licensed vaccine currently available [4]. The increasing prevalence of antibiotic resistant strains [5-8] necessitates a broad coverage Shigella vaccine to prevent infection [4,9] and reduce the global disease burden [10,11].
Here we compare simulations of serogroups 3 and 5 with our recent work on the serogroup 2 O-Ags, contrasting the O-Ag behavior for O-acetylated serotype 2a (group O-factor 9) and 3a (group O-factors 6 and 7,8) with the non-vaccine serotype 3b (group O-factor 6) as well as serotypes 5a (type O-factor V) and 5b (group O-factor 7, 8). With this large data set for comparison, we aim to broadly identify guiding heuristics for the conformational effect of substitutions on particular positions of , (e) 3b, (f) 5a, and (g) 5b. All serogroups share the serotype Y backbone and are distinguished by substitutions, which are labelled with the associated O-factors. Vaccine serotypes 2a and 3a are indicated with an asterisk. Schematic diagrams are depicted using the Symbol Nomenclature for Glycans (SNFG) symbol set [35] where green triangle-Rha, blue square-GlcNAc, blue circle-Glc, red triangle-O-acetylation. Some studies have indicated cross-reactivity between serogroups 2, 3, and 5. Serotypes 2a, 3b, and 5a each react with group O-factor 3,4 antisera (associated backbone trisaccharide residues C-D-A) [12] and partial cross-reactivity with these strains is demonstrated from serotype 2a in human testing [36]. Serotypes 2b, 3a, and 5b share glucosylation on O-3 of rhamnose A (group O-factor 7,8) and strong cross-reactivity is reported from a candidate 2a/3a vaccine against 2b and 5b in guinea pigs [37]. Therefore, a vaccine containing serotypes 2a and 3a (expressing group O-factors 6; 7,8 and 9) is suggested to elicit broad cross-protection against the remaining serogroup 2, 3, and 5 serotypes [19].
Here we compare simulations of serogroups 3 and 5 with our recent work on the serogroup 2 O-Ags, contrasting the O-Ag behavior for O-acetylated serotype 2a (group O-factor 9) and 3a (group O-factors 6 and 7,8) with the non-vaccine serotype 3b (group O-factor 6) as well as serotypes 5a (type O-factor V) and 5b (group O-factor 7,8). With this large data set for comparison, we aim to broadly identify guiding heuristics for the conformational effect of substitutions on particular . We assume that O-factors with a significant conformational effect should be represented in the vaccine serotypes to allow for broad coverage. Ultimately, we aim to determine whether the conformational findings from our computational modeling supports the assumption that a quadrivalent vaccine containing S. flexneri serotypes 2a and 3a (as well as 6 and S. sonnei) could provide broad coverage against S. flexneri serotype 3b and serogroup 5.

Materials and Methods
The S. flexneri O-Ags have glycosidic linkages described by two dihedral angles, ϕ and ψ, defined as ϕ = H1-C1-O1-C x ' and ψ = C1-O1-C x '-H x '. These definitions are analogous to ϕ H and ψ H in IUPAC nomenclature and are consistent with our previous carbohydrate modeling [26,38] This work follows our established methodology for the computational study of carbohydrate antigens. Potential of mean force (PMF) calculations for the disaccharide fragments of the O-Ag repeating unit indicate the global minima for each linkage, which are then used to build short 3 RU chains for initial 300 ns MD simulations in solution. The most populated linkage conformations from the 3 RU simulations are then used to construct starting structures for the simulations of the 6 RU chains [39][40][41].

ϕ, ψ PMF Calculations
The low-energy conformations of the glycosidic linkages were determined by calculating the potential of mean force (PMF) for rotation about the ϕ and ψ dihedral angles of each disaccharide linkage. PMFs were calculated with the metadynamics algorithm [42] as implemented in NAMD [43]. The disaccharide PMFs were calculated in the gas-phase with the ϕ, ψ dihedral angles set as collective variables.

Molecular Dynamics
Simulations were run with the NAMD software package [43], employing CUDA extensions to leverage graphics processing units for the calculation of long-range electrostatic potentials and non-bonded forces [44]. Carbohydrates were modelled with the CHARMM36 additive force field for carbohydrates [45,46] and explicit water molecules were represented with the TIP3P water model [47].
Our in-house CarbBuilder software was used to build the carbohydrate structures prior to simulation [48]. Initial 3 RU chains of the serogroup 3 and 5 O-Ags (not discussed here) were built with glycosidic linkage conformations set to the energy minimum of the respective disaccharide PMFs. The RUs of the O-Ag chains modeled in this study are as follows with the serotype-defining moieties in bold: The 3 RU simulations were run for 1000 and 300 ns (for serogroups 3 and 5, respectively), with the most frequent dihedral angles from these simulations used to build the initial conformations for the 6 RU chains. These conformations were then subjected to 10,000 steps of standard NAMD minimization in vacuum and subsequently placed into a cubic water box with the solvate command from the Visual Molecular Dynamics (VMD) package [49]. The cubic water boxes for the 6 RU structures had side lengths of 100 and 90 Å, respectively, for serogroups 3 and 5, and periodic boundary conditions were employed. The solvated structures were gradually heated through a protocol of 5 K incremental temperature reassignments between 10 and 310 K, with 1000 steps of NAMD minimization and 1000 steps of MD after each temperature reassignment. Equations of motion were integrated using the velocity-Verlet method [50] with a 1 fs step size. Molecular dynamics simulations were performed under the isothermal-isobaric (nPT) ensemble at a temperature of 310 K and maintained with a Langevin piston barostat [43] and Nose-Hoover thermostat-a hybridized method of the Nose-Hoover constant pressure method [51] with piston fluctuations controlled by Langevin dynamics [52], as implemented in NAMD. Particle mesh Ewald (PME) summation [53] was used for calculation of long-range electrostatics, with k = 0.20 Å −1 and PME grid dimensions that were set equal to the periodic cell size. Non-bonded interactions were truncated at 15.0 Å and a switching function implemented between 12.0 and 15.0 Å. The 1-4 interactions were not scaled, in accordance with CHARMM force field recommendations.

Block Averaging Analysis
Block averaging analysis is used to assess simulation convergence and is implemented with in-house Python scripts [54]. The block averaging analysis algorithm splits a simulation trajectory with N frames into a set of M "blocks" with a length of n frames, such that N = M × n. Next, an average of a selected measurable (e.g., end-to-end distance) is calculated within each block. The block length (n) is slowly increased and, at each value of n, the set of block averages are recalculated. The standard deviation in the set of block averages, σ n , is used to determine the blocked standard error (BSE) for each value of n. The simulation is indicated to be converged once the running estimate of the BSE asymptotes to a plateau, where the plateau represents the true standard error in the estimate of the mean [55].

Data Analysis
Simulations underwent 200 ns of equilibration followed by production runs of 1 and 2 µs (for serogroups 3 and 5, respectively). Snapshots of molecular conformations were taken at 25 ps intervals from the simulation trajectories. Inter-atomic distances and dihedral angles were measured from VMD's Tcl console and graphical user interface, and statistical calculations were performed with in-house Python scripts. For all saccharides, we defined the end-to-end distance, r, as the length from C-2 of rhamnose B at the non-reducing end of the chain and C-1 of rhamnose C at the reducing end, thus excluding the very flexible terminal sugar units.
Molecular conformations were visualized in VMD [49], with carbohydrate rings highlighted by the PaperChain visualization algorithm [56]. Before conformational clustering, the trajectory snapshots were aligned on the ring atoms of the central 'C-D-A-B' fragment between RU3 and RU4-a frame-shifted full repeating unit to account for the variability in each linkage across the different serotypes. The most common chain conformations are determined by clustering the simulation snapshots into families with relative occupancies. We cluster the central 4 RU of each 6 RU chain, as the terminal repeating units are less representative of the native O-Ag backbones. VMD's internal cluster command was employed to calculate the conformational clusters in the production runs with an RMSD fit of the non-hydrogen atoms in the central 4 RUs with a cut-off of 5.5 Å. Clusters comprising less than 5% of the simulation were excluded. The conformations of the previously published serotype Y and serogroup 2 simulations [26] were recalculated under these criteria for a fair comparison between the serogroups.

Results
We begin our analysis of the simulation data with a broad comparison of the O-Ag chain extension and flexibility of S. flexneri serogroup 2 with serogroups 3 and 5; then we analyze the dominant backbone conformation of each O-Ag; finally, we explore the conformational effects of the glucosylation and O-acetylation on the orientations of the backbone glycosidic linkages.

Simulation Convergence
We used block averaging analysis [54,55] of two metrics of chain flexibility-the end-to-end distance, r, and the radius of gyration, R g -to assess the convergence of the MD simulations. Convergence is indicated by the plots of the blocked standard error (BSE) for both r and R g (shown in Figure S1a,b) reaching a plateau. The asymptote of the BSE plot represents the true standard error in the measured variable, which can be used to approximate a correlation time for the simulation. The range of correlation times from 18 to 104 ns indicate that the 200 ns equilibration period is sufficient for all the O-Ags in this study. Further analysis reveals that the number of statistically independent samples in each simulation is much greater than 1 (49,58,88, and 21 for 3a, 3b, 5a, and 5b, respectively)-as recommended for a converged trajectory [55]. Therefore, block averaging analysis indicates that the longer production runs (1000 ns for serogroup 3 and 2000 ns for serogroup 5) provide sufficient sampling of the conformational space. The simulations of the more flexible serogroup 5 O-Ags were extended to 2 µs to ensure that convergence was achieved.

O-Ag Flexibility
The fluctuation in r over the course of a simulation is a simple measure of molecular extension and flexibility for the S. flexneri O-Ags. Here we define r as the distance between C-2 of rhamnose B in RU1 and C-1 of rhamnose C in RU6 (Figure 2a).  Figure 2h). In particular, the O-factor II glucosylation that defines serogroup 2 has a very dramatic effect on r, reducing the overall extension and flexibility of the chain. In contrast, the O-factor V glucosylation that defines serogroup 5 has a less obvious effect on r, slightly increasing the average chain extension. Finally, O-factor 7,8 (O-3 glucosylation on rhamnose A; serotypes 2b, 3a, and 5b) appears to narrow the distribution of r, making the O-Ag chains more conformationally defined. For example, there is a clear difference in the shape of the r distribution for serotype 3a (Figure 2f) as compared to 3b (Figure 2g): serotype 3a has a unimodal distribution of r (mean 45 Å, σ ≈ 13 Å) whereas in 3b (which lacks O-factor 7,8) r is shifted to smaller values and has a right-skewed distribution with a peak at 25 Å (mean 29 Å, σ ≈ 14 Å). This significant shift in r distribution within serogroup 3 indicates a substantial conformational difference between the 3a and 3b O-Ags, and hence a significant conformational effect for O-factor 7,8. We have previously observed the same effect in serogroup 2 and it can be observed within serogroup 5, which is the least conformationally defined of the four serogroups. Serogroup 5 is the most similar to the Y backbone, but does not show the same clear bimodal distribution. The r distributions for serotype 5b (expressing O-factor 7,8) show a slight shift to more extended conformations (Figure 2i) as compared to 5a (Figure 2h). However, the similarity of the r distributions for 5a and 5b suggest similar conformational behavior for both serotypes.
(substitution at O-3 of rhamnose B, serogroup 5) does not have a significant effect on chain conformation. Our analysis suggests that, at a first approximation, these three heuristics are additive. To test and refine these rules of thumb, as well as investigate the potential for cross-reactivity between serotypes, we now perform a detailed comparison of the chain conformations for all O-Ags. suggests that, at a first approximation, these three heuristics are additive. To test and refine these rules of thumb, as well as investigate the potential for cross-reactivity between serotypes, we now perform a detailed comparison of the chain conformations for all O-Ags.  (Figure 3d), producing a dominant helical conformation with 3 RU per turn and an average pitch of 29 Å (3a-1, 11%). However, 3a remains more flexible than 2b and can adopt a wide range of helical conformations (3a-3, 3a-5), as well as partially extended chains (3a-2), fully extended chains (3a-4), and S-bends (3a-6). Therefore, the serogroup 3 O-Ags follow a similar trend to serogroup 2 [26]: substitution on rhamnose C restricts the chains to curved conformations for both serotypes 2a and 3b (Figure 3b,f) and additional substitution on O-3 of rhamnose A shifts the conformations towards helices for serotypes 2a-3Ac, 2b, and 3a (Figure 3c-e). Further, the axially orientated O-acetyl groups of serogroup 3 are readily accessible for antibody binding in both serogroup 3 O-Ags, which supports the reported immunodominance of O-factor 6 [31].

O-Ag Conformations
A comparison of the primary conformations of serogroup 5 with the backbone shows the effect of O-factor V (glucosylation at O-3 of rhamnose B) on the chain conformation. Relative to the backbone (Figure 3a), the 5a O-Ag shows an increase in the prevalence of elongated helices (Figure 3g): the dominant conformation (5a-1, 25%) is a right-handed helix with 3 RU per turn, in agreement with an early helical model prediction [24]. This extended helical structure (pitch of 30 Å) has significant flexibility-the helix encompasses just 25% of the simulation and frequently unwinds to extended chains (5a-2 and 5a-4) as well as C-curve conformations (5a-3 and 5a-5) that are also present in the backbone (e.g., Y-4 and Y-5). Therefore, O-factor V has only a slight impact on the backbone conformation. However, the glucose side chains (colored cyan in Figure 3g,h) are exposed for antibody binding.

O-Ag Glycosidic Linkage Conformations
As carbohydrate rings have fairly constrained chair conformations, chain flexibility in the S. flexneri O-Ags arises principally from rotations about glycosidic linkages, which are commonly measured via the φ and ψ dihedral angles. Ring substitutions can increase or, more commonly, decrease, the range of motion for a glycosidic linkage. The range of motion for each of the glycosidic linkages in the serogroup 3 and 5 O-Ag RUs over the course of the simulations is shown in Figure 4 (Figure 3h, 5b-1, 24%) to 5a, the more minor C-curve conformations in 5a are replaced with helices in 5b (5b-2, 13%; 5b-4, 7%). The chain also has a unique tight hairpin bend conformation (5a-3, 11%) corresponding to the short r values adopted early in the 5b r time series (Figure 2i).
In summary, conformational analysis suggest refinement of our proposed three broad heuristics for the effects of substitution on the backbone conformation of the Shigella O-Ags, as follows.

O-Ag Glycosidic Linkage Conformations
As carbohydrate rings have fairly constrained chair conformations, chain flexibility in the S. flexneri O-Ags arises principally from rotations about glycosidic linkages, which are commonly measured via the ϕ and ψ dihedral angles. Ring substitutions can increase or, more commonly, decrease, the range of motion for a glycosidic linkage. The range of motion for each of the glycosidic linkages in the serogroup 3 and 5 O-Ag RUs over the course of the simulations is shown in Figure 4 with scatter plot heatmaps of the ϕ, ψ dihedral angle distribution over the course of the simulations. Fragments of the O-Ag backbone showing the relative arrangements of the sidechains, the N-acetyl (blue) and the O-acetyl (red) substituents for each of the serotypes are shown in Figure 5.
For the Y-backbone (Figure 4a), the ϕ dihedral for all linkages is restricted to a narrow range of values around ϕ ≈ 40 • , whereas the ψ dihedral is more flexible with two primary conformations at ψ ≈ 10 • and ψ ≈ −35 • (hereafter referred to as +ψ and −ψ, respectively). The D-A linkage is the most constrained, having the narrowest range for psi, because the close proximity of the N-acetyl group to this β-D-GlcpNAc-(1→2)-α-L-Rhap III linkage restricts ψ rotations. The backbone dihedral angle conformations are consistent with the scatter plots of the ϕ, ψ linkages from short (60 ns) simulations of 3 RU chains [27]. Further, the estimates of key NOE distances (by r 6 averaging) are in good agreement with NMR NOE measurements for the native LPSs of serotype 3a [27] (Supplementary Materials Table S1) and 5a [24] (Supplementary Materials Table S2), providing validation for our MD simulations [57].
Comparison of the heatmaps for substituted O-Ags with the backbone (serotype Y) maps allows a closer identification of the specific effect of substitutions on the O-Ag local chain flexibility and dynamics.
For the first heuristic, we found that substitution at rhamnose C (serogroups 2 and 3) restricts the extension of the O-Ag chain to predominantly curved conformations. Comparison of the ϕ, ψ heatmaps for the Y backbone (Figure 4a In contrast, the serogroup 5 O-Ag B-C linkages remain unconstrained, adopting predominantly +ψ orientations (mean ≈ 7 • ) that result in a greater prevalence of extended structures despite glucosylation on O-3 of rhamnose B (O-factor V). This accounts for the more extended helical conformation in our molecular dynamics simulations as compared to the static model which was built with −ψ B-C orientations [24].    Figure 4, fifth column) is in turn also restricted by the N-acetyl moiety with a single conformation at ϕ, ψ ≈ −52 • , −36 • , which is in agreement with NMR measurements for short serotype 3a fragments that predict a −ψ orientation [31].
O-acetylation at this position (O-factor 9, serotype 2a-3Ac, Figure 4c) has a lesser, but similar, restriction on the range of rotation. Steric clashes between these two groups explain the large conformational restriction produced by glucosylation (serotypes 2b and 3a) as well as O-acetylation (2a-3Ac, Figure 5b) on rhamnose A. The conformational effects of this restraint are to increase the incidence of more regular helical structures in these serotypes.
Finally, for the third heuristic, we found that substitution at O-3 of rhamnose B (O-factor V, serogroup 5) has only a slight effect on conformation, shifting the backbone to somewhat more extended O-Ag conformations. Comparison of the heatmaps for serogroup 5a (Figure 4g) with the backbone shows that glucosylation at this position has little effect on the backbone A-B and B-C linkages: the glucose side chains do not interfere with rotations about the bonds (Figure 5f,g).

Discussion
Our simulations show that S. flexneri serogroups 2, 3, and 5 all have very flexible O-Ags. However, substitutions of the backbone residues limit the range and distribution of chain conformations in different ways. Our analysis has suggested three broad heuristics for the effects of substitution on the backbone conformation of the Shigella O-Ags: (1) substitution on rhamnose C has the greatest impact on restricting the extension and conformational range of the O-Ags; (2) substitution at O-3 of rhamnose A (2a-3Ac, 2b, 3a, 5b) also has a strong impact, restricting the O-Ags to predominantly helical conformations; (3) substitution at O-3 of rhamnose B (serogroup 5) has only a slight effect on conformation.
Can this conformational analysis give some insight into whether a quadrivalent vaccine containing S. flexneri serotypes 2a, 3a (as well as 6 and S. sonnei) could provide broad coverage against S. flexneri serotypes 3b, 5a, and 5b? The factors that lead to cross-protection between O-Ags are not well understood. However, an assumption that similar O-Ag conformations is a necessary (if not sufficient) criterion for cross-protection between O-Ags seems reasonable. However, immunodominant substitutions that change the binding surface (but perhaps not the conformation) may be confounding factors. On a conformational basis, we postulate that the two substitutions that produce the greatest conformational effects should be represented in the vaccine serotypes. Therefore, the vaccine should contain serotypes with substitutions on rhamnose C (O-factors II and 6) as well as rhamnose A (O-factors 9 and 7,8). Inclusion of an additional serotype with substitution at rhamnose B (O-factor V) seems less likely to be necessary. On the basis of this argument, the 2a-3Ac serotype containing both O-factors II and 9 would seem to be sufficient, whereas 2a (only substituted on rhamnose C) would not.
Cross-protection within serogroup 2 (2a-3Ac and 2b) seems likely due to the similar helical conformations of the serotypes within the group, and cross-protection with the helices in serogroup 3 and serogroup 5 may be possible. However, this conformational argument does not consider serogroup 3 s immunodominant O-2-acetylation on rhamnose C (O-factor 6) [31,32], which is a strong basis for including this serotype. Further reasons to include 3a are the prevalence of 3a infection, the lack of expected cross-protection from 2a [58][59][60], and potential cross-protection by 3a against 2b arising from the shared glucosylation of rhamnose A (O-factor 7,8). Although serotype 3b is conformationally more similar to the non-acetylated 2a chain, the minor C-curve and helical conformations of 3b may allow for partial cross-reactivity from a 2a-3Ac vaccine component. Further, serotypes 3a and 3b share minor conformational families, which may be sufficient to elicit cross-reactivity. Furthermore, serogroup 3a may provide cross-reactivity with other disease-causing serogroups: serotypes 1b and 4b express O-factor 6 and serotype X expresses O-factor 7,8.
Finally, on the basis of conformational similarity, we suggest that the inclusion of serogroup 5 is not necessary in the vaccine, as serotype 5a shares similar helical structures with the 2a-3Ac chain. Further, the partial O-acetylation on O-3 of rhamnose A (O-factor 9) for serotype 5a could provide cross-reactivity with an O-acetylated serotype 2a vaccine, although the exposure of the O-factor V glucosylation for antibody binding may be a confounding factor. In future work, we will investigate the O-Ags of the next most prevalent serogroups identified by the GEMS report-serogroups 6 and 1-to allow for further development of our heuristics for the conformations of the S. flexneri O-antigens.
Supplementary Materials: The following are available online at http://www.mdpi.com/2076-393X/8/4/643/s1, Figure S1: Blocked standard error calculations for determining the extent of simulation convergence, Table S1. Comparison of serotype 3a 1H distances to NMR measurements and previous MD study, Table S2: Comparison of serotype 5a 1 H distances to NMR measurements and previous MD study.