Extracting Conformational Ensembles of Small Molecules from Molecular Dynamics Simulations: Ampicillin as a Test Case

The accurate and exhaustive description of the conformational ensemble sampled by small molecules in solution, possibly at different physiological conditions, is of primary interest in many fields of medicinal chemistry and computational biology. Recently, we have built an on-line database of compounds with antimicrobial properties, where we provide all-atom force-field parameters and a set of molecular properties, including representative structures extracted from cluster analysis over µs-long molecular dynamics (MD) trajectories. In the present work, we used a medium-sized antibiotic from our sample, namely ampicillin, to assess the quality of the conformational ensemble. To this aim, we compared the conformational landscape extracted from previous unbiased MD simulations to those obtained by means of Replica Exchange MD (REMD) and those originating from three freely-available conformer generation tools widely adopted in computer-aided drug-design. In addition, for different charge/protonation states of ampicillin, we made available force-field parameters and static/dynamic properties derived from both Density Functional Theory and MD calculations. For the specific system investigated here, we found that: (i) the conformational statistics extracted from plain MD simulations is consistent with that obtained from REMD simulations; (ii) overall, our MD-based approach performs slightly better than any of the conformer generator tools if one takes into account both the diversity of the generated conformational set and the ability to reproduce experimentally-determined structures.

Recently, in the framework of the TRANSLOCATION consortium [26], we have started building a large database of antimicrobial compounds containing all-atom FF parameters, as well as physico-chemical descriptors derived from both quantum-mechanics and µs-long MD simulations [27].Our database, freely accessible online, is to our knowledge the first extensive one including dynamical properties for a large set of compounds.Besides the specific application to the translocation of antibiotics through bacterial porins [28,29] and to their extrusion by efflux pumps [30,31], this piece of information can be useful for protein-ligand molecular docking [32], whose success rate is known to be strongly dependent on the generation of accurate input geometries and on the description of the flexibility of both partners involved in the binding process [33].While receptor flexibility is nowadays routinely considered using different techniques [34][35][36], in the vast majority of docking campaigns, ligand flexibility is generally taken into account by considering the rotatable bonds of one given input structure, sometimes generating the 3D structure from 2D drawings or directly from SMILES strings [33].In addition, stable protomers of ligands are usually generated only at physiological pH 7.4.However, catalytic sites might feature micro-conditions that differ from those found at physiological pH, thus affecting the most likely charge/protonation state of the ligand [33].An alternative and less exploited way of generating ligand conformations is by means of MD simulations in explicit solvent.Though this methodology is computationally demanding, it has theoretically the advantage of propagating all of the conformational degrees of freedom (not only dihedrals) included in the FF function in the presence of explicit solvent and ions.Furthermore, a plethora of methods exist to accelerate the conformational sampling (e.g., metadynamics [37], accelerated MD [38] and Replica Exchange MD [39]).Therefore, MD-based methods can be useful to improve the reliability and predictive power of molecular docking.
Following the considerations above, the aim of the present work is two-fold: (i) to assess the conformational ensemble extracted from our µs-long MD simulations in terms of both the diversity of the generated conformational set and the ability to reproduce experimentally-determined structures; (ii) to provide parameters of ligands at different pHs, thus making available static and dynamic properties of a given compound as a function of different charge/protonation states.To these aims, we selected a medium-sized molecule in our sample, namely ampicillin (43 atoms, molecular formula C 16 H 19 N 3 O 4 S, molecular weight 349.40476Da; see Figure 1), a broad-spectrum β-lactam penicillin antibiotic used extensively to treat bacterial infections for more than 50 years [40].In the following, we will simply refer to the molecule using the corresponding Protein Data Bank chemical component identifier AMP.Thus, considering the AMP neutral zwitterionic form, the most populated one at physiological pH 7.4, we first compared our conformational landscape extracted from plain MD simulations [27] with those obtained by means of: (i) Replica Exchange MD (REMD) [39]; and (ii) some freely-available conformer generation tools widely used in computer-aided drug-design [41].We additionally generated the General AMBER Force Field (GAFF) parameters [42] for each major tautomer in the pH range 2-14 and performed µs-long MD simulations for the species bearing net charges −2, −1 and +1.The parameters and the molecular descriptors extracted from both quantum-mechanics and MD simulations are available online [27].Our results indicate that the conformational statistics extracted from plain MD simulations is: (i) consistent with that obtained from REMD simulations; and (ii) performs slightly better than some widely-used conformer generation tools when considering both the abilities to generate high-diversity conformational ensembles and to reproduce experimentally the available structures.

Molecular Characterization and Quantum-Chemistry Calculations
Starting from the structure data file (CID_6249.sdf)downloaded from PubChem [43], we used the package MARVIN [44] to compute the net charge dependence on pH, the isoelectric point pI and the microspecies distribution in the pH range 2-14.In particular, the microspecies distribution curves as a function of pH have been obtained using the pKa plugin [45] implemented in MARVIN [44].This plugin calculates the pKa values of all proton gaining/loosing atoms on the basis of the partial charge distribution computed empirically using the MARVIN charge plugin [44,45].We thus focused on the species bearing net charges −2, −1, 0 and +1, which are the major tautomers in the pH range 2-14.The same program has been used to obtain molecular formula, molecular weight, number of H-bond donors/acceptors, number of rotatable bonds and van der Waals volume.The 3D structure of each microspecies has been subsequently used to perform density functional theory (DFT) calculations [46] with the GAUSSIAN09 package [47].As already done in our systematic investigation [27], we employed the hybrid B3LYP functional [48,49] and the 6-31G basis-set [50].The combination B3LYP/6-31G is a good compromise between accuracy and computational cost [51,52].All GAUSSIAN09 calculations were performed using the same settings adopted in our previous study [27].For each major tautomer in the pH range 2-14, we used the DFT optimized geometry to compute logP values with XLOGP3 [53] and polar/non-polar molecular surfaces through the PLATINUM web interface [54].We then generated three sets of atomic partial charges using the RESP method [55] implemented in the ANTECHAMBER package [56]: the standard Hartree-FockF/6-31G , and the B3LYP/6-31G charges fitting the molecular electrostatic potential using both CHELPG [57] and Merz-Kollman (MK) [58] schemes.We report in Appendix A a comparison between some of the molecular descriptors extracted from MD trajectories obtained with CHELPG-B3LYP/6-31G and MK-Hartree-Fock/6-31G*.

MD Simulations and Post-Processing of the MD Trajectories
For each major microspecies of AMP with total charge −2, −1 and +1, we performed all-atom MD simulations in the presence of explicit water solution (0.1M KCl) using the AMBER14 package [59].Model systems were prepared with the program TLEAP of AMBERTOOLS14 [59] adopting GAFF parameters [42] for the molecule and the TIP3P model of water [60].For all microspecies, we used the same protocol adopted in our previous systematic investigation [27].After production runs, we obtained structural and dynamical properties of the molecules by using the CPPTRAJ program [61].During the MD runs, we monitored minimum and maximum projection areas using the MARVIN geometry plugin [44,45] and three morphologic descriptors related to the gyration tensor, i.e., asphericity, acylindricity and kappa2, as implemented in the PLUMED plugin [62].Asphericity and acylindricity give a measure of the deviation of the mass distribution from spherical and cylindrical symmetry, respectively; the relative shape anisotropy kappa2 is limited between zero and one and reflects both symmetry and dimensionality [63].Again, the full list of molecular descriptors extracted from MD simulations and the numerical settings adopted can be found in [27].

REMD Calculations and Conformer Generation Methods Adopted
REMD simulations were performed for the zwitterionic form of AMP using the AMBER14 package [59].We adopt a set of 72 replicas in the temperature range from 275-600 K.Each replica was simulated for 50 ns, for a total simulation time of 3.6 µs.The number of exchange attempts between replica pairs was 50,000, and temperature exchanges between replicas were attempted with a frequency of 1 ps −1 .We achieved a very uniform rate of exchanges among replicas of 0.33%.
Following the recent extensive comparison between different conformer generation tools [41], we selected three methods having the option of generating a fixed user-specified number of conformers: FROG2 [64], OPENBABEL [65] and RDKIT [66].More precisely,

•
FROG2: We loaded the 3D structure data file CID_6249.sdftaken from PubChem [43] in the web portal and adopted default settings.The server returned an ensemble of diverse conformers generated using a two-stage Monte-Carlo approach in the dihedral space.• OPENBABEL: We used the same structure CID_6249.sdfas an input to the genetic algorithm code implemented in the program.This is a stochastic conformer generator producing a population of conformers that arrive iteratively at an optimal solution in terms of root-mean-squared diversity (we used a cutoff of 2.0 Å), after a series of generations.

•
RDKIT: We used the python script provided in the user manual by loading the CID_6249 structure in .mol2format.The script generates the desired number of conformers and, for each of them, performs energy minimizations with the Universal Force Field [67].
For the purpose of comparing the features of the conformational ensemble extracted from our MD simulations (plain and replica-exchange) to the structures generated by conformer generators (vide infra), we performed a symmetric root mean square displacement-based cluster analysis using the hierarchical agglomerative algorithm [68] with a fixed number (30) of representatives.All figures have been produced by using PYMOL [69], VMD [70] and GNUPLOT [71] graphics programs.

Comparison among Different Conformational Ensembles
We first assessed the ability of MD simulations: (i) to generate high-diversity conformational ensembles; and (ii) to reproduce the experimentally-available structure of zwitterionic AMP, the one in the co-crystal with OmpF, one of the main general diffusion porins of Escherichia coli [72].First, we compared the structural clusters representatives of the whole µs-long trajectory with those obtained from a 50 ns-long REMD simulation with 72 replicas in the temperature range 275-600 K. Figure 2 compares the pair-wise root mean square displacement (RMSD) matrix for 30 configurations (all generated vs. all generated) extracted with the two simulations; the picture reports also the corresponding histogram distributions, whose maximum values (mean ± SD) are found to be 5.8 Å (3.2 ± 1.1) and 5.4 Å (3.1 ± 1.0), respectively.As demonstrated by these numbers and clearly seen from a visual inspection of Figure 2, the conformational statistics extracted from the plain µs-long MD simulation at T = 310 K is consistent with the 50 ns-long REMD simulations spanning 72 temperatures in the range 275-600 K.Note that we arrive at the same conclusion by comparing the pair-wise RMSD matrix for 50 and 100 conformers generated with both plain MD and REMD simulations.
As an additional test, we compared the variability of the conformational ensemble extracted from the plain µs-long MD simulation with that obtained using some freely-available conformer generation tools widely used in protein-ligand docking and, more in general, in computer-aided drug-design studies.Following the recent extensive comparison between different conformer generation tools [41], we selected FROG2 [64], OPENBABEL [65] and RDKIT [66].Similar to Figure 2, Figure 3 displays the pair-wise RMSD matrices and the corresponding histogram distributions for 30 AMP configurations generated with the three methods.While the RMSDs between the FROG2 conformations appear to follow a bi-modal distribution peaking at ∼1.5 and ∼3.0 Å, OPENBABEL and RDKIT values follow unimodal distributions centered at 3.7 ± 1.2 and 3.9 ± 1.3 Å, respectively.By comparing Figures 2 and 3, it is clear that our data for AMP, in terms of the diversity of the generated conformers: (i) appear to perform better than FROG2; (ii) are of similar quality as those of OPENBABEL; and (iii) cover a smaller conformational space with respect to RDKIT.Again, please note that similar conclusions can be draw by repeating the same comparison for 50 and 100 conformers generated with the three methods.
In order to quantify the ability to generate conformers that are structurally similar to the experimentally-determined structure of AMP [72], we compared the RMSD between the experimental configuration and each of the 30 conformers extracted from our plain MD simulation or generated with the different conformer generator tools considered.The resulting comparison is reported in Figure 4. We found that RDKIT yields structures differing, on average, by about 4-5 Å in terms of RMSD from the experimental structure.By looking in detail at the generated structures, we found that 17 over 30 conformers generated with RDKIT present a flipped conformation with respect to the experimental structure.In particular, the dihedral angle between the planes of the four-membered β-lactam ring and the five-membered ring is found to be ∼240 • , while the experimental value is about ∼120 • .This can be seen in the green box of Figure 4 and partly explains the larger scatter observed on average in the case of RDKIT.On the contrary, FROG2, OPENBABEL and our own MD results appear to oscillate around a lower mean value of about 3.0 Å, and interestingly, one of the clusters in these three ensembles is found to be very close to the experimental structure with a minimum RMSD of 0.9 Å, 1.7 Å, and 1.0 Å, respectively.For the four cases considered, the visual comparison between the lowest RMSD structure and the experimental one is reported in the same Figure 4.In summary, the results presented in Figures 2-4 for the specific system investigated here show that: (i) the conformational statistics extracted from plain MD simulations is consistent with that obtained from REMD simulations and comparable to those generated by the conformer generator tools considered; (ii) our MD-based approach performs slightly better than any of the conformer generator tools if one takes into account both the diversity of the generated conformational set and the ability to reproduce experimentally-determined structures.Clearly, we are aware that the µs-long MD simulations are not computationally inexpensive, even for small compounds.However, the main reason why we performed here 1 µs-long MD simulations is the consistency with our previous work concerning the creation of an online database of antimicrobial compounds differing in size and flexibility [27].Thus, such long simulations could not be needed in all cases, at least for compounds with a limited number of rotatable bonds.To show that this is the case for AMP, we addressed the convergence of the conformational diversity and of a few molecular properties as a function of the total simulation time.As shown in Appendix B, for the specific case of AMP, we could have reduced the computational cost grossly by one order of magnitude.

Force-Field Parameters and Molecular Properties of Different Microspecies
As shown in Figure 5A, in the pH range 2-14 considered in this work, AMP has two proton-donating atoms, one oxygen of the external carboxylic group (pKa = 3.24) and one internal nitrogen (pKa = 11.97), and one proton accepting atom, namely the nitrogen atom (pKa = 7.44) opposite the β-lactam group.As a result, the molecule can exist in four different charge/protonation states, as shown in the pH-dependent net charge distribution (Figure 5B) and the fractional microspecies distribution (Figure 2C): cationic at pH ≤ 3.0, zwitterionic between 3.0 and 7.5, anionic between 7.5 and 12.0 and dianionic for pH ≥12.0.For each of the above microspecies, we followed the same general protocol adopted to build the on-line database [27], not taking into account possible chemical changes induced by pH, such as opening of the β-lactam ring [73].In this work, we added the individual pages of each AMP microspecies displayed in Figure 5 in our online database [27].
In particular, the DFT optimized geometry (in both .xyzand .sdfformats) and the GAFF parameters files (.prep and .frcmodformats [42]) can be freely downloaded for each charge/protonation state.As previously done for the full set of compounds, we provide three sets of atomic partial charges as detailed in Section 2. The availability of the FF parameters for the major microspecies of the same compound makes it possible to perform straightforwardly MD simulations with ready-to-use input files.In the above web page, for each microspecies, separate tables report general-purpose properties and molecular descriptors derived from both DFT and MD simulations.The comparison for some of the different properties listed for the different microspecies is reported in Figures 6-8, displaying polar/non-polar molecular surfaces [54], the magnitude and spatial orientation of electric dipoles and the dynamical behavior of the spherical shape anisotropy kappa2 [62], respectively.Comparison between the polar/non-polar molecular surfaces (red-white-blue color scale) as evaluated with the PLATINUM web server [54] for the AMP microspecies considered (see Figure 5).The polar (non-polar) fractions are reported in red (blue).

Conclusions
In this work, we presented a follow-up of our long-term project of building a multi-purpose database of force-field parameters, dynamics and molecular descriptors of compounds with antimicrobial activity.We selected a medium-sized antibiotic, ampicillin, and assessed the quality of the conformational ensemble extracted from µs-long MD simulations.For the different charge/protonation states of ampicillin in the pH range 2-14, we additionally made available the GAFF parameters, as well as some general properties and molecular descriptors derived from both DFT and MD simulations.For the specific case considered in this work, we found that the finite ensemble best reproducing the whole MD trajectory is fully consistent with REMD simulations performed with 72 replicas distributed in the range 275-600 K and comparable to those generated by some widely-used conformer generation methods.In addition, by taking into account both the diversity of the generated conformational set and ability to reproduce available experimental structures, our MD-based approach is found to perform slightly better than any of the conformer generator tools considered.Atomic partial charges are among the primary FF parameters determining the dynamical behavior of a small organic molecule.All-atom MD simulations performed for the compounds included in our online database used GAFF parameters [42] for the major tautomer at physiological pH = 7.4, with CHELPG atomic charges computed at the B3LYP/6-31G level (see Section 2).To assess this specific choice, here, we compared some of the dynamical properties extracted from MD trajectories obtained to the above charges (in the following denoted as chgs-1), with the same descriptors extracted from MD simulations using the standard MK partial charges derived from Hartree-Fock/6-31G calculations (labeled as chgs-2).Figure 6 compares the RESP point charges computed with the two schemes for zwitterionic ampicillin, the most populated microspecies at physiological pH = 7.4 (see Figure 2).
We found some quantitative differences among the values of the two sets of charges.Overall, the MK-Hartree-Fock/6-31G* point charges turn out to be larger than the corresponding CHELPG-B3LYP/6-31G , something that is well known from previous investigations [55].Nonetheless, the charges are qualitatively similar, as seen by comparing the corresponding color patterns (see Figure A1).Moreover, there is a negligible impact of these differences on zwitterionic AMP dynamics as sampled along µs-long MD simulations using both sets of charges.In particular, as shown in Figure A2, we found coincident within statistical deviations all of the molecular descriptors extracted from the MD trajectories, namely the number of solvent molecules within the first and second shells, the molecular flexibility expressed in terms of root mean square fluctuations and some morphological descriptors as a function of time: asphericity, acylindricity, kappa2 and minimal projection area.

B. Assessment of the Convergence of Structural Properties vs. Simulation Time
To validate the importance of an extensive sampling, we addressed the convergence of the conformational diversity as a function of the total simulation time.Figure B1 displays the comparison between the histogram distribution of the pair-wise RMSD for the 30 conformers extracted from plain MD simulations at 10, 100, 500 and 1000 ns.We found convergent trends, as shown by a visual inspection of the histograms and by comparing the corresponding statistical parameters.This is further confirmed by Figure B2, which compares the spatial distribution of the 30 conformers corresponding to 10, 100 and 1000 ns-long MD runs.As shown by the figure, clusters corresponding to longer simulations are able to cover a larger portion of the configurational space.
We performed a similar convergence check by looking at the evolution of some molecular properties extracted from MD runs at increasing simulation times.Similar to Figure B1

Figure 1 .
Figure 1.2D and 3D structures of zwitterionic ampicillin (oxygen, nitrogen, carbon, sulfur and hydrogens atoms are marked in red, blue, gray, yellow and white, respectively).

Figure 2 .
Figure 2. Pair-wise RMSD matrix for 30 conformers extracted from a plain µs-long MD simulation (left) and a 50 ns-long REMD simulation with 72 replicas in the range 275-600 K (right; see Section 2).The color scale reflects the RMSD values expressed in Å, as shown in the right box.For each case, the bottom panel shows the corresponding histogram distribution.

Figure 4 .
Figure 4. RMSD between the experimental configuration of AMP[72] and each of the 30 conformers extracted from a µs-long MD simulation (red) or generated with FROG2 (blue), OPENBABEL (orange), and RDKIT (green).The boxes, with the same color-codes, show the comparison between the lowest-RMSD structure and the experimental one (blue).

Figure 6 .
Figure 6.Comparison between the polar/non-polar molecular surfaces (red-white-blue color scale) as evaluated with the PLATINUM web server[54] for the AMP microspecies considered (see Figure5).The polar (non-polar) fractions are reported in red (blue).

Figure 7 .
Figure 7.Comparison between the electric dipoles of the AMP microspecies considered.For each charge/protonation state, we report the absolute values (expressed in debyes); to ease visual comparison, the dipole's moduli have been arbitrarily normalized to the same quantity.Hydrogen atoms have been omitted for clarity.

Figure 8 .
Figure 8.Comparison between the relative shape anisotropy kappa2 of the four AMP microspecies considered.This quantity, related to the gyration tensor, has been monitored during the MD runs using PLUMED[62].

Figure A2 .
Figure A2.Comparison between dynamical properties of zwitterionic AMP extracted from MD trajectories obtained using chgs-1 (left) and chgs-2 (right) partial charges.From top to bottom: number of water molecules in the first (red) and second (blue) shell, root mean square fluctuations, asphericity, acylindricity, kappa2 and minimal projection area.
, Figures B3 and B4 compare the distributions of minimal and maximal projection areas, respectively, showing again a convergent trend in both cases.

Figure B1 .
Figure B1.Comparison between the histogram distribution of the pair-wise RMSD for the 30 conformers extracted from plain MD simulations at 10, 100, 500 and 1000 ns.

Figure B2 .
Figure B2.Side (left) and front (right) views of the spatial distribution of the 30 conformers corresponding to 10, 100 and 1000 ns-long MD runs (blue, red, green, respectively).All of the structures have been aligned with respect to the penam group (formed by the four-membered β-lactam ring fused to the five-membered ring).

Figure B3 .
Figure B3.Same as Figure B1 for the minimal projection area.

Figure B4 .
Figure B4.Same as Figure B1 for the maximal projection area.