A Database of Force-Field Parameters, Dynamics, and Properties of Antimicrobial Compounds

We present an on-line database of all-atom force-field parameters and molecular properties of compounds with antimicrobial activity (mostly antibiotics and some beta-lactamase inhibitors). For each compound, we provide the General Amber Force Field parameters for the major species at physiological pH, together with an analysis of properties of interest as extracted from μs-long molecular dynamics simulations in explicit water solution. The properties include number and population of structural clusters, molecular flexibility, hydrophobic and hydrophilic molecular surfaces, the statistics of intra- and inter-molecular H-bonds, as well as structural and dynamical properties of solvent molecules within first and second solvation shells. In addition, the database contains several key molecular parameters, such as energy of the frontier molecular orbitals, vibrational properties, rotational constants, atomic partial charges and electric dipole moment, computed by Density Functional Theory. The present database (to our knowledge the first extensive one including dynamical properties) is part of a wider project aiming to build-up a database containing structural, physico-chemical and dynamical properties of medicinal compounds using different force-field parameters with increasing level of complexity and reliability. The database is freely accessible at http://www.dsf.unica.it/translocation/db/.

Recently, as part of the activity of the TRANSLOCATION consortium within the Innovative Medicines Initiative antimicrobial resistance programme, New Drugs for Bad Bugs [33], we have undertaken a long-term project with the goal of building a large database of antimicrobial compounds (and medicinal compounds in general) containing, for each molecule, all-atom parameters compatible with different existing biological force fields, as well as microsecond-long dynamics and physico-chemical descriptors in different physiological conditions. Thus, using several established computational tools we have started a systematic investigation on antimicrobial compounds of different classes. Besides the specific application to the study of bacterial resistance mediated by the membrane barrier [34,35], our database contains molecular information possibly useful in different contexts such as, for example, receptor-ligand molecular docking and indirect assessment of role of flexibility in the interpretation of experimental results (see Section 4).
This work constitutes the first step of the aforementioned wider project: we report a homogenous database of the computed molecular properties for a sample of 40 different antimicrobial compounds ranging in size from ∼20 to ∼80 atoms (see Table A1 in the Appendix). For each compound, we provide the General Amber Force Field parameters [36] for the major species at physiological pH, together with an analysis of properties of general interest, including number and population of relevant structural clusters, molecular flexibility, hydrophobic and hydrophilic molecular surfaces, as well as the statistics of intra-and inter-molecular H-bonds, hydration shells structure and dynamics, etc., as extracted from µs-long molecular dynamics simulations in explicit water solution. In addition, the database includes several key molecular parameters, such as energy of the frontier molecular orbitals, normal modes of vibration, rotational constants, atomic partial charges and electric dipole moment, computed by Density Functional Theory. This database is freely accessible on-line at the address http://www.dsf.unica.it/translocation/db/ and is suitable for further inclusion of new data.

Computational Methods
For each compound we obtained the 3D structure data file (SDF format) from the Pubchem database [37]. We then used the ChemAxon's Marvin suite of programs [30] to calculate the dominant tautomer distribution and thus find the protonation/charge state most populated at physiological pH = 7.4. We used the same package to obtain other general properties of interest, such as the net charge dependence on pH, the isoelectric point pI, the Van der Waals volume, the number of rotatable bonds, and the number of H-bond donors/acceptors.

QM Calculations
The structure of the major species determined in the above step has been subsequently used as an input to quantum-chemical calculations at the Density Functional Theory level [38]. For this part of the work we used the Gaussian09 package [32]. In particular, we employed the widely used hybrid B3LYP functional, a combination of exact (Hartree-Fock) exchange with local and gradient-corrected exchange and correlation terms [39,40], in conjunction with the split-valence 6-31G Gaussian basis-set [41]. The combination B3LYP/6-31G represents a good compromise between accuracy and computational cost [42,43]. In all cases considered we disabled the use of molecular symmetry (Symmetry=None), adopted very restrictive convergence criteria for both self-consistent-field iterations (10 −8 Ha, SCF(Conver=8)) and geometry optimizations (Opt(VeryTight)), and used a pruned (99,590) grid (Int=UltraFine) for numerical integration. For each compound we optimized the ground-state structure employing the Polarizable Continuum Model [44] as to mimic the effect of water solvent (SCRF=(PCM,Solvent=Water)) particularly to avoid formation of strong intra-molecular H-bonds. To confirm the geometry obtained to be a global minimum on the potential energy surface we performed full vibrational analyses obtaining real frequencies in all cases. We processed the output of Gaussian09 with GaussSum [45] and Gabedit [46] to extract orbital data, and vibrational spectra, respectively.
On the optimized geometry we then performed B3LYP/6-31G single-point energy calculations in vacuum to generate the atomic partial charges fitting the molecular electrostatic potential. Under the constraint of reproducing the overall electric dipole moment of the molecule, we used both Merz-Kollman [47] (Pop=MK) and CHELPG [48] (Pop=CHelpG) schemes to construct a grid of points around the molecule. Atomic partial charges were then generated through the two-step restrained electrostatic potential (RESP) method [49] implemented in the Antechamber package [50]. Following the same procedure we additionally extracted the standard HF/6-31G* point charges, which are fully compatible with the charge derivation protocol within AMBER [49]. The derivation of consistent charges is hardly a straightforward task for which specifically devoted tools and databases have been developed, such as the R.E.DD.B. database to give an example [27,28]. Finally, for the B3LYP/6-31G optimized geometry, we computed logP values using the XLOGP3 program [51], and polar/non-polar molecular surfaces with the PLATINUM web interface [52].

MD Simulations
We performed all-atom molecular dynamics simulations in the presence of explicit water solution (0.1 M KCl) using the Amber14 package [31]. Model systems were prepared with the program tleap of AmberTools14 [31] adopting the TIP3P model of water [53] and the monovalent ion parameters appropriate for this choice [54]. For the antibiotics we used the General AMBER Force Field (GAFF) parameters [36]. For all systems under investigation we used the following procedure. First, geometry optimization was conducted with a two-step protocol: (i) 10,000 cycles (1000 of steepest descent plus 9000 of conjugate gradient) with harmonic restraint k = 10 kcal · mol −1 · Å −2 on each heavy atom of the solute and (ii) 10,000 conjugate gradient cycles without restraints. Next, an heating to 400 K followed by a cooling to 310 K were accomplished via NVT MD runs for 2 ns and 10 ns, respectively. As a last step preceding the productive dynamics, NPT MD was conducted for 1 ns to relax the simulation box. Finally, 1 µs-long MD simulations were performed under the NPT ensemble. Pressure and temperature were regulated at 1 Atm and 310 K using the isotropic Berendsen barostat [55] and the Langevin thermostat [56], respectively. Electrostatic interactions were evaluated using the particle mesh Ewald scheme with a cutoff of 9.0 Å for the short-range evaluation in direct space. The same cutoff was used for Lennard-Jones interactions (with a continuum model correction for energy and pressure) [31].

Post-Processing of the MD Trajectories
From the all-atom MD simulations we obtained structural and dynamical features of the compounds investigated by means of the PTRAJ and CPPTRAJ programs [57]. In detail, we extracted first and second water shells using a lower (upper) cutoff of 3.4 (5.0) Å. For the analysis of intra-and inter-molecular H-bonds we adopted angle and distance cutoffs of 135 • (donor-hydrogen-acceptor angle) and 3.5 Å (donor-acceptor), respectively [58][59][60][61][62]. The number and population of structural clusters were determined using a hierarchical agglomerative algorithm [63]. To evaluate atomic root mean square fluctuations we used the utility g_rmsf of GROMACS [64]. During the MD runs we also monitored three morphology descriptors related to the gyration tensor, i.e., asphericity, acylindricity, and kappa2, as implemented in the PLUMED plugin [65]. Asphericity and acylindricity give a measure of the deviation of the mass distribution from spherical and cylindrical symmetry, respectively; the relative shape anisotropy kappa2 is limited between 0 and 1 and reflects both symmetry and dimensionality [66]. The dynamical evolution of the minimal projection area has been determined with the combined use of Open Babel [67] and ChemAxon's Calculator Plugin [30]. Molecular graphics have been generated by using the PYMOL [68] and VMD [69] packages. The overall computational protocol adopted is schematically depicted in Figure 1.

General Structure of the Database
The total of 40 species presently included in the database, ranging in size from clavulanic acid (22 atoms, molecular weight = 198.1528 Da) to tigecycline (82 atoms, molecular weight = 586.6566 Da), covers eight different classes of antibiotics and related compounds, namely: carbapenems, cephalosporins, monocyclic beta-lactams, oxazolidinones, penicillines, quinolones, tetracyclines, and beta-lactamase inhibitors. Antimicrobial compounds of these classes are the most widely used against infections caused by Gram-negative bacteria, such as Escherichia coli, Pseudomonas aeruginosa, Salmonella aenterica and Klebsiella pneumoniae. The complete set of compounds included in the current version of the database, together with some of their general properties are listed in Table A1 of the Appendix. Most of the parameters we provide play a key role in determining the translocation of small molecules, such as antibacterials, through bacterial porins [70][71][72], as well as their extrusion by efflux pumps [73][74][75], which are among the key topics of the New Drugs for Bad Bugs programme [33].
The main-page http://www.dsf.unica.it/translocation/db/ contains the full list of compounds ordered, within each class, by increasing molecular weight (see Figure 2A). As shown in the left side of the picture, a direct link to the official web-page of all of the computer packages and tools employed is also given. A separate page can then be accessed for each given molecule ( Figure 2B). The latter includes a 2D representation of the selected molecule as well as the corresponding QM-optimized 3D structure that can be interactively manipulated by activating a JSmol script [76]. The General AMBER Force Field (GAFF) parameters files (.prep and .frcmod formats [36]) generated with the AmberTools package [31] can be downloaded within each compound's page ( Figure 2C). We provide three sets of atomic point-charges computed in vacuum: the standard HF/6-31G(d) (molecule_mk_hf.prep) and the B3LYP/6-31G(d,p) charges fitting the molecular electrostatic potential using the CHELPG (molecule_chelpg.prep) and Merz-Kollman (molecule_mk.prep) schemes (see Section 2). A separate table reports general properties as well as molecular descriptors extracted from both quantum-mechanics and molecular-dynamics simulations. As shown in Figure 3, among general properties we report molecular formula, molecular weight, number of atoms and rotatable bonds, number of H-bond donors/acceptors, physiological charge, Van der Waals volume, isoelectric point pI and pH-dependent net charge of the molecule. For the quantum-optimized structure we provide also logP values [51], and polar/non-polar molecular surfaces [52]. In the quantum-mechanical section ( Figure 4) the ground-state optimized geometry is available in both plain .xyz and MDL .sdf formats. For the above molecular structure we report: energy of the highest-occupied molecular orbital (HOMO) and lowest-unoccupied molecular orbital (LUMO), HOMO-LUMO gap, rotational constants, and electric dipole moment computed in vacuum and implicit water. The whole set of orbital data and the corresponding plots (density of electronic states, virtual and occupied levels) extracted with GaussSum [45] (Figure 4B), as well as the list of harmonic vibrational frequencies and integrated absorption coefficients with the corresponding IR absorption spectrum obtained with gabedit [46] are also provided ( Figure 4C). A graphical representation of the spatial orientation of the electric dipole is additionally given ( Figure 4D). Figure 5 offers a visual representation of some of the molecular descriptors extracted from the MD simulations. The complete set includes: number of solvent molecules within first and second solvation shells, the statistics of intra-and inter-molecular H-bonds, number and population of structural clusters, molecular flexibility expressed in terms of root mean square fluctuations, as well as dynamical behavior of asphericity, acylindricity, and kappa2.

Discussion
Among the antimicrobial compounds included in this release of the database, those for which all of the above data are simultaneously available from experiments is relatively small. Thanks to the ever increasing computational power available, the use of modeling tools represents the best alternative to obtain homogenously-derived physico-chemical descriptor of molecules, and can be furthermore useful as a guide for future experimental work. There are many Internet resources reporting relevant data for a large number of compounds of medicinal interest (PubChem, ChemSpider, DrugBank, Chemicalize, etc.). However, to the best of our knowledge, the present database is the first extensive one reporting structural, physico-chemical, and especially dynamical properties obtained combining different computational tools (in particular µs-long long MD simulations). There are basically three different levels of active use of our data. First of all the obvious stand-alone use of the tabulated static and dynamical properties of selected compounds. Second, the availability of the (at the moment only) GAFF parameters for the major microspecies at physiological pH, makes possible to straightforwardly perform MD simulations with ready-to-use input files. Finally, the knowledge of the dynamical behavior of a large number of molecular descriptors extracted from µs-long MD runs allows performing statistical analyses that go beyond the availability of one static value. This piece of information can find a plethora of possible applications, for example in the field of molecular docking, just to mention one. Most of the available docking programs take into account ligand flexibility only in terms of rotatable bonds of one given input structure (see e.g., rDock [77] and AutoDock Vina [78]) or in terms of small deformations around an equilibrium conformation, derived e.g., by applying normal modes or related analyses [79]. However, as largely demonstrated particularly in the case of protein-protein interactions [80][81][82][83][84][85][86][87], the availability of different conformations sampled by the ligand during MD simulations can be crucial also (perhaps mostly) in protein-ligand docking [73,86,[88][89][90], considerably improving the protocol and making possible to find otherwise unaccessible binding conformations. This is particularly true in the context of conformational selection theory [7], where conformations having low probability in the absence of receptor are populated by the onset of ligand-protein interactions.
The knowledge of the values distribution for specific molecular descriptors can be also useful in the study of structure-dynamics-activity relationships by means of specific molecular descriptors. For instance, the minimal projection area (MPA) has been recently used [91] in the context of bacterial resistance mediated by Resistance-Nodulation Division (RND) multidrug efflux transporters [92][93][94][95][96][97], to identify a correlation between reduced efflux activity by mutants of the transporter AcrB and the size of substrates. Table 1 reports the MPAs computed for the smallest and largest compound within each of the eight classes considered in the present work. The table reports the dynamical values extracted from the MD trajectories and compares them with the static ones publicly available from Chemicalize [98]. As shown in Table 1 the value reported in Chemicalize fall in the Mean ± Stdev range in almost all cases considered; a relevant exception is represented by ertapenem for which Chemicalize predicts the corresponding MPA to be larger than about 26% with respect to our mean value. Taking into account the minimum and maximum values and the corresponding range of variability, the distribution of the computed MPAs displays large deviations (from ∼10% up to ∼40%) around the mean value. Thus, our data can be used to take into account the dynamical nature of interacting partners when studying ligand transport processes, as well as any interaction dynamics between antibiotics and their targets.

Perspectives
In the near future we plan to extend the amount of information present in the database. For instance, in order to guarantee reproducibility, we plan to make available the input and output files of our simulations. Furthermore, the database will be extended towards two main directions. From one side we will include more compounds, covering additional antimicrobial classes. At the same time, for each compound, we will setup parameters at different pHs including at least two more cases (low and high-pH) in addition to the physiological pH already considered. We plan to include in the individual page of each compound the relevant literature reporting on molecular properties from both experimental and computational studies. From the technical point of view we plan to improve the key parameters reported, in particular dihedral angles, by comparison with quantum-mechanical dynamics in presence of implicit and explicit solvent molecules, using biased molecular dynamics techniques [99][100][101][102]. Future refinements of the database will include the possibility to use a better water model to perform classical MD simulations. Finally, future directions include also performing MD simulations in solvents different than water (e.g., non polar solvents and "organic broths") in order to further improve conformational sampling while (possibly) mimicking interactions with functional groups in biomolecules such as proteins, nucleic acids, and membranes.

Acknowledgments
The research leading to these results was conducted as part of the Translocation consortium

Conflicts of Interest
The authors declare no conflict of interest.
Appendix A