Comparison of Empirical Zn 2 + Models in Protein–DNA Complexes

: Zinc ions are the second most abundant ions found in humans. Their role in proteins can be merely structural but also catalytic, owing to their transition metal character. Modelling their geometric–coordination versatility by empirical force fields is, thus, a challenging task. In this work, we evaluated three popular models, specifically designed to represent zinc ions with regard to their capability of preserving structural integrity. To this end, we performed molecular dynamics simulations of two zinc-containing protein–DNA complexes, which differed in their zinc coordination, i.e., four cysteines or two cysteines and two histidines. The most flexible non-bonded 12-6-4 Lennard–Jones-type model shows a preference for six-fold coordination of the Zn 2 + -ions in contradiction to the crystal structure. The cationic dummy atom model favours tetrahedral geometry, whereas the bonded extended zinc AMBER force field model, by construction, best preserves the initial geometry of a regular or slightly distorted tetrahedron. Our data renders the extended zinc AMBER force field the best model for structural zinc ions in a given geometry. In more complicated cases, though, more flexible models may be advantageous.


Introduction
Metalloproteins make up a large portion of proteins in general [1] and of human proteins in particular [2]. The most abundant metal ion after iron found in humans is zinc [3,4]. It can play functional roles in enzymes such as the RNA polymerase [5]. It can also have mainly structural and stabilising roles, in particular in DNA-binding proteins. Transcription factors are an important group of DNA-binding proteins. There, the most typical zinc-binding motif is the zinc finger (ZnF), where a β-sheet with two cysteine residues is followed by an α-helix with two histidine residues, resulting in a tetrahedral C2H2-coordination of zinc [6]. Another common example of tetrahedral coordination is provided by four cysteine residues (C4), which can be found in the hormone receptor family. In such receptors, the zinc ions stabilise the secondary structure, such that protein-DNA and protein-protein contacts can be established when binding as a dimer to DNA [7,8].
Zinc is a transition metal (TM) and, thus, does not form pure covalent or ionic, i.e., non-bonded, but instead coordinate bonds, which share characteristics with both types and are thus more flexible and can break easier than covalent bonds [9]. In biological systems, zinc exists as a divalent cation Zn 2+ with an electronic configuration of [Ar]3d 10 without its two 4s-electrons. Consequently, the bonding characteristic is complex due to the complicated shape of the d-orbitals.
In solution, a Zn 2+ -ion is coordinated by six water molecules, whereas in proteins, it can have coordination numbers (CNs) of four, five, or six in different geometries [10,11]. The rather low energy barriers for switching the CN and geometry are ideal for catalysing reactions in enzymes [2]. Structural zinc ions are generally coordinated by four residues [11].
While a full quantum mechanical (QM) description of zinc and its immediate environment would provide the most accurate representation of all properties and effects, the typical sizes and timescales, especially up to microseconds, required to describe biological zinc-containing systems of interest are computationally too demanding for QM methods. Therefore, it is desirable to describe zinc ions within classical force fields for molecular dynamics (MD) simulations. The difficulty, however, is the trade-off between a force field model that is flexible enough to allow changes in CN and geometry around the Zn 2+ -ion while still maintaining structural integrity.
For that reason, as reviewed by Li et al. [9], many different models for describing all kinds of ions, including Zn 2+ , have been developed and validated over the past few decades. Among the non-polarisable models, the most popular ones are non-bonded, bonded, and cationic dummy models. Zinc parameters of polarisable models, such as those for the AMOEBA force field [12], do exist, but will not be discussed here because they need more parameters due to their complexity and variety, as well as smaller time steps compared to unpolarised models for helping energy conservation [9].
The most intuitive treatment of metal ions is to just use the non-bonded parts of the potential energy function, The ion only has electrostatic and van der Waals (vdW) interactions with its surroundings and possesses the formal charge of the metal ion. The model is very simple and requires only a few parameters. ε ij is the potential well depth, r ij is the distance between atoms i and j, and R min is the distance at which the Lennard-Jones (LJ) potential is minimum. q i and q j are the charges of the two atoms and ε 0 and ε r are the vacuum and relative permittivity, respectively. Interactions of the ion with ligands can be formed and broken. Using the 12-6 LJ potential for vdW interactions reproduces the ionic interaction between monovalent ions and low electronegativities (such as alkali metal ions) very well [9]. This model underestimates short-range interactions and cannot reproduce hydration free energy (HFE) and ion oxygen distance (IOD) simultaneously [13]. Macchiagodena et al. developed a new approach for parametrising the coordinating protein ligands by redefining their atomic charges as well as their LJ parameters [14,15]. This approach is aimed at reproducing mean distance distributions. Melse et al. found that the combination of the force field parameters of the coordinating residues from Macchiagodena et al. with the non-bonded CM Zn 2+ parameters from Li et al. led to highly stable simulations of the tetrahedral Zn 2+ -ions even on longer time scales [13,16].
To additionally mimic dipole interactions, Li and Merz proposed adding a 1/r 4 term to the 12-6 LJ potential [17]. In the AMBER force field, it is described as follows: where κ is a scaling factor and has the unit Å −2 . With this new 12-6-4 LJ-type potential HFE, IOD and CN can be reproduced simultaneously in water. Those parameters are available for a multitude of water models [18]. Song [20]. In general, ions are parametrised under infinite dilution by reproducing experimental properties. Certain properties (e.g., free energies) require extensive sampling, which makes a direct comparison with experimental data difficult [9]. The water model used for parametrisation also plays an important role. The same parameters can produce, e.g., different HFEs in different water models but similar structural properties [9,13]. If the parameters are determined in an aqueous solution, which is typically the case, they are consequently biased toward the CN of this environment, so tetrahedrally coordinated Zn 2+ will likely change to a 6-coordination in the presence of other potential coordination partners such as water molecules [21].
Another strategy is to describe the zinc ion via a bonded model, as in the (extended) zinc AMBER force field (ZAFF/EZAFF), where the ligating atoms are covalently bonded to the ion [22,23]. The potential for a zinc ion now includes not only the non-bonded electrostatic and vdW terms but also environment-dependent bond, angle, and dihedral terms. In this model, the zinc ion only carries a partial charge whereas LJ parameters are directly taken from the non-bonded model. The ZAFF is an empirical model and was developed for the TIP3P water model to best represent fourfold-coordinated zinc ions in a protein environment [22]. It was later extended and methodologically refined (EZAFF) to include not only four-but also five-and six-coordinated systems [23]. EZAFF is available in the metal center parameter builder MCPB.py, which is part of the AmberTools [24]. The coordination sphere and the resulting electrostatics are most accurately described by the bonded model but a lot of nontransferable parameters are needed. This model is the most rigid one as it does not allow an exchange between ligands or metal ions and CN [21]. It is actually problematic when zinc is also coordinated by at least two adjacent water molecules as the 1-3 non-bonded interaction is usually not considered, meaning that the two water oxygen atoms do not repel each other electrostatically. On the other hand, a 1-4 non-bonded interaction is included, and the oxygen of one water molecule and the hydrogen of the adjacent water molecule strongly attract each other without experiencing vdW repulsion because water hydrogen atoms typically have zero vdW parameters. By this attraction, the angle is distorted [25].
A hybrid version of the non-bonded and bonded model is the cationic dummy atom model (CDAM). In the CDAM, covalently bonded dummy atoms, according to the CN, are added to the ion. They usually have an equilibrium distance of 0.9 Å and carry fractional masses and charges. They all sum up to the formal mass and charge of the ion. Consequently, the charge of the ion itself can even be negative and has no physical meaning [9]. Pang introduced and validated a CDAM for Zn 2+ in tetrahedral geometry in 1999, where the ion itself was neutral and the four dummy atoms carried a charge of +0.5 e each [26][27][28]. Duarte et al. later developed and Jiang et al. then refined parameters for octahedral coordination [29,30]. There, each dummy atom possesses a charge of +0.5 e and the ion consequently a charge of −1 e. The CDAM does not need as many parameters as the bonded model. An exchange of ligands is possible but only works in practise if the first shell ligands are anionic, and the alternative ligands of the second sphere are neutral or positively charged [21].
The quality of different models for divalent ions, including zinc, has been studied extensively within proteins [16,21,23,25,[31][32][33]. In this study, we focus on structurestabilising tetrahedrally coordinated Zn 2+ -ions in the context of protein-DNA systems as they may have additional challenges on the >1 µs time scale. We take the Wilms tumor protein (WT1) ZnF2 − 4 in complex with DNA as an example of C2H2 coordination [34] and the CXXC domain of mixed-lineage leukemia 1 (MLL1) in complex with DNA [35] as an example of C4 coordination. In those two seemingly simple proteins, the multiple zinc ions are responsible for their structural stability, which is crucial for proper DNA binding.

Model Setup and Zn 2+ Models
The initial structures of WT1 (PDB ID: 5kl2; resolution: 1.69 Å) [34,36] and the MLL1 CXXC domain (PDB ID: 4nw3; resolution: 2.82 Å) [35,37], both in complex with their cognate DNA structure, were taken from the PDB [38]. Only coordinates of the protein-DNA complexes and zinc ions were used. Missing side chains were added with the AutoPSF plugin of VMD [39]. In the case of WT1, the overhanging bases of the DNA were deleted. Protonation states of histidine residues were verified by visual inspection and by using PlayMolecule ProteinPrepare [40], which utilises PROPKA 3.1. The sites of cysteines and histidines facing a Zn 2+ -ion are all unprotonated (Table S1). Then both of the systems were prepared in the AmberTools suite with three different zinc force field models, i.e., the nonbonded 12-6-4 Lennard-Jones-type model [17], the tetrahedral cationic dummy atom model [26], and the bonded extended zinc AMBER force field model [23], which are shortly described in the following. All models were solvated in cubic boxes with TIP3P water [41] and a minimum buffer distance of 12.5 Å between the protein-DNA system and the box sides. DNA was described by the parmbsc1 parameters [42] and protein by the AMBER ff14SB [43]. The systems were neutralised by adding Na + ions.

Cationic Dummy Atom Model (CDAM)
For setting up the CDAM, both cysteine and histidine residues need to be in their anionic forms, CYM and HIN, respectively. Since GLU427 (index 77) of WT1 forms a hydrogen bond with HID431 (index 81), it was changed to its neutral form GLH, and the coordinates of OE1 and OE2 were switched to account for correctly protonated oxygen. The setup procedure is described in more detail online, where the corresponding parameter files for AMBER can be found [46].

Extended Zinc AMBER Force Field (EZAFF)
For building the bonded EZAFF model, bonds between the ions and surrounding ligands as well as their charges can be parametrised with the help of the metal center parameter builder MCPB.py [24,47]. There, a large model consisting of ions and surrounding ligands is generated. The positions of hydrogen atoms in this model were optimised with Gaussian 16 [48] using the B3LYP functional and the 6-31G(d) basis set. Based on the geometry-optimised model, partial charges were calculated via the RESP fitting procedure with the ChgModB scheme [22], where the charges of backbone heavy atoms were restrained to AMBER ff14SB values [43]. Bond and angle force constants are taken from an empirical relation between equilibrium bond/angle values and force constants.

Molecular Dynamics Simulations
The solvated systems were geometry-optimised with restraints of 50 kcal·mol −1 ·Å −2 on protein and DNA for 5000 steps, followed by 10,000 optimisation steps without restraints. The method of minimisation was switched from steepest descent to conjugate gradient after the first 500 and 1000 steps, respectively. After that, during a 500 ps simulation, the systems were heated to 298.15 K in an NVT ensemble with weak restraints of 50 kcal·mol −1 ·Å −2 on protein and DNA. For each system, three independent runs of 2 µs were performed in the NPT ensemble with Langevin dynamics at 298.15 K and 1 bar (weak pressure coupling, isotropic position scaling, pressure relaxation time of 2 ps, and collision frequency of 2 ps −1 ). We used an integration time step of 2 fs and bonds involving hydrogen atoms were constrained via SHAKE [49]. Periodic boundary conditions, a non-bonding cut-off of 10 Å and particle mesh Ewald [50,51] for treating electrostatic interactions were also applied.
To prevent fraying of the DNA termini, Watson-Crick and C1'-C1' distance restraints with a force constant of 2 kcal·mol −1 ·Å −2 were imposed as in [52] in accordance with B-DNA geometry.
All simulations were performed on GPUs using pmemd.cuda of Amber 18 [44] or Amber 22 [45]. For each system, we conducted three individual runs of 2 µs, run1, run2, and run3, with different initial velocities. The first 100 ns were considered NPT equilibrations and were not used for analysis.

Analysis
Root mean square deviations (RMSDs) were calculated for backbone atoms with respect to the crystal structure, while root mean square fluctuations (RMSFs) only included C α atoms for protein and P atoms for DNA. The median structure (i.e., the structure that is closest to the average structure) based on the same atoms of each replica was used as the reference.
Radial distribution functions of water around the zinc ions were calculated considering only the oxygen atoms. These were determined for chunks of 50 ns and then averaged to have the standard deviation as an error estimate.
Hydrogen bonds were defined based on geometric criteria, i.e., a maximum donoracceptor distance of 3.2 Å and a donor-hydrogen-acceptor angle deviating from linearity by not more than 42 • .
The fraction of native contacts was calculated using the definition by Best, Hummer, and Eaton [53] where X is a conformation, r ij (X) is the distance between heavy atoms i and j in conformation X and r 0 ij is the corresponding distance in the native state conformation, i.e., the crystal structure. N is the number of all pairs of heavy atoms (i, j) belonging to different residues θ i and θ j such that |θ i − θ j | > 3 and r 0 i,j < 4.5 Å. As suggested by Best et al., we used a smoothing factor of β = 5 Å −1 and a fluctuation parameter of λ = 1.8.
For determining the metal coordination geometry, FindGeo was used on frames taken every 1 ns [54]. This tool finds metal ions in a given PDB file, identifies the coordinating atoms that are less than 2.8 Å away (C-and H-atoms excluded), and compares the given coordination geometry via RMSD calculation after superimposition to a library of various ideal geometries. The best-fitting geometry is then the one that is the lowest in RMSD.
DNA parameters were calculated with Curves+ and Canal [55]. All analyses were performed using self-written Python scripts based on MDTraj [56]. For analysis, all three runs of 1.9 µs were merged.

Protein Structure
In Figure 1, representative median structures show that the 12-6-4 model deviates the most from the initial crystal structure for both proteins whereas the other two models are in very good agreement with the crystal structures of both proteins, MLL1 and WT1.
RMSD and RMSF values were calculated to probe the overall stability and flexibility of the systems.
In the case of MLL1, high protein RMSD values ( Figure 2) are partially due to more flexible behaviour in the C-terminal end, as can be seen from the RMSF in Figure 3. Such behaviour cannot be attributed to one specific model but can rather occur in all of them. Another reason for large RMSD values is the change in secondary structure, especially from sheet to coil or even to helix (Figure 4). With regard to the secondary structure, all of the models look rather comparable with 12-6-4 being slightly worse. A closer look at the time series of the secondary structures ( Figure S1) reveals that in none of the three runs in the 12-6-4 model the beta-sheet, formed by residues 1153-1154 and 1198-1199 (indices 3-4 and 48-49), reforms after being lost at approximately 0.2, 0.75, and 1 µs in the respective runs. A similar behaviour can be observed in only one run of the CDAM model. In the EZAFF model, the sheet structure is also lost in one run at approximately 1 µs,but reforms again ( Figure 2). All coordinating residues behave similarly across all models in terms of RMSF (Figure 3). For WT1, the increased RMSD is due to multiple and different higher flexible residues. This can best be seen for the 12-6-4 model, e.g., around the cysteines coordinating ZN3 (for run3) and around the cysteines coordinating ZN1 (for run1) in Figure 3. It is also noteworthy that the sheet regions around the ZN1-and ZN3-coordinating cysteines are smaller than in the crystal structure and the other two models, and that a helix can form between the ZN1-coordinating cysteines (Figures 4 and S1). For the CDAM, a slight upward trend in protein RMSD is visible in Figure 2. The ZN3-coordinating residues fluctuate most when compared to the others and, thus, seem to be the most critical ( Figure 3). In the case of the bonded EZAFF model and run3, the helices to which the ZN2-and ZN3-coordinating histidines belong become shorter after ∼500 ns and ∼1 µs, respectively ( Figure S1).
The number of hydrogen bonds within the protein is rather unrevealing with median values of 19, 20, and 20 for the 12-6-4, CDAM, and EZAFF models of MLL1, respectively (for distributions see Figure S2). The same is true for WT1 with 35, 34, and 36 median numbers of hydrogen bonds.

DNA Structure
Intra-and inter-base pair parameters, and axis DNA parameters were calculated to check for potential long-range effects of different ion models ( Figure S3, Figure S4, and Figure S5, respectively). For all DNA parameters, the models behave identically within the error range for MLL1 and WT1, which indicates that the specific ion model does not influence the DNA structure. The WT1 simulations also agree quite well with the crystal structure except for the axis bend. MLL1, on the other hand, shows more deviations between the crystal structure and simulations (see, for example, shear and stretch in Figure S3).  . RMSF of C α (protein, residue indices 0-50 and 0-87 for MLL1 and WT1, respectively), Zn 2+ (residue indices 51-52 and 88-90 for MLL1 and WT1, respectively) and P (DNA, residue indices 53-74 and 91-108 for MLL1 and WT1, respectively) atoms after aligning to the median structure of the same atoms for MLL1 and WT1, respectively, simulated with the different zinc models. The standard deviation of the mean is shown in the transparent shade. Coordinating cysteines are shown in yellow and histidines in blue. The residues coordinating ZN1, ZN2, and ZN3 are indicated as dotted, dashed, and dash-dotted lines, respectively. Black vertical lines are added for visually separating protein, Zn 2+ -ions, and DNA. For residue names of coordinating residues see Table S1.  Table S1.

Protein-DNA Interactions
For proper DNA recognition, the protein and the DNA need to be intact on their own, as well as their interactions. The number of hydrogen bonds, as one metric for such interactions, differs only slightly between the different models for both systems. The median number is 11 for all models of MLL1 and WT1 apart from the CDAM of WT1 where it is 10 (distribution see Figure S6).
The fraction of native contacts, on the other hand, provides further insights into the behaviour of the different ion models. Again, the 12-6-4 model is the worst among the Zn 2+ -models tested in this study. The biggest penalty contribution is from within the proteins rather than from the protein-DNA interactions, as shown by the median fraction of native contacts for the entire system, within the protein, and between the protein and DNA in Figure 5a. The CDAM and EZAFF models perform similarly, with CDAM being slightly better in terms of maintaining native contacts between protein and DNA, and the EZAFF model performing better within protein. The distribution for MLL1 in all Zn 2+models is generally much broader than for WT1 because Q declines over time ( Figure S7). For WT1, the median of the whole system is above 0.9 for all models tested and, thus, very good. We also see that the model does not have a significant long-range influence on the protein-DNA interactions for WT1. The bimodal distribution for the EZAFF model is due to a change in secondary structure in one of the runs (Figures S1 and S7), which gives rise to a different set of contacts. The CDAM and EZAFF models can be considered equally good in this context.

Zn 2+ Coordination
Looking at the individual Zn 2+ -ions and their coordination spheres in more detail reveals the reason for the poorer protein structure quality for both proteins in the 12-6-4 model (as can be seen in RMSD values, secondary structure, and native contact analysis). All Zn 2+ -ions, in both systems, change their CN from four to predominantly six in octahedral geometry (ZN2 in MLL1 also has some amount of CN = 5 in a trigonal bipyramid, which is due to one run, data not shown). A detailed breakdown of the observed geometries during the simulations can be found in Table 1. While CDAM enforces the tetrahedral geometry throughout all simulations and Zn 2+ -ions, the situation for the EZAFF model is more nuanced. This can best be seen for ZN2 of MLL1, whose initial crystal geometry is a distorted tetrahedron instead of a regular one. Consequently, observed geometries are more diverse and not purely tetrahedral, though with a CN of four.   (5) 13.54 --Square pyramid (5) 0.02 --Octahedron (6) 82.12 --Pentagonal bipyramid with a vacancy (equatorial) (6) 4.02 --Irregular geometry 0.04 -- Representative structures for each Zn 2+ -ion and model are depicted in Figure 6, along with the corresponding crystal structure geometry. It can be observed that EZAFF closely resembles the original geometry due to parameter construction. The Zn 2+ -ions in the 12-6-4 model are coordinated by two water molecules, resulting in octahedral coordination. In the CDAM, the structure is also comparable to the crystal but one can see that the distance between the coordinating atoms and Zn 2+ has decreased. The average distances are visualised in Figure 7. For MLL1, almost all distances simulated by all models are shorter than in the crystal structure. The models that are closest to the crystal structure in terms of distances are EZAFF and 12-6-4. For WT1, the Zn 2+ -S γ distance is, on average, systematically underestimated by all models, while the 12-6-4 model overestimates the Zn 2+ -N ε2 distance and CDAM and EZAFF are comparable. Still, EZAFF produces the most consistent representation of distances.
In an ideal tetrahedron, the atom-Zn 2+ -atom angles for all ligating atoms should be ≈109.5 • . Figure S8 shows the distributions for all angles in all models as well as the respective angles in the crystal structures. Again, EZAFF is in the best agreement with the crystal structure due to the model construction, and CDAM benefits from a regular tetrahedral structure.
The RDFs of water oxygen atoms around Zn 2+ -ions in Figure 8 show that water molecules move into the coordination shell in all simulations using the 12-6-4 model. This is not the case for the other two models. However, due to the lower distance to the coordinating ligands, slightly more water can move closer to the Zn 2+ -ions for the CDAM than for the EZAFF model. This behaviour of the zinc ion coordination is also reflected in the RMSD time series calculated for the Zn 2+ -ions and their ligating residues ( Figure S9). Additionally, we see the effect of change in the secondary structure, resulting in changed dihedral angles but not necessarily altered coordination geometry.

Discussion
Our results show that the Zn 2+ -model used, i.e., the 12-6-4 LJ-type non-bonded model, the cationic dummy atom model, or the bonded EZAFF model, has an impact on the Zn 2+ coordination geometry and, thus, on the structural stability of at least the immediate environment of the Zn 2+ -ions. We observe deviations from the initial crystal structure to different extents, which is clearly dependent on the flexibility of the model.
In the 12-6-4 LJ non-bonded model, the CN changes to six in mainly octahedral geometry as two water molecules enter the first coordination shell of all Zn 2+ -ions. This destroys the geometry around the Zn 2+ -ions, the secondary structure is altered, and some of the native contacts, mainly in protein, break. Since the model has been parametrised in solution, where octahedral coordination is dominant, and not in the protein environment, different geometries that can be observed in a protein, such as the tetrahedral coordination of the two systems studied in this work, cannot be adequately represented. This model is only a good choice if the CN changes to six in octahedral geometry or a flexible ligand exchange is desired (or expected), as was the case for the investigation of the divalent cation binding protein PsaA by MacDermott-Opeskin et al. [33].
The CDAM by Pang maintains a tetrahedral coordination geometry reliably but underestimates Zn 2+ -S distances by approximately 0.2 Å as already pointed out by the author in the original paper [26]. These too-short distances cause slight changes in the geometry around the Zn 2+ -ion, which may propagate through the protein. The slowly increasing RMSD and the slowly decreasing fraction of the native contacts within WT1 are indicative of such a Zn 2+ -S bond contraction affecting the entire protein, at least for longer time scales.
The initial crystal geometry is best preserved by the bonded EZAFF model since it is parametrised for this input geometry. Therefore, one has to be sure that the initial structure is of good quality as the one used for WT1. The crystal structure of MLL1, on the other hand, has a lower resolution. For this protein, we observed a decreasing fraction of native contacts (much worse compared to WT1), lower distances between Zn 2+ and coordinating atoms than in the crystal structure, and deviations in DNA parameters. While the resulting structure may be a better representation of the system in solution, it is also conceivable that the constraints imposed by the bonded model force the system into a wrong Zn 2+ -ion coordination and geometry, and the remainder of the system responds by structural adaptions that better accommodate the forced, and possibly wrong, geometry. Such structures may need refinement, such as with the PDB-REDO procedure [57], prior to MD simulation or the CDAM may be the better choice in such cases as it allows for more flexibility of the coordinating ligands in the context of tetrahedral coordination.
In all models investigated in this work, the coordination geometry, once established, is stable, even if it is a wrong geometry. Thus, contrary to CDAM and EZAFF, the 12-6-4 model should not be used for tetrahedrally coordinated zinc that is accessible by water since it could lead to octahedral coordination.
We could not observe long-range effects of the zinc models on DNA, even though the protein secondary structure was altered. This overall stability of the investigated systems suggests the structural role of the zinc ions to be rather local, i.e., only relevant for the domains in which they are located. In more complex systems with dimerisation domains that contain Zn 2+ -ions, as is found in many DNA binding domains of hormone receptors [7,8], the effect of an altered zinc coordination and geometry will likely have more severe effects on the overall structure.
It is always important to validate the chosen model by comparing quantities from MD simulations, as shown here, to those of the corresponding X-ray crystallographic or the NMR structure as already pointed out in other studies [16,21,32]. Care must be taken if the crystal structure only has low resolution. In such cases, it is possible to first use geometry restraints based on the AMBER FF for crystallographic refinement with Phenix, a commonly used program for crystal structure refinement [58].
Our study confirms that there is no single perfect ion model that outperforms all others, even for describing structural-tetrahedrally coordinated Zn 2+ -ions. If the Zn 2+ environment is challenging, e.g., a ligand-binding site where a change in CN could be important, the exact coordination is not known, or none of the available parameter sets produce adequate results, new molecular mechanics parameters (hybrid bonded/nonbonded model parameters as in [31]), more sophisticated models, or (semi-empirical) QM calculations may be needed (as shown by Yu et al. [23]).

Conclusions
We investigated the influence of three popular zinc models, i.e., the non-bonded 12-6-4 LJ-type model, the cationic dummy atom model, and the bonded EZAFF model, on the protein-DNA complexes of MLL1 and WT1. All Zn 2+ -ions in these proteins are tetrahedrally coordinated and guarantee structural integrity. Our results indicate that the 12-6-4 model should not be used for this geometry if water is nearby. CDAM, on the other hand, yields a regular tetrahedral geometry, while EZAFF can additionally maintain more distorted geometries. Thus, EZAFF should only be used if the geometry in the crystal structure is correct. Consequently, the choice of an appropriate model for a given structure depends on the quality of the structure and the role of the zinc ion and should, therefore, be carefully validated for the given system.

Supplementary Materials:
The following supporting information can be downloaded at https: //www.mdpi.com/article/10.3390/biophysica3010014/s1, Table S1: Protonation states of cysteines and histidines in MLL1 and WT1.; Figure S1: Secondary structure elements for all protein residues over the course of simulation for WT1 and MLL1; helices are shown in light red, sheets in light blue and everything else in grey colour. Coordinating cysteines are shown in yellow and histidines in blue. The residues coordinating ZN1, ZN2, and ZN3 are indicated as dotted, dashed, and dash-dotted lines, respectively; Figure S2: Number of hydrogen bonds within protein for MLL1 (left) and WT1 (right); Figure S3: DNA intra-base pair parameters for MLL1 (left) and WT1 (right); Figure S4: DNA inter-base pair parameters for MLL1 (left) and WT1 (right); Figure S5: DNA axis parameters for MLL1 (left) and WT1 (right); Figure S6: Number of hydrogen bonds between protein and DNA fro MLL1 (left) and WT1 (right); Figure S7: Fraction of native contacts time series of the whole system, within protein and between protein and DNA for MLL1 (left) and WT1 (right); Figure S8: Angles around the Zn 2+ -ions including their coordinating ligands for (a) MLL1 and (b) WT1. The median values are shown as white x. Crystal structure values are indicated as black dashed lines; Figure S9: RMSD of Zn 2+ -ions and their ligating residues of MLL1 and WT1 in all three ion models with respect to the crystal structure; Figure S10: Median structures of the last 100 ns of all simulation runs of MLL1 aligned (protein and DNA backbone) to the corresponding crystal structure (grey). RMSD values in Å of protein and DNA backbone with respect to the crystal structure are shown in parentheses; Figure S11: Median structures of the last 100 ns of all simulation runs of WT1 aligned (protein and DNA backbone) to the corresponding crystal structure (grey). RMSD values in Å of protein and DNA backbone with respect to the crystal structure are shown in parentheses.  Acknowledgments: The authors gratefully acknowledge the computer resources and support provided by the Erlangen Regional Computing Center (RRZE). The authors gratefully acknowledge the computer resources and support provided by NHR@FAU. S.V. is grateful for the support from the IMPRS for Biology and Computation.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: CDAM cationic dummy atom model EZAFF extended zinc AMBER force field MD molecular dynamics CN coordination number