1. Introduction
Thymine DNA glycosylase (TDG) is an enzyme of the base excision repair (BER) system that recognises and excises the nucleobase of a number of damaged or mispaired nucleotides. In addition to the removal of the name-giving thymine in G:T mispairs, TDG has been reported to operate on forms of oxidised methyl-cytosine, products of the ten-eleven translocation (TET) methyl-cytosine dioxygenase that transforms 5-methyl-cytosine (MC) through step-wise oxidation into 5-hydroxymethyl-cytosine (HMC), 5-formyl-cytosine (FC), and 5-carboxylcytosine (CAC) [
1,
2,
3]. Whereas HMC and MC are not processed by the glycosylase enzyme, the higher oxidised forms—FC and CAC—are recognised and expelled by TDG, and ultimately, by other enzymes in the base excision repair pathway, replaced by unmethylated cytosine [
1,
4,
5].
Crystal structures of TDG glycosylases complexed to lesioned DNA [
6,
7,
8] show the mispaired or damaged bases flipped out of the helical DNA duplex into the enzyme’s active site. Base extrusion is thus an important step in the base excision process and one possibility along a multi-step interrogation pathway to discriminate target bases from non-cognate ones [
9,
10]. The base flip has been suggested by simulations to follow different dynamics for FC and CAC than for thymine, and an active role of the TDG enzyme in promoting base extrusion has been shown [
11].
Besides discrimination of the cognate and non-cognate methyl-cytosine forms at the stage of base flipping, substrate specificity can at last be achieved at the chemical step of glycosidic bond cleavage between the C1’-atom of the sugar and the N1-atom of the methyl-cytosine base. The removal of CAC has been shown to be acid catalysed, ruling out HMC and MC as substrates since these bases have no proton acceptor groups. Excision of FC, on the other hand, does not require acid catalysis [
4] but appears to rely on FC to be a better leaving group than HMC or MC [
12,
13,
14]. Quantum chemical calculations on nucleotide models suggest that differences in the ‘inherent chemistry of the modifications’ compared to MC lead to lower barriers of the glycosidic hydrolysis and thus higher activity of TDG [
15,
16,
17]. One such chemical difference has been described by shorter N-glycosidic bond lengths in the reactant transition state and another by the leaving group ability of the base [
18]. Calculations of thymine excision by the TDG enzyme find an active role for a histidine residue, His151, in proton shuttling to and from the leaving base, a mechanism that is not conceivable with cytosine and methyl-cytosine bases [
19]. Calculations of the excision mechanism of FC in TDG, in contrast, do not support the need for proton shuttling by His151 and do not provide further suggestions of substrate discrimination at the chemical step [
20].
Biochemical DNA binding data show binding to C, MC and HMC to be significantly weaker than binding to DNA with substrate bases [
4,
14]. It is therefore conceivable that the recognition by the glycosylase has taken place already, upon binding to the damaged DNA, forming stronger or at least different interactions with the two cognate forms of oxidised methyl-cytosine, FC and CAC, than with the non-cognate forms, HMC and MC, respectively.
Figure 1 summarises possible steps in the protein binding, base recognition, and base excision mechanisms by which TDG could discriminate target and non-target bases.
In contrast to the recognition of mispaired thymine, the deamination product of methyl-cytosine (or uracil, the deamination product of cytosine), which is likely detected due to the local deformation of the DNA at the lesion site [
21,
22], none of the aforementioned oxidised methyl-cytosine forms appears to exhibit an altered conformation of the DNA in solution [
23,
24] that can easily be recognised by the repair enzyme TDG. Experimental and simulation studies, however, report distinct structural alteration of 10 bp long DNA with two XC bases, one FC or CAC on either strand, compared to B-DNA [
25]. Moreover, larger fluctuations of FC and HMC have been observed in molecular dynamics (MD) simulations and larger fluctuations of FC:G pairs have also been confirmed experimentally [
26]. Base pair opening, which can be understood as the first step of base extrusion, has been probed by NMR, using imino proton exchange rates as a marker. Imino protons are more accessible and thus the exchange rates are faster, if the G:XC pair is in a (partially) open conformation and/or the base is (partially) flipped out of the DNA helix. The measured rates, though different for the different oxidised forms, do not show a trend that correlates with TDG activity [
27]. In particular, CAC does not alter DNA flexibility and does not exhibit greater base pair motion (opening) or faster imino proton exchange [
24,
27].
C-NMR vs. pH-titration experiments show a much lower N3-pK
a for FC and CAC than for HMC, which has been explained by the electron-withdrawing properties of the formyl and carboxyl group. An increased N3 acidity would then correlate with weakened hydrogen bonding and reduced base pair stability, explaining the observed lower melting temperatures for FC and CAC compared to MC [
28].
It has also been suggested that the higher oxidised forms of methyl-cytosine—FC and CAC—could be recognised because of their higher propensity, compared to HMC and MC, to form imino tautomers [
4,
7]. Such imino tautomers would predominantly form so-called wobble pairs (see
Figure 2), which resemble the mispairs formed by uracil, G:U, and thymine, G:T. Calculations of amino and imino tautomers of the isolated nucleobases FC and CAC in the gases phase and in an implicit water model show a clear preference for the amino forms [
6]. Moreover, NMR experiments [
27] and 2D-IR spectra accompanied by density functional theory calculations [
28] find the amino forms of free DNA in water to be predominant.
Yet, the situation might be different in the protein environment, that is, with the TDG enzyme complexed to the DNA, an imino form might be stabilised by the enzyme and/or DNA with one of the oxidised forms, amino or imino, interacting more or less favourably than the others. In this paper, we therefore investigate DNA carrying one of the different possibilities of G:XC pairs at a time, namely guanine paired to one of the differently oxidised forms of methyl-cytosine in their amino (XC = MC, HMC, FC, or CAC) or imino (XC = IMC, IHC, IFC, or ICC) tautomeric forms, respectively. By means of molecular simulations, we compare these DNA systems, in free form and complexed to TDG, so as to explore the differences in conformational dynamics and protein-DNA interactions between the different oxidised forms and between amino and imino tautomers.
2. Methods
We modelled DNA with different modifications of a G:XC pair as amino tautomers, XC = CAC, FC, HMC, MC and imino tautomers XC = ICC, IFC, IHC, IMC. The starting coordinates of the uncomplexed DNA in sequence CATCGCTCA
XCGTACAGAGC have been taken from the PDB [
29,
30] structure 6U17 [
31]. For the complex of DNA (GCTCA
XCGTACA) with thymine DNA glycosylase we used the crystal structure with PDB code 2RBA [
6]. This structure contains a 2:1 complex of the catalytic domain of human TDG (residues 111–308) with one protein bound to an abasic site analog and the other one bound to a non-cognate site with a central G:C pair. We removed the protein bound to the abasic site and from the DNA we kept only the part that is complexed to the second other protein. The different modifications, XC = CAC, FC, HMC, MC and imino tautomers XC = ICC, IFC, IHC, IMC, have been build by adding a (oxidised) methyl group to the 5C of the cytosine base of the central G:C pair. Molecular dynamics simulations of the models were performed with Amber 18 [
32] and Amber 20 [
33] using pmemd.cuda, following a protocol established previously [
34,
35,
36,
37]. The DNA part of the system was described by the parmbsc1 [
38] force field and the protein by ff14SB [
39]. Both free and complexed DNA were solvated with TIP3P [
40] water and sodium counter ions were added as well as NaCl at a concentration of 150 mM [
41].
5-Methylcytosine (MC) and its oxidised derivatives 5-hydroxymethylcytosine (HMC), 5-formylcytosine (FC) and 5-carboxylcytosine (CAC) and their imino tautomers ICC, IFC, IHC and IMC (one hydrogen moved to N3 from N4, see
Figure 2) were parameterised following a protocol established previously [
35,
36,
37,
42,
43]. Therefore, only a short summary is given here. We used RESP [
44,
45] charges for the modified bases; missing parameters of the bases were amended using values from GAFF [
46,
47] or parmbsc1/ff14SB [
38,
39].
After initial geometry optimisation and 500 ps heating to 298 K in an NVT ensemble, for each of the systems, three independent runs were performed for 600 ns. These production runs were performed with Langevin dynamics in an NPT ensemble at 1 bar and 298 K, using a time step of 2 fs, SHAKE [
48,
49] on all bonds involving hydrogen and periodic truncated octahedral boxes (box dimensions approx. 92 Å), a non-bonding cutoff of 10 Å, and particle mesh Ewald for the treatment of electrostatic interactions. Watson–Crick distance restraints were imposed on the DNA termini of the free DNA (20 kcal·mol
Å
, allowing ±0.1 Å movement from the equilibrium bond distance) to prevent fraying of the DNA termini [
50]. For the protein-DNA complexes we used slightly different distance restraints with a force constant of 2 kcal·mol
Å
on the DNA termini, but the same as in [
51], which are in accordance with B-DNA geometry. Not only were the distances between the heavy atoms of the Watson–Crick H-bonds of the DNA termini restrained but also the C1’-C1’ distance of them and the ones shifted by one base pair in the 3’ or 5’ direction (for upper and lower bounds see Supporting_Table S1 of [
51]).
Only the last 500 ns were used for analysis. Cpptraj [
52] from the AmberTools suite, vmd 1.9.3 [
53] and Curves+/Canal [
54,
55] were used for further analyses of the systems’ fluctuations, hydrogen-bond interactions and the DNA conformation. A hydrogen bond was defined based on geometric criteria, that is, a donor–acceptor distance not larger than 3.2
and a donor-hydrogen-acceptor angle deviating from linearity by not more than 42
. A flip angle, describing how much the XC base is in an intrahelical or extrahelical state, is defined as the pseudo dihedral formed by the XC base, the sugar of the XC nucleotide, the sugar of the next nucleotide downstream and the next base and its complementary base, a definition we have used previously [
21,
22].
Relative binding free energies of the complexes of the DNA oligonucleotides containing the amino- and imino-tautomers of the four (oxidised) variants of 5-methylcytosine with TDG were obtained using the thermodynamic cycle shown in
Figure 3. The perturbations were performed with Amber 20 [
33] pmemd.cuda following a dual-topology thermodynamic integration (TI) approach [
56,
57,
58,
59]. The amino tautomers of CAC, FC, HMC and MC were perturbed into their imino tautomers ICC, IFC, IHC and IMC using a lambda coordinate of 21 windows (0.00, 0.05, …, 0.95, 1.00), both in the bound state (complex with TDG) and the free state (solvated in water) [
56]. A van-der-Waals and electrostatic soft core potential with Amber 20 default soft core parameters was used, the soft core regions are indicated in
Figure 4.
The perturbation free energies
and
(Scheme in
Figure 3) were obtained from the free energy gradients by trapezoidal numerical integration. Starting structures for the TI simulations were taken from the MD simulations of the unperturbed amino forms of CAC, FC, HMC and MC after 10.5 ns equilibration. Again, Watson–Crick distance restraints on DNA termini (see above) were employed to prevent fraying of the DNA termini.
After initial geometry optimisation, each lambda window was heated to 298 K during 200 ps NVT with weak Cartesian restraints (5 kcal·mol
Å
) on non-hydrogen DNA/protein atoms, followed by 200 ps NPT equilibration without restraints. An integration time step of 1 fs was used, with SHAKE [
60] constraints on all bonds involving hydrogen except the perturbed residues (in addition to SHAKE being removed between bonds containing one common and one unique atom). A Monte Carlo barostat was used for pressure (1 bar) control. All other simulation parameters were chosen as suggested in the Amber TI tutorial [
56]. Each lambda window was simulated for 30 ns, of which the last 20 ns were used for integration and each perturbation was repeated 2 times so as to generate 3 runs of each perturbation simulation.
Values reported in this work are the mean calculated from averaging over the three independent runs of the respective simulations and errors are estimated as the standard deviation from the mean.
4. Discussion
The conformational dynamics and the interactions with the protein do not exhibit significant differences between the different oxidised forms of 5-methyl-cytosine in their amino tautomers. However, the imino tautomers of all G:XC pairs exhibit significant conformational differences compared to their amino counterparts. As anticipated, the amino tautomers are in a Watson–Crick conformation whereas the imino tautomers form wobble pairs with fewer and shifted hydrogen bonds between the XC and the guanine base. Such differences can in principle be exploited for recognition by the TDG protein and it has also been suggested that extrusion (flipping) of an XC base in a wobble pair requires less energy than from a Watson–Crick pair [
7]. The wobble conformations, moreover, exhibit a second, less favourable conformational state, that corresponds to a partially open and partially flipped state, as has been observed earlier for mispaired thymine [
21,
22]. For the mispair, this second state is stabilised upon complexation to the TDG protein, in contrast to G:C and G:MC (in amino form) which remain in Watson–Crick conformation also with the protein bound [
21,
22]. Among the imino tautomers of the G:XC pairs, only the lower oxidation forms, IHC and IMC, experience a lowering of the relative free energies of the partially open/partially flipped state. That is, the conformation that likely plays a crucial role in the recognition of mispairs becomes more favourable upon complexation only for the
non-target bases of TDG.
There are no direct interactions, such as hydrogen bonds, between the TDG protein and the G:XC pair or, for that matter, to other DNA residues, that could explain the observation of a stabilised partially open/partially flipped state in the imino tautomers. The only direct hydrogen bond between the TDG protein and the XC base is to the O15 (or O16) atom of the oxidised methyl group. But first, the probability for this hydrogen bond is very low and second, it is observed for both tautomeric forms of carboxyl-cytosine and IHC. Un-oxidised imino methyl-cytosine, IMC, whose more open conformation is most stabilised by complexation to TDG, lacks the oxygen atom in question and cannot form such an interaction.
However, only the IMC and the IHC systems populate conformations in which the imino proton is oriented in such a way that likely favourable contacts of Lys201 with the N4 atom of the XC base are possible. In the IFC and ICC systems, the higher oxidation level and hence the higher negative charge favours an orientation of the N4 atom towards the oxygen atom(s) of the formyl and carboxyl group, respectively. One can thus argue that closer interaction with Lys201 requires either a more open state, so as to allow the protein residue to come closer to the N4 atom, or a sufficiently polarised oxygen atom as in a formyl or carboxyl group. With two such oxygen atoms and a full negative charge, carboxyl-cytosine does not even need a more open conformation for stronger interactions, explaining the smallest stabilisation effect of a more open state upon complexation of ICC.
Moreover, the unoxidised methyl group is smaller than the higher oxidised forms and reduced sterical demands may be a simple explanation why IMC has the highest chance to be in a more open/more flipped state. In the case of methyl-cytosine, there are strong hydrogen bonds between the DNA residues next to the lesion and protein residues Cys233 and mainly Lys232 that are only observed for the imino tautomer, IMC. It is interesting to note that these hydrogen bonds have also been observed in earlier simulations [
11,
22] of DNA with G:MC and DNA with G:T complexed to TDG in both intrahelical and extrahelical conformation.
Whereas all these interactions indicate the imino tautomer IMC to be more favourable in the complex than in free DNA, the computed binding affinities point in the opposite direction. The only interaction that is diminished upon complexation is the water-mediated hydrogen bond between XC and G, while hydrogen bonds of XC (by the N4 atom) and of the complementary G17 (by its O6 atom) with water have comparable probabilities for the free and complexed IMC system. This can be interpreted as the G:IMC pair becoming even more ‘wobbly’, and occasionally too much opened for a water molecule to wedge in between. This may be the only conformational freedom gained whereas all the ‘stabilising’ contacts with the protein likely come on the expense of entropy and thus lead to an, in total, higher relative free energy.
For hydroxymethyl-cytosine, in contrast, the calculated relative binding affinities show a complexed imino tautomer to be slightly preferable over an uncomplexed IHC, in agreement with the observed stabilisation of the more open state upon complexation to TDG. For the formyl-cytosine, the imino tautomer IFC is also less favourable in the complex than in free DNA. Hydrogen bond interactions with the protein, within the DNA, and even with water are comparable for free and complexed IFC. Higher steric demands than the unmodified methyl group render the formyl group more unlikely to populate the more open wobble state. That aside, the unfavourable binding affinity to the protein for the imino tautomer IFC, may also be attributed to entropic considerations. In the complexes, the DNA fluctuates less than in the free form, and is, moreover, bent at the lesion site. Both effects are naturally due to interactions with the protein, and stabilised by for example, Arg275 wedging into the DNA groove (see
Figure S4). Our earlier findings of TDG complexed to mispaired, but intrahelical, G:T, and the present study suggest partially open wobble pair conformations to be favoured by the protein and hence more stabilised. It appears that for the imino tautomers IFC and IMC this stabilisation cannot outperform the loss in entropy in these tighter complexes. For the imino tautomer ICC both effects may just balance, resulting in negligible differences of binding affinities, and for the IHC forms, the (tighter) complexation is slightly favourable.
Even the unfavourable relative binding affinities of ∼2 kcal/mol render the imino tautomers to be not much less likely in the complexes than in the free DNA. Given, however, that imino tautomers are hardly observed in free DNA in water [
27,
28], there is little chance for DNA with G:XC lesions in imino form to bind to the TDG protein. A transient transition from amino to imino tautomer in the complexed DNA cannot be ruled out completely, though. The subsequent base extrusion is likely considerably faster for imino than for amino tautomers, such that even the small amounts of TDG-DNA complexes with imino tautomers can significantly contribute to the formation of the extrahelical state. A step-wise binding and recognition process in which imino forms play a role could look like this: first, DNA with G:XC as amino tautomer is bound, then proton transfer in the complex generates the imino form, which is subsequently extruded. Finally, the base is, as imino or amino tautomer, expelled. Our findings of a relatively more stabilised partially open, and hence closer to extrusion, conformation for the imino tautomers of the non-cognate G:XC systems, hydroxymethyl- and methyl-cytosine, renders such a recognition mechanism favouring the wrong, that is, the non-target bases. If, however, the proton transfer step has a much lower barrier in the cognate, formyl- and carboxyl-cytosine systems than in the non-cognate systems, their imino forms, ICC and IFC, have a higher probability to be formed (in the complex) and these bases would be flipped and then excised more easily than the non-target bases. Taken together, the imino forms of the oxidised methyl-cytosines are likely not decisive for recognition by TDG upon binding to the DNA.