Molecular Sciences Ab Initio Study of the Prototropic Tautomerism of Cytosine and Guanine and Their Contribution to Spontaneous Point Mutations

High-level quantum-chemical and quantum-dynamics calculations are reported on the tautomerization equilibria and rate constants of isolated and monohydrated cytosine and guanine molecules. The results are used to estimate the fraction of the bases present in the cell during DNA synthesis as the unwanted tautomers that forms irregular base pairs, thus giving rise to a spontaneous GC → AT point mutation. A comparison of the estimated mutation frequencies with the observed frequency in E. coli is used to analyze two proposed mechanisms, differing in the degree of equilibration reached in the tautomerization reaction. It was found that the fraction of the rare tautomer in monohydrated complex of cytosine as well as guanine significantly exceed the amount responsible for the observed values of the GC → AT mutations. In the absence of water the equilibrium concentration of tautomeric forms is relatively large, but the barrier to their formation is high. It is possible that the mechanism in which a high tautomerization barrier keeps the tautomeric transformation far from a state of equilibrium is more likely than a mechanism in which water and/or polymerases produce a low equilibrium concentration of the tautomeric forms.


Introduction
Since the discovery of the DNA structure and the possibility of the existence of rare tautomeric forms of the DNA bases by Watson and Crick [1], the properties of various isomers of nucleic acid bases differing in the position of hydrogen atoms have been studied extensively .The reason is that it was shown that a proton transfer can lead to mispairing of purine and pyrimidine bases thus causing point mutations.Although it has never been demonstrated experimentally that rare tautomers are responsible for spontaneous mutations, subsequent experimental and theoretical investigations [8][9][10][11][12][13][14] seem to confirm the essential correctness of this postulate.
Experimental [17][18][19] and theoretical [21][22][23] studies have shown that cytosine (C) exists primarily in the two most stable tautomeric forms, i.e. the aminooxo (canonic) and aminohydroxo forms (see Fig. 1).The existence of small amount of the iminooxo form (Fig. 1) has been shown both experimentally [21,24] and by theoretical studies [21][22][23].However, since the aminohydroxo form of cytosine is obtained from a canonic form by a proton transfer from N1 to N3 atom, the former doesn't have any biological significance due to the fact that in DNA the hydrogen atom at N1 position is substituted by a sugar moiety.Thus, it is of essence to discuss the possibility of rare cytosine tautomer occurrence only for iminooxo form.The results of quantum-chemical calculations suggest that the difference in the stabilities of the canonic and this rare form is about 1-2 kcal mol -1 .The situation changes dramatically when the cytosine is immersed in water.The existence of only canonic form has been shown experimentally [25][26][27][28][29][30][31].A strong predominance of the canonic form has been also revealed by theoretical studies [23].When forming a DNA double helix, cytosine forms a hydrogen-bonded pair with guanine.On the other hand, the rare iminooxo form of cytosine (C*) forms a pair with adenine (A) instead of guanine (G).Similarly, rare tautomer of guanine (G*) forms a pair with thymine (T).After the strand separation, the counterbases will form pairs with thymine and adenine instead of guanine and cytosine, respectively.Thus, the scheme which was postulated in Ref. [32], leads to a spontaneous GC → AT transition.One of the reactions (pathways), which included the tautomerization of guanine, responsible for this kind of transition was discussed in [33].Similarly, we can write the following reaction scheme for cytosine that is also responsible for GC → AT transition:

C C*
From (1) one can postulate that the concentrations of both C* (from equilibrium 2) and G* (see ref. [33]) or at least one of the rare forms should be sufficient to provide for the experimentally observed rate of mutations of type (1) [34].Two mechanisms can be considered for generation of sufficient concentration of rare forms: 1.The equilibrium in the tautomeric transformations is reached during the synthesis of DNA (equation (2) for cytosine and analogous one for guanine [33]).Topal and Fresco [35] were the first to propose this scheme.They suggested that the value of the equilibrium constant should be in the range of 10 -4 -10 -5 if the same equilibrium is also governing the "proof-reading" reaction [35].
2. The second mechanism was first discussed by Löwdin [9].He suggested that the equilibrium is not reached during the DNA synthesis due to the fact that the forward reaction in (2) is too slow.
This suggests that the forward reaction rate constant is a limited factor rather than the equilibrium constant.Therefore, this constant should be large enough to be able to generate a concentration of rare form significant enough to reproduce the frequency of point mutations after the proofreading step of the DNA synthesis.
Some of the previous studies suggest [36,37] that the DNA bases are either completely dehydrated or only partially hydrated during the replication and transcription.In order to study the effect of partial hydration of the DNA bases, we have also included the following reaction scheme in our current study: The tautomerization reaction is a complex multidimensional process and the value of the barrier height alone is not sufficient to determine the time required to reach the biologically relevant concentrations of the rare forms.The standard transition state theory is not appropriate for this reaction because tautomerization process includes a proton transfer, which may involve a quantum-mechanical tunneling.Thus, a multidimensional tunneling approach is necessary here [38,39].However, because of the high computational demand of such calculations, the latter are limited to small model systems only [40][41][42].
In our recent studies [33,43] we have addressed the dynamics of tautomerization using a recently developed approximate instanton approach [43,44], which can describe proton transfer in large molecular systems [45].The investigation included the calculation of the rate constants for guanine and its mono-and dihydrated forms using energetic parameters obtained from MP2/6-31G(d) and higher levels of theory together with frequencies from HF/6-31G(d) calculations.It was concluded that the low concentrations of rare guanine tautomer in E. coli are more easily explained if water is assumed to be absent, so that the tautomeric system remains far from equilibrium.
In this study we continue this investigation by considering a second nucleic acid base, which together with guanine participates in the aforementioned GC → AT transition.For the calculations of the rate constants we have used some of the data from our previous study on the stability of cytosine tautomers [23].The goal of this research is also to determine if both or only one the two bases contribute to the observed frequency of spontaneous point mutations in E. coli for which reliable data are available.

Details of Calculations
The ab initio LCAO-MO method was used for the study of cytosine tautomers and their interaction with one water molecule.The calculations were carried out with the Gaussian 98 program [46].The standard 6-31G(d) basis set was used for optimizations of the molecular geometries.All geometries of the local minima were optimized without symmetry restrictions (C 1 symmetry being assumed) by the gradient procedure at the second order of closed-shell restricted Møller-Plesset perturbation theory [47].Additional optimizations of all structures were performed for comparison purposes at HF level of theory [48].The local minima and transition states were verified by establishing that the matrices of the energy second derivatives (hessians), calculated at the MP2/6-31G(d) and HF/6-31G(d) levels, have zero and one negative eigenvalue, respectively.Additional single-point calculations were performed at the MP2/aug-cc-pVTZ and CCSD(T)/aug-cc-pVDZ levels of theory.To estimate the enthalpy values, the thermal corrections calculated at the level of optimization were added to the corresponding energies (the values of thermal corrections that were calculated at HF level were scaled by 0.9 [49]).The entropy values were evaluated from the frequency calculations at the level of optimization.The Gibbs free energies and equilibrium constants were estimated at 298.15 K by the , where the equilibrium constant is given by K = k f(orward) /k r(everse) .
To estimate the rate constants, the approximate instanton approach [44,50] was used, as implemented in the DOIT 1.2 program [51] and described in Ref. [44].The frequencies calculated at HF/6-31G(d) level were scaled by 0.90 and those calculated at MP2/6-31G(d) were scaled by 0.9661 [49].The tunneling rate constant for the C* → C transfer was calculated by the expression where 0 t Ω is the effective tunneling frequency in the equilibrium configuration of C*, and S I (T) is the multidimensional instanton action.The rate constants for the proton transfer processes presented in schemes ( 2) and (3) were calculated from k f (T) = K(T) k r (T).For details of the instanton approach, we refer to original papers [39,50].
To describe the kinetics of the tautomeric transitions of Schemes ( 2) and (3), the standard equations for the kinetics of reversible first-order reactions [52] has been employed: The fraction of C transformed into C* at time t for C*(t) << C 0 can be calculated as In the limit where (

Results and Discussion
The values of the calculated Gibbs free energies of tautomerization ∆G 298 together with the corresponding equilibrium constants are presented in Table 1.One can see the predominance of the canonic forms of both cytosine and guanine.The results suggest that the fraction of the rare form of cytosine will constitute from 6 to 31% of the mixture of tautomers, depending on the level of calculations.Similarly, the calculated equilibrium fraction of rare guanine tautomer ranges from 2 to 45% (see Table 1 and Ref. [33,53]).Even though the precise concentrations of rare forms have not been experimentally determined, these results are qualitatively in agreement with available spectroscopic data [17][18][19] (note that we do not consider the aminohydroxo form of cytosine, which was shown to be the most stable in gas phase, because it cannot be present in DNA due to the substitution of the hydrogen atom at N1 position by a sugar moiety).[53].‡ Thermochemical data used for the calculation of ∆G were calculated at HF/6-31G(d) level of theory.The value of thermal correction to enthalpy was scaled by 0.9.
Even though it was shown experimentally that the canonic forms of both cytosine and guanine are the only tautomers present in water solution [16,[25][26][27][28][29][30][31], it is reasonable to consider the complexes with limited number of water molecules.In the previous study of the guanine tautomerization one and two water molecules were added to guanine tautomers.Since only insignificant differences were found between mono-and dihydrated complexes, the present study has been limited to only one water molecule.The data presented in Table 1 shows that hydration of cytosine and guanine tautomers with one water molecule introduces a higher gap between the stabilities of the two tautomeric forms of each base, significantly decreasing the concentration of the rare tautomer.
Even more profound difference between isolated and monohydrated bases is observed for the height of the tautomerization (hydrogen transfer) barriers U (see Table 2).The absence of water makes the process of proton transfer very slow.Introduction of one water molecule decreases the height by approximately one half, making this process much faster (see the rate constants in Table 2).Indeed, substitution of the values of rate constants into the equation ( 7) results in the following values for the time needed to reach approximately 90% of the equilibrium concentration of the rare tautomer starting from pure canonic form.In case of isolated cytosine, it is predicted to take 10 4 -10 6 hours to reach this concentration, depending on the level of calculations.In case of guanine, the calculated value amounts to 73 hours.
One can see that these numbers change dramatically when a water molecule facilitates the proton transfer.The results show that the value of the equilibrium constant increases by up to sixteen orders of magnitude for cytosine and by up to twelve for guanine.The significance of that change is that now the time required for reaching the equilibrium concentrations decreases down to 10 -9 -10 -4 seconds for cytosine and about 10 -8 for guanine [33].In both cases the equilibrium is achieved instantly.One should note that there is a significant difference in the values of the rate constants calculated with geometries and frequencies obtained at Hartree-Fock level and those at Møller-Plesset level of theory (Table 2) even thought the same energetic parameters (barrier height and reaction exothermicity) were used in both cases.In contrast, the difference in the values of those constants for isolated bases is close to zero.This fact proves the complexity of the proton transfer process, which can be evaluated differently at various levels of theory.Note, that the calculations of the rate constants for guanine tautomerization [33] have been performed using geometries and frequencies obtained at Hartree-Fock level of theory.
In order to discuss the biological consequences of these findings, we apply the data in Tables 1 and   2 directly to the interpretation of the frequency of spontaneous GC → AT transitions in E. coli, assuming that all mutations are produced through the reactions (2) and (3) for cytosine and analogous transformations for guanine.The frequency of the spontaneous GC → AT transitions in E. coli has been found to be in the range of (3-9) × 10 -10 [34].The value c* for cytosine (and g* for guanine) are greatly dependent on the time allotted for DNA synthesis.However, this time should be shorter than (1.2-3) × 10 3 seconds, which is the time required for generation of a new E. coli cell.One can see that 1.8 × 10 7 † Data from ref. [43].‡ The geometries and frequencies used for the calculations of rate constants were taken from HF/6-31G(d) results.The frequencies were scaled by 0.9.
§ The geometries and frequencies used for the calculations of rate constants were taken from MP2/6-31G(d) results.
if we consider partially hydrated complexes of the bases, the equilibrium is reached almost instantly.
Thus, the fraction of the noncanonical form of cytosine would be c* = K = (1-3) × 10 -2 for monohydrated tautomers.The analogous value for guanine is g* = 2 × 10 -3 [33].Both of these values are significantly larger than the observed frequency of the spontaneous mutations.If, on the other hand, we consider isolated bases, the equilibrium concentrations of the rare forms are higher, but the time required to reach those is too long.In that case, using the eq.( 9), we can calculate the fractions as c* = k f t = (3-10) × 10 -9 t for cytosine and g* = 1.5 × 10 -7 t for guanine, were t is the time of the DNA synthesis, which should be less than 10 3 s.Therefore, if considering the time of the DNA synthesis in the range from 10 to 1000 s, one can get the values of c* and g* to be approximately 10 -5 -10 -8 and 10 -4 -10 -6 , respectively.However, these values are still significantly larger than the observed value of (3-9) × 10 -10 .
In order to correct the errors which occur during the DNA synthesis, DNA polymerase checks the newly-synthesized DNA strand and corrects most of the incorrect bases [45,54,55].It was shown experimentally that this "proof-reading" step reduces the number of mutations by a factor of 10 2 -10 3 .
Such significant reduction should be also considered while comparing the calculated and observed frequencies of the mutations.Therefore, the frequency of the spontaneous GC → AT before the checking step should be in approximate range of 10 -6 -10 -8 .One can see that the fractions of rare tautomers in monohydrated species of cytosine and guanine are still much larger than the proposed range.On the other hand, the values calculated for cytosine are fitting this range very well and those of guanine are on the border of this range.However, given the uncertainty of the DNA synthesis time and the real values of the barrier heights and rate constants, we can only conclude that the predicted values for the frequencies of mutations are similar to those derived experimentally.

Conclusions
In this study we have continued the investigation of the contribution of the tautomerization to the spontaneous point mutations by including the thermodynamic and kinetic data and application of quantum-chemical and direct-dynamics methods to the tautomerization in cytosine.We have found that the fraction of the rare tautomer in monohydrated complex of cytosine as well as guanine significantly exceed the amount responsible for the observed values of the GC → AT mutations.On the other hand, the values obtained for isolated cytosine are in good agreement with the experimental results.However, since the fraction of the rare guanine generated in the same amount of time considerably exceeds that of the cytosine, the contribution of the latter to the spontaneous point mutations of GC → AT type should be regarded as insignificant.Also we would like to note that the values of the predicted equilibrium and rate constants are sensitive to the level of calculations (level of theory and the basis set), which suggests that a higher-level calculations should be also performed.

6 )
where C*(t) is the current concentration of C* at time t; C i and * i C are the initial concentrations of C and C*; k f and k r are the forward and reverse rate constants; * eq C is the equilibrium concentration of C*; and K is the equilibrium constant for the tautomerization process.From equations (5) and(6) we can calculate the time necessary to reach a given concentration C* as
† Data from ref.