Computational Estimation of the Acidities of Pyrimidines and Related Compounds

Pyrimidines are key components in the genetic code of living organisms and the pyrimidine scaffold is also found in many bioactive and medicinal compounds. The acidities of these compounds, as represented by their pKas, are of special interest since they determine the species that will prevail under different pH conditions. Here, a quantum chemical quantitative structure–activity relationship (QSAR) approach was employed to estimate these acidities. Density-functional theory calculations at the B3LYP/6-31+G(d,p) level and the SM8 aqueous solvent model were employed, and the energy difference ∆EH2O between the parent compound and its dissociation product was used as a variation parameter. Excellent estimates for both the cation → neutral (pKa1, R2 = 0.965) and neutral → anion (pKa2, R2 = 0.962) dissociations were obtained. A commercial package from Advanced Chemical Design also yielded excellent results for these acidities.


Introduction
Pyrimidines play a central role in the terrestrial genetic code and the pyrimidine framework is also found in many bioactive compounds and medicines. As measurements of molecular properties are frequently difficult or expensive, it is of interest to develop theoretical means for estimating these properties. Among the most important and interesting properties of the pyrimidines are their acidities, as represented by their pK a values, which determine the forms of the compounds that will prevail in solution under different pH conditions. As a result, there has been a long-standing interest in estimating the pK a s of chemical compounds using theoretical approaches [1,2]. This interest continues, as demonstrated by the broad range of methods employed in recent pK a studies [3][4][5][6][7][8][9][10]. In an earlier study by our group [11], we presented computational estimates of the pK a s of the biologically related purines and indoles. In the present work we develop estimates for the acidities of pyrimidines and related compounds.
Three main approaches have been used in studies estimating compound acidities [1,12]. In the first approach, first-principles or absolute pK a s are determined in a straight-forward manner using direct calculations, often relying on a thermodynamic cycle to separate different hypothetical stages [13,14]. The advantage of this approach is that it follows standard procedures and does not depend on prior knowledge of experimental results. Its disadvantage is that it normally requires a high level of computational effort to achieve reasonable accuracy. A second approach employs the development of an appropriate Quantitative Structure-Activity Relationship (QSAR) for the acidities of a set of compounds [2,15]. This approach requires first the collection of measured experimental pK a values for the set and then the identification of some suitable molecular parameter that is closely related to the acidity. An advantage of this approach is that it can generally achieve high accuracy while employing more modest computational efforts, and it accordingly allows the estimation of the pK a s of related, unreported compounds using the same modest computational effort. Furthermore, digressions from the derived regression algorithms (i.e., outliers) can alert practitioners to compounds that may be exhibiting behaviors different from the others in the sample [16][17][18]. In a third approach, a number of commercial programs use algorithms derived from large acidity databases and employ empirical parameters such as Hammett constants to estimate pK a s [19,20]. The latter two approaches were employed in the present study.
As noted, the QSAR approach relies on the discovery of some parameter or property of the compounds examined that correlates with the activity of interest, in the present case the pK a . In many cases, partial atomic charges have been employed as variation parameters for acidities and other properties. However, because the notion of an "atomic charge" in a molecule is not a proper quantum chemical observable, a variety of different approximate schemes have been developed to represent these atomic charges. Of these proposed schemes, our own group [21][22][23][24] and others [5,[25][26][27] have found the natural population analysis (NPA) orbitals and charges developed by Weinhold et al. [28][29][30] to be especially useful in this role. More recently, we have also used another parameter, the energy difference ∆E H2O between the parent compound and its main dissociation product for this purpose. It is this parameter that was employed in the present study.
A particular difficulty arises in a study of the pyrimidines because these compounds do not typically appear as a single species in gas phase or solution, but rather are present as a collection of related tautomers, a situation that also prevailed in our earlier study of purines [11]. Accordingly, some accommodation must be made for this condition, as will be described in the following section.

Methods
Our aim in a recent series of reports has been to design and evaluate a QSAR protocol suitable for producing accurate pK a estimates for selected classes of compounds while employing relatively modest and commonly available computational tools.
The initial step in a pK a QSAR study involves the collection of reported experimental pK a values from the literature for the class of compounds of interest. In the present case, values for both the cation-to-neutral dissociation, AH 2 + → AH + H + , which we will designate pK a1 , and the neutral-to-anion dissociation, AH → A − + H + , which we will call pK a2 , were available for a number of pyrimidines and related compounds. These values are tabulated in Table 1. Given in Table 1 also are computed pK a values obtained from the Advanced Chemical Development (ACD) [31] commercial software package. Table 1. Reported experimental pK a1 and pK a2 values and ACD computed values for the pyrimidines and related heterocycles studied.

No.
Compound Formula pK a1 ACD pK a1 pK a2 ACD pK a2 Calculations were carried out using the Spartan'10 software package (Wavefunction, Inc., Irvine, CA). In earlier pK a studies, we found that density-functional computations at the B3LYP/6-31+G(d,p) level provided accurate accounts of molecular properties while still requiring only modest computational demands. After testing this assumption (vide infra) against available gas-phase experimental results for pyrimidines in the NIST database [40], this level of theory was adopted in the present work. For the studies in aqueous solution, the SM8 aqueous solvent model of Marenich et al. [41,42] was used. This same solvent model was also employed in our earlier study of purines and indoles [11] and other studies [16,17,43], where it was shown to perform well in helping to reproduce the experimental data.
As noted above, many of the compounds examined here exist in several tautomeric forms in solution. (For example, uracil has three cationic tautomers, six neutral tautomers, and two anionic tautomers.) Accordingly, the relative stabilities of the tautomeric formscationic, neutral, and anionic-of each compound were evaluated within the SM8 aqueous solvent model, and the most stable tautomer for each condition was taken as representative of that compound for computational purposes [44]. Fortunately, there frequently exists a substantial energy gap (>25 kJ/mol) between the most stable tautomer and the remaining tautomers of that species; however, ultimately the validity of this approximation must await justification from the results of the subsequent analysis. In earlier studies, we found that for neutral → anion dissociations (pK a2 ) the value ∆E H2O = E H2O (A − ) − E H2O (AH) for the energy difference between the parent compound AH and its dissociation product Ain aqueous solution provides an excellent regression parameter for QSAR pK a studies, and after testing several other parameters this descriptor was used in the present studies. For the cation → neutral dissociation (pK a1 ), the analogous expression ∆E H2O = E H2O (AH) − E H2O (AH 2 + ) was used. We also provide a note of caution regarding directly comparing numerical results found here using Spartan'10 with those obtained using the popular Gaussian computational package (Gaussian, Inc., Wallingford, CT 06492, USA), since these programs use different basis sets for some atoms [18,45].

Results and Discussion
We first wished to assure that the B3LYP/6-31+G(d,p) level of computation was sufficient to provide accurate results for the dissociations in question. For this, we turned to gasphase reaction results reported in the NIST chemical database [24]. This database contains gas-phase thermodynamic data for the Gibbs energy change ∆ r G • for the anion + H + → neutral reaction, for six of the compounds studied here. These experimental data are compared with our computed ∆ r G • values in This, and the coefficient of determination of R 2 = 0.998 between the computed and experimental values, suggest that the level of computation described should provide a good account of the reactions to be considered. As noted above, in previous studies we have found that the energy difference ∆E H2O between the parent compound and its dissociation product provides an excellent regression variable for pK a QSAR estimations. We first examined employment of the gas-phase ∆E values for this purpose. As expected, the use of ∆E gas values yielded good, but not exceptional, correlations for both the pK a1 (cation → neutral, R 2 = 0.707) and pK a2 (neutral → anion, R 2 = 0.874) dissociations.
We next optimized the compounds within the SM8 aqueous solvent model. Using the solvent-optimized structures and the calculated ∆E H2O values for the appropriate reactions, we obtained the following QSAR models: In these equations, n = the number of compounds in the sample, R 2 is the coefficient of determination (the fraction of the variance in the data accounted for by the model), s is the standard error of the estimate, and F is the Fisher statistic. It is evident that optimization of the structures within the solvent model significantly increases the accuracy of the model, as was also shown in earlier work [1,[7][8][9].
The results for pK a1 and pK a2 are plotted in Figures 1 and 2, and the calculated values are compared with the experimental values in Tables 3 and 4. We note that several of the pK a s for the cation → neutral dissociation (pK a1 ) fall into the difficult-to-measure negative value range and carry large uncertainties. Accordingly, these values are also not well characterized for use in this range in forming the regression Equation (1), and we prefer to recognize this uncertainty by simply indicating modestly negative (<0) or significantly negative (<<0) for the pK a s of these compounds. In these equations, n = the number of compounds in the sample, R 2 is the coefficient of determination (the fraction of the variance in the data accounted for by the model), s is the standard error of the estimate, and F is the Fisher statistic. It is evident that optimization of the structures within the solvent model significantly increases the accuracy of the model, as was also shown in earlier work [1,[7][8][9].
The results for pKa1 and pKa2 are plotted in Figures 1 and 2, and the calculated values are compared with the experimental values in Tables 3 and 4. We note that several of the pKas for the cation → neutral dissociation (pKa1) fall into the difficult-to-measure negative value range and carry large uncertainties. Accordingly, these values are also not well characterized for use in this range in forming the regression Equation (1), and we prefer to recognize this uncertainty by simply indicating modestly negative (<0) or significantly negative (<<0) for the pKas of these compounds.    We also tested the ability of a commercial software program, Advanced Chemical Development, Inc.'s ACD/Labs PhysChem Percepta Suite, to estimate these pK a s. The results for pK a1 showed an excellent correlation: pk a1 (exp.) = 1.07 (±0.03) · pk a1 (ACD) + 0.34 (±0.2) where n = 19, R 2 = 0.991, s = 0.334, and F = 1583.
These results encourage use of this software for studies of the pK a s of these compounds.

Conclusions
A primary endeavor of scientific studies is to develop models of physical, chemical, and biological systems for the purpose of understanding these systems better. All models are by their very nature approximate. However, as Gauch has noted [46], in some casescounterintuitively-a model can be more accurate than the data from which it is constructed "because it amplifies hidden patterns and discards unwanted noise" inherent in the system being examined. The QSAR equations used here take advantage of this property by "averaging through" the noise, or random errors, in the experimental pK a data. It is evident that both the QSAR Equations (1) and (2) above provide relatively simple means, via mathematical models, for estimating the pK a s of the pyrimidines and related compounds. For example, in order to estimate the pK a s of unmeasured compounds in this class or to check the reported pK a s of measured compounds, using the QSAR equations one merely needs to determine ∆E H2O for the compound and then evaluate the pK a from the appropriate regression equation. Therefore, the equations provided should allow reasonable estimations for the pK a s of other pyrimidines and compounds similar to the pyrimidines. Use of the commercial ACD/Labs program can provide a further independent and very useful check on the pK a estimates.